Pooling EJB Session Beans per default

classic Classic list List threaded Threaded
44 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: Pooling EJB Session Beans per default

Radoslaw Rodak

Am 06.08.2014 um 19:48 schrieb Jason Greene <[hidden email]>:


On Aug 6, 2014, at 12:21 PM, Andrig Miller <[hidden email]> wrote:



----- Original Message -----
From: "Jason Greene" <[hidden email]>
To: "Andrig Miller" <[hidden email]>
Cc: "Bill Burke" <[hidden email]>, [hidden email]
Sent: Wednesday, August 6, 2014 11:08:02 AM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default


On Aug 6, 2014, at 10:49 AM, Andrig Miller <[hidden email]>
wrote:



----- Original Message -----
From: "Bill Burke" <[hidden email]>
To: [hidden email]
Sent: Wednesday, August 6, 2014 9:30:06 AM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default




This conversation is a perfect example of misinformation that
causes us performance and scalability problems within our code
bases.

It’s just a surprising result. The pool saves a few allocations, but
it also has the cost of concurrency usage which can trigger
blocking, additional barriers, and busy looping on CAS. You also
still have object churn in the underlying pool data structures that
occurs per invocation since every invocation is a check-out and a
check-in (requires a new node object instance), and if the semaphore
blocks you have additional allocation for the entry in the wait
queue. You factor in the remaining allocation savings relative to
other allocations that are required for the invocation, and it
should be a very small percentage. For that very small percentage to
lead to several times a difference in performance to me hints at
other factors being involved.


All logically thought through.  At a 15% lower transaction rate than we are doing now, we saw 4 Gigabytes per second of object allocation.  We, with Sanne doing most of the work, managed to get that down to 3 Gigabytes per second (I would have loved to get it to 2).  Much of that was Hibernate allocations, and of course that was with pooling on.  We have not spent the time to pinpoint the exact differences, memory and other, between having pooling on vs. off.  Our priority has been continue to scale the workload and fix any problems we see as a result.  We have managed to increase the transaction rate another 15% in the last couple of months, but still have another 17+% to go on a single JVM before we start looking at two JVM's for the testing.  

Once we get to our goal, I would love to put this on our list of tasks, so we can get the specific facts, and instead of talking theory, we will no exactly what can and cannot be done, and whether no pooling could ever match pooled.

Fair enough, and I certainly didn’t mean to imply that such work should be your team, I was just speaking generally. In any case, what I really really would like for us to achieve is a default implementation that performs generally well on all usage patterns, with no tuning required. Since we know that initialization can be costly for some applications usage of SLSB, such an implementation will definitely require a form of pooling.

I suspect that a thread local based design with the pooling tied to worker threads will give us this. Alternatively a shared pool which is auto-tuned to match might be worth looking into.

If there is anyone lurking who wishes to contribute in this area speak up, and I’ll worth with you on it. As doge would say “Such Fun. Much Glory” :)

I would expect something better then this :-)))




--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Pooling EJB Session Beans per default

Radoslaw Rodak
In reply to this post by Andrig Miller

Am 06.08.2014 um 19:21 schrieb Andrig Miller <[hidden email]>:



----- Original Message -----
From: "Jason Greene" <[hidden email]>
To: "Andrig Miller" <[hidden email]>
Cc: "Bill Burke" <[hidden email]>, [hidden email]
Sent: Wednesday, August 6, 2014 11:08:02 AM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default


On Aug 6, 2014, at 10:49 AM, Andrig Miller <[hidden email]>
wrote:



----- Original Message -----
From: "Bill Burke" <[hidden email]>
To: [hidden email]
Sent: Wednesday, August 6, 2014 9:30:06 AM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default



On 8/6/2014 10:50 AM, Andrig Miller wrote:


----- Original Message -----
From: "Radoslaw Rodak" <[hidden email]>
To: [hidden email]
Sent: Tuesday, August 5, 2014 6:51:03 PM
Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default


Am 06.08.2014 um 00:36 schrieb Bill Burke <[hidden email]>:



On 8/5/2014 3:54 PM, Andrig Miller wrote:
Its a horrible theory. :)  How many EJB instances of a give
type
are
created per request?  Generally only 1.  1 instance of one
object
of
one
type!  My $5 bet is that if you went into EJB code and
started
counting
how many object allocations were made per request, you'd lose
count
very
quickly.   Better yet, run a single remote EJB request
through
a
perf
tool and let it count the number of allocations for you.  It
will
be
greater than 1.  :)

Maybe the StrictMaxPool has an effect on performance because
it
creates
a global synchronization bottleneck.  Throughput is less and
you
end
up
having less concurrent per-request objects being allocated
and
GC'd.


The number per request, while relevant is only part of the
story.
The number of concurrent requests happening in the server
dictates the object allocation rate.  Given enough
concurrency,
even a very small number of object allocations per request can
create an object allocation rate that can no longer be
sustained.


I'm saying that the number of concurrent requests might not
dictate
object allocation rate.  There are probably a number of
allocations
that
happen after the EJB instance is obtained.  i.e. interception
chains,
contexts, etc.   If StrictMaxPool blocks until a new instance
is
available, then there would be less allocations per request as
blocking
threads would be serialized.


Scenarion 1 )
------------------
Let say we have a pool of 100 Stateless EJBs and a constant Load
of
50 Requests per second  proceeded by 50 EJBs from the pool in
one
second.
After 1000 seconds how many new EJB Instances will be created
having
a pool? answer 0 new EJBs  worst case 100 EJB’s in pool… of
course
object allocation is much higher as of course 1 EJB call leads
to
many Object from one EJB  but…let see situation without pool.

50 Request/s * 1000 seconds = worst case 50’ 000 EJB Instances
on
Java heap where 1 EJB might have many objects…   as long as
Garbage
Collection was not triggered… which sounds to me like faster
filling
JVM heap and having ofter GC probable depending on GC Strategy.

Scenarion 2)
------------------
Same as before,  Load is still 50 Requests  per second BUT EJB
Method
call takes 10s.
after 10s we have 500 EJB Instances without pool, after 11s  550
-
10
= 540EJB Instances , after 12s  580 EJBs … after some time very
bad
perf…full GC …and mabe OutOfMemory..

So… performance advantage could also turn in to disadvantage :-)


Whoever is investigating StrictMaxPool, or EJB pooling in
general
should
stop.  Its pointless.

Agree, pools are outdated…. but something like WorkManager for
min,
max Threads or even better always not less the X idle Threads
would
be useful :-)

Radek


The scenarios above are what is outddated.  Fifty requests per
second isn't any load at all!  We have 100's of thousands of
clients that we have to scale to, and lots more than 50 requests
per second.

What you mean to say is that you need to scale to 100's of
thousands
of
clients on meaningless no-op benchmarks. :)  I do know that that
old
SpecJ Java EE benchmarks artifically made EJB pooling important as
process intensive calculation results were cached in these
instances.
But real-world apps don't use this feature/anti-pattern.


I am not talking about a meaningless no-op benchmark, but a
benchmark that does lots of work.  We don't use meaningless no-op
benchmarks on the performance team, with some exception for
microbenchmarks that we have carefully crafted that model the
interactions for a specific component within the context of how it
is actually used for a real application.

Also however crappy it was, I did implement an EJB container at
one
time
in my career.  :)  I know for a fact that there are a number of
per-request internal support objects that need to be allocated.
Let's
count:

* The argument array (for reflection)
* Each argument of the method call
* The response object
* Interceptor context object
* The interceptor context attribute map
* EJBContext
* Subject, Principal, role mappings
* Transaction context
* The message object(s) specific to the remote EJB protocol

Starts to add up huh?   I'm probably missing a bunch more.  Throw
in
interaction with JPA and you end up with even more per-request
objects
being allocated.  You still believe pooling one EJB instance
matters?


See John O'Hara's post which shows our non-meaningless benchmark
and the difference that pooling makes vs. non-pooling.  It is a
dramatic difference to say the least.

There is certainly a correlation identified between the results of
this benchmark and the use of pooling. However the underlying cause
of the resulting difference is still unknown. If we knew
definitively how and why this happens it would help in optimizing
this further. As an example, if it turned out to be some secondary
factor, like the throttling aspect of the pool, then eliminating
these allocations (and others) with a zero-tuning approach, like
thread local pooling would offer little to no improvement. If
discovered it is indeed extreme object allocation, and that it came
from thousands of nested calls in a request, then having a temporary
per-request thread local cache would dramatically improve the
results, and be cheap/quick to implement vs a full thread local
solution. If there is a bug in our code somewhere where under
certain situations we create hundreds of objects, when we should be
creating 10s, and the pool covers that up, fixing that bug and
removing the pool could lead to better results. If it turns out
there is only 3% extra churn but that extra churn causes a 10x perf
reduction in GC, then we better understand those limits and
potentially work with the openjdk team in that area.



This conversation is a perfect example of misinformation that
causes us performance and scalability problems within our code
bases.

It’s just a surprising result. The pool saves a few allocations, but
it also has the cost of concurrency usage which can trigger
blocking, additional barriers, and busy looping on CAS. You also
still have object churn in the underlying pool data structures that
occurs per invocation since every invocation is a check-out and a
check-in (requires a new node object instance), and if the semaphore
blocks you have additional allocation for the entry in the wait
queue. You factor in the remaining allocation savings relative to
other allocations that are required for the invocation, and it
should be a very small percentage. For that very small percentage to
lead to several times a difference in performance to me hints at
other factors being involved.


All logically thought through.  At a 15% lower transaction rate than we are doing now, we saw 4 Gigabytes per second of object allocation.  We, with Sanne doing most of the work, managed to get that down to 3 Gigabytes per second (I would have loved to get it to 2).  Much of that was Hibernate allocations, and of course that was with pooling on.  We have not spent the time to pinpoint the exact differences, memory and other, between having pooling on vs. off.  Our priority has been continue to scale the workload and fix any problems we see as a result.  We have managed to increase the transaction rate another 15% in the last couple of months, but still have another 17+% to go on a single JVM before we start looking at two JVM's for the testing.  

Once we get to our goal, I would love to put this on our list of tasks, so we can get the specific facts, and instead of talking theory, we will no exactly what can and cannot be done, and whether no pooling could ever match pooled.

I’m lucky my employer spend us this APM tool... this saved me lot of weeks of work!   



Andy

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat



_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Pooling EJB Session Beans per default

Andrig Miller
In reply to this post by jtgreene


----- Original Message -----

> From: "Jason Greene" <[hidden email]>
> To: "Bill Burke" <[hidden email]>
> Cc: [hidden email]
> Sent: Wednesday, August 6, 2014 11:37:07 AM
> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>
>
> On Aug 6, 2014, at 10:30 AM, Bill Burke <[hidden email]> wrote:
>
> > What you mean to say is that you need to scale to 100's of
> > thousands of
> > clients on meaningless no-op benchmarks. :)  I do know that that
> > old
> > SpecJ Java EE benchmarks artifically made EJB pooling important as
> > process intensive calculation results were cached in these
> > instances.
> > But real-world apps don't use this feature/anti-pattern.
>
> If the benchmark in question is doing that, that would most
> definitely explain this. I mean we know the following can perform
> poorly without pooling:
>
> 1. Expensive initialization in @PostConstruct

There is no @PreConstruct or @PostContruct methods.

> 2. Lazy expensive initialization in a invocation (what you allude to
> above)

No caching or other lazy initialization is in the stateless session beans.

> 3. Expensive initialization in a constructor

I'd have to look at the code again, but I don't remember anything like this.

>
> Large allocations of objects count as expensive initialization.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Pooling EJB Session Beans per default

Andrig Miller
In reply to this post by jtgreene


----- Original Message -----

> From: "Jason Greene" <[hidden email]>
> To: "Andrig Miller" <[hidden email]>
> Cc: "Bill Burke" <[hidden email]>, [hidden email]
> Sent: Wednesday, August 6, 2014 11:48:53 AM
> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
>
>
> On Aug 6, 2014, at 12:21 PM, Andrig Miller <[hidden email]>
> wrote:
>
> >
> >
> > ----- Original Message -----
> >> From: "Jason Greene" <[hidden email]>
> >> To: "Andrig Miller" <[hidden email]>
> >> Cc: "Bill Burke" <[hidden email]>, [hidden email]
> >> Sent: Wednesday, August 6, 2014 11:08:02 AM
> >> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
> >>
> >>
> >> On Aug 6, 2014, at 10:49 AM, Andrig Miller <[hidden email]>
> >> wrote:
> >>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> From: "Bill Burke" <[hidden email]>
> >>>> To: [hidden email]
> >>>> Sent: Wednesday, August 6, 2014 9:30:06 AM
> >>>> Subject: Re: [wildfly-dev] Pooling EJB Session Beans per default
> >>>>
> >>>
>
>
> >>> This conversation is a perfect example of misinformation that
> >>> causes us performance and scalability problems within our code
> >>> bases.
> >>
> >> It’s just a surprising result. The pool saves a few allocations,
> >> but
> >> it also has the cost of concurrency usage which can trigger
> >> blocking, additional barriers, and busy looping on CAS. You also
> >> still have object churn in the underlying pool data structures
> >> that
> >> occurs per invocation since every invocation is a check-out and a
> >> check-in (requires a new node object instance), and if the
> >> semaphore
> >> blocks you have additional allocation for the entry in the wait
> >> queue. You factor in the remaining allocation savings relative to
> >> other allocations that are required for the invocation, and it
> >> should be a very small percentage. For that very small percentage
> >> to
> >> lead to several times a difference in performance to me hints at
> >> other factors being involved.
> >>
> >
> > All logically thought through.  At a 15% lower transaction rate
> > than we are doing now, we saw 4 Gigabytes per second of object
> > allocation.  We, with Sanne doing most of the work, managed to get
> > that down to 3 Gigabytes per second (I would have loved to get it
> > to 2).  Much of that was Hibernate allocations, and of course that
> > was with pooling on.  We have not spent the time to pinpoint the
> > exact differences, memory and other, between having pooling on vs.
> > off.  Our priority has been continue to scale the workload and fix
> > any problems we see as a result.  We have managed to increase the
> > transaction rate another 15% in the last couple of months, but
> > still have another 17+% to go on a single JVM before we start
> > looking at two JVM's for the testing.
> >
> > Once we get to our goal, I would love to put this on our list of
> > tasks, so we can get the specific facts, and instead of talking
> > theory, we will no exactly what can and cannot be done, and
> > whether no pooling could ever match pooled.
>
> Fair enough, and I certainly didn’t mean to imply that such work
> should be your team, I was just speaking generally. In any case,
> what I really really would like for us to achieve is a default
> implementation that performs generally well on all usage patterns,
> with no tuning required. Since we know that initialization can be
> costly for some applications usage of SLSB, such an implementation
> will definitely require a form of pooling.
>

No problem, and I think we should do this, we just cannot afford to spend the time right now.

We share the same goal.  I too would love to have a default configuration, that didn't require tuning that performs and scales really well.

> I suspect that a thread local based design with the pooling tied to
> worker threads will give us this. Alternatively a shared pool which
> is auto-tuned to match might be worth looking into.
>
> If there is anyone lurking who wishes to contribute in this area
> speak up, and I’ll worth with you on it. As doge would say “Such
> Fun. Much Glory” :)
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
123