Clustering tests

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustering tests

kkhan
Hi,

I've had a vague thought. I'm not familiar with the clustering tests bit from a glance there seems to be a lot of stopping and starting of servers going on? These tests run very slowly, so if all this stopping/starting is contributing to the slowness, it could be an idea to explore changing the container so that

stop -> executes a reload on the server so that it is reloaded in admin-only mode
start -> executes a reload on the server so that is is reloaded in 'normal' mode

Radoslav, do you see any issues with this?
_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

Radoslav Husar
On 03/06/2012 06:34 PM, Kabir Khan wrote:
> Hi,
>
> I've had a vague thought. I'm not familiar with the clustering tests
> bit from a glance there seems to be a lot of stopping and starting of
> servers going on? These tests run very slowly, so if all this

Hi Kabir,

Thanks for the tough. Yes that is true and surely does contribute to the
slowness.

The reason for the lots of stopping is because we need to use manual
containers (for simulating failover, state transfer, undeploy,
singleton, etc). The containers are set to manual mode and unmanaged
deployments.

Unfortunately, this currently means that the containers are stopped
after *each* test class and then started again. ARQ does this. We would
also need to be able to order the test class executions. Most of these
cases (currently all) the running container could be re-used (even
though we shut it down in the middle of the test).

For this we would probably need support in Arquillian. (class order +
dont turn off container after class?) Aslak, WDYT?

> stopping/starting is contributing to the slowness, it could be an
> idea to explore changing the container so that
>
> stop ->  executes a reload on the server so that it is reloaded in
> admin-only mode start ->  executes a reload on the server so that is
> is reloaded in 'normal' mode
>
> Radoslav, do you see any issues with this?

How would this actually work? Use auto containers and use stop and start
to simulate up and down without having to do any fix in ARQ? Hmhm, thats
interesting.

This would however prohibit use of stop by kill or custom "kill" and
probably use of different container confs as well.

However, this seems to me too much of a workaround (for instance the
server's JVM never shuts down).

Rado
_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

Richard Achmatowicz
Kabir, what exactly is admin mode? Is it the same server profile but
with controls on communication from clients?

The problem is that you're replacing a dead server (no communication
channels) with a live server in admin mode (with live communication
channels). So IIUC, you depend on the admin mode working correctly to
block communications from the outside. Also, if JGroups detected failed
group members using a standard OS ping mechanism (which it doesn't),
then this could change JGroups view of failed vs non-failed hosts - that
is, if admin mode did not also block pings.


On 03/08/2012 11:28 AM, Radoslav Husar wrote:

> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>> Hi,
>>
>> I've had a vague thought. I'm not familiar with the clustering tests
>> bit from a glance there seems to be a lot of stopping and starting of
>> servers going on? These tests run very slowly, so if all this
> Hi Kabir,
>
> Thanks for the tough. Yes that is true and surely does contribute to the
> slowness.
>
> The reason for the lots of stopping is because we need to use manual
> containers (for simulating failover, state transfer, undeploy,
> singleton, etc). The containers are set to manual mode and unmanaged
> deployments.
>
> Unfortunately, this currently means that the containers are stopped
> after *each* test class and then started again. ARQ does this. We would
> also need to be able to order the test class executions. Most of these
> cases (currently all) the running container could be re-used (even
> though we shut it down in the middle of the test).
>
> For this we would probably need support in Arquillian. (class order +
> dont turn off container after class?) Aslak, WDYT?
>
>> stopping/starting is contributing to the slowness, it could be an
>> idea to explore changing the container so that
>>
>> stop ->   executes a reload on the server so that it is reloaded in
>> admin-only mode start ->   executes a reload on the server so that is
>> is reloaded in 'normal' mode
>>
>> Radoslav, do you see any issues with this?
> How would this actually work? Use auto containers and use stop and start
> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
> interesting.
>
> This would however prohibit use of stop by kill or custom "kill" and
> probably use of different container confs as well.
>
> However, this seems to me too much of a workaround (for instance the
> server's JVM never shuts down).
>
> Rado
> _______________________________________________
> jboss-as7-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev

_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

kkhan
In reply to this post by Radoslav Husar
I've answered as best I can without having looked too much into it.
On 8 Mar 2012, at 16:28, Radoslav Husar wrote:

> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>> Hi,
>>
>> I've had a vague thought. I'm not familiar with the clustering tests
>> bit from a glance there seems to be a lot of stopping and starting of
>> servers going on? These tests run very slowly, so if all this
>
> Hi Kabir,
>
> Thanks for the tough. Yes that is true and surely does contribute to the
> slowness.
>
> The reason for the lots of stopping is because we need to use manual
> containers (for simulating failover, state transfer, undeploy,
> singleton, etc). The containers are set to manual mode and unmanaged
> deployments.
Understood :-)

>
> Unfortunately, this currently means that the containers are stopped
> after *each* test class and then started again. ARQ does this. We would
> also need to be able to order the test class executions. Most of these
> cases (currently all) the running container could be re-used (even
> though we shut it down in the middle of the test).
>
> For this we would probably need support in Arquillian. (class order +
> dont turn off container after class?) Aslak, WDYT?
>
>> stopping/starting is contributing to the slowness, it could be an
>> idea to explore changing the container so that
>>
>> stop ->  executes a reload on the server so that it is reloaded in
>> admin-only mode start ->  executes a reload on the server so that is
>> is reloaded in 'normal' mode
>>
>> Radoslav, do you see any issues with this?
>
> How would this actually work? Use auto containers and use stop and start
> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
> interesting.

I think changing ManagedDeployableContainer in the as source tree might work, or alternatively creating another container for clustering. I'll just keep calling it ManagedDeployableContainer for now :-) ManagedDeployableContainer is where the start/stop calls seem to end up. Some tracking of if it is currently started/stopped could be added to know whether to start it for real (on first start) or reloading out of admin-only. A problem is when to do the real stop, rather than the reload->admin-only, e.g. at the end of the suite but there might be some hooks that can be used? I guess it really depends on if the ManagedDeployableContainer instances behind the ContainerController are the same for each test or not.

I've not really looked at it enough to know about the ordering stuff you mention.

>
> This would however prohibit use of stop by kill or custom "kill" and
> probably use of different container confs as well.
Kill probably should kill the process, and be tracked in the ManagedDeployableContainer so that the next start() does a real start

> However, this seems to me too much of a workaround (for instance the
> server's JVM never shuts down).
>
> Rado
> _______________________________________________
> jboss-as7-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev


_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

kkhan
In reply to this post by Richard Achmatowicz
A reload stops all running services. Admin mode is a server process with the management interfaces enabled but NO subsystems started. So reloading->admin-mode means that the process has no jgroups stack running. Then reloading out of admin mode would install the admin services.

To see it in action try these commands from CLI
[standalone@localhost:9999 /] :reload(admin-only=true)                                        
{"outcome" => "success"}
[standalone@localhost:9999 /] :reload(admin-only=false)
{"outcome" => "success"}

And look out for how #@*%ing fast a reload is compared to a normal start :-)

On 8 Mar 2012, at 16:48, Richard Achmatowicz wrote:

> Kabir, what exactly is admin mode? Is it the same server profile but
> with controls on communication from clients?
>
> The problem is that you're replacing a dead server (no communication
> channels) with a live server in admin mode (with live communication
> channels). So IIUC, you depend on the admin mode working correctly to
> block communications from the outside. Also, if JGroups detected failed
> group members using a standard OS ping mechanism (which it doesn't),
> then this could change JGroups view of failed vs non-failed hosts - that
> is, if admin mode did not also block pings.
>
>
> On 03/08/2012 11:28 AM, Radoslav Husar wrote:
>> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>>> Hi,
>>>
>>> I've had a vague thought. I'm not familiar with the clustering tests
>>> bit from a glance there seems to be a lot of stopping and starting of
>>> servers going on? These tests run very slowly, so if all this
>> Hi Kabir,
>>
>> Thanks for the tough. Yes that is true and surely does contribute to the
>> slowness.
>>
>> The reason for the lots of stopping is because we need to use manual
>> containers (for simulating failover, state transfer, undeploy,
>> singleton, etc). The containers are set to manual mode and unmanaged
>> deployments.
>>
>> Unfortunately, this currently means that the containers are stopped
>> after *each* test class and then started again. ARQ does this. We would
>> also need to be able to order the test class executions. Most of these
>> cases (currently all) the running container could be re-used (even
>> though we shut it down in the middle of the test).
>>
>> For this we would probably need support in Arquillian. (class order +
>> dont turn off container after class?) Aslak, WDYT?
>>
>>> stopping/starting is contributing to the slowness, it could be an
>>> idea to explore changing the container so that
>>>
>>> stop ->   executes a reload on the server so that it is reloaded in
>>> admin-only mode start ->   executes a reload on the server so that is
>>> is reloaded in 'normal' mode
>>>
>>> Radoslav, do you see any issues with this?
>> How would this actually work? Use auto containers and use stop and start
>> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
>> interesting.
>>
>> This would however prohibit use of stop by kill or custom "kill" and
>> probably use of different container confs as well.
>>
>> However, this seems to me too much of a workaround (for instance the
>> server's JVM never shuts down).
>>
>> Rado
>> _______________________________________________
>> jboss-as7-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>
> _______________________________________________
> jboss-as7-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev


_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

kkhan

On 8 Mar 2012, at 16:56, Kabir Khan wrote:
>
> And look out for how #@*%ing fast a reload is compared to a normal start :-)

Mainly due to the classes already having been loaded
_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Clustering tests

Brian Stansberry
In reply to this post by Richard Achmatowicz
On 3/8/12 10:48 AM, Richard Achmatowicz wrote:
> Kabir, what exactly is admin mode? Is it the same server profile but
> with controls on communication from clients?
>

Simplest explanation:

[standalone@localhost:9999 /] :read-operation-description(name=reload)
{
     "outcome" => "success",
     "result" => {
         "operation-name" => "reload",
         "description" => "Reloads the server by shutting down all its
services and starting again. The JVM itself is not restarted.",
         "request-properties" => {"admin-only" => {
             "type" => BOOLEAN,
             "description" => "Whether the server should start in
running mode ADMIN_ONLY when it restarts. An ADMIN_ONLY server will
start any configured management interfaces and accept management
requests, but will not start services used for handling end user requests.",
             "expressions-allowed" => false,
             "required" => false,
             "nillable" => true
         }},
         "reply-properties" => {},
         "read-only" => false
     }
}

> The problem is that you're replacing a dead server (no communication
> channels) with a live server in admin mode (with live communication
> channels). So IIUC, you depend on the admin mode working correctly to
> block communications from the outside.

True. It's a tradeoff. I certainly wouldn't advocate doing this every
place we want to restart a server. But some places it might make sense.
If we end up stopping and starting servers 100 times over the course of
a clustering testsuite run, some of those may not really need a full
process restart.

If an admin-only server has a running JGroups channel, the JGroups
subsystem is broken. When you start in admin-only mode, the
Stage.RUNTIME handlers for subsystems should not execute.

> Also, if JGroups detected failed
> group members using a standard OS ping mechanism (which it doesn't),
> then this could change JGroups view of failed vs non-failed hosts - that
> is, if admin mode did not also block pings.

If JGroups is using simple OS pings (which it isn't) to validate the
existence of a working peer, then JGroups is badly broken. ;) And if it
started doing that and it found the socket for the channel open on an
admin-only peer, then the JGroups subsystem is broken.

Bottom line, I think this is a good thing to think about and keep in
mind as the clustering testsuite grows. It's not a no-brainer thing to
do, but it is one possible way to slow down the rapid increase in
testsuite execution time that we are seeing. Execution time is a real
issue that impacts quality, since a major quality factor is our ability
to have all patches manually reviewed and tested before they are merged.

>
>
> On 03/08/2012 11:28 AM, Radoslav Husar wrote:
>> On 03/06/2012 06:34 PM, Kabir Khan wrote:
>>> Hi,
>>>
>>> I've had a vague thought. I'm not familiar with the clustering tests
>>> bit from a glance there seems to be a lot of stopping and starting of
>>> servers going on? These tests run very slowly, so if all this
>> Hi Kabir,
>>
>> Thanks for the tough. Yes that is true and surely does contribute to the
>> slowness.
>>
>> The reason for the lots of stopping is because we need to use manual
>> containers (for simulating failover, state transfer, undeploy,
>> singleton, etc). The containers are set to manual mode and unmanaged
>> deployments.
>>
>> Unfortunately, this currently means that the containers are stopped
>> after *each* test class and then started again. ARQ does this. We would
>> also need to be able to order the test class executions. Most of these
>> cases (currently all) the running container could be re-used (even
>> though we shut it down in the middle of the test).
>>
>> For this we would probably need support in Arquillian. (class order +
>> dont turn off container after class?) Aslak, WDYT?
>>
>>> stopping/starting is contributing to the slowness, it could be an
>>> idea to explore changing the container so that
>>>
>>> stop ->    executes a reload on the server so that it is reloaded in
>>> admin-only mode start ->    executes a reload on the server so that is
>>> is reloaded in 'normal' mode
>>>
>>> Radoslav, do you see any issues with this?
>> How would this actually work? Use auto containers and use stop and start
>> to simulate up and down without having to do any fix in ARQ? Hmhm, thats
>> interesting.
>>
>> This would however prohibit use of stop by kill or custom "kill" and
>> probably use of different container confs as well.
>>
>> However, this seems to me too much of a workaround (for instance the
>> server's JVM never shuts down).
>>
>> Rado
>> _______________________________________________
>> jboss-as7-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
>
> _______________________________________________
> jboss-as7-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/jboss-as7-dev


--
Brian Stansberry
Principal Software Engineer
JBoss by Red Hat
_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev