[wildfly-dev] Call For Help (Testsuite)

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[wildfly-dev] Call For Help (Testsuite)

jtgreene
Administrator
We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.

Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.

If anyone has any further ideas on how to tackle these issues I am all ears.

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Brian Stansberry
If anyone sees a hang issue with AutoIgnoredResourcesDomainTestCase,
there's a JIRA[1] and a quick-and-dirty fix[2].

[1] https://issues.jboss.org/browse/WFLY-1511
[2] https://github.com/wildfly/wildfly/pull/4628

On 6/12/13 3:01 PM, Jason Greene wrote:

> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>
> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>
> If anyone has any further ideas on how to tackle these issues I am all ears.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>


--
Brian Stansberry
Principal Software Engineer
JBoss by Red Hat
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Andrig Miller
In reply to this post by jtgreene
I recently help out on a customer case, which came from the Linux side of the house, which had to do with Java when virtualized.  The customer produced a test case that was emulating the application where the performance slowdown was occuring.  It turned out that they were using ReentrantReadWriteLock for some synchronization.  I did some thread dumps of the test case, and looked at the source too, and found out that ReentrantReadWriteLock was using the ...Unsafe.park() native method, which does a futex() system call.  That system call has terrible performance when virtualized, compared to using ReentrantLock, which has been changed with JDK 7 to no longer depend on the native code, and does everything in Java, in user space.  Performance was so much better, and with some vitualization configuration settings was almost equal to bare metal performance.

That's a long story, just to say, if we have intermittent test suite failures only when run virtualized, perhaps we have some Java code that is using native code that calls futex() in the OS.  If so, it will perform very poorly compared to bare metal, and could be part of the problem with intermittent test failures, especially if there are any timing issues in the test cases.

Just an FYI for something to look out for.

Andy

----- Original Message -----

> From: "Jason Greene" <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, June 12, 2013 1:01:10 PM
> Subject: [wildfly-dev] Call For Help (Testsuite)
>
> We still have a number intermittent test failures that have been
> around for over a year now. I'm asking for everyone's help in doing
> what we can to make them stable. If you submit a PR, and you see
> what looks like an intermittent failure, can you do some
> investigation and report your findings even if it is not your area?
> It would be awesome if you can report what you find to the mailing
> list, and rope in help.
>
> Nearly all of these seem to only occur when virtualization is
> involved, so if need be we can work out a plan to create either a
> special run to capture diagnostic info, or I can give access to a
> dedicated slave.
>
> If anyone has any further ideas on how to tackle these issues I am
> all ears.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Josef Cacek
In reply to this post by jtgreene
It would also be nice if the jenkins slaves used in lightning lab are stable enough.
For instance we're experiencing problems with the jenkins-slave5 now and if a PR gets assigned such a machine (4 times in a row) for the AS testsuite run, then it really makes the developer angry.

BR,

-- josef

----- Original Message -----

> From: "Jason Greene" <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, June 12, 2013 9:01:10 PM
> Subject: [wildfly-dev] Call For Help (Testsuite)
>
> We still have a number intermittent test failures that have been around for
> over a year now. I'm asking for everyone's help in doing what we can to make
> them stable. If you submit a PR, and you see what looks like an intermittent
> failure, can you do some investigation and report your findings even if it
> is not your area? It would be awesome if you can report what you find to the
> mailing list, and rope in help.
>
> Nearly all of these seem to only occur when virtualization is involved, so if
> need be we can work out a plan to create either a special run to capture
> diagnostic info, or I can give access to a dedicated slave.
>
> If anyone has any further ideas on how to tackle these issues I am all ears.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Tomaž Cerar-2

On Thu, Jun 13, 2013 at 5:13 AM, Josef Cacek <[hidden email]> wrote:
It would also be nice if the jenkins slaves used in lightning lab are stable enough.
For instance we're experiencing problems with the jenkins-slave5 now and if a PR gets assigned such a machine (4 times in a row) for the AS testsuite run, then it really makes the developer angry.


We are working on that, expect some announcements on this topic shortly.
as for slave5 goes i am looking into it.

--
tomaz



_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Tomaž Cerar-2
One more thing.

"lightning lab" is just one underpowered server used and maintained by AS team,
we would wish we could call this a "lab" as it would imply more than one server :-)

--
tomaz


On Thu, Jun 13, 2013 at 7:42 AM, Tomaž Cerar <[hidden email]> wrote:

On Thu, Jun 13, 2013 at 5:13 AM, Josef Cacek <[hidden email]> wrote:
It would also be nice if the jenkins slaves used in lightning lab are stable enough.
For instance we're experiencing problems with the jenkins-slave5 now and if a PR gets assigned such a machine (4 times in a row) for the AS testsuite run, then it really makes the developer angry.


We are working on that, expect some announcements on this topic shortly.
as for slave5 goes i am looking into it.

--
tomaz




_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Jason T. Greene
In reply to this post by Tomaž Cerar-2
I fixed it last night. The problem was an OSGi test with an intermittent hang is embedding AS (a sure fire process). So it hung blocking port 9999, and the cleanup script only kills app server processes. I have adjusted it to kill maven processes as well.

I know people think there are issues with lightning, but really it's our code that is the problem.

On Jun 13, 2013, at 6:42 AM, Tomaž Cerar <[hidden email]> wrote:


On Thu, Jun 13, 2013 at 5:13 AM, Josef Cacek <[hidden email]> wrote:
It would also be nice if the jenkins slaves used in lightning lab are stable enough.
For instance we're experiencing problems with the jenkins-slave5 now and if a PR gets assigned such a machine (4 times in a row) for the AS testsuite run, then it really makes the developer angry.


We are working on that, expect some announcements on this topic shortly.
as for slave5 goes i am looking into it.

--
tomaz


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Carlo de Wolf
In reply to this post by Andrig Miller
There are intermittent failures running on bare metal as well. I think
they are confined to conditions where the machine is stressed.
I'm especially interested in cases where the GC seems to come through at
an ill-timed moment, at which point it looks like connections that are
in use are actually closed asynchronously.

Carlo

On 06/12/2013 11:38 PM, Andrig Miller wrote:

> I recently help out on a customer case, which came from the Linux side of the house, which had to do with Java when virtualized.  The customer produced a test case that was emulating the application where the performance slowdown was occuring.  It turned out that they were using ReentrantReadWriteLock for some synchronization.  I did some thread dumps of the test case, and looked at the source too, and found out that ReentrantReadWriteLock was using the ...Unsafe.park() native method, which does a futex() system call.  That system call has terrible performance when virtualized, compared to using ReentrantLock, which has been changed with JDK 7 to no longer depend on the native code, and does everything in Java, in user space.  Performance was so much better, and with some vitualization configuration settings was almost equal to bare metal performance.
>
> That's a long story, just to say, if we have intermittent test suite failures only when run virtualized, perhaps we have some Java code that is using native code that calls futex() in the OS.  If so, it will perform very poorly compared to bare metal, and could be part of the problem with intermittent test failures, especially if there are any timing issues in the test cases.
>
> Just an FYI for something to look out for.
>
> Andy
>
> ----- Original Message -----
>> From: "Jason Greene" <[hidden email]>
>> To: [hidden email]
>> Sent: Wednesday, June 12, 2013 1:01:10 PM
>> Subject: [wildfly-dev] Call For Help (Testsuite)
>>
>> We still have a number intermittent test failures that have been
>> around for over a year now. I'm asking for everyone's help in doing
>> what we can to make them stable. If you submit a PR, and you see
>> what looks like an intermittent failure, can you do some
>> investigation and report your findings even if it is not your area?
>> It would be awesome if you can report what you find to the mailing
>> list, and rope in help.
>>
>> Nearly all of these seem to only occur when virtualization is
>> involved, so if need be we can work out a plan to create either a
>> special run to capture diagnostic info, or I can give access to a
>> dedicated slave.
>>
>> If anyone has any further ideas on how to tackle these issues I am
>> all ears.
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Tomaž Cerar-2
Carlo,

what are the most common problems your team found when running on bare metal?

anything standing out?

From analysis (combination of linux & windows jobs) we have on TC we can see something like this:

Most commonly failed, more than 10% of time

org.jboss.as.test.integration.osgi.jta.TransactionTestCase.testUserTransaction (this one fails in about 50% of time)
org.jboss.as.test.integration.osgi.deployment.BundleReplaceTestCase.testDirectBundleReplace

Bit less common, < 10% of time (no particular order)

org.jboss.as.test.integration.domain.suites.DeploymentManagementTestCase.testFullReplaceViaHash
org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testActiveBackupReload
org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testLiveReload
org.jboss.as.test.integration.osgi.jndi.JNDITestCase.testInitialContextFactoryBuilderService    
org.jboss.as.test.integration.osgi.jndi.JNDITestCase.testObjectFactoryOSGiService
org.jboss.as.test.manualmode.ejb.shutdown.RemoteCallWhileShuttingDownTestCase.testServerShutdownRemoteCall
org.jboss.as.test.clustering.cluster.ejb2.invalidation.CacheInvalidationTestCase(SYNC-tcp).testCacheInvalidation

plus few more that fail less commonly.

From my point of view, the most annoying one is the OSGI's testUserTransaction, Thomas: can you please take a look at it?

But what can be seen from the list up there, most of intermittent test failures are either manual mode tests or osgi, everything else is in minority and we are working on fixing it.
There use to be lots of problems with clustering, but that was mostly fixed.

--
tomaz


On Fri, Jun 14, 2013 at 11:05 AM, Carlo de Wolf <[hidden email]> wrote:
There are intermittent failures running on bare metal as well. I think
they are confined to conditions where the machine is stressed.
I'm especially interested in cases where the GC seems to come through at
an ill-timed moment, at which point it looks like connections that are
in use are actually closed asynchronously.

Carlo

On 06/12/2013 11:38 PM, Andrig Miller wrote:
> I recently help out on a customer case, which came from the Linux side of the house, which had to do with Java when virtualized.  The customer produced a test case that was emulating the application where the performance slowdown was occuring.  It turned out that they were using ReentrantReadWriteLock for some synchronization.  I did some thread dumps of the test case, and looked at the source too, and found out that ReentrantReadWriteLock was using the ...Unsafe.park() native method, which does a futex() system call.  That system call has terrible performance when virtualized, compared to using ReentrantLock, which has been changed with JDK 7 to no longer depend on the native code, and does everything in Java, in user space.  Performance was so much better, and with some vitualization configuration settings was almost equal to bare metal performance.
>
> That's a long story, just to say, if we have intermittent test suite failures only when run virtualized, perhaps we have some Java code that is using native code that calls futex() in the OS.  If so, it will perform very poorly compared to bare metal, and could be part of the problem with intermittent test failures, especially if there are any timing issues in the test cases.
>
> Just an FYI for something to look out for.
>
> Andy
>
> ----- Original Message -----
>> From: "Jason Greene" <[hidden email]>
>> To: [hidden email]
>> Sent: Wednesday, June 12, 2013 1:01:10 PM
>> Subject: [wildfly-dev] Call For Help (Testsuite)
>>
>> We still have a number intermittent test failures that have been
>> around for over a year now. I'm asking for everyone's help in doing
>> what we can to make them stable. If you submit a PR, and you see
>> what looks like an intermittent failure, can you do some
>> investigation and report your findings even if it is not your area?
>> It would be awesome if you can report what you find to the mailing
>> list, and rope in help.
>>
>> Nearly all of these seem to only occur when virtualization is
>> involved, so if need be we can work out a plan to create either a
>> special run to capture diagnostic info, or I can give access to a
>> dedicated slave.
>>
>> If anyone has any further ideas on how to tackle these issues I am
>> all ears.
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Jeff Mesnil
Le 14 juin 2013 à 11:25, Tomaž Cerar <[hidden email]> a écrit :

> org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testActiveBackupReload
> org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testLiveReload

I've sent a PR[1] for these tests. The intermittent failures occur during a :reload operation when the connection is closed before the operation result returns.
I swallow up the exception for this special case and this should prevent anymore failure.

Note that these tests would be revisited when we provide a high-level reload operation on the client that takes care of the disconnection/reconnection issues.

jeff

[1] https://github.com/wildfly/wildfly/pull/4639

--
Jeff Mesnil
JBoss, a division of Red Hat
http://jmesnil.net/


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Emanuel Muckenhuber

On 06/14/2013 11:33 AM, Jeff Mesnil wrote:
> Le 14 juin 2013 à 11:25, Tomaž Cerar<[hidden email]>  a écrit :
>
>> >org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testActiveBackupReload
>> >org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testLiveReload
> I've sent a PR[1] for these tests. The intermittent failures occur during a :reload operation when the connection is closed before the operation result returns.
> I swallow up the exception for this special case and this should prevent anymore failure.
>
> Note that these tests would be revisited when we provide a high-level reload operation on the client that takes care of the disconnection/reconnection issues.
 >

Were those tests still failing recently? Because i thought we fixed the
execution of that :reload operation for good. There are still other
(client side) issues related to reloading a server, but we do have some
better handling for this operation now. So if that is still an issue
then i have to look into it again.

Emanuel
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Jaikiran Pai
Yes, the HornetQ test which does a "reload" and belongs to the manual
mode testsuite was failing (often) on the reload operation. There's a
JIRA https://issues.jboss.org/browse/WFLY-1518 where I've attached the
whole log file. Let me know if you need more data/logs, we might be able
to get it from the TeamCity environment.

-Jaikiran
On Friday 14 June 2013 03:26 PM, Emanuel Muckenhuber wrote:

> On 06/14/2013 11:33 AM, Jeff Mesnil wrote:
>> Le 14 juin 2013 à 11:25, Tomaž Cerar<[hidden email]>  a écrit :
>>
>>>> org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testActiveBackupReload
>>>> org.jboss.as.test.manualmode.messaging.HornetQBackupActivationTestCase.testLiveReload
>> I've sent a PR[1] for these tests. The intermittent failures occur during a :reload operation when the connection is closed before the operation result returns.
>> I swallow up the exception for this special case and this should prevent anymore failure.
>>
>> Note that these tests would be revisited when we provide a high-level reload operation on the client that takes care of the disconnection/reconnection issues.
>   >
>
> Were those tests still failing recently? Because i thought we fixed the
> execution of that :reload operation for good. There are still other
> (client side) issues related to reloading a server, but we do have some
> better handling for this operation now. So if that is still an issue
> then i have to look into it again.
>
> Emanuel
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Scott Marlow
In reply to this post by jtgreene
I'm looking into why the
org.hibernate.cache.infinispan.InfinispanRegionFactory.manager instance
variable is null after it should still be non-null (WFLY-1531) that
caused
org.jboss.as.test.clustering.cluster.ejb3.security.FailoverWithSecurityTestCase
to fail with a NPE on Tomaz's test machine.

If anyone sees NPE failures in
org.jboss.as.test.clustering.cluster.ejb3.security.FailoverWithSecurityTestCase,
please let me know.  +10 if you can save the clustered testsuite
server.log so we can see if the previous test completed before starting
FailoverWithSecurityTestCase. :)

On 06/12/2013 03:01 PM, Jason Greene wrote:

> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>
> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>
> If anyone has any further ideas on how to tackle these issues I am all ears.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Aleksandar Kostadinov
In reply to this post by jtgreene
Not exactly an answer but can we have a persistent amazon EC2 small
instance to run the testsuite? This way such intermittent issues would
be caught early after being introduced.

Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):

> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>
> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>
> If anyone has any further ideas on how to tackle these issues I am all ears.
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
>
> _______________________________________________
> wildfly-dev mailing list
> [hidden email]
> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

jtgreene
Administrator
Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.

On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:

> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>
> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>
>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>
>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>
>>
>> _______________________________________________
>> wildfly-dev mailing list
>> [hidden email]
>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Aleksandar Kostadinov
Multicast is not a problem. It works fine on the local machine as long
as you don't expect another machine to see the messages.

Jason Greene wrote, On 06/14/2013 06:35 PM (EEST):

> Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.
>
> On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>
>> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>>
>> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>>
>>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>>
>>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>>
>>> --
>>> Jason T. Greene
>>> WildFly Lead / JBoss EAP Platform Architect
>>> JBoss, a division of Red Hat
>>>
>>>
>>> _______________________________________________
>>> wildfly-dev mailing list
>>> [hidden email]
>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

jtgreene
Administrator
Ah yes duh. We have to do some Linux hacks to get IPv6 multicast to work on the loopback with 2 addresses, but that should work on EC2 as well.

We'd have to get a large instance because we need more than 1 core to adequately test. So I'll have to ask around about budget.

On Jun 14, 2013, at 10:47 AM, Aleksandar Kostadinov <[hidden email]> wrote:

> Multicast is not a problem. It works fine on the local machine as long as you don't expect another machine to see the messages.
>
> Jason Greene wrote, On 06/14/2013 06:35 PM (EEST):
>> Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.
>>
>> On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>
>>> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>>>
>>> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>>>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>>>
>>>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>>>
>>>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>>>
>>>> --
>>>> Jason T. Greene
>>>> WildFly Lead / JBoss EAP Platform Architect
>>>> JBoss, a division of Red Hat
>>>>
>>>>
>>>> _______________________________________________
>>>> wildfly-dev mailing list
>>>> [hidden email]
>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Aleksandar Kostadinov
My hope was that test suite would run reliably on a smaller instance -
at most medium which is the largest 32bit instance.

The test suite should as well be run on modern multi-cpu/core machines
but being run on slower ones will avoid issues later when certifying
legacy hardware and cloud environments.

Jason Greene wrote, On 06/14/2013 07:01 PM (EEST):

> Ah yes duh. We have to do some Linux hacks to get IPv6 multicast to work on the loopback with 2 addresses, but that should work on EC2 as well.
>
> We'd have to get a large instance because we need more than 1 core to adequately test. So I'll have to ask around about budget.
>
> On Jun 14, 2013, at 10:47 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>
>> Multicast is not a problem. It works fine on the local machine as long as you don't expect another machine to see the messages.
>>
>> Jason Greene wrote, On 06/14/2013 06:35 PM (EEST):
>>> Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.
>>>
>>> On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>>
>>>> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>>>>
>>>> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>>>>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>>>>
>>>>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>>>>
>>>>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>>>>
>>>>> --
>>>>> Jason T. Greene
>>>>> WildFly Lead / JBoss EAP Platform Architect
>>>>> JBoss, a division of Red Hat
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> wildfly-dev mailing list
>>>>> [hidden email]
>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>
>>>
>>> --
>>> Jason T. Greene
>>> WildFly Lead / JBoss EAP Platform Architect
>>> JBoss, a division of Red Hat
>>>
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

jtgreene
Administrator
Well the problem is that you really need more than one core to catch timing & memory visibility issues.

On Jun 14, 2013, at 11:40 AM, Aleksandar Kostadinov <[hidden email]> wrote:

> My hope was that test suite would run reliably on a smaller instance - at most medium which is the largest 32bit instance.
>
> The test suite should as well be run on modern multi-cpu/core machines but being run on slower ones will avoid issues later when certifying legacy hardware and cloud environments.
>
> Jason Greene wrote, On 06/14/2013 07:01 PM (EEST):
>> Ah yes duh. We have to do some Linux hacks to get IPv6 multicast to work on the loopback with 2 addresses, but that should work on EC2 as well.
>>
>> We'd have to get a large instance because we need more than 1 core to adequately test. So I'll have to ask around about budget.
>>
>> On Jun 14, 2013, at 10:47 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>
>>> Multicast is not a problem. It works fine on the local machine as long as you don't expect another machine to see the messages.
>>>
>>> Jason Greene wrote, On 06/14/2013 06:35 PM (EEST):
>>>> Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.
>>>>
>>>> On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>>>
>>>>> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>>>>>
>>>>> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>>>>>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>>>>>
>>>>>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>>>>>
>>>>>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>>>>>
>>>>>> --
>>>>>> Jason T. Greene
>>>>>> WildFly Lead / JBoss EAP Platform Architect
>>>>>> JBoss, a division of Red Hat
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> wildfly-dev mailing list
>>>>>> [hidden email]
>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>
>>>>
>>>> --
>>>> Jason T. Greene
>>>> WildFly Lead / JBoss EAP Platform Architect
>>>> JBoss, a division of Red Hat
>>>>
>>
>> --
>> Jason T. Greene
>> WildFly Lead / JBoss EAP Platform Architect
>> JBoss, a division of Red Hat
>>

--
Jason T. Greene
WildFly Lead / JBoss EAP Platform Architect
JBoss, a division of Red Hat


_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: [wildfly-dev] Call For Help (Testsuite)

Aleksandar Kostadinov
My experience shows that slow machines tend to reveal a lot of problems.
That might be more of test suite deficiencies than project issues but
they still do a lot of harm when found too late after being introduced.

I'm all for running the test suite at least daily on a range of machine
types but I think a slow and a fast run will mostly help. A large
instance will be an improvement in any case though.

Jason Greene wrote, On 06/14/2013 07:46 PM (EEST):

> Well the problem is that you really need more than one core to catch timing & memory visibility issues.
>
> On Jun 14, 2013, at 11:40 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>
>> My hope was that test suite would run reliably on a smaller instance - at most medium which is the largest 32bit instance.
>>
>> The test suite should as well be run on modern multi-cpu/core machines but being run on slower ones will avoid issues later when certifying legacy hardware and cloud environments.
>>
>> Jason Greene wrote, On 06/14/2013 07:01 PM (EEST):
>>> Ah yes duh. We have to do some Linux hacks to get IPv6 multicast to work on the loopback with 2 addresses, but that should work on EC2 as well.
>>>
>>> We'd have to get a large instance because we need more than 1 core to adequately test. So I'll have to ask around about budget.
>>>
>>> On Jun 14, 2013, at 10:47 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>>
>>>> Multicast is not a problem. It works fine on the local machine as long as you don't expect another machine to see the messages.
>>>>
>>>> Jason Greene wrote, On 06/14/2013 06:35 PM (EEST):
>>>>> Unfortunately EC2 won't run the test suite since some of the tests rely on multicast. Lightning is virtualize but its at capacity. We may be getting some more hardware though which would help.
>>>>>
>>>>> On Jun 14, 2013, at 10:32 AM, Aleksandar Kostadinov <[hidden email]> wrote:
>>>>>
>>>>>> Not exactly an answer but can we have a persistent amazon EC2 small instance to run the testsuite? This way such intermittent issues would be caught early after being introduced.
>>>>>>
>>>>>> Jason Greene wrote, On 06/12/2013 10:01 PM (EEST):
>>>>>>> We still have a number intermittent test failures that have been around for over a year now. I'm asking for everyone's help in doing what we can to make them stable. If you submit a PR, and you see what looks like an intermittent failure, can you do some investigation and report your findings even if it is not your area? It would be awesome if you can report what you find to the mailing list, and rope in help.
>>>>>>>
>>>>>>> Nearly all of these seem to only occur when virtualization is involved, so if need be we can work out a plan to create either a special run to capture diagnostic info, or I can give access to a dedicated slave.
>>>>>>>
>>>>>>> If anyone has any further ideas on how to tackle these issues I am all ears.
>>>>>>>
>>>>>>> --
>>>>>>> Jason T. Greene
>>>>>>> WildFly Lead / JBoss EAP Platform Architect
>>>>>>> JBoss, a division of Red Hat
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> wildfly-dev mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.jboss.org/mailman/listinfo/wildfly-dev
>>>>>>>
>>>>>
>>>>> --
>>>>> Jason T. Greene
>>>>> WildFly Lead / JBoss EAP Platform Architect
>>>>> JBoss, a division of Red Hat
>>>>>
>>>
>>> --
>>> Jason T. Greene
>>> WildFly Lead / JBoss EAP Platform Architect
>>> JBoss, a division of Red Hat
>>>
>
> --
> Jason T. Greene
> WildFly Lead / JBoss EAP Platform Architect
> JBoss, a division of Red Hat
>
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
12