Examples of heuristic outcome and other transaction errors

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Examples of heuristic outcome and other transaction errors

Misty Stanley-Jones
Hi all,

I am documenting Transactions for EAP6, and I need some information and examples about:

* Heuristic outcomes
* Error handling in general

Specifically I would like to see examples of error messages, and some troubleshooting techniques. I am at a loss to know how to write examples that will generate errors. Thanks for your help!

--


Misty Stanley-Jones, RHCE
Content Author, ECS Brisbane
☺: misty (Freenode IRC) ✉: [hidden email] ☏: +61 7 3514 8105 ☏: 88105

_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev
Reply | Threaded
Open this post in threaded view
|

Re: Examples of heuristic outcome and other transaction errors

Jonathan Halliday

On 07/11/2011 12:31 AM, Misty Stanley-Jones wrote:
> Hi all,
>
> I am documenting Transactions for EAP6, and I need some information and examples about:
>
> * Heuristic outcomes
> * Error handling in general
>
> Specifically I would like to see examples of error messages, and some troubleshooting techniques. I am at a loss to know how to write examples that will generate errors. Thanks for your help!
>

hmm yes, forcing heuristic outcomes is not easy - the system does
everything in its power to avoid them, so making one occur is hard work.
However it's a common request from QE, GSS and the occasional customer,
so here is the stock answer:

1) run under high load and disable, either in software or hardware, one
or more of the resource managers e.g. physically pull out the network
cable connecting the app server to the resource manager.  This is a
probabilistic outcome - if the load is high enough chances are you'll
hit at least one transaction at the right point in the lifecycle to
cause a heuristic. Finding it in the huge server log files may take a
while though :-)

2) Run a single transaction, hold it on the debugger at the critical
point, disable the RM and allow execution to continue. More elegant, but
requires some understanding of the source code (hint: BasicAction.prepare)

3) Write a dummy XAResource for enlistment into the transaction and have
it fail at an appropriate point e.g. in the commit call.

4) Use Byteman to instrument an existing XAResource to fail at an
appropriate point.

These approaches all have one thing in common: they are intended for
software engineers. Unless you have a tame developer to hand, your best
bet will be to find a pre-existing example. The TS testsuite has
several, although not in the most user-friendly form. There was also
some work done for the anti-FUD video which is probably still hanging
around somewhere.

As for troubleshooting advise:

Heuristics typically fall into two groups - those caused by transient
failures in the environment and those caused by coding errors. The
former are more common in production or stress tests, whilst the latter
are normally weeded out at development time.

Heuristics caused by transient failures rarely need much in the way of
root cause analysis - the network break / db server outage / power loss
/ etc is normally readily apparent. The tricky bit is resolving by hand
any transactions that are in a heuristically completed state. The TS
system only automatically resolves pending transactions during recovery,
not heuristically completed ones. Cleaning those up can require
identifying which RMs were involved, examining state in the transaction
manager and RMs and manually forcing log cleanup and data reconciliation
in one or more of them. It's not easy and the exact steps depend on the
specific RMs involved (each has its own management tooling) and the data
content of the transaction branch in the RMs.  I'm in the process of
adding additional logging output to assist with this type of thing, but
from the point of view of the AS7/EAP6 user the biggest problem right
now is we're missing an integrated tx objectstore browser frontend :-(

Heuristics caused by misbehaving (i.e. not spec compliant) RMs are less
common but do occur occasionally. Unless the user is also the RM author,
troubleshooting these is normally a job for GSS and the TS devs. The
root cause is normally some disagreement or misunderstanding in the
interpretation of an arcane point in the spec. In practical terms the
cleanup is simpler, as in dev environments simply deleting all the
relevant logs records in the RM and app server will reset things - data
reconciliation is rarely a significant concern here.

As for general (non-heuristic) error handling in transactions, the key
cases are:

- transaction timed out in the background and the business logic thread
did not notice. This frequently manifests in hibernate suddenly being
unable to get a db connection for lazy loading.  begin a new hibernate
session. lengthen the timeout, restructure your code or tune the
environment if it's happening frequently.

- transaction already running on a thread, normally because thread
pooling is used and something forgot to disassociate an old transaction
before releasing the thread back to the pool. Use finally blocks for
cleanup and be aware that TransactionManager.[commit|rollback] do
disassociation but Transaction.[commit|rollback] don't. Yes I'm looking
at you Mr. Spring Framework.

- recovery failed due to non-serializable resource. Usually caused by
failing to configure the recovery plugin properly and hopefully a thing
of the past now that is at last done automatically by the server :-)

- the system disallowed enlistment of a second local resource. This is
just a problem of user education - the difference between local-ds and
xa-ds is not well understood.

I think those cover a majority of the support load, but the GSS guys may
have others.

Jonathan.

--
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod
Street, Windsor, Berkshire, SI4 1TE, United Kingdom.
Registered in UK and Wales under Company Registration No. 3798903
Directors: Michael Cunningham (USA), Charlie Peters (USA), Matt Parsons
(USA) and Brendan Lane (Ireland).
_______________________________________________
jboss-as7-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/jboss-as7-dev