Transactions in Batch (JSR 352)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Transactions in Batch (JSR 352)

James Perkins
In the JSR 352 batch specification there are some issues around transactions with JCA. These issues would mainly be seen with JDBC item readers/writers.

Here is kind of a thought dump from a sit down I had with the JCA team. If anyone has any opinions on how this should be handled let's talk it out. I would imagine this is a fairly important issue as generally speaking batch jobs will likely be legacy jobs and JDBC is probably heavily used.



Batch Transactions with JCA

Problem:
Batch requires that a Transaction.begin() and a Transaction.commit() for each open, close and chunk processed. This causes connections from JCA to potentially cross transactional boundaries.
    This requires that the JCA CachedConnectionManager (CCM) is enabled for the resource in question (Jesper)

Transaction code should be written like:
Connection c
Transaction.begin()
c = DataSource.getConnection()
c.close()
Transaction.commit()

Chunk Processing process:
Transaction.begin()
ItemReader.open()
ItemWriter.open()
Transaction.commit()

repeat until item reader returns null
    Transaction.begin()
    repeat-on-chunk
        ItemReader.readItem()
        ItemProcessor.processItem()
    ItemWriter.writeItems
    Transaction.commit()

Transaction.begin()
ItemReader.close()
ItemWriter.close()
Transaction.commit()

Where seen:
  • No errors in 8.1+
  • Upstream with the tracking set to true and exception is thrown; /subsystem=datasources/data-source=ExampleDS:write-attribute(name=tracking, value=true)
  • https://gist.github.com/jamezp/cf39f92913425c83929f
    • This shows where the connection was allocated that is crossing the transaction boundary, and hence killed (Jesper)

Questions:
  • Will the tracking attribute be used often?
    • The feature is meant to allow people to find where in their code they are making assumptions that isn't true (Jesper)
      • The corner case is just the "c.close()" call (Jesper)


Possible Fixes:
  • Leave batch as is. If the tracking attribute is set to true exceptions will be thrown
    • Remember that the underlying connection will be killed (Jesper)
  • Add a property to jBeret to use local (fake) transactions. This is what we currently do and I feel it's just wrong.
  • Create a deployment descriptor (or possibly a property that can be passed when starting the job) to allow different styles for transactions
Example JBoss Way
repeat until item reader returns null
    Transaction.begin()
    ItemReader.open()
    ItemWriter.open()
    repeat-on-chunk
        ItemReader.readItem()
        ItemProcessor.processItem()
    ItemWriter.writeItems()
    ItemReader.close()
    ItemWriter.close()
    Transaction.commit()



-- 
James R. Perkins
JBoss by Red Hat

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

James Perkins
I'm retracting my email due to my misunderstanding. It's likely most item readers will not need a JTA data source. Therefore if you use a non-JTA data source you can cache the connection for the readItem() method. Any XA resources need to be opened and closed within the scope of a single transaction. For example ItemWriter.writeItems(List<Object> items) the XA resource must be opened and closed within this method and NOT in the ItemWriter.open() and ItemWriter.close() methods.

In other words, there are no changes needed. It just needs to be noted one should not cache an XA data source in the open method and attempt to reuse it later.

On 02/03/2015 02:21 PM, James R. Perkins wrote:
In the JSR 352 batch specification there are some issues around transactions with JCA. These issues would mainly be seen with JDBC item readers/writers.

Here is kind of a thought dump from a sit down I had with the JCA team. If anyone has any opinions on how this should be handled let's talk it out. I would imagine this is a fairly important issue as generally speaking batch jobs will likely be legacy jobs and JDBC is probably heavily used.



Batch Transactions with JCA

Problem:
Batch requires that a Transaction.begin() and a Transaction.commit() for each open, close and chunk processed. This causes connections from JCA to potentially cross transactional boundaries.
    This requires that the JCA CachedConnectionManager (CCM) is enabled for the resource in question (Jesper)

Transaction code should be written like:
Connection c
Transaction.begin()
c = DataSource.getConnection()
c.close()
Transaction.commit()

Chunk Processing process:
Transaction.begin()
ItemReader.open()
ItemWriter.open()
Transaction.commit()

repeat until item reader returns null
    Transaction.begin()
    repeat-on-chunk
        ItemReader.readItem()
        ItemProcessor.processItem()
    ItemWriter.writeItems
    Transaction.commit()

Transaction.begin()
ItemReader.close()
ItemWriter.close()
Transaction.commit()

Where seen:
  • No errors in 8.1+
  • Upstream with the tracking set to true and exception is thrown; /subsystem=datasources/data-source=ExampleDS:write-attribute(name=tracking, value=true)
  • https://gist.github.com/jamezp/cf39f92913425c83929f
    • This shows where the connection was allocated that is crossing the transaction boundary, and hence killed (Jesper)

Questions:
  • Will the tracking attribute be used often?
    • The feature is meant to allow people to find where in their code they are making assumptions that isn't true (Jesper)
      • The corner case is just the "c.close()" call (Jesper)


Possible Fixes:
  • Leave batch as is. If the tracking attribute is set to true exceptions will be thrown
    • Remember that the underlying connection will be killed (Jesper)
  • Add a property to jBeret to use local (fake) transactions. This is what we currently do and I feel it's just wrong.
  • Create a deployment descriptor (or possibly a property that can be passed when starting the job) to allow different styles for transactions
Example JBoss Way
repeat until item reader returns null
    Transaction.begin()
    ItemReader.open()
    ItemWriter.open()
    repeat-on-chunk
        ItemReader.readItem()
        ItemProcessor.processItem()
    ItemWriter.writeItems()
    ItemReader.close()
    ItemWriter.close()
    Transaction.commit()



-- 
James R. Perkins
JBoss by Red Hat

-- 
James R. Perkins
JBoss by Red Hat

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

Jesper Pedersen-2
On 02/04/2015 01:42 PM, James R. Perkins wrote:

> I'm retracting my email due to my misunderstanding. It's likely most
> item readers will not need a JTA data source. Therefore if you use a
> non-JTA data source you can cache the connection for the readItem()
> method. Any XA resources need to be opened and closed within the scope
> of a single transaction. For example ItemWriter.writeItems(List<Object>
> items) the XA resource must be opened and closed within this method and
> NOT in the ItemWriter.open() and ItemWriter.close() methods.
>
> In other words, there are no changes needed. It just needs to be noted
> one should not cache an XA data source in the open method and attempt to
> reuse it later.
>

Just remember that any non-transactional resources (NoTransaction) needs
an explicit .close() in order to be returned to the connection pool.
Otherwise you will out of connections fast ;) Scenarios like this can be
investigated using the IronJacamar leak detector pool,

 
http://www.ironjacamar.org/doc/userguide/1.2/en-US/html/ch04.html#configuration_ironjacamar_leakpool

since there is no transactional boundary to do the check on.

Batch is still a bit up in the air IMHO, and the container is counting
on the user to do the right thing. Which we all know doesn't always
happen. So, I think it is a good idea to explore different transactional
models looking at existing batch frameworks.

Best regards,
  Jesper

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

James Perkins

On 02/04/2015 10:51 AM, Jesper Pedersen wrote:

> On 02/04/2015 01:42 PM, James R. Perkins wrote:
>> I'm retracting my email due to my misunderstanding. It's likely most
>> item readers will not need a JTA data source. Therefore if you use a
>> non-JTA data source you can cache the connection for the readItem()
>> method. Any XA resources need to be opened and closed within the scope
>> of a single transaction. For example ItemWriter.writeItems(List<Object>
>> items) the XA resource must be opened and closed within this method and
>> NOT in the ItemWriter.open() and ItemWriter.close() methods.
>>
>> In other words, there are no changes needed. It just needs to be noted
>> one should not cache an XA data source in the open method and attempt to
>> reuse it later.
>>
>
> Just remember that any non-transactional resources (NoTransaction)
> needs an explicit .close() in order to be returned to the connection
> pool. Otherwise you will out of connections fast ;) Scenarios like
> this can be investigated using the IronJacamar leak detector pool,
Right and this *should* happen in the Item(Reader|Writer).close()
correct? Of course the user needs to ensure that happens, but it's
"safe" assuming they do right?

>
>
> http://www.ironjacamar.org/doc/userguide/1.2/en-US/html/ch04.html#configuration_ironjacamar_leakpool 
>
>
> since there is no transactional boundary to do the check on.
>
> Batch is still a bit up in the air IMHO, and the container is counting
> on the user to do the right thing. Which we all know doesn't always
> happen. So, I think it is a good idea to explore different
> transactional models looking at existing batch frameworks.
I'm not necessarily opposed to this, I'm just as concerned as I was :).
It might make sense to have another option for users that do need XA
resources and don't want to open/close them in each readItem().
>
> Best regards,
>  Jesper
>

--
James R. Perkins
JBoss by Red Hat

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

Jesper Pedersen-2
On 02/04/2015 01:55 PM, James R. Perkins wrote:
>> Just remember that any non-transactional resources (NoTransaction)
>> needs an explicit .close() in order to be returned to the connection
>> pool. Otherwise you will out of connections fast ;) Scenarios like
>> this can be investigated using the IronJacamar leak detector pool,
 >
> Right and this *should* happen in the Item(Reader|Writer).close()
> correct? Of course the user needs to ensure that happens, but it's
> "safe" assuming they do right?
>

Yeah, because there are no bugs in applications.

... Of course that doesn't explain why <ccm debug=true> and the leak
detector pool are needed ... must be something else ...

try/catch/finally is a good thing.

Best regards,
  Jesper

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

David Lloyd-2
On 02/04/2015 01:01 PM, Jesper Pedersen wrote:

> On 02/04/2015 01:55 PM, James R. Perkins wrote:
>>> Just remember that any non-transactional resources (NoTransaction)
>>> needs an explicit .close() in order to be returned to the connection
>>> pool. Otherwise you will out of connections fast ;) Scenarios like
>>> this can be investigated using the IronJacamar leak detector pool,
>   >
>> Right and this *should* happen in the Item(Reader|Writer).close()
>> correct? Of course the user needs to ensure that happens, but it's
>> "safe" assuming they do right?
>>
>
> Yeah, because there are no bugs in applications.
>
> ... Of course that doesn't explain why <ccm debug=true> and the leak
> detector pool are needed ... must be something else ...
>
> try/catch/finally is a good thing.

Connection objects are AutoCloseable, so you can even do:

   try (Connection c = ds.getConnection()) {
      // do stuff
   }

Simple and clean.
--
- DML
_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev
Reply | Threaded
Open this post in threaded view
|

Re: Transactions in Batch (JSR 352)

Jesper Pedersen-2
On 02/04/2015 02:50 PM, David M. Lloyd wrote:
>> try/catch/finally is a good thing.
>
> Connection objects are AutoCloseable, so you can even do:
>
>     try (Connection c = ds.getConnection()) {
>        // do stuff
>     }
>
> Simple and clean.

Hopefully it will make it into the JCA spec some day :) We are still
using bow and arrow in many cases.

_______________________________________________
wildfly-dev mailing list
[hidden email]
https://lists.jboss.org/mailman/listinfo/wildfly-dev