In this I will talk about Java batch implementations in two different frameworks: Spring Batch and Akka. I looked at both frameworks for a simple ETL program. I ran each of these with a commit size of 1 and of 10. I was curious what types of improvements would be seen for each batch job and how the increased processing time would affect completion time. Additionally I am going to look at each of these frameworks for concurrency.
Spring Batch
It is part of the Spring framework family. It is a heavily structured framework the breaks down the batch job into what it calls readers, processors and writers. The reader reads from a source, the processor transforms the data and writer outputs the transformed data to a source. By default it executes synchronously but is configurable for thread pools and forking threads asynchronously. The goal of spring batch is to batch your write operations together. By doing this the most of expensive operation (the write) will only be completed once for each commit size. This is referred to as chunk processing.
Akka
It is an actor based framework designed for concurrency. In Akka you create a system and actors. After starting the system you then send immutable data to actors. Each actor by default contains a mailbox which is a queue. The actor reads off the queue and processes the message. The actor system is designed to be non blocking and asynchronous. So if an actor is running and a new actor is launched there is no guarantee that it is on the same thread. The Actor system manages the number of threads.
In every actor system there is a dispatcher. The dispatcher controls handling of actors across threads and their mailboxes. The image below shows a balancing dispatcher. The balancing dispatcher essentially shares 1 mailbox of work across however many worker actors. When the worker has completed its work it responds saying it is done and the dispatcher pushes another message onto the mailbox that the worker will handle.
Execution
In my tests I looked a few different execution types. First I ran with the default behavior which means no pooling. Just 1 synchronous execution running. Next I tested each with different thread pool sizes and number of forks/ asynchronous executions. While the Akka and Spring Batch executions are not identical within the realms of each framework I found that they are analogous.
Next Steps
In my next blog I will walk through my implementations and what the results showed. It is available here – Part 2