This week's book giveaway is in the Servlets forum.
We're giving away four copies of Murach's Java Servlets and JSP and have Joel Murach on-line!
See this thread for details.
The moose likes Spring and the fly likes To Spring Batch or ETL Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Frameworks » Spring
Bookmark "To Spring Batch or ETL" Watch "To Spring Batch or ETL" New topic
Author

To Spring Batch or ETL

Arun Kumarr
Ranch Hand

Joined: May 16, 2005
Posts: 513

We are in solutioning phase of a project and we are at a place where we have daily files which are coming in, which has to be parsed, validated and loaded into the database.
The file size could be from 20MB to 200 MB. Now we wanted to use Spring Batch, but considering the file size, we also are thinking about an ETL tool to do the job.
I was just curious (and also to rule out the ETL option), what is the cap on the file size which can be processed via Spring Batch. Is there any limit?
Now the SLA to load the files vary from 1 hr to 5hrs (not in any order, so let's take the worst case 200MB-1 hr combination).
Has anyone used such a combination in their project? What is your advice? Any valuable suggestions/problems you faced while loading bulk files, please let me know.

If you are not laughing at yourself, then you just didn't get the joke.
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

There is no cap on the file size.

You can also do some tricks like Multi-threading, partitioning, and remoting jobs to increase performance if the out of the box single threaded chunk reading is to slow. But you would be surprised at the speed.

I highly recommend Spring Batch over an ETL tool, just because I think Spring Batch is easier to use, setup, and code to. And very powerful with a database to store executions of jobs and statistics.

Hope that helps

Mark


Perfect World Programming, LLC - Two Laptop Bag - Tube Organizer
How to Ask Questions the Smart Way FAQ
William P O'Sullivan
Ranch Hand

Joined: Mar 28, 2012
Posts: 860

Depending on your needs ...

Spring Batch will not do the parsing for you. You will need to receive the files, process, validate etc..
Look into mule ESB also for automatic triggering on reception of files in certain folders/directories.

Also, for ETL, look at Talend, I believe it's open source and can transform all sorts of files.

WP
Arun Kumarr
Ranch Hand

Joined: May 16, 2005
Posts: 513

Thank you Mark and William. We did consider Talend and Kettle for the ETL Job.
Would it be fair logical assumption to consider the ETL Tools as --> Parsers + A spring batch like framework + UI?
Also, I believe ETL tools does a run-time configuration changes to field mappings, which is tough in Spring batch (code change, compile and deploy).
So when I have to take a call, I'll check if my changes in fields and field mappings are huge, then we would suggest to go ahead with the ETL tool, else we would prefer Spring Batch (my personal preference too).
Mark Spritzler
ranger
Sheriff

Joined: Feb 05, 2001
Posts: 17249
    
    6

William P O'Sullivan wrote:Depending on your needs ...

Spring Batch will not do the parsing for you. You will need to receive the files, process, validate etc..
Look into mule ESB also for automatic triggering on reception of files in certain folders/directories.

Also, for ETL, look at Talend, I believe it's open source and can transform all sorts of files.

WP


That's true, but not exactly correct. There are mechanisms in Spring Batch, built in, for reading and parsing files. You still have some code to write to do the mapping of what is in a line in the file and any domain object you have that represents it, but the mechanism to use a simple callback interface mapping to me is a huge gain, taking away all the pain points of parsing code.

Also, adding Spring Integration with Spring batch can add transforms and many more.

Mark
 
It is sorta covered in the JavaRanch Style Guide.
 
subject: To Spring Batch or ETL
 
Similar Threads
confusion?
Process the multiple records in a file by Producer/consumer concept using Multithreading
Batch File
javax.transaction.TransactionRolledbackException : Transaction is ended due to timeout
set commit size in spring