I'm reasonably experienced with Spring and have done several Batch projects before, but I'm facing a new situation and I'm not sure how to implement for it. The situation is this:
This Batch Project requires querying SFDC for data from a stage table and then making a second query to pull the real data later on based on the results of the first. So the point where I'm stuck is, how exactly to go about implementing the ItemReader for this. The first query (ID Query) is basically pulling back a bulk load of ID's based on some parameters and could be a load of a few hundred ID's.
Basically, how do I set up my ItemReader to make this query, pull back the set of ID's, and then pass them back correctly without sticking them into some kind of collection object and passing back the collection? It's my understanding that the ItemReader returns the items being read in 1 at a time, so if I pull back 100 in a single query, how do I go about parsing these back with the Item Reader?
Also, I'm aware that their is a limit on how many results an SFDC Query can return in one go (200 if I'm not mistaken) or at least the instance we are working with has this limit. Say there were 300 records, how should I go about ensuring proper chunking of the results so that I can get all 300? I know it could be hacked together to say, pull 200, then set the status of those 200 to in-process and then query again... but that just doesn't strike me as the best way to do this.
I'd appreciate any help and advice I can get. Examples are always best but I'm more or less stuck on what is the best way to implement this.
Not quite sure. But I would probably have an ItemReader read the first batch of ids. Pass them to an ItemProcessor where you can do whatever you want, one id at a time loop through and run the second query. Then pass the results of the second query to your ItemWriter.
I am not sure how to solve your problem, but it does seem like you are getting stuck because of the way you are looking at a Job with Steps and each Step with an Reader and a Writer, but that you are missing that you can also create a Processor.
The other thing that you might not have looked at as a solution was breaking it down into more steps rather all done in one step. What if the first step is to get the ids in the first query and store them somewhere. Then the next step is to get back the ids that you need to run the second query.
I definitely appreciate the advice and that gave me some great things to think about. I should have posted this in the original post. But the SFDC instance I am working with has Governance limits which only allow a certain number of calls to be made to and from it within a given time frame. If I did a per record query back for the 2nd query, I would hit those limits pretty quickly (this is a half hour - hourly job). So I have to do bulk calls as they only count as 1 call vs the Governance Limits. So my 2nd query back is a big bulk call on the list of ID's the first returned. If I could do a per record call, that absolutely would be how I would do it (with the ItemProcessor).
The only way I can think to use the ItemProcessor is to cheat the Reader a bit and have the reader pull the entire data set back in 1 run and return a single object that is a collection of all that data. But that, to me, seems to break the intended usage of a Reader and doesn't feel right alongside best practices.
The biggest problem I have is, that Reader/Writer steps seem to be designed for per record basis (ie each call to read() on the reader returns 1 record, processor works on 1 record at a time, and only the writer works on the whole data set at once). But in this case, because of these limits I have to do bulk queries and I don't think any of prebuilt Reader beans (such as IbatisPagingItemReader etc) will work for this because of how we have to interact with SFDC. So I'm basically stumped on how to do it right because I don't just want to hack something together. I'm sure there has to be something in the Spring Batch tool kit that was built for a scenario like this...
I'm pretty sure I'm going to break it down into multiple steps. Right now Im thinking:
Step 1) Reader: performs ID Query
Writer: writes ID's to 11i Stage Table
Step 2) Reader: reads ID's from 11i Stage Table. Performs Lead Query
Writer: pushes data to consuming service.
But for step 2, that has the Reader essentially performing 2 reads. 1 from the stage table and the other from SFDC using that data. What would be the proper way to break it up to confirm to the best practices and intended functionality? I've seen suggestions on the web of implementing a StepExecutionListener with my Reader and having the beforeStep function run the query to the Stage Table for that data and then the actual read() makes the call to SFDC (in this case). But that doesn't seem like the right way to handle it..
So thats where I'm stuck. I could implement this in any number of ways but I can't figure out what is the right way to do it that confirms to the intended usage and best practices. I'd appreciate any further advice or insight.
Joined: Apr 23, 2012
Well I've managed to eliminate most of my problems, but I'm still having the issue of getting my Custom ItemReader to return results individually.
The situation is, my reader makes a query to SFDC that pulls back multiple results at once (searching for all records that meet a set of criteria). But I'm not sure how I am supposed to properly handle this in Spring Batch so that the Readers read() method is returning a record at a time so that the itemProcessor will receive that single record at a time.
This is basically what I have at the moment. The leadIdRetrievalService runs the query that pulls back all the id's and spits them back out in a list to the reader. What I can't figure out, is how to handle this so that the Reader meets the requirements of returning 1 record per call to read().
Any advice would be helpful as this has me twisted in knots as to what is the right way to do this. I know I could simply configure the reader to return the whole list, but that would make the Processor expect a List of the ID's and the writer a list of lists of Id's.
Again, I think you might be stuck in seeing the step as having to look one and only one way.
Since you have that one limitation. Take that into mind and think. How would I do this if I personally, not a computer, was doing this? What would my steps be? a single read from an ItemReader does not have to equals/equate to one row in the database. It could be one single read is reading a single table, for instance. You could have a first step that reads all the data and stores it in a file or somewhere for the write. then you pass the limitation, then the next step does it one row at a time. Break things down. A Job can have many steps, a step can have reader-writer or reader-processor-writer. And as you had mentioned you have ExecutionContexts to store data in. Maybe in a read/write you just read it all in and store the data in the JobExecutionContext. Problem though with that is that Spring Batch will save the data in the ExecutionContexts.
Also, you Readers etc can have instance variables to hold onto stuff too.