I am considering using Spring Batch framework to migrate content (including metadata, binary assets..) from source to a target content management system. There is also transformation of content into the target system. I wanted to see if people are using Spring Batch for such content migration scenarios. I have read the Spring batch documentation(http://static.springsource.org/spring-batch) and understand that I may need to write my custom ItemReaders/Processors/Writers. Also some of the tasks for content migration don't seem to fit very well into the Spring Batch Step domain language. Let me know what you guys think of Spring Batch and any issues/gotchas to consider.
I have not had the opportunity to use Spring Batch yet so I'll let someone else comment to that. But when you start talking about transformation etc. I think Spring Integration. You might look at that project as well to see if it will solve the issues where Spring Batch is lacking.
Yes, there are built in Readers and Writers. Mostly you would implement ItemProcessors. So you could call out to Spring Integration in your processor and probably use the built in readers and writers.
For instance. Say you have files on the file system that are written in XML. Actually in this example I wouldn't need a processor as I would want the reader to read in the file using JAXB. This would transform your xml to Java objects. Then say you want to save that info into a database, you could use the built in JDBC writer.
If you are say reading from one content management that stores docs in xml, and you want to save to another content management also in xml, but a different format, that is where you want to use a processor to call out to Spring Integration to do the xml to xml transformation. You don't have to, but it probably will be a cleaner approach.
Thanks for the replies. Yea I think I would end up implementing ItemProcessors for parsing/transforming content. In my case the source content is going to be HTML markup and target is XML (in Teamsite content management system). We choose this instead of reading from DB due to the complexity of source data model and access. I have not looked at Spring Integration yet; but am thinking of using XSLT or some templating engine for transformation. So at a high level these are the tasks I have identified for batch job.
1. Retrieving Content (html markup).
2. Parse html and Retrieve Dependencies from content (images, binary assets...)
3. Write dependencies to target filesystem.
4. Write metadata to target filesystem.
5. Transform content.
6. Write content to target filesystem.
7. publish/send acknowledgement...
For input to the spring batch I am using a CSV file which has source/target url and associated metadata.... I wanted to get some ideas on how to design the spring batch config (job/steps/tasklets and itemReaders/processors/writers).
Joined: Aug 28, 2005
So I have done a POC for Spring Batch for our content migration and it was successful. A lot of OOTB ItemReaders/ItemWriters will be helpful in our case. I have also designed my initial job config. One question I had was regarding writing the content/depedencies to the filesystem. So I am thinking of having a custom ItemWriter implementation which will retrieve the images/pdfs/content through http and write them in the target file system. My main concern is that since this is a Custom ItemWriter do I need to also implement methods in ItemStream interface for storing state and providing restartability ? If thats the case any examples would be helpful.
It depends. On many factors. Mostly how you write it will tell. So There isn't an answer to having to implement any of the interfaces. InputStream is just one option. There are JobListeners and StepListeners that can also be used to get to the JobContext and StepContext to store data. If you have to save data for restarts. There are also listeners for restarts and retries that might be of interest.
Or if your process removes the files from the source after a copy, then no state needed to restart, it is just a matter of just needed to process the remaining files still on the source side.
Lots of reasons, possibilities, so no direct answer to your question.
I’ve looked at a lot of different solutions, and in my humble opinion Aspose is the way to go. Here’s the link: http://aspose.com