This week's book giveaway is in the Cloud/Virtualizaton forum.
We're giving away four copies of Mesos in Action and have Roger Ignazio on-line!
See this thread for details.
Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Parsing and Storing HUGE XML files

 
vicky kumar
Ranch Hand
Posts: 55
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a huge XML file containing say millions of records. I get this file on daliy basis from the customer. My requirement is to parse the file and store all the records in it in database. Because of the size of the file, it can run into memory issues. Is there any other way to parse the file in chuncks, store in DB w/o running into memory problems. I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.

Please let me know if anyone has had such implementation done in his work.

Thanks
Vicky
 
Paul Clapham
Sheriff
Posts: 21107
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Parsing with SAX or STAX should do what you want.
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13061
6
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.


You are exactly right.

This may be a job for "pipeline" style processing. I did survey article 1 and article 2 on XML pipelines.

I strongly recommend Harold's online book chapter on SAX processing.

Bill

[ October 08, 2008: Message edited by: William Brogden ]
[ October 08, 2008: Message edited by: William Brogden ]
 
Yves Zoundi
Ranch Hand
Posts: 47
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Like Paul Clapham said, you want some SAX or STAX, or any XML parsing library which is event based(xpp3, etc.). Trying to load the document inside a tree based XML API will probably give you a outofmemoryerror, you'll try playing with the heap size and get nowhere...
It will be less convenient/easy depending on the XML document structure and complexity, but at least you'll be able to process the file.
 
Neeraj Vij
Ranch Hand
Posts: 315
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I have also same issue to address. But few extra things too.

1.) xsd validation.- if the xml formation aginst xsd is incorrect. I need to save the reason for it and show it to the user.

2.) perform user validations if xml is fine and save the records into DB.

I was thinking of using Castor api.

But my concern is of mapping around 150 fields to java classes and then saving into DB.

Should I map java classe to xml fields using castor and then do validations etc.

or

should I use sax parser to parse the file and populate the fields one by one.

Please suggest some inputs.

Thanks,
Neeraj.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic