This week's giveaway is in the Android forum.
We're giving away four copies of Android Security Essentials Live Lessons and have Godfrey Nolan on-line!
See this thread for details.
The moose likes XML and Related Technologies and the fly likes Parsing and Storing HUGE XML files Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Parsing and Storing HUGE XML files" Watch "Parsing and Storing HUGE XML files" New topic
Author

Parsing and Storing HUGE XML files

vicky kumar
Ranch Hand

Joined: Dec 13, 2002
Posts: 55
I have a huge XML file containing say millions of records. I get this file on daliy basis from the customer. My requirement is to parse the file and store all the records in it in database. Because of the size of the file, it can run into memory issues. Is there any other way to parse the file in chuncks, store in DB w/o running into memory problems. I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.

Please let me know if anyone has had such implementation done in his work.

Thanks
Vicky
Paul Clapham
Bartender

Joined: Oct 14, 2005
Posts: 18541
    
    8

Parsing with SAX or STAX should do what you want.
William Brogden
Author and all-around good cowpoke
Rancher

Joined: Mar 22, 2000
Posts: 12761
    
    5
I believe any XML technique like XPopinter, XQuery or XPATH will hold it in DOM and that will be a problem.


You are exactly right.

This may be a job for "pipeline" style processing. I did survey article 1 and article 2 on XML pipelines.

I strongly recommend Harold's online book chapter on SAX processing.

Bill

[ October 08, 2008: Message edited by: William Brogden ]
[ October 08, 2008: Message edited by: William Brogden ]
Yves Zoundi
Ranch Hand

Joined: Aug 31, 2008
Posts: 47
Like Paul Clapham said, you want some SAX or STAX, or any XML parsing library which is event based(xpp3, etc.). Trying to load the document inside a tree based XML API will probably give you a outofmemoryerror, you'll try playing with the heap size and get nowhere...
It will be less convenient/easy depending on the XML document structure and complexity, but at least you'll be able to process the file.


Author of VFSJFileChooser and XPontus XML Editor
Neeraj Vij
Ranch Hand

Joined: Nov 25, 2003
Posts: 315
Hi,

I have also same issue to address. But few extra things too.

1.) xsd validation.- if the xml formation aginst xsd is incorrect. I need to save the reason for it and show it to the user.

2.) perform user validations if xml is fine and save the records into DB.

I was thinking of using Castor api.

But my concern is of mapping around 150 fields to java classes and then saving into DB.

Should I map java classe to xml fields using castor and then do validations etc.

or

should I use sax parser to parse the file and populate the fields one by one.

Please suggest some inputs.

Thanks,
Neeraj.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Parsing and Storing HUGE XML files
 
Similar Threads
Converting xml files to test files
Any open source XQJ implementation?
XML Parser
parsing XML file without loading it
JTable challenge