• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Huge XML file.

 
Ranch Hand
Posts: 215
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, I am using SAX parser. I have an XML file which can have records upto millions. Is their any problem of parsing such huge file?
Please comment.

Thanks in advcance.
 
Marshal
Posts: 28193
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
There are all kinds of things that you can do wrong when you are writing programs. Do you have a more specific question about that scenario?
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
We get similar questions frequently, browse through or search this forum.

The most important questions are:
1. What do you have to get out of the file?
2. Do you have to write a modified XML document?

Generally speaking, SAX techniques do not keep any data beyond the current buffer load and current event so file size is irrelevant.

Bill
 
Rahul Ba
Ranch Hand
Posts: 215
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ya, I want to say their are millions of field and I want to retrive the value from those field?
So, Parsing million of fields....will it work properly? Will it be slow?

Thanks in advance for your opinion.
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
SAX is the fastest way to parse your file because - as I already said - it does not build and keep a whole lot of objects.

With SAX it is up to you, the programmer, to keep track of everything in the XML hierarchy so you can recognize exactly where the parser is in the document.

Why are you worried about this? A few simple experiments would reveal just how much time the raw parsing would take with your document.

Bill
 
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually VTD-XML claims to be faster than SAX, even for huge files...
in addition, vtd-xml supports XPath and random acccess like DOM
you might want to investigate that as well

http://vtd-xml.sf.net
 
William Brogden
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Although VTD sounds interesting, I immediately hit this alarming comment:

We propose a "non-extractive" tokenization approach that maintains the source document intact in memory.



and go - turns out their "huge" XML file for testing was (wait for it....)

"po_huge.xml" ----- 9,907,759 bytes

Yes you can accomplish miracles of speed if you make a few tiny asumptions.....

But if I had to run lots of XPath expressions on a medium size XML document, I would look into it.

Bill
 
Jamie Zhang
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The regular VTD-XML processor upto 2GB of files in size...
VTD-XML-HUGE process XML files upto 256 GB in size...

The memory consumption for VTD-XML is 1.3~1.5x the size of XML documents

so for a 100MB file, VTD-XML would consume 130MB of memory

With VTD-XML-HUGE, you can potentially do memory mapping ...
 
reply
    Bookmark Topic Watch Topic
  • New Topic