File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Soft Skills: The software developer's life manual this week in the Jobs Discussion forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

XML parsing VS simple TXT parsing using java streams

 
john wesley
Ranch Hand
Posts: 47
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,
I have a situation here, I am currently storing huge amounts of data (half GB, one GB at most 2 GB) in text file (csv style)�and then I parse them using simple java streams �I would read them once and calculate some summaries and fill in oracle table. However, now I am thinking of storing the data in XML format instead of text format and use SAX for parsing , the file size would surely shoot up ..maybe double �.but more important is parsing performance �is XML suited for this amount of data ?? will SAX parsing be any better than simply reading text file using java streams and tokenizing them ??

Can some one please throw some light on this issue
thanks,

.....jw
 
Arun Prasath
Ranch Hand
Posts: 192
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, you can write huge amounts of XML but not by using SAX or DOM. but by using SAX extensions that are available.
I would suggest you to read this article
This is a good one that talks about that.
Hope it helps..
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13044
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since XML parsing will add LOTS of overhead I can't imagine how you could avoid a major slowdown. Any XML processing will involve creation of lots of objects, conversion to and from String etc.
IF (big if) your data is all ASCII, you will be much faster handling the input as byte streams and byte[] buffers, not character streams and staying well away from String conversion until the last minute.
XML shines when the data structure is complex, anything that can be represented as CSV is not a good candidate.
Bill
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic