File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
Win a copy of Clojure in Action this week in the Clojure forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Utiltiy that uses Terse XML tags

 
bob connolly
Ranch Hand
Posts: 204
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

Has anyone run accross a utility to convert XML tags to low memory sized cryptic characters?

I'm hoping to find a utility that will read a BILLION record XML file and convert the tags to a special byte size character to reduce file size!

This utility would do something like convert <intex_traunche_cusip_id> to somthing like #@ or somekind of unmodifyable crytptic low size value?

Thanks for any references or suggestions!

bc
 
William Brogden
Author and all-around good cowpoke
Rancher
Posts: 13048
6
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Actually that would be pretty trivial to program using SAX. The startElement and endElement methods would have to build and use a replacement table. However, before you embark on that you should look into how much compression the plain ZIP compression utility can provide.

I looked into both ZIP encoding and "fast infoset" for this article. ZIP encoding compressed my test file by more than a factor of 10 with only a minor effect on parsing time.

Let us know what you come up with, I think a lot of people are worried about large XML files.

Bill
 
bob connolly
Ranch Hand
Posts: 204
  • 0
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks William, very good article!

I'm going to take a closer look into the specifications for that Fast Infoset technique!

And it's good to know that the zipping is about the best anyone can do for right now!

Have a good one William!

bc
 
I agree. Here's the link: http://aspose.com/file-tools
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic