File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes XML and Related Technologies and the fly likes Utiltiy that uses Terse XML tags Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Engineering » XML and Related Technologies
Bookmark "Utiltiy that uses Terse XML tags" Watch "Utiltiy that uses Terse XML tags" New topic

Utiltiy that uses Terse XML tags

bob connolly
Ranch Hand

Joined: Mar 10, 2004
Posts: 204

Has anyone run accross a utility to convert XML tags to low memory sized cryptic characters?

I'm hoping to find a utility that will read a BILLION record XML file and convert the tags to a special byte size character to reduce file size!

This utility would do something like convert <intex_traunche_cusip_id> to somthing like #@ or somekind of unmodifyable crytptic low size value?

Thanks for any references or suggestions!

William Brogden
Author and all-around good cowpoke

Joined: Mar 22, 2000
Posts: 13036
Actually that would be pretty trivial to program using SAX. The startElement and endElement methods would have to build and use a replacement table. However, before you embark on that you should look into how much compression the plain ZIP compression utility can provide.

I looked into both ZIP encoding and "fast infoset" for this article. ZIP encoding compressed my test file by more than a factor of 10 with only a minor effect on parsing time.

Let us know what you come up with, I think a lot of people are worried about large XML files.

bob connolly
Ranch Hand

Joined: Mar 10, 2004
Posts: 204
Thanks William, very good article!

I'm going to take a closer look into the specifications for that Fast Infoset technique!

And it's good to know that the zipping is about the best anyone can do for right now!

Have a good one William!

I agree. Here's the link:
subject: Utiltiy that uses Terse XML tags
It's not a secret anymore!