• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Utiltiy that uses Terse XML tags

 
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

Has anyone run accross a utility to convert XML tags to low memory sized cryptic characters?

I'm hoping to find a utility that will read a BILLION record XML file and convert the tags to a special byte size character to reduce file size!

This utility would do something like convert <intex_traunche_cusip_id> to somthing like #@ or somekind of unmodifyable crytptic low size value?

Thanks for any references or suggestions!

bc
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Actually that would be pretty trivial to program using SAX. The startElement and endElement methods would have to build and use a replacement table. However, before you embark on that you should look into how much compression the plain ZIP compression utility can provide.

I looked into both ZIP encoding and "fast infoset" for this article. ZIP encoding compressed my test file by more than a factor of 10 with only a minor effect on parsing time.

Let us know what you come up with, I think a lot of people are worried about large XML files.

Bill
 
bob connolly
Ranch Hand
Posts: 204
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks William, very good article!

I'm going to take a closer look into the specifications for that Fast Infoset technique!

And it's good to know that the zipping is about the best anyone can do for right now!

Have a good one William!

bc
 
Because those who mind don't matter and those who matter don't mind - Seuss. Tiny ad:
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic