• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML parsing examples and/or tutorials

 
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I'm trying to learn Java, XML and regex at the same time. I need this to create a program that will parse a mixed content XML file and produce another. The input files are big documents and the output files are the same documents with rewritten markup based on textual content. My biggest challenge seems to be to identify textual patterns that may cross node boundarys. A node could be an element, a comment, a processing instruction and so on. Of course elements are not created equal and most can have attributes that I need to retain and possibly add to newly created elements.

Most likely I will use DOM since I need to do some look-ahead and perhaps also look-behind to recognize patterns and where they start and end. DOM also seems to be a good choice with mixed content (an element can contain text and child elements in any order and recursively). Feel free to try to convince me there is a better alternative to DOM!

I have also looked at XPath. I can see that it is powerful but I don't see how it could help me.

I have found some examples and a little bit of tutorial information but most tackle rather simple problems. What I would like to get is pointers to XML parsing and construction examples that could give me more ideas and inspiration to learn good techniques for handling semi-complex cases.
 
Ranch Hand
Posts: 151
MyEclipse IDE Python Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
html unit helps you allot this might help you alot webpage
 
T Dahl
Ranch Hand
Posts: 35
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks!

A parser for HTML probably has many of the same challenges as a parser for XML. I will see if I can find some inspiration in the source code (which at first sight looks enormous).

Other pointers are still welcome of course!
 
She's out of the country right now, toppling an unauthorized dictatorship. Please leave a message with this tiny ad:
a bit of art, as a gift, that will fit in a stocking
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic