• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

XML Parsing

 
Ranch Hand
Posts: 65
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
how do i convert this

into
 
author
Posts: 3892
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You've got to be kidding right? This is natural language processing, not XML parsing. Natural language processing of this type (taking arbitrary English text and transforming it into some structured format) is a VERY hard problem, and one usually solved by people with years of experience at AI.
The way this is normally done (in most rule-based systems) is to FORCE the user to input the rules in some sort of very restricted UI that only allows them to construct valid rules through some sort of pull-down set of operators and operands.
If this is a customer requirement, you need to go back to the customer and re-negotiate for something more like the option I describe above. This is not going to be easy (or in my opinion, possible) to solve in its most extreme case.
Kyle
 
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Via manual coding... Seriously, you could consider trying a "compiler compiler" such as JavaCC but I'm pretty sure that would be an overkill (i.e. not one of the simplest tools to learn).
 
Lasse Koskela
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Oh, and since the topic has nothing to do with this forum, I'll ask the moderator to move the post to somewhere else (I don't know yet where).
 
author and deputy
Posts: 3150
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I agree with Kyle..
I tried to do exactly what you were saying using JavaCC and Java (Reg Exp).. At the end, its only waste of time and the result is not always consistent.
 
Author & Gold Digger
Posts: 7617
6
IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'm moving this post to the XML and Related Technologies forum. Please continue this discussion there. Thank you
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I do this sort of thing alot - there is no magic wand, just rather laborious hand coding. It helps immensely if the raw data is very consistent in other words if Rule: is always in that case and at the beginning of a text line.
Bill
 
Kyle Brown
author
Posts: 3892
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The problem that I see is that the text appears to be unstructured -- in other words, there's no way to predict what the rules will look like, even making hand coding darn near impossible. For instance:
the total purchase amount of a shopping cart
could be phrased in at least a half-dozen different ways (think about reversing the order of "shopping cart" and "purchase amount", leaving off "total", or "purchase", etc.
There has to be some structure to the text to make this even possible, which is why I suggested solving this with a GUI that provides that structure explicitly, rather than allowing free text.
Kyle
 
Lasse Koskela
author
Posts: 11962
5
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
FYI, IBM developerWorks has recently published an article titled Analyze non-XML data with XSLT
reply
    Bookmark Topic Watch Topic
  • New Topic