You've got to be kidding right? This is natural language processing, not XML parsing. Natural language processing of this type (taking arbitrary English text and transforming it into some structured format) is a VERY hard problem, and one usually solved by people with years of experience at AI. The way this is normally done (in most rule-based systems) is to FORCE the user to input the rules in some sort of very restricted UI that only allows them to construct valid rules through some sort of pull-down set of operators and operands. If this is a customer requirement, you need to go back to the customer and re-negotiate for something more like the option I describe above. This is not going to be easy (or in my opinion, possible) to solve in its most extreme case. Kyle
Via manual coding... Seriously, you could consider trying a "compiler compiler" such as JavaCC but I'm pretty sure that would be an overkill (i.e. not one of the simplest tools to learn).
I agree with Kyle.. I tried to do exactly what you were saying using JavaCC and Java (Reg Exp).. At the end, its only waste of time and the result is not always consistent.
I do this sort of thing alot - there is no magic wand, just rather laborious hand coding. It helps immensely if the raw data is very consistent in other words if Rule: is always in that case and at the beginning of a text line. Bill
The problem that I see is that the text appears to be unstructured -- in other words, there's no way to predict what the rules will look like, even making hand coding darn near impossible. For instance: the total purchase amount of a shopping cart could be phrased in at least a half-dozen different ways (think about reversing the order of "shopping cart" and "purchase amount", leaving off "total", or "purchase", etc. There has to be some structure to the text to make this even possible, which is why I suggested solving this with a GUI that provides that structure explicitly, rather than allowing free text. Kyle