I'm stuck in this problem...
I'm writing a program that behaves like a parser. It checks the HTML, ignore everything inside the tag, but extract numbers that are outside the tag (numbers that are visible via the use of browser).
However the problem is: if the program reads in character by character. When it comes to '<', it will think this is an open tag and will ignore everything until a '>' comes up
So for example in the following sentence:
three < five<br /> The system will continue to look for '>' and never terminate.
What's the best solution to that? Are there any way to identify HTML tag?
Thanks in advance.
Flora