Win a copy of Learn Spring Security (video course) this week in the Spring forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HTML parser

 
Flora Ng
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm stuck in this problem...
I'm writing a program that behaves like a parser. It checks the HTML, ignore everything inside the tag, but extract numbers that are outside the tag (numbers that are visible via the use of browser).
However the problem is: if the program reads in character by character. When it comes to '<', it will think this is an open tag and will ignore everything until a '>' comes up
So for example in the following sentence:
three < five<br /> The system will continue to look for '>' and never terminate.
What's the best solution to that? Are there any way to identify HTML tag?
Thanks in advance.
Flora
 
Cindy Glass
"The Hood"
Sheriff
Posts: 8521
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The traditional way around it is to use "& lt' and '& gt' (without the spaces) if you want to display greater to and less than and know that they are not html. This of course only works if YOU get to control the input into the html page.

[This message has been edited by Cindy Glass (edited August 16, 2001).]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic