File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Beginning Java and the fly likes HTML parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Soft Skills this week in the Jobs Discussion forum!
JavaRanch » Java Forums » Java » Beginning Java
Bookmark "HTML parser" Watch "HTML parser" New topic
Author

HTML parser

Flora Ng
Greenhorn

Joined: Jul 05, 2001
Posts: 11
I'm stuck in this problem...
I'm writing a program that behaves like a parser. It checks the HTML, ignore everything inside the tag, but extract numbers that are outside the tag (numbers that are visible via the use of browser).
However the problem is: if the program reads in character by character. When it comes to '<', it will think this is an open tag and will ignore everything until a '>' comes up
So for example in the following sentence:
three < five<br /> The system will continue to look for '>' and never terminate.
What's the best solution to that? Are there any way to identify HTML tag?
Thanks in advance.
Flora
Cindy Glass
"The Hood"
Sheriff

Joined: Sep 29, 2000
Posts: 8521
The traditional way around it is to use "& lt' and '& gt' (without the spaces) if you want to display greater to and less than and know that they are not html. This of course only works if YOU get to control the input into the html page.

[This message has been edited by Cindy Glass (edited August 16, 2001).]


"JavaRanch, where the deer and the Certified play" - David O'Meara
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: HTML parser