File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
The moose likes I/O and Streams and the fly likes Creating a HTML parser in java Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Creating a HTML parser in java" Watch "Creating a HTML parser in java" New topic

Creating a HTML parser in java

Neaman Shafiq

Joined: Feb 07, 2001
Posts: 5
i am currently producing an online tutorial for children which they can use to learn HTML. i need to create a mechanism which allows the child to enter his/her HTML code into a window or applet, which i can then take and process and return the output. The process involves parsing the HTML code provided by the child. does anyone know how this can be accomplished using java and its libraries? i do not wish to use string tokenizers for this purpose as they seem, altho effective, a bit tedious to program and inefficient.
help! please!
Carl Trusiak

Joined: Jun 13, 2000
Posts: 3340
Read the Documentation on javax.swing.text.html.HTMLEditorKit.Parser

Hope This Helps
Carl Trusiak

I Hope This Helps
Carl Trusiak, SCJP2, SCWCD
Ranch Hand

Joined: Jul 06, 2001
Posts: 54
Hi Shafiq,
Start by creating an applet. This applet will contain 2 panes :
The first one will contain a javax.swing.JTextArea where kids would enter the html text. Let us call this text area inputHtmlText.
The second pane will contain a javax.swing.JEditorPane, let us call it htmlViewer :
javax.swing.JTextArea inputHtmlText = new javax.swing.JTextArea();
javax.swing.JEditorPane htmlViewer = null;
Suppose that the text entered by a kid in the text area is stored in a String :
String htmlString = inputHtmlText.getText();
Now you will construct a new JEditorPane :
htmlViewer = new javax.swing.JEditorPane("text/html", htmlString);
And you are done.
I assume that you are familiar with events handling and swings.
You should also hope that the browser used by the kid supports JRE 1.2.2
Take care

Omar IRAQI Houssaini
Jan Sauerwein

Joined: Jul 08, 2001
Posts: 6
When you want to implement the whole HTML 4.0 Standard i wish you a lot of fun. And hope you've enough time the next year.
For very easy HTML the
is enough. But when you want to use some kind of style sheets, or Java-Skript it sucks.
Writing a real Parser isn't easy. There is a lot of mathematics involved. Look at the W3C at there speech definitions and you won't do that any longer. The grammar for a really good html-parser is very complex.
And I recommend to you to use and other programming-language to do that. When you use C/C++ you can use the classes of the mozilla project. So you haven't to do the identifing of the tags and there correctness.
I hope I show you that it will be no good idea to program a complete parser for html.
Ranch Hand

Joined: Jul 06, 2001
Posts: 54
Hi Jan Sauerwein,
I appreciate your contribution, but I think you didn't read my solution.
The javax.swing.JEditorPane uses the default parser provided by the HTMLEditorKit, and unfortunatly this is the same parser used to implement Sun HotJava browser!
What is great about the JEditorPane class, is that it hides the programmer from all these parsing details, he just passes the HTML String to the JEditorPane constructor and the JEditorPane object does the rest.
So, I think that if one wants just to display an HTML file content, then the solution that I have provided is the easiest one.
[This message has been edited by Omar IRAQI (edited July 08, 2001).]
I agree. Here's the link:
subject: Creating a HTML parser in java
It's not a secret anymore!