aspose file tools*
The moose likes Java in General and the fly likes Creating a Parser Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of EJB 3 in Action this week in the EJB and other Java EE Technologies forum!
JavaRanch » Java Forums » Java » Java in General
Bookmark "Creating a Parser" Watch "Creating a Parser" New topic
Author

Creating a Parser

Peter Allen
Greenhorn

Joined: Apr 13, 2002
Posts: 13
Sorry, I couldn't find an answer to this. I've been trying to create my own parser to convert a .java file into .html format and inserting syntax highlighting. I thought I would give it a try, but it seems harder than I thought. My code is very sorry, for that I'm sorry, but it actually highlights all the keywords defined. The only exception is when a keyword doesn't have a delimiter character before it. This is a big problem for me. Here is what I have:

This code is reading it's own .java file and creates an HTML file and should color all the keywords red. The first import statement isn't highlighted red because it has no character before it, and there are other times when this error can occur (ex: if(true) {break;} or any other keyword in a way like this). I believe I have bad design or I'm overlooking something. Can anyone help me out? Thanks
PS: Also if anyone can find a better algorithm or can do something more efficiently, I'd appreciate help on it or a solution to that particular problem. I know this code is pretty inefficient, but this is the best I could do. Thanks again.
[ April 26, 2002: Message edited by: Peter Allen ]
[ April 26, 2002: Message edited by: Dirk Schreckmann ]
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Urg.
I'd recommend editing your post to put in some line breaks, so we can read it.
Also, in general "parsing" is rarely a beginner-level topic. This question might be better asked in intermediate. I'd transfer it, but I'll let you edit the question first.


"I'm not back." - Bill Harding, Twister
Peter Allen
Greenhorn

Joined: Apr 13, 2002
Posts: 13
Oops, sorry I should have seen that.
Peter Allen
Greenhorn

Joined: Apr 13, 2002
Posts: 13
I'm not trying to get too fancy with this. It ALMOST works, I just need help recognizing keywords when they're at different positions like the one I mentioned (like if its not surrounded by a space, tab, etc.) I just wanted help finding a way to fix this, otherwise it works the way I wanted to, even if it is very very messy. Thanks
Dirk Schreckmann
Sheriff

Joined: Dec 10, 2001
Posts: 7023
I'd guess Jim would accept the code formatting, now. So, I'm moving this to Java in General (intermediate)...


[How To Ask Good Questions] [JavaRanch FAQ Wiki] [JavaRanch Radio]
Mapraputa Is
Leverager of our synergies
Sheriff

Joined: Aug 26, 2000
Posts: 10065
What you are writing is called "Lexical analyzer" or "Lexer". Are you doing it because you need this functionality, for the fun of it, or as an educational experience? If the first is true, there is a generic tool that takes a grammar specification and generates Java classes to turn "raw" input into tokens: http://www.cs.princeton.edu/%7Eappel/modern/java/JLex/current/manual.html
Or you could use Java Compiler Compiler (JavaCC). This has one Java grammar specification ready, and even "JavaCC grammar to convert Java or JavaCC code to HTML" http://www.cobase.cs.ucla.edu/pub/javacc/#Jsection
Dirk Schreckmann
Sheriff

Joined: Dec 10, 2001
Posts: 7023
Originally posted by Peter Allen:
The only exception is when a keyword doesn't have a delimiter character before it.

One idea that occurs to me then, look for the keyword in the entire String (or StringBuffer) rather than trying to use a delimiter.
Also, you have a couple of places where using a StringBuffer (which is mutable while a String is not) would improve efficiency overall.
Jim Yingst
Wanderer
Sheriff

Joined: Jan 30, 2000
Posts: 18671
Or, skip saving things in Strings or StringBuffers entirely, and write your output directly to the Writer bw. OK, you'll have to save somethings along the way, but many things are not necessary. I'd just read one line from the file, process it, and write the results to the Writer before moving on.
To be fair, this sort of performance optimization is less important than getting the logic right. Here are some perfectly legal test cases to consider:

Can you highlight this correctly?
Incidentally, a related discussion may be found here.
[ April 27, 2002: Message edited by: Jim Yingst ]
Peter Allen
Greenhorn

Joined: Apr 13, 2002
Posts: 13
Thanks for the help. My friend found what was wrong, the HTML tags I was inserting was affecting the final String and how the StringTokenizer worked. I knew I had it working, but at the same time I feel stupid for not catching it. Thank you for your help though.
Steve Deadsea
Ranch Hand

Joined: Dec 03, 2001
Posts: 125
There are already libraries out there that will do what you want.
http://ostermiller.org/syntax/
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Creating a Parser
 
Similar Threads
Need help on Regular Expression
Unicode character
Need help with using Greek letters read in from file
will the code running concurrently or not
find multiple occurences of string in file