Win a copy of Mesos in Action this week in the Cloud/Virtualizaton forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Creating a Parser

 
Peter Allen
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I couldn't find an answer to this. I've been trying to create my own parser to convert a .java file into .html format and inserting syntax highlighting. I thought I would give it a try, but it seems harder than I thought. My code is very sorry, for that I'm sorry, but it actually highlights all the keywords defined. The only exception is when a keyword doesn't have a delimiter character before it. This is a big problem for me. Here is what I have:

This code is reading it's own .java file and creates an HTML file and should color all the keywords red. The first import statement isn't highlighted red because it has no character before it, and there are other times when this error can occur (ex: if(true) {break;} or any other keyword in a way like this). I believe I have bad design or I'm overlooking something. Can anyone help me out? Thanks
PS: Also if anyone can find a better algorithm or can do something more efficiently, I'd appreciate help on it or a solution to that particular problem. I know this code is pretty inefficient, but this is the best I could do. Thanks again.
[ April 26, 2002: Message edited by: Peter Allen ]
[ April 26, 2002: Message edited by: Dirk Schreckmann ]
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Urg.
I'd recommend editing your post to put in some line breaks, so we can read it.
Also, in general "parsing" is rarely a beginner-level topic. This question might be better asked in intermediate. I'd transfer it, but I'll let you edit the question first.
 
Peter Allen
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oops, sorry I should have seen that.
 
Peter Allen
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm not trying to get too fancy with this. It ALMOST works, I just need help recognizing keywords when they're at different positions like the one I mentioned (like if its not surrounded by a space, tab, etc.) I just wanted help finding a way to fix this, otherwise it works the way I wanted to, even if it is very very messy. Thanks
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd guess Jim would accept the code formatting, now. So, I'm moving this to Java in General (intermediate)...
 
Mapraputa Is
Leverager of our synergies
Sheriff
Posts: 10065
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What you are writing is called "Lexical analyzer" or "Lexer". Are you doing it because you need this functionality, for the fun of it, or as an educational experience? If the first is true, there is a generic tool that takes a grammar specification and generates Java classes to turn "raw" input into tokens: http://www.cs.princeton.edu/%7Eappel/modern/java/JLex/current/manual.html
Or you could use Java Compiler Compiler (JavaCC). This has one Java grammar specification ready, and even "JavaCC grammar to convert Java or JavaCC code to HTML" http://www.cobase.cs.ucla.edu/pub/javacc/#Jsection
 
Dirk Schreckmann
Sheriff
Posts: 7023
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Peter Allen:
The only exception is when a keyword doesn't have a delimiter character before it.

One idea that occurs to me then, look for the keyword in the entire String (or StringBuffer) rather than trying to use a delimiter.
Also, you have a couple of places where using a StringBuffer (which is mutable while a String is not) would improve efficiency overall.
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or, skip saving things in Strings or StringBuffers entirely, and write your output directly to the Writer bw. OK, you'll have to save somethings along the way, but many things are not necessary. I'd just read one line from the file, process it, and write the results to the Writer before moving on.
To be fair, this sort of performance optimization is less important than getting the logic right. Here are some perfectly legal test cases to consider:

Can you highlight this correctly?
Incidentally, a related discussion may be found here.
[ April 27, 2002: Message edited by: Jim Yingst ]
 
Peter Allen
Greenhorn
Posts: 13
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the help. My friend found what was wrong, the HTML tags I was inserting was affecting the final String and how the StringTokenizer worked. I knew I had it working, but at the same time I feel stupid for not catching it. Thank you for your help though.
 
Steve Deadsea
Ranch Hand
Posts: 125
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
There are already libraries out there that will do what you want.
http://ostermiller.org/syntax/
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic