File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Is This an Optimal Substring Solution??? Big Moose Saloon
  Search | Java FAQ | Recent Topics
Register / Login
JavaRanch » Java Forums » Java » Java in General
Reply Bookmark "Is This an Optimal Substring Solution???" Watch "Is This an Optimal Substring Solution???" New topic
Author

Is This an Optimal Substring Solution???

Rex Winn
Greenhorn

Joined: Nov 01, 2005
Posts: 3
I'm new to Java but experienced as a developer. I'm working on a string parser that has to fire every second and scan a full page of text that is rendered as HTML. I'm wondering if I can make it run faster using better code. Here's what I'm using (see code) is there a better/faster way to do this? Would I get better performance using RegEx?

I think something helpful to experts is that HTML_STAT_SUMMARY gets refreshed about every 5 seconds. I query that page of HTML for about 30 substrings and I have 30 functions set up just like the one below. Is this a good approach or do I leverage my position in the document and move forward on each search? I do know where each value will tend to be in the HTML_STAT_SUMMARY so I think I could be more efficien in stepping through as well by saving my current position and doing forward lookups.

All comments or ideas are welcome.

Harald Kirsch
Ranch Hand

Joined: Oct 14, 2005
Posts: 37
A few remarks:

It is rather surprising that the string to parse is not passed in as a paramater. Getting it into the function scope by accessing the global variable HTML_STAT_SUMMARY is weird.

At one point you use TAG_CARRIER_INFO.length(), then again you use a 10 hardcoded.

If the 30 tags are in a certain order, it would definitively pay to search them in order and avoid reading the complete whole string 30 times.

Finally a shameless plug: Searching for 30 (or many more) strings (aka regular expressions) in a document to extract surrounding text is the perfect use case for monq.jfa (GPL software). It allows you to stick pattern/action pairs into a finite automaton that reads your text and calls the actions whenever a pattern is matched. Throughput is 1.5MB/s on a 2.6GHz Pentium. Setting up the automaton looks like this:

Javadoc: http://www.ebi.ac.uk/~kirsch/monq-doc/
Download: ftp://ftp.ebi.ac.uk/pub/software/textmining/monq/
Tutorial: http://www.ebi.ac.uk/~kirsch/JfaWiki/


Harald.
Rex Winn
Greenhorn

Joined: Nov 01, 2005
Posts: 3
OOPS... As far as the parameter goes... I kind of cobbled that whole thing together to simplify reading it. I missed a few things when I cobbled it. But you have the gist of it in your reply.

Shameless Plug? I had found a thing called CUP. Now I'll have to go check out what you are suggesting. A link is worth a 1000 googles...

Thanks for your reply though.
Rex Winn
Greenhorn

Joined: Nov 01, 2005
Posts: 3
Oops found it in your signature to. I'm at freshmeat right now. Will jump to your signature..
 
I agree. Here's the link: http://zeroturnaround.com/jrebel/download
 
subject: Is This an Optimal Substring Solution???
 
Similar Threads
Sorting records using multiple criteria, before writing to file
JavaBean losing reference when passed to JSP - help!
Should I use a list for this? Or something else?
Opening New Window with a different MIME TYPE
Please critique!