| Author |
Is This an Optimal Substring Solution???
|
Rex Winn
Greenhorn
Joined: Nov 01, 2005
Posts: 3
|
|
I'm new to Java but experienced as a developer. I'm working on a string parser that has to fire every second and scan a full page of text that is rendered as HTML. I'm wondering if I can make it run faster using better code. Here's what I'm using (see code) is there a better/faster way to do this? Would I get better performance using RegEx? I think something helpful to experts is that HTML_STAT_SUMMARY gets refreshed about every 5 seconds. I query that page of HTML for about 30 substrings and I have 30 functions set up just like the one below. Is this a good approach or do I leverage my position in the document and move forward on each search? I do know where each value will tend to be in the HTML_STAT_SUMMARY so I think I could be more efficien in stepping through as well by saving my current position and doing forward lookups. All comments or ideas are welcome.
|
 |
Harald Kirsch
Ranch Hand
Joined: Oct 14, 2005
Posts: 37
|
|
A few remarks: It is rather surprising that the string to parse is not passed in as a paramater. Getting it into the function scope by accessing the global variable HTML_STAT_SUMMARY is weird. At one point you use TAG_CARRIER_INFO.length(), then again you use a 10 hardcoded. If the 30 tags are in a certain order, it would definitively pay to search them in order and avoid reading the complete whole string 30 times. Finally a shameless plug: Searching for 30 (or many more) strings (aka regular expressions) in a document to extract surrounding text is the perfect use case for monq.jfa (GPL software). It allows you to stick pattern/action pairs into a finite automaton that reads your text and calls the actions whenever a pattern is matched. Throughput is 1.5MB/s on a 2.6GHz Pentium. Setting up the automaton looks like this: Javadoc: http://www.ebi.ac.uk/~kirsch/monq-doc/ Download: ftp://ftp.ebi.ac.uk/pub/software/textmining/monq/ Tutorial: http://www.ebi.ac.uk/~kirsch/JfaWiki/
|
Harald.
|
 |
Rex Winn
Greenhorn
Joined: Nov 01, 2005
Posts: 3
|
|
OOPS... As far as the parameter goes... I kind of cobbled that whole thing together to simplify reading it. I missed a few things when I cobbled it. But you have the gist of it in your reply. Shameless Plug? I had found a thing called CUP. Now I'll have to go check out what you are suggesting. A link is worth a 1000 googles... Thanks for your reply though.
|
 |
Rex Winn
Greenhorn
Joined: Nov 01, 2005
Posts: 3
|
|
|
Oops found it in your signature to. I'm at freshmeat right now. Will jump to your signature..
|
 |
 |
|
|
subject: Is This an Optimal Substring Solution???
|
|
|