I'm new to Java but experienced as a developer. I'm working on a string parser that has to fire every second and scan a full page of text that is rendered as HTML. I'm wondering if I can make it run faster using better code. Here's what I'm using (see code) is there a better/faster way to do this? Would I get better performance using RegEx?
I think something helpful to experts is that HTML_STAT_SUMMARY gets refreshed about every 5 seconds. I query that page of HTML for about 30 substrings and I have 30 functions set up just like the one below. Is this a good approach or do I leverage my position in the document and move forward on each search? I do know where each value will tend to be in the HTML_STAT_SUMMARY so I think I could be more efficien in stepping through as well by saving my current position and doing forward lookups.
All comments or ideas are welcome.
Joined: Oct 14, 2005
A few remarks:
It is rather surprising that the string to parse is not passed in as a paramater. Getting it into the function scope by accessing the global variable HTML_STAT_SUMMARY is weird.
At one point you use TAG_CARRIER_INFO.length(), then again you use a 10 hardcoded.
If the 30 tags are in a certain order, it would definitively pay to search them in order and avoid reading the complete whole string 30 times.
Finally a shameless plug: Searching for 30 (or many more) strings (aka regular expressions) in a document to extract surrounding text is the perfect use case for monq.jfa (GPL software). It allows you to stick pattern/action pairs into a finite automaton that reads your text and calls the actions whenever a pattern is matched. Throughput is 1.5MB/s on a 2.6GHz Pentium. Setting up the automaton looks like this: