Hi, In looking at the questions and answers on various forums, I saw a way to improve one of my programs. It's a search program that looks thru html files that I have downloaded from various sites and saved on my harddrive. I'm on a dial-up connection and sometimes find using Google a problem. And I enjoy writing/using my own code. For many of the folders of HTML I have(such as the Java Tutorial) I can invoke my search program (as an applet) from the browser while I'm looking at the pages.
One of the items I saw that I wanted to use was the java -Xprof option. I used it with the search program and found that there was a heavy usage in the toUpperCase() method. The other question I saw was: How to determine what language a String was. This lead to thinking of Strings as being of characters which have a value from 0 to 64K. Eureka!!
A part of the Xprof output for my search program follows. This search took 10.6 seconds and looked at 979 files.
Searched 979 files in 808 dir, total time=9593, average=9, duration= 10609
Flat profile of 10.62 secs (220 total ticks): Thread-3
The question: What have I overlooked? Are there characters that my toUpperCase() method will miss? Am I interpreting the data incorrectly?
Joined: Nov 16, 2009
Sorry, I realise this is an old post - was looking for -Xprof experiences and found your case-changing code.
I remember doing something similar recently, but have a lingering impression my original problem with toUpper/toLower might have been confused by charsets. If you're doing this on your own PC, the code-free way of doing it would be to not check if the character was in your array, but fill in all the places (like UPPER_CHAR_ARRAY[(int)'!'] = '!') in the 256-place array and assign them anyway:
I did this for a search engine (only a hobby on my desk!) last year, but saved the pages in a 'canonical form', so I didn't need to uppercase them every time I needed to search. For your needs, you could just double up your storage and save an uppercased copy, or you could use javax.swing.something.HTMLEditorKit to strip out the non-text (it wasn't 100% for me, but not bad), uppercase what's left and compress the originals - that would probably save time and space, at the expense of a little bit of pre-processing.
Heh. That's enough! I often come to CodeRanch, never joined it until now. I'd better get back to looking for -Xprof...