File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes Java in General and the fly likes Example of Using -Xprof to improve code Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » Java in General
Bookmark "Example of Using -Xprof to improve code " Watch "Example of Using -Xprof to improve code " New topic
Author

Example of Using -Xprof to improve code

Norm Radder
Ranch Hand

Joined: Aug 10, 2005
Posts: 685
Hi,
In looking at the questions and answers on various forums, I saw a way to improve one of my programs. It's a search program that looks thru html files that I have downloaded from various sites and saved on my harddrive. I'm on a dial-up connection and sometimes find using Google a problem. And I enjoy writing/using my own code. For many of the folders of HTML I have(such as the Java Tutorial) I can invoke my search program (as an applet) from the browser while I'm looking at the pages.

One of the items I saw that I wanted to use was the java -Xprof option. I used it with the search program and found that there was a heavy usage in the toUpperCase() method. The other question I saw was: How to determine what language a String was. This lead to thinking of Strings as being of characters which have a value from 0 to 64K. Eureka!!

A part of the Xprof output for my search program follows. This search took 10.6 seconds and looked at 979 files.

Searched 979 files in 808 dir, total time=9593, average=9, duration= 10609

Flat profile of 10.62 secs (220 total ticks): Thread-3

Interpreted + native Method
2.4% 0 + 5 java.io.WinNTFileSystem.list
1.0% 0 + 2 java.io.FileInputStream.open
...
13.5% 17 + 11 Total interpreted

Compiled + native Method
13.9% 29 + 0 java.lang.Character.toUpperCaseEx
12.0% 25 + 0 sun.nio.cs.SingleByteDecoder.decodeArrayLoop
11.5% 24 + 0 java.lang.String.codePointAt
9.6% 20 + 0 java.lang.String.toUpperCase
7.2% 15 + 0 java.lang.String.indexOf


The three above lines in the report show that toUpperCase() is very expensive. I uppercase everything to make finding strings easier. I use indexOf() for example

I thought about my application and realized that I was only interested in a-z being uppercased to A-Z. So I created a small class that would only uppercase those 26 letters:

Then used the above method to uppercase the strings before searching them. I got about a 20% time improvement. The following shows the same search taking 7.9 seconds.

Searched 979 files in 808 dir, total time=6908, average=7, duration= 7891

Flat profile of 7.93 secs (154 total ticks): Thread-3

Interpreted + native Method
3.8% 0 + 5 java.io.WinNTFileSystem.getBooleanAttributes
3.8% 5 + 0 java.awt.EventQueue.postEventPrivate
2.3% 0 + 3 java.io.WinNTFileSystem.list
1.5% 0 + 2 java.io.FileInputStream.open
...
22.9% 17 + 13 Total interpreted

Compiled + native Method
20.6% 27 + 0 NormsTools.UC_a_to_z.toUpperCase
12.2% 16 + 0 java.lang.String.indexOf
12.2% 16 + 0 sun.nio.cs.SingleByteDecoder.decodeArrayLoop
6.1% 8 + 0 NormsDev.SearchFiles.SearchStrings.searchFile
5.3% 7 + 0 java.io.BufferedReader.readLine


The question: What have I overlooked? Are there characters that my toUpperCase() method will miss? Am I interpreting the data incorrectly?

Thanks,
Norm
Sean Collins
Greenhorn

Joined: Nov 16, 2009
Posts: 8
Sorry, I realise this is an old post - was looking for -Xprof experiences and found your case-changing code.

I remember doing something similar recently, but have a lingering impression my original problem with toUpper/toLower might have been confused by charsets. If you're doing this on your own PC, the code-free way of doing it would be to not check if the character was in your array, but fill in all the places (like UPPER_CHAR_ARRAY[(int)'!'] = '!') in the 256-place array and assign them anyway:


That saves you a .length lookup in the loop, and a zero-member check. Now that I check the JVM spec, char is a 16-bit type (http://java.sun.com/docs/books/jls/third_edition/html/typesValues.html#51034), so I'm sure you could spare 64K of memory! Just assign all 64K array members to the same value as their index, except for your lower case characters, and lose the range check:



I did this for a search engine (only a hobby on my desk!) last year, but saved the pages in a 'canonical form', so I didn't need to uppercase them every time I needed to search. For your needs, you could just double up your storage and save an uppercased copy, or you could use javax.swing.something.HTMLEditorKit to strip out the non-text (it wasn't 100% for me, but not bad), uppercase what's left and compress the originals - that would probably save time and space, at the expense of a little bit of pre-processing.

Heh. That's enough! I often come to CodeRanch, never joined it until now. I'd better get back to looking for -Xprof...
Campbell Ritchie
Sheriff

Joined: Oct 13, 2005
Posts: 38494
    
  23
Welcome to the Ranch Sean Collins.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Example of Using -Xprof to improve code