• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

optimization?

 
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I recently had the opportunity to optimize a bit of code. Each time a particular page was loaded, this bit of code was called three times, and each time it was called, it in turn created 134 java.text.DateFormat symbols, one for each available Locale on the machine. The three times through this bit of code took a total of 1752 milliseconds:

1752ms for 402 DateFormats (unoptimized first page load)

I found a way to cache the results the first time the bit of code was called, so that it wouldn't need to create any DateFormat objects on subsequent calls. Guess first if you want, before scrolling down for the results ...










The results were:

1564ms for 134 DateFormats (optimized first page load)

Of course, that's just for the first page load. For subsequent page loads, the results were:

0ms for 0 DateFormats (optimized second page load)

Now here's the challenge: how much did the optimization actually gain? That is, how many milliseconds were spent on subsequent page loads in the unoptimized code? And most importantly, why?

I'll post the actual result after people have had the chance to discuss a bit.
 
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

0ms for 0 DateFormats (optimized second page load)



The second time the page is loaded, the program refers to the cache? Does that mean that you need to create the objects once? Or do you do it once per load on the first exacution of the bit of code?

If it took you at first (1752 + X)ms to load a page, and now you've made it (1564 + X)ms to load a page, then the gain is 188ms per page load, of course. What other factors should be taken into account? Am I looking at this from the right angle?
 
Warren Dew
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maxim Katcharov:

The second time the page is loaded, the program refers to the cache?

In the optimized version, yes.

If it took you at first (1752 + X)ms to load a page, and now you've made it (1564 + X)ms to load a page, then the gain is 188ms per page load, of course. What other factors should be taken into account? Am I looking at this from the right angle?

Well, you're looking at it from the same angle that I was. I expected the second load using the unoptimized code to be about 300 ms as a result (since on the second page load, I'm saving all 402 calls rather than just 268 calls). Question is, was I right or wrong, and why?
 
Maxim Katcharov
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, I think it depends on how you implement the cache... if you're serializing these and flushing them from ram so as not to take up space between page loads, then you'll need to read them from the file, and that may take a fair bit of time. Otherwise you just return the cache.

page load 1: create+store, reuse, reuse
page load 2: load, reuse, reuse
or
page load 2: reuse, reuse, reuse

That is, how many milliseconds were spent on subsequent page loads in the unoptimized code?



hmm, I'm a little confused... since you've updated that one section, wouldnt subsequent page loads all call the updated section, which quickly returns the cache?
 
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In the unoptimized mode, you create one Format in 4.35 ms.
To create and store it, you need 11.76 ms.
To fetch it, as I understood, you need 0 ms.
What is the question?

A subsequent optimized load gains 4.35 ms per Format.
And costs an initial price of 11.76-4.35.

So for two calls, the unoptimized version would be faster, for 3 and more calls you earn more and more time.
 
Warren Dew
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think I need to clarify my question:

1752ms for 402 DateFormats (unoptimized version, first page load)
???ms for 402 DateFormats (unoptimized version, second page load)

1564ms for 134 DateFormats (optimized version, first page load)
0ms for 0 DateFormats (optimized version, second page load)

So the question is, fill in the ??? and explain how you got to your answer.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
1752 or 0
guessing.
Perhaps the non-optimized version is caching too, but why wouldn't it work for the first page?
[ October 01, 2004: Message edited by: Stefan Wagner ]
 
Maxim Katcharov
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Seeing the code would help with a more accurate guess, but I'll go with 1756. Unless there was some trickery involved.
 
Stefan Wagner
Ranch Hand
Posts: 1923
Scala Postgres Database Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes. That's more rational, because you wouldn't optimize a well running code.
 
Warren Dew
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Well, to be honest, I didn't realize the second load would be different from the first load until after the optimization was done.

Originally I was thinking that the first load would be reduced to 1/3 of 1752ms, or somewhere in the range of 584ms.

When it turned out only to reduce things to 1564ms rather than 584ms, I figured that the DateFormat class must already be doing some caching of expensive startup calculations (or maybe the class just takes a long time to load). Given that the real savings were only 188ms for the optimization on the first page load - 94ms per 134 DateFormats created - I figured that would be the cost of not being optimized for the second page load as well. Since 402 DateFormats would have to be created by the unoptimized code, I figured that ??? = 282ms or so.

But I was wrong again. In actuality, ??? = 799, which is different than either of the obvious possibilities. So we have:

1752ms for 402 DateFormats (unoptimized version, first page load)
799ms for 402 DateFormats (unoptimized version, second page load)

1564ms for 134 DateFormats (optimized version, first page load)
0ms for 0 DateFormats (optimized version, second page load)

Now we get to the interesting question: why?

Note: I don't actually know the answer to that question. I have some theories, but they aren't necessarily better than anyone else's.
 
Maxim Katcharov
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Had a look at the source, and it caches. The static fields are mostly finals of the regular sort, but



is definitley not. It's used in only one method:



Looks like you did 800ms better than the sun programmers. Too bad that the cache they made is redundant for the most part when it comes to someone doing some proper optimization. I believe this also means that NumberFormats can be safley compared with == (for the most part, I do believe that serialization ruins it).

I wonder how many other classes do this...
 
Warren Dew
blacksmith
Posts: 1332
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maxim Katcharov:

Too bad that the cache they made is redundant for the most part when it comes to someone doing some proper optimization.

To be fair, Sun's optimization is good for the general case; they can't predict specific cases where further optimization is possible. One of the things I take away from this is that there's a lot of optimization already built into the standard libraries and types, and I think that's a good thing.

The optimization you spotted explains the difference between the 1752ms for the first page load and the 799ms for the second page load. What it doesn't fully explain, though, is the difference between the 188ms for the last 268 calls in the first page load - 94ms/134 calls - and the 799ms for the 402 calls in the second page load - 266ms/134 calls on average. It seems like the time taken in the unoptimized version goes something like this, at a guess:

1564ms for first 134 calls (Sun caching is happening here)
94ms for next 134 calls
94ms for next 134 calls

(first page load finished, second page load)

611ms for next 134 calls
94ms for next 134 calls
94ms for next 134 calls

so the time taken actually goes back up by a substantial amount between the first page load and the second page load.

I can think of two possible explanations for that. First, the information that Sun's code cached to memory might have been flushed from the CPU cache to main memory; when the second page is loaded, it takes longer because it has to be fetched from main memory instead of the processor cache, though not as long as if it had to be recalculated from scratch. Second, it could be that some just in time (JIT) optimizations that speed the second and third set of 134 calls on the first page load are lost before the second page load, and so the JIT compiler has to do the optimization again.
 
Maxim Katcharov
Ranch Hand
Posts: 113
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You're right of course, the optimization that Sun did works just fine. If a rare someone thought it was critical to optimize the class to custom fit the application, they could just make their own class to get around that. A library that is easy to use for 1000 people and innapropriate for 1 is better than vice versa.

I wonder if there's a list of optimizations and specific notes on library classes out there...
reply
    Bookmark Topic Watch Topic
  • New Topic