Two Laptop Bag*
The moose likes Meaningless Drivel and the fly likes Big O notation Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Other » Meaningless Drivel
Bookmark "Big O notation" Watch "Big O notation" New topic
Author

Big O notation

Pho Tek
Ranch Hand

Joined: Nov 05, 2000
Posts: 761

Taken from this page

Both walking and traveling at the speed of light have a time-as-function-of-distance big-oh complexity of O(N). Altho they have the same big-oh characteristics, one is rather faster than the other. Big-oh analysis only tells you how it grows with the size of the problem, not how efficient it is.

Imagine I have to walk a 3 mile stretch of road everyday to catch the train. Depending on what time of day; the number of cars plying it will vary. Close to peak hours; you will find bumper-to-bumper traffic which lasts for several hours. Vehicles are crawling. It takes me 20 minutes of brisk walking to cover that distance. A car or a bus will take longer during peak hours.

So during peak hours, it is:
O(1) by walking
O(n) by driving where n is the number of cars

So walking is more efficient than driving during peak hours. So that web page is not accurate ?
Mike Simmons
Ranch Hand

Joined: Mar 05, 2008
Posts: 3018
    
  10
When the web page says walking is O(N), the N they refer to is distance. When you talk about O(N) and O(1), the N you are talking about is the number of cars. Two unrelated concepts.
[ August 14, 2008: Message edited by: Mike Simmons ]
Bear Bibeault
Author and ninkuma
Marshal

Joined: Jan 10, 2002
Posts: 61606
    
  67



[Asking smart questions] [Bear's FrontMan] [About Bear] [Books by Bear]
Scott Selikoff
author
Saloon Keeper

Joined: Oct 23, 2005
Posts: 3716
    
    5

Awesome Bear, just awesome.


My Blog: Down Home Country Coding with Scott Selikoff
Andrew Monkhouse
author and jackaroo
Marshal Commander

Joined: Mar 28, 2003
Posts: 11503
    
  95

Originally posted by Pho Tek:
Imagine I have to walk a 3 mile stretch of road everyday to catch the train.

[...snip...]

So during peak hours, it is:
O(1) by walking
O(n) by driving where n is the number of cars

So walking is more efficient than driving during peak hours. So that web page is not accurate ?


From that section you quoted (with my highlighting):

Big-oh analysis only tells you how it grows with the size of the problem, not how efficient it is.


They explicitly mentioned that big-O does not consider efficiency, which negates your question.

To put it another way, consider these 2 methods:

(Yes, this is imperfect code. It is only intended for an example. )

Since we are not concerned with efficiency, all we really care about is the number of loops we make. In which case both of these methods have a complexity of O(n) where n = power.

While the iterative solution is more efficient (since there are fewer operations within the loops, and we don't have the overhead of putting anything on our stack for each recursion), the efficiency is not considered for big-O analysis.

If you were trying to consider efficiency, then you might write the analysis as something like:

Note: I just made that up - there would have to be a lot more thought put into it before I ever try to create a real formula to estimate time taken.

Or, going back to your example relating to traveling 3 miles. That might be something like:Normally the average speed is considered a constant (yes, even stop-and-go traffic still has an "average speed" for the entire trip). And constants are removed from the equation, leaving you with an O(n) where n is the distance to be traveled (again - according to their problem space whereby they stated that they were looking at time as a function of distance - nothing else.

Now, going back to my own problem space, I am going to try and make my code a little more efficient in the long run, so I am going to have code such as:
Since I can ignore constants, I know that in big-O notations this is O(log(n)) - much more efficient than the previous attempts which were O(n).

If I were to try to re-apply my totally bogus way of calculating time to process, I would possibly come up with some formula like:

The problem with having extended calculations is that it doesn't really help you much. Consider graphing those two equations, using the totally bogus constants that each "constant operation takes 3 cpu instructions" and "each addition to the stack takes 6 cpu instructions":



So where did this get us?

To make these bogus calculations I had to:
  • Guess at how many CPU instructions were in each iteration or recursion.


  • (I could have decompiled the Java byte code to work out how many Java virtual machine instructions were built for each real command, but that would only add a little more accuracy - unless I decompiled the Java runtime and worked out what it was actually using on the machine I will never really know).

  • Guess at how many CPU seconds it would take to run those instructions.


  • (Having decompiled the Java runtime and it's libraries in the first step, I suppose I could look up the specifications for the machine I am running on, and check the idle value of the operating system, which would give me a slightly more accurate guess.

    But I am going to be in big trouble if anyone changes a single line of code, or the machine gets upgraded, or some other service starts running on the operating system - all my numbers are useless.All that is really valuable from that graph is that my final solution is better than the other two. And by the looks of things my iterative solution starts out better.

    But if I got my numbers wrong - if creating the stack is practically a non event while the actual operations are the heavy processes, then I get a different graph:



    In that case the first recursive solution started better than the iterative solution, however it soon became the worst solution.

    No matter what though, the second recursive solution - the O(log n) solution - always ended up much better than the other two.

    And I knew that from the start. Just looking at how many iterations or recursions without caring about constant operations told me that O(log n) was better than O(n). No need for guessing at numbers - I have a quick method of comparing 2 or more potential solutions to a problem.

    Finally (bet you thought I'd never shut up )

    You would not normally want to spend anywhere near the amount of time I just spent on this post analyzing each and every one of your methods in your production code. Normally if you have a problem with performance, you should run a profiler to find out where the problem really lies, then you can look at the area that is problematic. And usually a quick analysis will show you whether your code is linear, logarithmic, quadratic, ...). Knowing this can be useful when you work out an alternate solution - finding the complexity can allow you to perform a theoretical comparison of the solutions without actually hurting production systems.

    Regards, Andrew
    [ August 15, 2008: Message edited by: Andrew Monkhouse ]

    The Sun Certified Java Developer Exam with J2SE 5: paper version from Amazon, PDF from Apress, Online reference: Books 24x7 Personal blog
    Andrew Monkhouse
    author and jackaroo
    Marshal Commander

    Joined: Mar 28, 2003
    Posts: 11503
        
      95

    Originally posted by Scott Selikoff:
    Awesome Bear, just awesome.


    O(sum) Bear, just O(sum)

    Regards, Andrew
    Scott Selikoff
    author
    Saloon Keeper

    Joined: Oct 23, 2005
    Posts: 3716
        
        5

    My 2 cents...

    While Big O notation gives you a guide for performance as n grows arbitrarily large, it does not compute average (amortized) running time nor does it help when n is reasonably small.

    Think of Big O as the guiding force as the size of your data set (number of records for example) grows huge. If it is unlikely the data set will ever grow very big, then there might be better algorithms. Always keep in mind its an estimate of worst case grow.

    In short? From time to time its good to throw away the theoretical limits for a second and compute experiment numbers on real world data. After all, if your system crashes because of performance and your boss wants to know why, getting into a discussion of Big O (robots or performance) isn't going to go well. You need to do tests with real data to know if your system is going to be able to handle it.

    Lastly about performance; there's always ways to improve an algorithm, but you get tend to get diminished returns the more you work on it. In other words, you'll get a 10% improvement initially... weeks later you might get a 2% improvement... months later you'll be fighting for a 0.05% improvement. At some point its better just to try a new approach all together or you tend to overfit your data (over-using training data to give incorrect/bias measures of performance). A common practice for performance tuning is go after the low hanging fruit (the easy performance fix that gave you the 10%) and avoid trying to find convoluted enhancements that will rarely lead to much performance increase.
    [ August 17, 2008: Message edited by: Scott Selikoff ]
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4659
        
        5

    Originally posted by Scott Selikoff:
    While Big O notation gives you a guide for performance as n grows arbitrarily large, it does not compute average (amortized) running time nor does it help when n is reasonably small.


    Big O is only useful when "N" is big. If n is small, it doesn't matter. You can use a n^4 algorithm when n is small, when n > 10000, then you have problems.

    One thing that many folks forget is that Big O ignores the constants. The technical definition is


    and when N is big, and especially when K is big, the C1 and C2 constants become too small to worry about. But its still correct if the C1 and C2 values are huge.

    If N = 10 and C1 = one million and C2 = 20 million, then don't rely on Big O
    Mike Simmons
    Ranch Hand

    Joined: Mar 05, 2008
    Posts: 3018
        
      10
    [Pat]: One thing that many folks forget is that Big O ignores the constants.

    To be fair, that was the primary point of the walking-vs.-speed-of-light example cited at the beginning of this thread.

    [B][Pat]: The technical definition is
    [/B]

    That hardly seems very general - what's the point of allowing a constant term C2 if you aren't going to allow other lower-order terms? Even a simple quadratic equation is outside the above paradigm. Nor does the "O =" part look useful. The execution time is equal to... well, some formula, often more complex that the strangely limited hybrid you offer. A standard example would be something like

    The order is just the highest-order term of such a formula, omitting a multiplicative constant: n^K in this case. Of course, some formulas can't be easily converted to power series. The factorial function n! is one such example. Other formulas like n*log(n) can be converted to power series, but we nonetheless usually find it easier (and more accurate) to just refer directly to the log function.
    [ August 17, 2008: Message edited by: Mike Simmons ]
    Pat Farrell
    Rancher

    Joined: Aug 11, 2007
    Posts: 4659
        
        5

    Originally posted by Mike Simmons:
    That hardly seems very general - what's the point of allowing a constant term C2 if you aren't going to allow other lower-order terms?


    I simplified it, a lot. The point is that Big Oh is about gross comparisons, is it O(n) or O(n * ln (n)) or O (n^x) for some X

    The formal definition is all about limits, which is easy calculus but a lot of developers don't like to be reminded of calculus.

    Since its about limits, as N goes to infinity, the importance of the constants go away. But they are there.

    A quadratic formula would be O(n^2). While it starts as

    once N gets big, it the N^2 term overwhelms the rest.

    There is no point in talking about Big Oh when you are talking about small N. The traveling salesman problem can be done in your head when N = 2, and its not hard when N=4. But its know to be NP-Hard, as it gets hard faster than N^k for any value of K. Which means it gets harder than say n^100, which gets really, really big fast.
     
    permaculture playing cards
     
    subject: Big O notation