Win a copy of Java Database Connections & Transactions (e-book only) this week in the JDBC forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Knute Snortum
  • Paul Clapham
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Frits Walraven
Bartenders:
  • Ganesh Patekar
  • Tim Holloway
  • salvin francis

Big O notation question  RSS feed

 
Ranch Hand
Posts: 39
2
Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Where should I place a Big O notation question? I was thinking maybe the sections for General Computing, Programming Diversions, or Java Beginner. Thanks.
 
Sheriff
Posts: 6775
469
BSD Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I presume your question is related with Java code, so added it to Beginning Java forum. Thanks for asking for a right forum.

Please ask Question here.
 
Marshal
Posts: 64496
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

Fire away with the question.
 
Brian Jones Jr.
Ranch Hand
Posts: 39
2
Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's the question and the solution. I don't understand how they got the solution.

Question:
Suppose we had an algorithm that took in an array of strings, sorted each string, and then sorted the full array. What would the runtime be?

Answer:
Many people will reason the following: sorting each string is O(N log N) and we have to do this for each string, so that's O(N*N log N). We also have to sort this array, so that's an additional O(N log N) work. Therefore, the total runtime is O( (N^2) log N + N log N ), which is just O( (N^2) log N). This is completely incorrect. The problem is that we used N in two different ways.

Let's define new terms and use names that are logical.
-Let "s" be the length of the longest string.
-let "a" be the length of the array.

Now we can work through this in parts:
-Sorting each string is O(s log s)
-We have to do this for every string (and there are "a" strings), so that's O(a*s log s).
-Now we have to sort all the strings. There are "a" strings, so you'll may be inclined to say that this takes O(a log a) time. You should also take into account that you need to compare the strings. Each string comparison takes O(s) time. There are O(a log a) comparisons, therefore this will take O(a*s(log a + Logs)).

If you add up these two parts, you get O(a*s(log a + log s)).

Can we first start with how they figured out how sorting each string is O(s log s)? I am a little confused by the wording of the question, but I think the string has a bunch of characters that needs to be sorted alphabetically. I think the "log N" comes from the fact that the possible combinations shrinks a lot as each index is figured out, but that doesn't really seem like a really good explanation (can you provide a visual or mathematical explanation here?). I don't know where the other "s" comes from.  Thanks.
 
author
Posts: 23832
140
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Brian Jones Jr. wrote:
Can we first start with how they figured out how sorting each string is O(s log s)?  



They didn't. It is just an assumption.

Basically, sorting algorithms can run the gamut on how will they scale with size. There was no mention of which algorithm was used, hence, no mention of what the big O for it is. Saying it is O(s log s) was probably chosen because that this the most common for many sorting algorithms.

Henry
 
Saloon Keeper
Posts: 10249
216
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It simply comes from the fastest sorting algorithms having an average time complexity with an upper bound in O(n log n). We substitute s for n, because the sorting is based on the length of the strings.
 
Brian Jones Jr.
Ranch Hand
Posts: 39
2
Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Henry and Stephen!! I think the rest makes sense now
 
Campbell Ritchie
Marshal
Posts: 64496
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It will actually reduce to O(nlogn) if I try a bit of jiggery‑pokery.

Now, you cannot sort Strings in Java® because Strings are immutable. You would have to think of Strings the way they do in C, viz. as a *char or char[], which can be sorted. Let us assume we are using something like quicksort or merge sort which runs in nlogn time. So you can sort your Strings in slogs time. So far so good. You are going to have your figures go up the creek if the Strings are very different in length, so let's make them all the same size.
Your String comparisons will run in linear time O(n) and that is a lower complexity than O(nlogn), so let's ignore it.
Once you have done the comparisons, let's choose another O(nlogn) algorithm to sort the array of Strings, which would be a **char or char[][] in C or similar. If the Strings are all s characters long, you are going to do our sorting in O(nlogn + s) time and we can forget the s. Now your calculation that you are running in O(s×n(logn + logs)) looks correct.

But: logx + logy = log(x × y), so your formula reduces to O(s×n log(s×n)), and that is a kind of O(nlogn) complexity.

Interesting question Please tell us where it comes from, so as to avoid copyright problems.
 
Brian Jones Jr.
Ranch Hand
Posts: 39
2
Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote: Interesting question Please tell us where it comes from, so as to avoid copyright problems.



Thanks for your insight Campbell

Cracking the Coding Interview 6th ed. p.49 by Gayle Laakmann
 
Brian Jones Jr.
Ranch Hand
Posts: 39
2
Java Mac Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Campbell. I checked the errata for the book, and I didn't see anything wrong with the solution they provided. According to the book, "There is no way to reduce it further." So, the correct answer should be O(a*s(log a + log s)).

Thanks for pointing out that String class is immutable in Java. I guess we have to assume they meant char[].
 
Campbell Ritchie
Marshal
Posts: 64496
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
They have forgotten that
loga + logs = log(a × s)
That is not an erratum; I think it is simpler than what they printed however. And it will reduce to
O(nlogn) where
n = a × s

That's my story and I am sticking to it
 
Campbell Ritchie
Marshal
Posts: 64496
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

A few minutes ago, I wrote:. . . O(nlogn) where
n = a × s . . .

Let's change that to O((n)log(n)) to maintain the precedence of the multiplication.

You have been reported to the other mods for asking such an interesting question Unfortunately it will be too scary for beginners, so I shall move you probably to two three[←edited after posting] other fora.
 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I realize this is over 2 years old now, but I'm going through the Cracking the Coding Interview book and found my way to this forum as I didn't immediately understand the explanation given in the book. Having dug into it a bit, I think I get it now.

And I think there's one slight mistype in Campbell's post:

Campbell Ritchie wrote:It will actually reduce to O(nlogn) if I try a bit of jiggery‑pokery.
Your String comparisons will run in linear time O(n) and that is a lower complexity than O(nlogn), so let's ignore it.


It seems like we *can't* drop the O(n) term for string comparisons, since it's actually an O(s) term on an independent input ('s' for length of the strings) rather than 'n' for the length of the array.

Campbell Ritchie wrote:If the Strings are all s characters long, you are going to do our sorting in O(nlogn + s) time and we can forget the s.


This is my first time learning how to analyze algorithm complexity, but I believe this is the understanding to have here:

O(n*log(n)) - This is the upper bound of 'number of operations' in the sorting algorithm that we run to sort the array. But how long does each operation take? Well, it's a comparison operation that depends on the length of the string in each element of the array. So, for each comparison, we have an operation: O(s)

Therefore to describe the complexity of the array-sorting portion of this problem we should multiply instead of adding: O(s * (n * log(n)))


Sorry if this reads as a pedantic response years later - like I said, I'm just learning this stuff and trying to make sure I really understand it.
 
Steven Lybeck
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry for the double post - I just wanted to edit to add: I agree with your ultimate conclusion that the notation could be further reduced from the book's answer O(a*s(log a + log s)) to: O(a*s*log(a*s))

The Cracking the Coding Interview book takes pains to remind that your complexity notation has to reference all inputs that can scale independently, so in this example it must take into account both 's' (length of strings) and 'a' ( length of array. I guess folks would be satisfied with O(n*log(n)) as long as you also make it clear that n=a*s
 
Campbell Ritchie
Marshal
Posts: 64496
225
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

I don';t remember what I thought two years ago, but, if you have two complexities added together, you consider what would happen if the size (n or s) becomes very large. Then that largest complexity will overwhelm all the others.
 
Stephan van Hulst
Saloon Keeper
Posts: 10249
216
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree with Steven though, your independent inputs must not get lost. For instance, let's say we have an algorithm that is in O(n³ + m). Even though this complexity class indicates that the algorithm's run-time will grow much quicker for input n, we may not leave out the linear growth term for input m, because it's independent.

However, what Campbell is doing here is substitution. In substitution, nothing gets lost, you just redefine your inputs. In this particular case, Campbell defined n to mean the combined lengths of the array and the longest string in that array.
 
A "dutch baby" is not a baby. But this tiny ad is baby sized:
how do I do my own kindle-like thing - without amazon
https://coderanch.com/t/711421/engineering/kindle-amazon
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!