Well, I'd probably put the characters into two Sets and apply a removeAll operation.

But if you have to do it using arrays, here is one way:

Create a boolean array with one boolean value for each possible character value. Set every value to false initially.

Now, for every char in the first string, set the corresponding boolean value to true. Then for every char in the second string, set it to false.

Every char that now has a correspending boolean value of true is one that needs to be in your output.

Out of the head I'd say that this has a complexity of O(n) (where n is the sum of characters in the two strings).

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus

Nagya Chindhi
Greenhorn

Joined: Jun 30, 2004
Posts: 3

posted

0

What happens if a character is repeated. Each character is not necessarily unique. Like in our case, the character "e" occurs twice.

Originally posted by Ilja Preuss: Well, I'd probably put the characters into two Sets and apply a removeAll operation.

But if you have to do it using arrays, here is one way:

Create a boolean array with one boolean value for each possible character value. Set every value to false initially.

Now, for every char in the first string, set the corresponding boolean value to true. Then for every char in the second string, set it to false.

Every char that now has a correspending boolean value of true is one that needs to be in your output.

Out of the head I'd say that this has a complexity of O(n) (where n is the sum of characters in the two strings).

I would make a 'map' out of the first string with a counter for each character. Either two arrays char[52] int[52] or make some object that had a char and an int counter.

grab the first String and push it into the 'map', then grab the second String and do removes. A final walk down the map would finish you off with the remainder.

That should still be O(n), well O(n + m), the two Strings, but basically the same.

My only question would be what if you are trying to eleminate a char that wasn't in the first String? I suppose you would just ignore it.

James Carman, President<br />Carman Consulting, Inc.

James Carman
Ranch Hand

Joined: Feb 20, 2001
Posts: 580

posted

0

Oh, that's O(m + n) where m = len(string) and n = len(invalidString)

Steven Bell
Ranch Hand

Joined: Dec 29, 2004
Posts: 1071

posted

0

Yes. At a glance that looks about right. And yes also on the O(m+n). It all depends on how detailed you are getting with your performance analysis. In general terms O(n+m) is roughly equal to O(n) because constants can be ignored and you end up with, worst case, O(2n) or O(2m)

Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

posted

0

Originally posted by Steven Bell: I would make a 'map' out of the first string with a counter for each character.

Well, let's say m = 2n. Then the whole thing is O(3n), which is still O(n). We could invent other relationships between m and n which trasnform the equation in all sorts of other ways if we want. But with big O notation, all we really care about is the exponent - variable names are generally interchangeable. N is the number of "things" - whatever things are most significant. The listener is expected to apply some common sense to figure out which things those are. [ February 28, 2005: Message edited by: Jim Yingst ]

"I'm not back." - Bill Harding, Twister

Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

posted

0

To me, O(m+n) means that

- we have two inputs into the algorithm - the two inputs are independent of each other, and - the algorithm is linear over both inputs

That looks differently to me than O(n), but I might simply have stopped my studies too early, I guess...

The total input is N = n + m, and the algorithm is O(N). All this means is that it has run time linearly dependent on the inputs. Whether the actual runtime is O(2n + 5m) makes no difference as it's still O(N). Any O(xN) algorithm is considered O(N) since xN is a linear function on N. Similarly, an O(xN + y) algorithm is also O(N).

On the other hand, if it weren't linearly dependent on both inputs, I guess you could consider them seperately, for example O(n^2,m^3). Then again, one of the inputs usually overshadows the other, and the problems we were given in school always had simple inputs. In a collection we dealt with just n, the number of items.

Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

posted

0

Originally posted by David Harkness: Similarly, an O(xN + y) algorithm is also O(N).

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

This isn't quite right, either. If we assume x is 1 and y is 2N then y > xN, but O(xN+y) is still O(N). I think David meant x and y to be constant, though.

I think it helps to keep in mind that the big O notation describes categories of behavior (e.g., linear, quadratic, logarithmic, etc.). It's pretty easy to determine which category by recalling your first year calculus, in particular L'Hospital's Rule.

In my understanding, that's only true if y is a constant, or if you can prove that y < xN for all N greater some value.

y is constant wrt N. If y is dependent of some other factor, then that factor should be considered as an input to your functions

So, function funcA takes time T(funcA) = an + y, and y = bm (where a and b are constants) then T= an + bm, and m is an input to your function. If y is dependent on n, then you are expressing T in a wrong way. The correct way to express T is this

T = a(p)n^p + a(p-1)n^(p-1) + ......... a(1)n + a(0) + b(p)m^p + b(p-1)b^(p-1) + ......... b(1)m + b(0) where n and m are inputs to your function and a(p), a(p-1)....a(1), a(0), b(p), b(p-1)....b(1), b(0) are constants

To simplify computing performance, we always take highest order of the equation. So, if your T is

T = 2n^2 + 10 n + 20 + 5m + 50

for a really large n, 2n^2 will be much larger than other parts of the equation. So, large that 10n + 20 + 5m + 50 becomes negligible

So, for large n, T = 2n^2

When we denote the Order, we ignore the coefficient of that equation . So,

Originally posted by Glenn Murray: I think David meant x and y to be constant, though.

Yes, I could have been more clear. I had just finished watching Shaun of the Dead and had a difficult time getting my mind into a mathematical frame.

Jayesh, thank you for posting a much more rigorous description.

Ilja Preuss
author
Sheriff

Joined: Jul 11, 2001
Posts: 14112

posted

0

So, if an algorithm has to inputs, and I want to say that it is linear regarding both inputs, how do I say that (say, in contrast to being linear to one, but constant regarding the other input)?

subject: Efficient algorithm for string manipulation