• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Finding String uniqueness

 
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

There is a very popular Question on finding the unique characters in the given string and the suggested code is



but the problem is, it doesn't work and I wont expect it to work; because, in "dup check" line we will be comparing two different numbers first thing,
Secondly say two different number may still have at least a bit sit in common in that case, "&" operator will still return true.
As we can see in this example, even if "a" is repeating it is not able to identify. How do I fix this problem?

Thanks,
 
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd fix it by getting rid of all of the code which does bit-shifting and things like that, and replace it by simple code which just compares char values to see if they are equal. I have no idea what that bit-twiddling code is for but, as you observe, it doesn't contribute to solving the problem. Just toss it and rewrite.
 
Saloon Keeper
Posts: 15484
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The checker here is used to keep track of which characters have already been used. However, the problem is that shifted is being compared against val, but shifted should be compared against the checker. That should solve the problem.

However, as Paul pointed out, it's much better to use clear code that solves the problem intuitively. Instead of shifting bits into an int, use a Set<Character> charactersUsed.
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, Stephen

you are correct I have to correct if( (val & shifted) > 0) to if( (checker & shifted) > 0). And it works fine. let me analyze what is going on and post it back.
 
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Please also check your posted code; you have ><< in line 8. I think there is an error in the forum sofware causing that; should it read <<?

I look forward to the explanation because the code you have posted with those shifts is completely incomprehensible. And what happens if you increase the length of the String tested?
 
Stephan van Hulst
Saloon Keeper
Posts: 15484
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Maybe I can clarify the algorithm a little bit, it's actually quite similar to using a set to keep track of used characters.

The variable checker is our set of used characters. Masking checker with the shifted character using the & operator is analogous to calling contains() on a set.

At the end of the loop, the character is added to the set using the |= operator.

The only reason to use this code is that it might be slightly more performant than using a Set<Character>. The cost is that it's unclear, and it only works for 32 different characters, marginally more than the amount of letters in the English alphabet.
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ok here is my explanation

1. ASCII character size is 256, as java is unicode double byte characters we have 2^15= 32768 characters. So, if we have a string > than this we can immediately
return that it is not unique
2. "shifted = (1 << val);" will set the bit representing the character and then "if( (checker & shifted) > 0)" will check if the newly set bit, is already visited
for eg: say string is "aba". for first character 'a', which is 97, "0001" << 97times which will be 0010. Now when we '0000' & '0010' it will fail the condition.
3. Now "checker |= shifted" will keep track of the visited characters, "0000" | "0010"= "0010" which is checker.
4. say for net character 'b' which is 98, "0001" << 98 will be 0100 and "0010" & "0100" will fail again and then "0010" | "0100" = "0110"
as you can see for 'a' the bit gets rotated 97 times and get set at 2, for 'b' it will be 3, for 'c' it will 4. which will be kept track by checker.
Now when again a duplicate character apapers in the string "0110" & "0010" > 0 will be true as 97th bit is already set.

I deciphered the algorithm, by debugging it.

Thanks,
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:you are correct I have to correct if( (val & shifted) > 0) to if( (checker & shifted) > 0).


No, no, no. That's correcting a solution which was WRONG to begin with - and plainly so complicated that YOU couldn't work out what was wrong with it.
It may solve your immediate problem, but it certainly doesn't validate the solution.

Even after Stephan's suggestion, would you be happy for your program to be a crucial part of the next Space Shuttle mission? And if not, why not?

Every single program you write (assuming we're not in the AI realm) needs to:
(a) Be correct. And that is not simply "I tested it with some stuff and it works". You need to know that it works. On time. Every time.
(b) Have other people who understand HOW it works. Otherwise it's Kleenex.

Winston
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:Ok here is my explanation
1. ASCII character size is 256, as java is unicode double byte characters...


Why does there need to be an explanation? The problem (as you stated it) is very clear:
  finding the unique characters in the given string

And here's my solution:
It may not be as slick or as fast or as compact as as the algorithm you're trying to turn into Java, but it WORKS (at least for any String that doesn't use "extended" characters).

And I know it works because I can prove it by going through the code step-by-step with anyone who's interested - although I hope I wouldn't have to.

Program simply and naturally. It's a key to a long and happy life..

Winston
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:I deciphered the algorithm, by debugging it.



But as far as I can see it doesn't satisfy the only requirement stated, namely "finding the unique characters in the given string". It prints a message which says when a duplicate is found, sure, but it doesn't tell us which characters aren't duplicates. Or perhaps it does, but the code is beyond my comprehension again?
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:Ok here is my explanation

1. ASCII character size is 256 . . . characters we have 2^15= 32768 characters. . . ,

Where did you get those figures from? By the way: they are both wrong.
 
Bartender
Posts: 5465
212
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ah well, a little bit shuffling never hurt anybody!

A more serious problem with the method is that it is wrong.
With the corrected code:

the output is

So you should take a long enough BitSet for your 'checker'.
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

Chandra shekar M wrote:Ok here is my explanation

1. ASCII character size is 256 . . . characters we have 2^15= 32768 characters. . . ,

Where did you get those figures from? By the way: they are both wrong.



Could you please tell me the correct one ?
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Souris wrote:ah well, a little bit shuffling never hurt anybody!

A more serious problem with the method is that it is wrong.
With the corrected code:

the output is

So you should take a long enough BitSet for your 'checker'.



how big it should be then?
 
Paul Clapham
Marshal
Posts: 28177
95
Eclipse IDE Firefox Browser MySQL Database
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:how big it should be then?



I think a better question is "Why does this algorithm think that 1 and q are the same character?" At least, you need to answer that before you start trying to salvage the code.
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That code looks beyond salvaging unless you are going in for an obfuscated code competition.

You can find out the bounds for ASCII by searching and the bounds for the char primitive are in the Java® Language Specification.
 
Stephan van Hulst
Saloon Keeper
Posts: 15484
363
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It doesn't necessarily have to do with the bounds for ASCII versus the bounds of char.

The problem is that an int can only hold 32 different bits, so the algorithm can only check if two characters are congruent modulo 32. '1' and 'q' are congruent modulo 32. Like Piet said, you need a bigger bit set if you want to check a wider range of possible characters.

However, any solution using a bit set is going to be horrible versus Winston's solution.
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Souris wrote:ah well, a little bit shuffling never hurt anybody!

A more serious problem with the method is that it is wrong.
With the corrected code:

the output is

So you should take a long enough BitSet for your 'checker'.



Corrected code is


Should fix the issue
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Chandra shekar M wrote:Corrected code [...] should fix the issue


Should it? I don't see how, because line 8 is still wrong, and now you're checking a bit in a number based on the numeric value of a character. Furthermore, you're doing it on a BigInteger, which is a sigh/magnitude number, so I'm not exactly sure how it works for negative numbers.

I hate to say, but your logic seems to be getting worse (and more tortuous), not better.

Why don't you try to write out in English (or your native language) what this logic is meant to be doing, step-by-step, and see whether it fulfils the requirement of finding the unique characters in a String.

Winston
 
Piet Souris
Bartender
Posts: 5465
212
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
In principal, there is nothing wrong. The ratio behind the method
has already been explained. But BigInteger is a bit dangerous
to use.

Much more important though is that when a fixed method does
not fail in a situation where it failed before, that it does not mean
that it is indeed fixed!

For example: look at the output of this:

You see, it does not report dups in "1q2r3s4t" anymore, but
now it looks as if it will never report any dups anymore.

How come? Well, it has to do with the same pitfall as
string.replace. Run the following code:



See the API for BigInteger.setBit and figure out
what's the difference. Then, fix your method again
and give it a thorough test!

To avoid all this messing around, I advised the use
of a BitSet. It works much easier than a BigInteger,
and to answer my remark about how long it should
be:
since chars are 16 bit in Java, there are 2^16
possible different characters, and so wee need that
many bits. Here goes:

It is a good excersize to use this in your code. Try it!

Lastly: your topic title was about finding the unique
characters in a given string. Now, we're capable
of printing out the duplicates, but that does not give
us the unique characters. See if you can adapt your
method to return the uniques. That is quite some more
work! But you will learn quite a few things from it.

Of course, Winstons code is much easier, but not half
as much fun!
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Souris wrote:In principal, there is nothing wrong...


Not with checking bits, no; but
  shifted = (1 << val);
is wrong, for the reasons Stephan explained, and is still part of Chandra's code.

Off-topic question: Does anyone know why BitSet is based on long arrays? It's not explicitly stated, but it's fairly obvious from the available factories (and is implemented that way, at least in the Oracle class), and it seems an odd choice, because longs aren't atomic - at least, I presume they still aren't.

I have the same query about BigInteger - why is it based on ints rather than chars, which would seem to be the obvious choice, particularly as they're unsigned. There are hundreds of lines of boilerplate in the class just to deal with that alone.

Ah well, not being a Vulcan I can only wonder.

Winston
 
Campbell Ritchie
Marshal
Posts: 79151
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Piet Souris wrote:. . . But BigInteger is a bit dangerous to use.. . .

Where on earth did you get that from?
 
Chandra shekar M
Ranch Hand
Posts: 175
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Guys, really appreciate the help.

 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic