• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Similar String grouping in Java

 
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
I am working on a problem wherein I need to Group similar strings in an Array.
i.e If my String array is say "AXPZ","BXT","PZXA","XAZP","XBT","TBX","ABCD"
then I would group AXPZ,PZXA and XAZP in one group.
BXT,XBT and TBX in 2nd group
ABCD in 3rd group.
A group consists of String having same characters (order does not matter)
1)What would be the best approach for the problem.
2) What would be the final DataStructure representing groups and Similar Strings.
3)What should be the approach for this problem.

If anyone has worked on similar problem do provide some comments or ways to approach it.

Regards,
Ankur Luthra
 
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ankur Luthra wrote:
1)What would be the best approach for the problem.



"Best" depends on your specific requirements, constraints, and use cases. It's likely that there isn't one approach that can be definitely said to be "best".


2) What would be the final DataStructure representing groups and Similar Strings.
3)What should be the approach for this problem.



One idea that comes to mind--that may or may not meet your requirements--is to define a class that stores the original version of each string and the sorted version. Then, to determine if two strings are "similar", you just look at their sorted versions. For example, AXPZ,PZXA and XAZP will all sort to APXZ.

Another approach--that may or may not work for you--is to use a BitSet for each string and turn on the bits corresponding to the letters that are present in that String. If two BitSets have the same bits turned on, then the corresponding Strings contain the same letters.

I leave it to you to consider the relative merits and demerits of these approaches in the context of your requirements.

 
Ankur Luthra
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I was able to do it in following way.Any comments to improve performance if the Array size increases?
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:Another approach--that may or may not work for you--is to use a BitSet for each string and turn on the bits corresponding to the letters that are present in that String. If two BitSets have the same bits turned on, then the corresponding Strings contain the same letters.


I fear I see a fatal flaw in that approach, kemosabe: wouldn't "AB" and "AAB" produce the same BitSet?

Winston
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ankur Luthra wrote:I was able to do it in following way. Any comments to improve performance if the Array size increases?


Well, I notice you've defined a HashSet, but don't appear to use it. What about a HashSet of sorted strings?

Better yet might be to create a HashMap, where each key is a sorted string, and each value is a List of all the strings that contain the same letters.

Winston
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:

Jeff Verdegan wrote:Another approach--that may or may not work for you--is to use a BitSet for each string and turn on the bits corresponding to the letters that are present in that String. If two BitSets have the same bits turned on, then the corresponding Strings contain the same letters.


I fear I see a fatal flaw in that approach, kemosabe: wouldn't "AB" and "AAB" produce the same BitSet?

Winston



Part of what I meant when I told the OP, "I leave it to you to consider the relative merits and demerits of these approaches in the context of your requirements," was defining whether he could have duplicated letters and what effect it would have on using one approach or the other--and, for that matter, if AB and AAB should be considered a "match" by his rules.
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ankur Luthra wrote:I was able to do it in following way.Any comments to improve performance if the Array size increases?



Not regarding performance in particular, just general comments:

1. Your ModifiedString class should not have setters. Just take the single, unsorted String as a constructor arg, and set the original and sorted variables right there.

2. It looks like there's way too much going on there. An array, and 3 Lists?

3. Don't use get(i) for iteration. Use an Iterator, whether explicit, or implied via foreach loop.

4. Two of your ArrayLists are unparameterized. You should use generics there.

5. Unless you know your variable specifically needs to be an ArrayList, just declare it as List rather than ArrayList.
 
Ankur Luthra
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:

Ankur Luthra wrote:I was able to do it in following way.Any comments to improve performance if the Array size increases?



Not regarding performance in particular, just general comments:

1. Your ModifiedString class should not have setters. Just take the single, unsorted String as a constructor arg, and set the original and sorted variables right there.

2. It looks like there's way too much going on there. An array, and 3 Lists?

3. Don't use get(i) for iteration. Use an Iterator, whether explicit, or implied via foreach loop.

4. Two of your ArrayLists are unparameterized. You should use generics there.

5. Unless you know your variable specifically needs to be an ArrayList, just declare it as List rather than ArrayList.


Thank you Jeff.I was focusing mainly on getting the work done so skipped these points.Thanks for the Inputs
 
Ankur Luthra
Ranch Hand
Posts: 39
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


Well, I notice you've defined a HashSet, but don't appear to use it. What about a HashSet of sorted strings?


Can you please tell me where can I fit in HashSet in this.Are you talking about the final Data Structure i.e. finalList in my case or taking making al HashSet?
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jeff Verdegan wrote:Part of what I meant when I told the OP, "I leave it to you to consider the relative merits and demerits of these approaches in the context of your requirements,"


Oh, OK. My profuse apologies.

Winston
 
Winston Gutkowski
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ankur Luthra wrote: Can you please tell me where can I fit in HashSet in this.Are you talking about the final Data Structure i.e. finalList in my case or taking making al HashSet?


Well, there are several ways of doing what you want; but the main advantage of a HashSet / HashMap in this case is speed, because it can do a "contains" check in O(1) time. Most other structures will require either O(n) or O(log(n))) time.

Assuming you have something like a HashMap<String, List<String>>, my logic would go something like this :However, as Jeff pointed out, if you're only interested in grouping strings that contain the same letters, regardless of count; you may need another method.

HIH

Winston
 
Jeff Verdegan
Bartender
Posts: 6109
6
Android IntelliJ IDE Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Winston Gutkowski wrote:

Jeff Verdegan wrote:Part of what I meant when I told the OP, "I leave it to you to consider the relative merits and demerits of these approaches in the context of your requirements,"


Oh, OK. My profuse apologies.

Winston



None necessary, of course.
 
Thank you my well lotioned goddess! Here, have my favorite tiny ad!
a bit of art, as a gift, the permaculture playing cards
https://gardener-gift.com
reply
    Bookmark Topic Watch Topic
  • New Topic