• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Find shortest string with all the words in any order

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

this is my 1st post here, so sorry if i missed forum to post this question.

I was solving this code question (like for interview):
"Given a large document and a short pattern consisting of a few words (eg. W1 W2 W3), find the shortest string that has all the words in any order (for eg. W2 foo bar dog W1 cat W3 -- is a valid pattern)"
this is from here: http://www.glassdoor.com/Interview/Given-a-large-document-and-a-short-pattern-consisting-of-a-few-words-eg-W1-W2-W3-find-the-shortest-string-that-has-all-QTN_216408.htm

Here is my code. I tested this using Peter Norvig's big.txt (~1mil words) and it runs just under 3sec for my test case.

Is this fast enough? Other comments? Thankx in advance!


 
Ranch Hand
Posts: 423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Red Ronin wrote:
Is this fast enough? Other comments? Thankx in advance!


If performance satisfies you (or your customer) ..... then yes, it is fast enough.
In other case there are some improvements possible.

First, you don't need 6 different regex patterns to check this condition,
only one is sufficient:


Follow this link to study regex patterns, it's really cool: http://www.regular-expressions.info/lookaround.html

Second, don't use regex at all,
simple 3 calls to a function String.contains() would be much more faster:
http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#contains%28java.lang.CharSequence%29


Third, the given problem is: "find the shortest string".
In your case, a "string" is equivalent to one line from the file.
So, if you have found a line that matches the condition, and the length of this line is X,
then you need to check only lines that have length < X.
For each new line check if it size is < X, if no - then just skip this line and take the next one.
If you find a new line shorter than X, that meet the condition too, then set X = length on this new line.
In this way you avoid searching substrings within the string probably for 50% or even more lines.
 
Bartender
Posts: 10780
71
Hibernate Eclipse IDE Ubuntu
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Ireneusz Kordal wrote:Third, the given problem is: "find the shortest string".
In your case, a "string" is equivalent to one line from the file.
So, if you have found a line that matches the condition, and the length of this line is X,
then you need to check only lines that have length < X.


Hmmm. Not so sure about that (although I totally agree that contains() is better than a regex).

If you used indexOf() and stored the results, you could then check the span between the first and last occurrence to work out your "shortest string", maybe something like:but the wording of the problem is very open to interpretation.

Winston

BTW: The above could be optimized along the lines that Ireneusz suggested to eliminate the third search if the first two produce a span that is greater than the one you already have; but personally I suspect it'll make the code a lot messier for not a lot of gain.
 
Ranch Hand
Posts: 50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What if the shortest string is between lines? What if the shortest string is a sub string in a line? ex: a w1 w2 w1 w3 x

I believe the shortest Time possible is O(n) for this problem, since divide and conquer would give O(n log n).

Your algorithm would be the almost correct one since checking for 6 patterns is still consider constant order, you would only have to change your line reader and evaluate more like the pseudo code from Ireneusz Kordal.

Performance means the fastest and right one.


To be completely honest this is a limited example of maximum or minimum sub sequence where S is only of w1, w2, w3 and a that represents any other word.
reply
    Bookmark Topic Watch Topic
  • New Topic