Frank Poupard wrote:Hello, I try to implement a way to find the longuest repeted substring given a string (known as the LRS problem). I intend to do it in a very simple fashion no matter the algorithm's efficiency.

Fair enough; but, as Einstein said: Everything should be as simple as possible,

*but no simpler*.

One thing I can assure you of: You will NOT solve this problem by coding. So, my advice:

Stop.

As you may have gathered from looking at the Wikipedia pages on it, this is

__not__ a simple problem. I don't know how to solve it optimally, and I've been programming for nearly 40 years - and most of the fastest solutions seem to have been developed in the '90's.

However, there are a few things I can glean simply from the requirements:

1. For a source String of length n, the substring I'm looking for will be <= n-1 characters long if overlapping substrings are allowed, or n/2 if they aren't (see below).

2. In order to quickly accept or eliminate candidates, I'll need a fast way of storing and finding substrings - and in

Java,

`HashMap`s (←click) are often the best solution for stuff like that.

3. The number of

*possible* substrings of any length in a String of length n is a triangular number - n (for length 1) + n-1 (for length 2) + n-2 (for length 3) +.... and so on.

4. If there are no repeated

*characters* (substrings of length 1) in the string, then there can't possibly be any repeating substrings; and I

*think* this property holds true for any length of substring.

If so, we can start at x=1, and simply build maps of substrings of length x until we find a length that does

`not` contain any duplicates, at which point we know that the longest possible one was x-1.

How do we know whether there are any duplicates? Simple: keep a list of the start points of each substring we check and, if it has already been found (ie, our map already contains it as a key), ADD it to the existing list. Then, any substring in our map that has more then 1 start point may be a duplicate (see below).

The only other thing you have to decide (or you may have been told) is whether a "duplicate" can

*overlap* an existing substring or not. If not, it adds an extra step to the procedure, but it also means you only have to look for substrings up to length n/2.

Now I'm pretty sure that this

__isn't__ the fastest way to do it, but I'm also pretty sure it will work. And note that I've done it

*without writing a line of code*.

So, a lesson for the future: You don't solve problems by coding, you solve them by

*thinking*.

Hope it helps.

Winston