I was looking at a Puzzle from a website called BrainBashers this morning. Here's the Q
During the recent BrainBashers cipher convention, a binary code contest took place. The contest consisted of a binary code transmission where the spaces between the letters were missing and there was no punctuation. Each letter of the alphabet was translated into its binary equivalent based on its position in the alphabet, a=1, b=10, c=11, d=100, e=101, f=110, g=111, h=1000, i=1001, j=1010, k=1011, l=1100, m=1101, n=1110, o=1111, p=10000, q=10001, r=10010, s=10011, t=10100, u=10101, v=10110, w=10111, x=11000, y=11001, z=11010. Can you find 10 countries?
Having a list of possible countries at hand, my solution would be to encode all country names into the binary code and then look for matches in the given encoded strings. But it would probably count as cheating.
Joined: Feb 18, 2005
Martin Vajsar wrote:Having a list of possible countries at hand, my solution would be to encode all country names into the binary code and then look for matches in the given encoded strings. But it would probably count as cheating.
Good one. Ok, let's pretend that there isn't a List<> but rather a bool method that takes a string and returns whether or not the string is a country name. Yeah, that's it.
You bigheaded English speaking people!
Why would you think these binary strings represent countrynames written in English?
Would you find "Sverige" or "Nederland"?
Nevertheless: disappointing puzzle. Wrote myself a dream of a recursive solution, only to find the last string alone having 680+
translations (possibly not all unique; I used an ArrayList in stead of a Hashset). I need a list of countries to see if there's a country in it, but in what language?
I found it surprising that the build times differ, and to find that HashSet and TreeSet hardly differ in speed.
But maybe that's only by the fact that there are only 2 binary strings that give rise to 300.000+ words.
Indeed, you do need a list of country names to solve this puzzle, but having such a list at your disposal: I don't think you can beat Martin's solution.
(well, assuming Mike's default language, and that the possibly non native English speaking author of the puzzle got every spelling correct)
Nevertheless, it reminds me of how fast computers are, nowadays: generating nearly 700.000 words, in about two seconds,
at least on my 7 year old E6400 processor. Truly amazing!
Joined: Feb 18, 2005
The next step is to try to identify heuristics that might make the search terminate quicker.
For instance, if we assume that the letter encodings are tried in order (A, B, C, ...) the last string (1101111111101111111110010011) would first find "ABAAAAAAABAAAAAAAADDAA" as a possible country name. Next would be "ABAAAAAAABAAAAAAAADDC". Would it be any more efficient to loop through the letters in reverse order (Z, Y, X, ...). In that case, the first word to try would be "MONOODS". One last option might be to use letter frequency, either for all of the problem space language (English or otherwise) or for some possibly non-authoritative list of countries (E, T, A, O, I, N, S, H, R, ...).
(My code above loops through letterCodes.keys(), so it's essentially using a random order of the letters.)
For any of those strategies, we could identify a country name that makes it inefficient. If you start search with "AAAAA", then it will take a while to find ZIMBABWE. Conversely, if you start with "Z" and work backwards, then it will take a while to find ALBANIA. The trick is to find the strategy that is most likely to find the most country names the fastest.
Any thoughts on either this or any other optimization?
Joined: Mar 05, 2008
Well, I think you guys were too quick to add rules precluding Martin's solution, which doesn't seem like cheating at all to me. But I think there may be faster solutions possible, given a complete list of possible country names. I would be inclined to store the binary-coded names in a binary trie, so that each digit is a node, and countries with the same starting letters could share the same nodes, until they diverge. This would allow you to search a pattern for matches against all possible country names, simultaneously, rather than searching for each country name separately. And it allows you to terminate a search much earlier, if the sequence of chars so far cannot match any possible country name.
Ah, after a bit of research, what I'm thinking of seems to be similar to the Aho-Corasick string matching algorithm. Modified for a binary alphabet, which should allow a few nice optimizations.
Get a list of all the country names in English , translate it into that binary encoding , then parse the input checking for any occurrence in the list.
This is probably the best architecture for a solution because, no matter what other solution you can come up with:
- at some point you need to compare a string (chars in the input) with the name of a country to see if they match , thus you need to have a list of all countries
- encoding a char into binary has a lower or equal cost than encoding a binary in a char
- whatever algorithm you chose for a solution , it will iterate on the input so it has at least the complexity O(n) , n = number of digits in the input
- this algorithm has O(n)
Anyhow , the real question then is how given an input String find the first occurrence of a certain sub string in it.
This is where you need to optimize. Looks like O(n) to me , but how fast can you make it go.