I am trying to read in a text from a file and then using an algorithm to find the anagrams by using the concept of summaries. Suppose that a word is a string of one or more lower case Roman letters a through z, without accents, blanks, or punctuation marks. Also suppose that the letter a corresponds to 0, that b corresponds to 1, etc., up to z that corresponds to 25. (These are not the A SCII or Unicode codes for those letters!) For each word, we make an array of 26 integers, called a summary of that word. In the summary, the element at index k tells how many times the letter corresponding to k appears in the word. For example, if we have the word present, then its summary looks like this, where the large numbers are array elements, and the small numbe
After we have read all the words, there will be many nodes in the anagram tree, each with a list of one or more words. The words in each list are anagrams of each other, because they all have the same summary. We then traverse the tree, visiting each node. If we find a node whose list has only one word, then we ignore it, because that word is an anagram of only itself. However, if we find a node whose list has two or more words, then we print those words, because they are anagrams of each other.
Anagram trees. Because of the K and R algorithm, summaries are totally ordered, so we can use them as keys in a binary search tree. We will call it an anagram tree. The anagram tree’s keys are summaries, and its values are linear singly linked lists of strings, in which each string represents a word. The anagram tree is the basis for our efficient anagram finding program.
Here is my idea. We start with an empty anagram tree. Then we read words as strings from a text file. Every time we read a word, we construct its summary. Then we search for a node in the tree that has the summary as its key, using the "K-and-R" algorithm to control the search. If we find the node, then we add the word to the node’s list (if it’s not already there). If we do not find the node, then we add a new node to the list. The new node’s key is the summary, and its value is a list that contains the word.
The exception message says the problem is in the copySummaries() method. The only things that can be null and cause the NPE there are your left and right arrays. The next stack trace message points to the add() method. So you'll probably want to check the things you're passing to the copySummaries() method and make sure they are not null.
If you used a Scanner and .useDelimiter("[^A-Za-z]"), you could eliminate a lot of the code for the things that Scanner can do that you are doing yourself.
The best ideas are the crazy ones. If you have a crazy idea and it works, it's really valuable.—Kent Beck