Winston Gutkowski wrote:Actually, you've stored a bit more than that, and I wonder if some of the things that you store aren't redundant.
Winston Gutkowski wrote:Also: DON'T make instance variables public. EVER.
Winston Gutkowski wrote:Well, Java has a very useful structure called an IdentityHashMap, that allows you to quickly retrieve values for specific objects. So, you could either set up one for each value, or you could create a NodeInfo class that contains all "non-essential" information about a TrieNode, and store that in an IdentityHashMap<TrieNode, NodeInfo>.
Then, whenever you encounter a TrieNode and you want to know its "count", you simply look it up in the Map.
Some of these values (eg, 'count') could also probably be calculated while the Trie is being built; although I have to admit that it "smells" like a value that might be better calculated via a method as required - but TBH I don't know.
Piet Souris wrote:
Not sure about your last reply. The way I look at it is what I already wrote. Given that I have an 'a' at level 1, then according to me there are two possibilities: either we're dealing with the word 'a' or we are dealing with the word 'ab'. Two possibilities therefore, with an entropy of 1. Now, in order to get that entropy, we must calculate the p for every child, taking that childs 'count' into consideration.
And that's why I talked about a special kind of null-character, so that the formula would only have to look at the children, and not also to inspect the 'isEnd' variable.
Winston Gutkowski wrote:
In fact, they include an even "simpler" structure:
where, instead of a List, each Node points to its "first" child; but that child may have "siblings".
Piet Souris wrote:Anyway, I've written my own version, according to what I understand of it.
I think it is about Alicia's version. but I added my own 'TrieEndOfWord class, to see if it works
Piet Souris wrote:Given an ‘a’ at level 1 (so in fact I have a word that starts with an ‘a’), and I ask what possible outcomes I have one level down, I see the following possibilities:
1) The ‘empty’ character, meaning end-of-word
2) ‘b’
Both with probability 0.5.
Piet Souris wrote:
If I define my own notation, I’d say that the ‘entropy’, given an ‘a’ at level 1, is:
- { 0.5 * log_2(0.5) + 0.5 * log_2(0.5) } = - log_2(0.5) = - (-1) = 1
So H(a, 1, 1) (is entropy, given an ‘a’ at level 1 and going 1 level deep) = 1.
Joel McNary wrote:I would suggest using Integer as the type for your list. Integers in Java are 32 bits, and you need to handle m <= 16, so Integer will be more than enough. If you want, you could use Short (which has a length of 16 bits), but if you start debugging you could see some unexpected display results and get confused. Short is a type really only used in memory-limited applications, though, and so isn't commonly used in classroom assignments. Other than that, your next steps look correct to me.
Joel McNary wrote:1). I would not use Byte as the keys in my maps, because that will not handle your larger m-values (Bytes handle a maximum of m = 8, and even then you might see some unexpected results if you print out the Map, since bytes in Java are signed!)
Joel McNary wrote:2). You will need another method to transform your byte[] input to an array (or List) of something else. For example, an input of byte[] {114} and m = 2 should give you back an array of two-bit numbers ({1, 3, 0, 2}). I leave it as an exercise to you to figure out why 114 would become those four numbers, but I gave you the answer already in this post. What should an input of byte[]{114} and m = 4 return? If you can figure that out, you are most of the way to writing the transformation function.
Joel McNary wrote:3). Do you have to handle cases like m = 3? That can be interesting, because there is no guarantee that a file will have an even multiple of 3 bits. What should be done in those cases? (What would byte[]{114} and m=3 return?
Joel McNary wrote:But before you start those changes, can you get it working front-to-back for m = 8? What entropy value does it give you for that text input?