Hello everyone. A friend of mine, or better acquaintance, gave me one of his old university project that I'm doing to practice but I'm stuck.
There is provided a training set, so this particular (http://i57.tinypic.com/6zrp5v.jpg
), and the purpose of the program is to return to the time the printing of the rule and other information.
A sample output of this program is (http://i61.tinypic.com/32zjddc.jpg
Now we come to my problem.
I think the problem is in my method to determine if a node is leaf or not. The method in question is this:
The track gave some tips and that is that if a leaf node is the number of training examples that fall within the current partition is less than numberOfExamplesPerLeaf, or if all the training examples that fall within the current partition belong to the same class.
Analyzing the sample output I could understand that such "Overcast" is leaf because the corresponding value of the class (the last column of the training set) contains the same value.
The problem is that I do not understand how to analyze for all values of a certain kind (eg. All the "Overcast") to determine if it is leaf or less.
I say this because the method that analyzes the entire training set I thought so:
namely, that each example is a single element of the array mapSplit (array that contains the value of the node, the index start and end in the training set and the number of children).
Obviously it would not be difficult to bring together examples with the same value in a single element of mapSplit (because before executing the method setSplitInfo the training set is sorted by the attribute) but then it would not be possible to parse through more mapSplit all individual values of the training set would be grouped together because ... I do not know if I get the idea.
How you solve the problem? We say that is a logical problem rather than programming.
For completeness, I have uploaded here (http://www.filedropper.com/decisiontree
) the complete source code of the project so that, if you have time, you might try to see how it works because I do not think he was clear enough in explaining .
Very few files, you put little to try it because I'm not even half of the entire project.
Mind you, NOT the solution I want (so do not want a single line of code), but only some advice because I MUST get there myself.