Part I: Learning
- In our general algorithm for handling decision trees, we can’t handle the case when the attribute value is missing. In real life, this happens all the time. This could happen because the datum was not collected, the datum had an error and was removed, or many other real-world reasons. Given the general algorithm for inferring a decision tree as a baseline, let’s try to find a way to modify it in order to handle this special case.
- We need to be able to classify such training examples that are missing attributes. So, suppose an example x has a missing value for attribute A and that the decision tree tests for A at a node that x reaches. One way to handle this case is to pretend that the example has all possible values for the attribute, but to weight each value according to its frequency among all of the examples that reach that node in the decision tree. The classification algorithm should follow all branches at any node for which a value is missing and should multiply the weights along each path. Write a modified classification algorithm (in pseudocode) for decision trees that has this behavior.
- Modify the information-gain calculation so that in any given collection of examples C at a given node in the tree during the learning, the examples with missing values for any of the remaining attributes are given default values corresponding to the frequencies of those values in the set C.
- (Warmup for tomorrow) Consider the problem of separating N data points into positive and negative examples located in d dimensions, using a linear separator of dimension d-1. Obviously, with N=2 points, on a line of dimension d=1, a point will separate these two points regardless of where these points are located, assuming the points are distinct.
- Show that it can always be done for N=3 points on a plane of dimension d=2, unless they are collinear.
- Show that it cannot always be done for N=4 points on a plane of dimension d=2.
- Show that it can always be done for N=4 points in a space of dimension d=3, unless they are coplanar.
- Show that it cannot always be done for N=5 points in a space of dimension d=3.
- (Super Challenge) Show that N points in general, but not N+1 points, are linearly separable in a space of dimension N-1, with an N-2-dimensional hyperplane..
Part II: FaceThingy, continued
For the next part of FaceThingy, implement the most basic decision tree learner in the following steps.
- Feature selection. If you are to train with Haar-like features, you will need to make some of these Haar-like features! Go to the original paper that presented Violet-Jones (http://research.microsoft.com/en-us/um/people/viola/Pubs/Detect/violaJones_CVPR2001.pdf), and refer to figure 3. Create these four features, and any other that you think would be good predictors of faces. Create at least ten features.
- Using the information gain function, write a method that, given a set of labeled images, will use the information gain function to choose a single decision node that best classifies the data. You may use the DecisionTree class, or make your own.
- You should now have a method that finds a single node. Now, write a recursive method that uses the method you just wrote to build the entire tree. Your algorithm should reflect the general algorithm presented in class.