Part 0: Machine Learning Formulation
- For each of the following learning problems, answer these questions. Would you use regression or classification? What are good features? What could an evaluation function be? What would the optimal (maximum or minimum) correspond to? Are there local minima or maxima to be aware of?
- Predicting whether a drink was made by the Coca-Cola company or PepsiCo.
- Predicting someone’s age.
- Predicting what instrument someone plays.
- Predicting what the temperature will be 2 weeks from now.
- Predicting the Apache II score. The APACHE II score is a score calculated for people in the emergency room (ER) or intensive care unit (ICU) as a way to assess the severity of trauma. Higher scores mean closer to death.
- Predicting the maximum safe speed to approach the road surface in front of an autonomous car.
- Predicting how to construct a sentence in response when presented with a question.
Part I: Playing with Data
Here, you will get used to thinking in terms of machine learning and large data sets.
First, go download a piece of software called Weka, here: http://www.cs.waikato.ac.nz/ml/weka/. Weka is an extremely handy tool that implements a lot of machine learning algorithms and wraps it up in a relatively easy-to-use interface. You will be playing around in here, and it should be pretty interesting, but remember that this is somewhat of a double-edged sword–while this is extremely powerful, it hides the complexity of many of these algorithms. It’s hard to appreciate how cool some of these algorithms are when they’re just presented fully implemented in a list. Also, many algorithms come with lots of parameters you can fiddle with, but without a thorough knowledge of the algorithms, many of these probably will not make sense.
Weka was made for people who understand machine learning to use. So, go play with it, but use it as a tool for learning more, and not as just a destination.
After Weka is downloaded, go download the iris data set at http://archive.ics.uci.edu/ml/machine-learning-databases/iris/. Grab the .data and .names files. This dataset is a classic in machine learning. It shows the measurements of three species of Iris plants, collected by Sir Ronald Aylmer Fisher. The three species, setosa, versicolor, and virginica can be differentiated by the 4 attributes that were collected: sepal length, sepal width, petal length, and petal width.
For this problem, please keep a log of what steps you are taking. This is what you will be turning in, and it will also help you think about the problem.
Formulate this learning problem a little more formally: what are the features? What are the response variable(s)? How can you use learning on this problem?
After (and only after) you have understood what your actual problem is, try to solve it using the tools provided in Weka. Play around if you don’t understand how to use it. If you continue to struggle, though, as one of us to help.
That was a warm up. The data was relatively clean, and everything was pretty much in its own cluster. Now, we will try a more difficult regression problem. Down this this dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
You are now going to try to predict the age of abalone using regression on the given attributes. Repeat the steps you took for the irises–formulating a problem, then attempting to determine a good model, and finally building the regression.
Part II: FaceThingy
Many of you have expressed preliminary excitement over this. Yes, it’s true. You will build a face detection program, and, for those of you daring enough, facial recognition that can tell people apart.
I’m also really sorry, but there’s an update to SimpleGraphics that will allow the drawing of raw images that you will need for this. Download it here: SimpleGraphics
Then, download the starter project for FaceThingy: FaceThingy (Note: this is about 6.5MB)
In class, we talked about how this code generally works so far. For today, try doing these things:
(1) To get used to manipulating images, read in a folder of images, then display the first image. When a user clicks the mouse on the right half of the screen, advance forward one image. If the click is on the left half, advance back one image.
(2) Make a button. When that button is clicked, use the BufferedImage.getRGB and setRGB methods to flip the image vertically.
(3) Now, for the actual important part, go implement the apply method in HaarFeature. This will actually calculate the value of the Haar-like feature applied to the image at the specied sub-image.
Part III: Submit!
If there’s anything you still haven’t submitted, now’s the time to do so!