This webpage contains the online material for the kernel method practical.
exercise1and have a look at the output.
Question 1: How does the error depend on the number of dimensions D? Does it go up or down? Is that expected?
OK, apparently the approximation is working then. Let us see it in action now. We classify a simple circle, points with radius smaller than one are negative, radius bigger is positive. Start the first run of this classification problem. Start
Question2 : Make sure you understand the plots and the text output. Do the performance numbers make sense to you? Which classifier is best? How does the linear SVM perform? Why?
Question3 : The linear SVM does not perform well. Maybe an embedding into a higher dimensional space in the following way works? (do not type this in, read on for the rest of the question first.)
%no need to type this in the sheel, read on w = randn(D,size(X,1)); X = w*X;What do you expect? Think first! Then set the variable do_embed to one and re-run exercise2, this will apply the above mentioned transformation
do_embed = 1; exercise2Does this match your expectations?
Question4 : Change the number of training points to 200 and re-run exercise2
num_train=200; exercise2What happens with the runtimes of the SVM trainings? Why do you think that is? Is there something special about the dataset?
Question5: Shogun supports specialized linear SVM solvers and those are much faster. Let us change to use a different solver by
linear_solver=1; exercise2Is the runtime better now?
Question6 Appreciate the speed gain by changing the number of training examples to something very high, say 100k and re-run
num_train=100000; exercise2Which method is the best in terms of speed/accuracy?
Question 7 Exit matlab:
Bonus Question (if you have time before the next part) : In case you know what a Neural Network is, do you see a connection? How could these random fourier features also be interpreted? Do you see where backpropagation can be used? Is the overall system still convex?
Setup: If you are within a Matlab session, exit Matlab first. To get started for this part, run the script
./MSchallenge.shThis will open a Matlab session and ask you to provide a name for your team. Next, it will load the dataset, train a baseline linear least squares model, and save your first submission.
Challenge: Now, you should write your own script to improve over the baseline and, possibly, win the competition! You can use the function krr.m to train a Kernel Ridge Regression model for several values of the regularization parameter. Type
help krrfor a description of the interface. You will need to design a proper model selection mechanism for KRR.
Editing: If for some reason you cannot use the Matlab GUI, you can resort to editors such as vim, joe, pico, or emacs. Alternatively, you can transfer the files to your local computer and work with a local text editor. You will need to transfer the files back to the MLSS machine in order to be evaluated. If you have a linux/macos system, you can use the command
scpto transfer the files. If you use Windows, please download and install WinSCP.
Hints: If you do not know where to start, edit one of the two incomplete example scripts rrval.m or krrrbfGCV.m and try to address the TODOs in the code. Make sure to save the results using the function writeoutput.m
LeaderBoard: We will update the LeaderBoard below once in a while (obviously, not too often!) by reading your output files and computing your performance score. Keep an eye on it to see how well you are doing. Again, note that this is not a real-time leaderboard to prevent you from optimizing the performance score! Remember that FIT = 1-NormalizedRMSE (the higher, the better)