Overview
Machine learning studies the question “how can we build computer programs that automatically improve their performance through experience?” This includes learning to perform many types of tasks based on many types of experience. For example, it includes robots learning to better navigate based on experience gained by roaming their environments, medical decision aids that learn to predict which therapies work best for which diseases based on data mining of historical health records, and speech recognition systems that learn to better understand your speech based on experience listening to you.
This course is designed to give PhD students a thorough grounding in the methods, theory, mathematics and algorithms needed to do research and applications in machine learning. The topics of the course draw from machine learning, classical statistics, data mining, Bayesian statistics and information theory. Students entering the class with a preexisting working knowledge of probability, statistics and algorithms will be at an advantage, but the class has been designed so that anyone with a strong numerate background can catch up and fully participate.
Prerequisites
 Basic probability and statistics are a plus.
 Basic linear algebra (matrices, vectors, eigenvalues) is a plus. Knowing functional analysis would be great but not required.
 Ability to write code that exceeds ‘Hello World’. Preferably beyond Matlab or R.
 Basic knowledge of optimization. Having attended a convex optimization class would be great but the recitations will cover this.
 You should have no trouble answering the questions of the self evaluation handed out for the 10601 course.
Resources
For specific videos of the class go to the individual lectures in the schedule below. This is also where you’ll find pointers to further reading material etc.
 Lecture slides in Keynote 1 2 5 6 7 and PDF 1 2 3 (annotated) 4 5 6 7 13
 Annotated Videos 2 3 4 5 9 10 11 13 18
 Problems 1 2
 Solutions HW1.Q1 HW1.Q3 HW2.Q12
Instructor
Geoffrey J. Gordon and Alex Smola
Teaching Assistants
Carlton Downey, Ahmed Hefny, Dougal Sutherland, Leila Wehbe, and Jing Xiang
Unit 1. Introduction
 Machine Learning Problems
 Classification, Regression, Annotation
 Forecasting
 Novelty detection
 Data
 Labeled, unlabeled
 Semisupervised, transductive, responsive environment, covariate shift
 Applications
 Optical character recognition
 Bioinformatics
 Computational advertising
 Selfdriving cars
 Network security
Unit 2: Basic Tools
 Linear regression
 Optimization problem
 Examples
 Overfitting
 Parzen windows
 Basic idea (smoothing over empirical average)
 Kernels
 Model selection
 Overfitting and underfitting
 Crossvalidation and leaveoneout estimation
 Biasvariance tradeoff
 Curse of dimensionality
 WatsonNadaraya estimators
 Regression
 Classification
 Nearest neighbor estimator
 Limit case via Parzen
 Fast lookup
Slides available in PDF.
Unit 3: Naive Bayes

 Bayes Rule
 Multiple testing
 Discrete attributes
 Continuous random variables
Slides available in 3a and 3b. Annotated versions are 3a and 3b.
Unit 4: Perceptron
 Application – Hebbian learning
 Perceptron
 Algorithm
 Convergence proof
 Properties
 Kernel trick
 Basic idea
 Kernel Perceptron
 Kernel expansion
 Kernel examples
Slides available in PDF and Keynote. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.
Unit 5: Optimization
 Unconstrained problems
 Gradient descent
 Newton’s method
 Convexity
 Properties
 Lagrange function
 Wolfe dual
 Batch methods
 Distributed subgradient
 Bundle methods
 Online methods
 Unconstrained subgradient
Slides in Keynote and PDF are here. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.
Unit 6: Duality (Support Vector Classification)
Slides available: Lectures 9 (annotated) and 10 (annotated).
Unit 7: SVM (Support Vector Classification
 Application – Optical Character Recognition
 Support Vector Machines
 Large Margin Classification
 Optimization Problem
 Dual Problem
 Properties
 Support Vectors
 Support Vector expansion
 Soft Margin Classifier
 Noise tolerance
 Optimization problem
 Dual problem
Slides available in PDF.
Unit 8: Kernel Methods and Regularization
Slides available in PDF
Unit 9: Tail Bounds & Averages
Slides available in PDF
Student Presentations [Playlist]
(Video Source: Alex Smola. Note Source: alex.smola.org)
Related Posts
« Brilliant Minds On The Original Roots of Artificial Intelligence, Cognitive Science, and Neuroscience Computational Pool Billiards: A New Challenge For Game Theory Pragmatics »