Description

This course will provide an introduction to the theory of statistical learning and practical machine learning algorithms. We will study both practical algorithms for statistical inference and theoretical aspects of how to reason about and work with probabilistic models. We will consider a variety of applications, including classification, prediction, regression, clustering, modeling, and data exploration/visiualization.

Download the syllabus.

Instructor

Mark Davenport
Email: mdav (at) gatech (dot) edu
Office: Coda, S1117
Phone: (404)894-2881
Office Hours: TBD.

Lectures

Lectures are Tuesdays and Thursdays from 9:30-10:45am and will be held in Clough Commons, Room 152. Lectures will also be available online live (via the link provided in Canvas) and after a brief delay will be made available in the Canvas Media Gallery.

Prerequisites

Throughout this course we will take a statistical perspective, which will require familiarity with basic concepts in probability (e.g., random variables, expectation, independence, joint distributions, conditional distributions, Bayes rule, and the multivariate normal distribution). We will also be using the language of linear algebra to describe the algorithms and carry out any analysis, so you should be familiar with concepts such as norms, inner products, orthogonality, linear independence, eigenvalues/vectors, eigenvalue decompositions, etc. as well as the basics of multivariable calculus such as partial derivatives, gradients, and the chain rule. If you have had courses on these topics as an undergraduate (or more recently) you should be able to fill in any gaps in your understanding as the semester progresses. Finally, many of the homework assignments and the course projects will require the use of Python. Prior experience with Python is not necessary, but I am assuming a familiarity with the basics of scientific programming (e.g., experience with C, MATLAB, or some other programming language).

Text

There is no required text for this course. There are many books, journal papers, and other online resources that have been influential in the development of the course that will be listed in the resources section of the course web site. The bulk of the material will provided in my lecture notes.

Topics Covered

  • Theory of generalization
    • Introduction to classification
    • Concentration inequalities and generalization bounds
    • The Bayes classifier and the likelihood ratio test
    • Nearest neighbor classification and consistency
    • Vapnik-Chervonenkis (VC) dimension
    • VC generalization bounds
    • bias-variance tradeoff
    • overfitting
  • Supervised learning
    • Linear classifiers
      • plugin classifiers (linear discriminant analysis, Logistic regression, Naive Bayes)
      • the perceptron algorithm and single-layer neural networks
      • maximum margin principle, separating hyperplanes, and support vector machines (SVMs)
    • From linear to nonlinear: feature maps and the ``kernel trick''
    • Kernel-based SVMs
    • Regression
      • least-squares
      • regularization
      • the LASSO
      • kernel ridge regression
    • model selection, error estimation, and validation
  • Unsupervised learning
    • Feature selection
    • Dimensionality reduction
      • principle component analysis (PCA)
      • multidimensional scaling (MDS)
      • manifold learning
    • Latent variables and structured matrix factorization
      • non-negative matrix factorization
      • sparse PCA
      • dictionary learning
      • latent semantic indexing, topic modelling
      • matrix completion
    • Density estimation
    • Clustering
      • k-means
      • Gaussian mixture models and expectation-maximization
      • spectral clustering
  • Advanced supervised learning
    • Decision trees
    • Ensemble methods
    • Random forests
    • Multi-layer neural networks and backpropagation
    • Deep learning
  • Other topics (as time and interest permits)
    • Graphical models
    • Reinforcement learning