MATA6200 Topics in Mathematics of Data Science (4 cr)
Description
Content
This course covers basics of optimization and computational linear algebra used in Data Science.
(1st part) Introduction: Classification problem and examples (imaging, shape analysis, words’ mover distance, etc). Brief recall on some elements of Linear Algebra. Linear systems: Least square, Linear Regression, Singular-value decomposition/principal-
component analysis, Rayleigh quotients, K-means.
(2nd part) Convergence: Local optima and global optima. Some elements of Convex Analysis. Convexity and smoothness, Lipschitz functions, Strong convexity. Gradient Decent methods, Newton’s method.
(3rd part) Metric Learning / Cost functions: Distances (Euclidian, Earth Mover distance) and f-Divergences (e.g., Kullback-Lieber). Properties on the real line. Study the particular case of the space of Gaussian distributions in 1d.
Time allowing further topics can be discussed as for instance: Linear programming and Sinkhorn Algorithm, Neural Networks, Generative Models, Zero-sum games, etc.
.
Completion methods
We expect the students of being very active on the development of the course. Grades will be based on the homeworks (1/2) and a final exam (1/2). The students will receive a short material to work on before the lectures as well as about three or four homework assignments will be provided for personal study after the lectures.
Learning outcomes
• Study convergence of basic algorithms used in Applied Mathematics and Data Science. • Understand some theoretical issues in Machine Learning algorithms and be aware of open problems in the field.
• Reinforce some concepts studied in Calculus and Linear Algebra by applying these on concrete problems.
• Be familiar with some concepts and theorems in Convex Analysis.
• Be introduced to some mathematical ideas and concepts which are going to be devel- oped in a master degree in mathematics.
Additional information
The goal of this course is to introduce some mathematical aspects of Data Science to
undergraduate students in mathematics. In particular, we aim to develop mathematical tools that are crucial to understand basic theoretical issues in supervised and unsupervised learning.
This course focus on fundamental aspects of the theory and the formalism, rather than
study different types of algorithms and recipes to solve a specific problem. One of the course goals is to present some applications of Calculus and Linear Algebra (studied in the first year degree in mathematics) in an applied context.
Programming: No programming skills are required and there will be no coding homework in this course.
Description of prerequisites
Study materials
of the course material is inspired on selected sections of the following books.
- On Convex Analysis:
1. S´ebastien Bubeck. Convex Optimization: Algorithms and Complexity. Foundations and Trends in Machine Learning Vol. 8, No. 3-4 (2015) 231-358.
2. Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge Univer- sity Press. Available at: https://web.stanford.edu/ boyd/cvxbook/
- On Linear Algebra:
3. Gilbert Strang. Introduction to Linear Algebra, Fifth Edition, 2016.
- Complementary material (selected topics):
4. Lindsay I Smith. A tutorial on Principal Components Analysis. Available at:
https://www.cs.otago.ac.nz/cosc453/student tutorials/principal components.pdf
5. Thomas M. Cover, Joy A. Thomas. Elements of Information Theory 2nd Edition
(Wiley Series in Telecommunications and Signal Processing). 2006.