This course needs a minimum of 4-6 hours/week of work.
There are weekly assignments, including both theoretical and programming questions.
Assignments should be done individually; however, discussion with your friends is encouraged.
Many of the questions in the assignments may seem challenging at the first attempt, but with enough spending time, every student can solve them.
In addition to the course materials, you may engage yourself with numerous resources out there.
Being active in the class is VERY important for me. This is not a spectator sport. The more you engage yourself in the course, the better your understanding of concepts.
Always try to ask any questions you may have. There is no place to be shy.
Most questions and examples come with real-world datasets.
To be successful in this course, you need to develop both your mathematical background and programming skills.
Course Material
There is no formal reference for this course. All material will be posted online on the course webpage. Please make sure to check the course webpage regularly as course materials will be added weekly as we progress toward the end of the class.
Here is a brief list of some useful sources to educate yourself. The full list of all sources and useful books, papers, blogs, etc can be found here.
Machine/Deep Learning:
G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning (link)
C. M. Bishop, Pattern Recognition and Machine Learning, 2006 (link)
K. Murthy, Machine Learning: A Probabilistic Perspective, 2012 (link)
Andrew Ng, Machine Learning Specialization, 2012 (link)
Hal Daumé III, A Course in Machine Learning, 2017 (link)
Andrew Ng, The Deep Learning Specialization, 2018 (link)
K. Murthy, Probabilistic Machine Learning: An Introduction, 2022 (link)
K. Murthy, Probabilistic Machine Learning: Advanced Topics, 2023 (link)
Programming:
C. Russell Severance Python for Everybody: Exploring Data in Python 3, 2016 (link)
W. McKinney, Python for Data Analysis, 2nd Edition, 2017 (link)
Most datasets used in this course and in ML literature can be found through the following links:
UCI Machine Learning Repository. The Machine Learning Repository at UCI provides an up-to-date resource for open-source datasets (link).
Google Dataset Search. Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It contains over 25 million datasets (link).
Kaggle. Kaggle provides a large container of datasets, including over 50,000 public datasets and 400,000 public notebooks for the purpose of data exploratory analysis (EDA) (link).
VisualData. It includes computer vision datasets by category; it allows searchable queries (link).
CMU Libraries. This database includes high-quality datasets thanks to the collection of Huajin Wang, at CMU (link).
The Big Bad NLP Database. This cool dataset contains datasets for various natural language processing tasks, created and curated by Quantum Stat (link).
Hugging Face. This popular hub and framework contains 46,121 datasets used in state-of-the-art ML/DL research. This database includes all modalities of data like Natural Language Processing (NLP), Computer Vision (CV), Speech, Tabular, and Multimodal (link).
We start by introducing what machine learning is, and why we need to know about it. Next, we review the required mathematical background for understanding Machine Learning (ML) and Deep Learning (DL).
We cover the following fundamental topics:
Linear Algebra
Probability
Statistics
Optimization
Next we do a crash course in Python and Numpy.
The tentative content of the course is listed as follows (please note that this list is subject to change):