Useful Links

The list of all sources for learning ML/DL and programming, including useful books, papers, blogs, etc:

  • ML books:
    1. G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning (link)
    2. C. M. Bishop, Pattern Recognition and Machine Learning, 2006 (link)
    3. K. Murthy, Machine Learning: A Probabilistic Perspective, 2012 (link)
    4. Hal Daumé III, A Course in Machine Learning, 2017 (link)
    5. K. Murthy, Probabilistic Machine Learning: An Introduction, 2022 (link)
    6. A. Lindholm, Niklas Wahlström, F. Lindsten, T. Schön, Machine Learning, A First Course for Engineers and Scientists (link)
    7. K. Murthy, Probabilistic Machine Learning: Advanced Topics, 2023 (link)
  • Programming:
    1. C. Russell Severance Python for Everybody: Exploring Data in Python 3, 2016 (link)
    2. W. McKinney, Python for Data Analysis, 2nd Edition, 2017 (link)
    3. Q. Larson, Python Data Science, 2020 (link)
    4. J. Brownlee, Crash Course in Python for Machine Learning Developers, 2016 (link)
    5. Become a Python Master (link)
    6. Data Science With Python Core Skills (link)
    7. M. Yasoob, A great resource to get started with Python (link)
  • ML blogs:
    1. Lilian’s blog with many interesting learning notes (link)
    2. Distill, a scientific journal that operated from 2016-2021 with many well-explained ML concepts (link).
    3. Shakir Mohamed’s blog with many learning resources in ML/DL (link)
    4. A blog affiliated with Ermon’s research group at Stanford with some advanced topics in ML (link]
    5. A research blog for Zico Kolter’s research group at CMU with theoretical topics (link)
    6. Off the Convex Path, a blog with concepts in optimization methods (link)
    7. A blog by Eric Jang with technical material in ML/DL (link)
    8. Machine Learning Research Blog Francis Bach (link)
    9. A blog about NLP and ML, highlighting many research papers (link)
    10. Posts on machine learning, statistics by Ferenc Huszár (link)
    11. A blog with materials on DL, AI, and Cloud GPUs (link)
    12. Daniel Gissin’s blog about optimization (link)
    13. Determined AI with introductory and advanced materials in ML/DL (link)
    14. The CMU ML blog for general-audience medium for CMU researchers (link)
    15. Machine learning and learning theory research (link)
    16. A blog about DL methods in Speech by Loren Lugosch (link)
    17. A blog about transformers by Jonathan Bgn (link)
    18. Jay Alammar’s blog about BERT, GPT, etc (link)
    19. A blog about mathematical concepts in ML by Gregory Gundersen (link)
    20. Columbia Advanced Machine Learning Seminar (link)
  • Data Science
    1. KDnuggets, courses/blogs in Data Science, Machine Learning, AI & Analytics (link
    2. Analytics Vidhya, courses/blogs/books in Data Science, Machine Learning (link)
  • Courses:
    1. Andrew Ng, Machine Learning Specialization, 2012 (link)
    2. Hal Daumé III, A Course in Machine Learning, 2017 (link)
    3. Andrew Ng, The Deep Learning Specialization, 2018 (link)
    4. Bayesian Methods in Machine Learning by Roman Garnett (Washington University in St. Louis) (link)
  • Date repositories used in ML literature:
    1. UCI Machine Learning Repository. The Machine Learning Repository at UCI provides an up-to-date resource for open-source datasets (link).
    2. Google Dataset Search. Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they are hosted, whether it’s a publisher’s site, a digital library, or an author’s web page. It contains over 25 million datasets (link).
    3. Kaggle. Kaggle provides a large container of datasets, including over 50,000 public datasets and 400,000 public notebooks for the purpose of data exploratory analysis (EDA) (link).
    4. VisualData. It includes computer vision datasets by category; it allows searchable queries (link).
    5. CMU Libraries. This database includes high-quality datasets thanks to the collection of Huajin Wang, at CMU (link).
    6. The Big Bad NLP Database. This cool dataset contains datasets for various natural language processing tasks, created and curated by Quantum Stat (link).
    7. Hugging Face. This popular hub and framework contains 46,121 datasets used in state-of-the-art ML/DL research. This database includes all modalities of data like Natural Language Processing (NLP), Computer Vision (CV), Speech, Tabular, and Multimodal (link).
    8. Data.world. This is yet another data source. Data.world calls itself a “collaborative data community” and holds over 100,000 datasets ranging from crime to social media (link).
    9. Data.gov. This is a database for the US Government’s open data as an attempt to be more transparent. This database hosts over 300,000 datasets from different fields such as environment, ocean, and agriculture (link).
    10. Earthdata. Earthdata has been created by NASA as a part of its Earth Science Data Systems Program called Earth Observing System Data and Information System (EOSDIS) (link).
    11. Europeana Data. Open metadata on 20 million texts, images, videos and sounds gathered by Europeana (link)
    12. Data Planet. The largest repository of standardized and structured statistical data (link)