In this post, I suggest some well-known free resources in the ML ecosystem.
Table of Contents:
- Learning about ML
- Remote computing (for free)
- Papers and written algorithms
- Datasets for ML
- Learning to code
- Python libraries for ML
Learning about ML
See my post on Some references for getting started and beyond with ML.
Remote computing (for free)
- Google Colab: “Colaboratory is a free Jupyter notebook environment that requires no setup, and runs entirely (writing, running, & sharing code) on the Cloud.”
- Kaggle Kernels: “Kaggle Kernels is a no-setup, customizable, Jupyter Notebooks environment. Access free GPUs and a huge repository of community published data & code.”
Papers and written algorithms
- arXiv: “arXiv is a free distribution service and an open-access archive for 1,657,256 scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.”
- Arxiv Sanity Preserver by Andrej Karpathy: “This project is a web interface that attempts to tame the overwhelming flood of papers on Arxiv. It allows researchers to keep track of recent papers, search for papers, sort papers by similarity to any paper, see recent popular papers, to add papers to a personal library, and to get personalized recommendations of (new or old) Arxiv papers.”
- Connected Papers: “A visual tool to help researchers and applied scientists find academic papers relevant to their field of work.”
- GitHub: “GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside 40 million developers.”
- HAL (Hyper Articles en Ligne): “an open archive where authors can deposit scholarly documents from all academic fields”
- Papers With Code: “The mission of Papers With Code is to create a free and open resource with Machine Learning papers, code and evaluation tables.”
- Semantic Scholar by the Allen Institute for AI: “Semantic Scholar is a free, AI-powered research tool for scientific literature.”
Since Oct. 2020, Papers with Code partners with arXiv: machine learning articles on arXiv have a Code tab to link official and community code with the paper.
Datasets for ML
- data.gouv.fr: “Open platform for French public data”
- Google Dataset Search: “Dataset Search allows users to find datasets on the Web through a simple search.”
- ImageNet: “ImageNet is an image dataset organized according to the WordNet hierarchy. Each meaningful concept in WordNet, possibly described by multiple words or word phrases, is called a “synonym set” or “synset”. There are more than 100,000 synsets in WordNet, majority of them are nouns (80,000+).”
- Kaggle: 28 680 datasets
- UCI Machine Learning Repository: “We currently maintain 488 data sets as a service to the machine learning community. You may view all data sets through our searchable interface.”
- List of datasets for machine-learning research from Wikipedia
Learning to code
- Codecademy: a popular online tutorial platform for beginners.
- HackerRank: “Join over 7 million developers, practice coding skills, prepare for interviews, and get hired.”
- Jupyter Notebook Tutorial: The Definitive Guide by DataCamp: “This tutorial explains how to install, run, and use Jupyter Notebooks for data science, including tips, best practices, and examples.”
- LeetCode: “LeetCode is the best platform to help you enhance your skills, expand your knowledge and prepare for technical interviews.”
- Python for Non-Programmers: “If you’ve never programmed before, the tutorials on this page are recommended for you; they don’t assume that you have previous experience.”
- Stack Overflow: a help forum for developers with over 9.5m members.
The Missing Semester of Your CS Education lectures by MIT are really interesting! The 2020 video lectures are available on YouTube.
“Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered, and is instead left to students to figure out on their own: proficiency with their tools. We’ll teach you how to master the command-line, use a powerful text editor, use fancy features of version control systems, and much more!”
Python libraries for ML
See my post on Python libraries for Data Scientists.