Tutorial: Machine Learning for Astronomy with Scikit-learn
AstroML
For more information on machine learning for Astronomy, see the astroML code and examples.
Machine Learning for Astronomy with scikit-learn
This tutorial offers a brief introduction to the fields of machine learning and statistical data analysis, and their application to several problems in the field of astronomy. These learning tasks are enabled by the tools available in the open-source package scikit-learn.
scikit-learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering.
Many of the examples and exercises in this tutorial require the ipython notebook, a tool which provides an intuitive web-based interactive environment for scientific python. Some of the material in the notebooks is duplicated in the following pages, but ipython notebook is required for some parts. For information on how to download the associated notebooks, see the Tutorial Setup and Installation page.
Note
This document is meant to be used with scikit-learn version 0.11+. Find the latest version here.
- 1. Tutorial Setup and Installation
- 2. Machine Learning 101: General Concepts
- 2.1. Features and feature extraction
- 2.2. Supervised Learning, Unsupervised Learning, and scikit-learn syntax
- 2.3. Supervised Learning: model.fit(X, y)
- 2.4. Unsupervised Learning: model.fit(X)
- 2.5. Linearly separable data
- 2.6. Hyperparameters, training set, test set and overfitting
- 2.7. Key takeaway points
- 3. Machine Learning 102: Practical Advice
- 4. Classification: Learning Labels of Astronomical Sources
- 5. Regression: Photometric Redshifts of Galaxies
- 6. Dimensionality Reduction of Astronomical Spectra
- 7. Exercises: Taking it a step further
- 8. Code examples





