Textbook Figures

This section makes available the source code used to generate every figure in the second edition of the book Statistics, Data Mining, and Machine Learning in Astronomy. Many of the figures are fairly self-explanatory, though some will be less so without the book as a reference. The table of contents of the book can be seen here(pdf).

The source code for the figures in the first edition are available from the dropdown menu.

Getting Started/Frequently Asked Questions

There is so much here: where to begin?

  1. Getting SDSS and other data, and quick analysis and plotting:

    • How do I access SDSS imaging data and plot various color-color diagrams? Chapter 1.

    • How do I access an SDSS spectrum and plot it? Chapter 1.

    • How do plot data in pixelated sky projections? Chapter 1.

    • How can I visualize a four-dimensional data set and its intrinsic correlations? Chapter 1.

  2. Basic statistical tools:

    • How do I use python to evaluate and plot various statistical distributions, such as Cauchy, Laplace, etc. Chapter 3.

    • How do I robustly estimate location and scale parameters of a one-dimensional data set? Chapter 3.

    • How do I robustly estimate parameters of a two-dimensional Gaussian? Chapter 3.

    • How do I account for selection effects (e.g. luminosity functions)? Chapter 4.

    • How do I generate a simulated sample drawn from an arbitrary distribution? Chapter 4.

    • How do I choose optimal bin width for a histogram? Do bins need to be same size? Chapters 4 and 5.

    • How do I fit y(x) when y has non-Gaussian uncertainties? Chapter 8.

    • How do I fit y(x) when both x and y have non-negligible uncertainties? Chapter 8.

  3. Non-trivial data mining and other tools:

    • How do I run PCA on many SDSS spectra? Chapter 7.

    • How do I fit a multi-component Gaussian (or any other function) to my histogram? Chapter 5.

    • How do I decide if I have “detection”? Chapters 4, 5, 8.

    • How do I fit a multi-component Gaussian (while accounting for errors) to my multi-dimensional data? Chapter 6.

    • How do I justify the use of, for example, a parabola instead of a straight line to fit my data? Chapter 5.

    • How do I use Markov Chain Monte Carlo to fit a complex function to my multi-dimensional data? Chapter 5.

    • How do I estimate underlying density traced by a finite-size sample of points? Chapter 6.

    • How do I find clusters (over-densities, classes, features) in my data set? Chapters 6 and 9.

    • How do I estimate a light curve period (Lomb-Scargle)? Chapter 10.

    • How do I analyze a non-periodic light curve? Chapter 10.

    • How do I estimate power spectrum for unevenly sampled data with large heteroscedastic uncertainties? Chapter 10.

    • How do I use detection times for individual photons to estimate exponential decay time? Chapter 10.