This section makes available the source code used to generate every figure in the book Statistics, Data Mining, and Machine Learning in Astronomy. Many of the figures are fairly self-explanatory, though some will be less so without the book as a reference. The table of contents of the book can be seen here(pdf).
Each chapter links to a page with thumbnails of the figures from the chapter.
- Chapter 1: Introduction
- Chapter 2: Fast Computation and Massive Datasets
- Chapter 3: Probability and Statistical Distributions
- Chapter 4: Classical Statistical Inference
- Chapter 5: Bayesian Statistical Inference
- Chapter 6: Searching for Structure in Point Data
- Chapter 7: Dimensionality and its Reduction
- Chapter 8: Regression and Model Fitting
- Chapter 9: Classification
- Chapter 10: Time Series Analysis
Getting Started/Frequently Asked Questions¶
There is so much here: where to begin?
- Getting SDSS and other data, and quick analysis and plotting:
- How do I access SDSS imaging data and plot various color-color diagrams? Chapter 1.
- How do I access an SDSS spectrum and plot it? Chapter 1.
- How do plot data in pixelated sky projections? Chapter 1.
- How can I visualize a four-dimensional data set and its intrinsic correlations? Chapter 1.
- Basic statistical tools:
- How do I use python to evaluate and plot various statistical distributions, such as Cauchy, Laplace, etc. Chapter 3.
- How do I robustly estimate location and scale parameters of a one-dimensional data set? Chapter 3.
- How do I robustly estimate parameters of a two-dimensional Gaussian? Chapter 3.
- How do I account for selection effects (e.g. luminosity functions)? Chapter 4.
- How do I generate a simulated sample drawn from an arbitrary distribution? Chapter 4.
- How do I choose optimal bin width for a histogram? Do bins need to be same size? Chapters 4 and 5.
- How do I fit y(x) when y has non-Gaussian uncertainties? Chapter 8.
- How do I fit y(x) when both x and y have non-negligible uncertainties? Chapter 8.
- Non-trivial data mining and other tools:
- How do I run PCA on many SDSS spectra? Chapter 7.
- How do I fit a multi-component Gaussian (or any other function) to my histogram? Chapter 5.
- How do I decide if I have “detection”? Chapters 4, 5, 8.
- How do I fit a multi-component Gaussian (while accounting for errors) to my multi-dimensional data? Chapter 6.
- How do I justify the use of, for example, a parabola instead of a straight line to fit my data? Chapter 5.
- How do I use Markov Chain Monte Carlo to fit a complex function to my multi-dimensional data? Chapter 5.
- How do I estimate underlying density traced by a finite-size sample of points? Chapter 6.
- How do I find clusters (over-densities, classes, features) in my data set? Chapters 6 and 9.
- How do I estimate a light curve period (Lomb-Scargle)? Chapter 10.
- How do I analyze a non-periodic light curve? Chapter 10.
- How do I estimate power spectrum for unevenly sampled data with large heteroscedastic uncertainties? Chapter 10.
- How do I use detection times for individual photons to estimate exponential decay time? Chapter 10.