Parameter Estimation for a Binomial Distribution#

Introduction#

This chapter illustrates the uses of parameter estimation in generating Binomial distribution for a set of measurement, and investigates how the change of parameter b (explained below) will change the probability result. In astronomical application, we can use this binary result distrinution to statistically determine the fraction of galaxies which show evidence for a black hole in their center. This chapter will compare the conclusion drawn from Binomial distribution and from Gaussian distribution using a sample data.

Binomial Distribution#

Unlike the Gaussian distribution, which describes the distribution of a continuous variable, the binomial distribution describes the distribution of a variable that can take only two discrete values (say, 0 or 1, or success vs. failure, or an event happening or not). If the probability of success is b, then the distribution of a discrete variable k (an integer number, unlike x which is a real number) that measures how many times success occurred in N trials (i.e., measurements), is given by

\[p(k|b,N)=\frac{N!}{k!(N-k)!}b^k(1-b)^{N-k}\]

Here we have the mean of the binomial distribution given by \(\bar{k} = bN\).
The standard deviation is \(\sigma_k = [N b(1-b)]^{1/2}\).
And the uncertainty (standard error) is \(\sigma_b = \sigma_k/N\).

Posterior probability distribution in binomial distribution#

Given a set of N measurements (or trials), \({x_i}\), drawn from a binomial distribution described with parameter b, we seek the posterior probability distribution \(p(b|{x_i}\)).

When N is large, b and its (presumably Gaussian) uncertainty \(\sigma_b\) can be determined using the equation above. For small N, the proper procedure is as follows. Assuming that the prior for b is at in the range 0-1, the posterior probability for b is

\[p(b|k,N)=Cb^k(1-b)^{N-k}\]

where k is now the actual observed number of successes in a data set of N values, and C is a normalization factor with can be determined from the condition \(\int_0^1 p(b|k,N)db = 1\). The maximum posterior occurs at \(b_0 = k/N\).

Import Data and Functions#

import numpy as np
import matplotlib
from scipy.stats import norm
from matplotlib import pyplot as plt

Define functions and calculate result from data#

Here we vary the b value and draw the resulted posterior probability distribution from our data set. The equation is described below: $\(p(b|k,N)=Cb^k(1-b)^{N-k}\)$ In comparison, we also calculate a Gaussian distribution from the same data set.

n = 10  # number of points
k = 4   # number of successes from n draws

b = np.linspace(0, 1, 100)
db = b[1] - b[0]

# compute the probability p(b) 
p_b = b ** k * (1 - b) ** (n - k)
p_b /= p_b.sum()
p_b /= db
cuml_p_b = p_b.cumsum()
cuml_p_b /= cuml_p_b[-1]

# compute the gaussian approximation
p_g = norm(k * 1. / n, 0.16).pdf(b)
cuml_p_g = p_g.cumsum()
cuml_p_g /= cuml_p_g[-1]

Show comparison result#

  • The solid line in the left panel shows the posterior pdf \(p(b|k,N)\) for k = 4 and N = 10. The dashed line shows a Gaussian approximation.

  • The right panel shows the corresponding cumulative distributions.
    In comparison, a value of 0.1 is marginally likely according to the Gaussian approximation (\(p_{approx}\)(< 0.1) \(\approx\) 0.03) but strongly rejected by the true distribution (\(p_{true}\)(< 0.1) \(\approx\) 0.003).

# Plot posterior as a function of b
fig = plt.figure(figsize=(10, 5))
fig.subplots_adjust(left=0.11, right=0.95, wspace=0.35, bottom=0.18)

ax = fig.add_subplot(121)
ax.plot(b, p_b, '-b')
ax.plot(b, p_g, '--r')

ax.set_ylim(-0.05, 3)
ax.set_xlim(0,1)

ax.set_xlabel('$b$')
ax.set_ylabel('$p(b|x,I)$')

ax = fig.add_subplot(122, yscale='log')
ax.plot(b, cuml_p_b, '-b')
ax.plot(b, cuml_p_g, '--r')
ax.plot([0.1, 0.1], [1E-6, 2], ':k')

ax.set_xlabel('$b$')
ax.set_ylabel('$P(<b|x,I)$')
ax.set_ylim(1E-6, 2)
ax.set_xlim(0,1)

plt.show()
../_images/6544b26e5eb91a1115c5654548067e54e9bf546c7bb414cb72b5719d983aa09a.png

Log-likelihood for Binomial Distribution#

Suppose we have N measurements, \({x_i}\). The measurement errors are gaussian, and the measurement error for each measurement is \(\sigma_i\). This method applies logrithm in searching the posterior probability density function (pdf). Given that the likelihood function for a single measurement, \(x_i\), is assumed to follow a Gaussian distribution \(\mathcal{N}(\mu,\sigma)\), the likelihood for all measurements is given by $\(L_p({x_i}|b,k,N) = ln[p(b,k,N|{x_i})]\)$

# Define the function for calculating log-likelihood Binomial distribution
def bi_logL(b, k, n):
    """Binomial likelihood"""
    return np.log( b**k * (1-b)**(n-k) )

# Define the grid and compute logL
b = np.linspace(1, 5, 70)
k = np.linspace(1, 5, 70)
n = 70

logL = bi_logL(b, k, n)
logL -= logL.max()

plot result#

fig = plt.figure(figsize=(5, 3.75))
plt.imshow(logL, origin='lower',
           extent=(b[0], b[-1], k[0], k[-1]),
           cmap=plt.cm.binary,
           aspect='auto')
plt.colorbar().set_label(r'$\log(L)$')
plt.clim(-5, 0)

plt.contour(b, k, convert_to_stdev(logL),
            levels=(0.683, 0.955, 0.997),
            colors='k')

plt.text(0.5, 0.93, r'$L(\mu,\sigma)\ \mathrm{for}\ \bar{x}=1,\ V=4,\ n=10$',
         bbox=dict(ec='k', fc='w', alpha=0.9),
         ha='center', va='center', transform=plt.gca().transAxes)

plt.xlabel(r'$\mu$')
plt.ylabel(r'$\sigma$')

plt.show()