# Parameter Estimation for a Binomial Distribution¶

## Introduction¶

This chapter illustrates the uses of parameter estimation in generating Binomial distribution for a set of measurement, and investigates how the change of parameter b (explained below) will change the probability result. In astronomical application, we can use this binary result distrinution to statistically determine the fraction of galaxies which show evidence for a black hole in their center. This chapter will compare the conclusion drawn from Binomial distribution and from Gaussian distribution using a sample data.

## Binomial Distribution¶

Unlike the Gaussian distribution, which describes the distribution of a continuous variable, the binomial distribution describes the distribution of a variable that can take only two discrete values (say, 0 or 1, or success vs. failure, or an event happening or not). If the probability of success is b, then the distribution of a discrete variable k (an integer number, unlike x which is a real number) that measures how many times success occurred in N trials (i.e., measurements), is given by

$p(k|b,N)=\frac{N!}{k!(N-k)!}b^k(1-b)^{N-k}$

Here we have the mean of the binomial distribution given by $$\bar{k} = bN$$.
The standard deviation is $$\sigma_k = [N b(1-b)]^{1/2}$$.
And the uncertainty (standard error) is $$\sigma_b = \sigma_k/N$$.

## Posterior probability distribution in binomial distribution¶

Given a set of N measurements (or trials), $${x_i}$$, drawn from a binomial distribution described with parameter b, we seek the posterior probability distribution $$p(b|{x_i}$$).

When N is large, b and its (presumably Gaussian) uncertainty $$\sigma_b$$ can be determined using the equation above. For small N, the proper procedure is as follows. Assuming that the prior for b is at in the range 0-1, the posterior probability for b is

$p(b|k,N)=Cb^k(1-b)^{N-k}$

where k is now the actual observed number of successes in a data set of N values, and C is a normalization factor with can be determined from the condition $$\int_0^1 p(b|k,N)db = 1$$. The maximum posterior occurs at $$b_0 = k/N$$.

## Import Data and Functions¶

import numpy as np
import matplotlib
from scipy.stats import norm
from matplotlib import pyplot as plt


### Define functions and calculate result from data¶

Here we vary the b value and draw the resulted posterior probability distribution from our data set. The equation is described below: $$$p(b|k,N)=Cb^k(1-b)^{N-k}$$$ In comparison, we also calculate a Gaussian distribution from the same data set.

n = 10  # number of points
k = 4   # number of successes from n draws

b = np.linspace(0, 1, 100)
db = b - b

# compute the probability p(b)
p_b = b ** k * (1 - b) ** (n - k)
p_b /= p_b.sum()
p_b /= db
cuml_p_b = p_b.cumsum()
cuml_p_b /= cuml_p_b[-1]

# compute the gaussian approximation
p_g = norm(k * 1. / n, 0.16).pdf(b)
cuml_p_g = p_g.cumsum()
cuml_p_g /= cuml_p_g[-1]


### Show comparison result¶

• The solid line in the left panel shows the posterior pdf $$p(b|k,N)$$ for k = 4 and N = 10. The dashed line shows a Gaussian approximation.

• The right panel shows the corresponding cumulative distributions.
In comparison, a value of 0.1 is marginally likely according to the Gaussian approximation ($$p_{approx}$$(< 0.1) $$\approx$$ 0.03) but strongly rejected by the true distribution ($$p_{true}$$(< 0.1) $$\approx$$ 0.003).

# Plot posterior as a function of b
fig = plt.figure(figsize=(10, 5))
fig.subplots_adjust(left=0.11, right=0.95, wspace=0.35, bottom=0.18)

ax = fig.add_subplot(121)
ax.plot(b, p_b, '-b')
ax.plot(b, p_g, '--r')

ax.set_ylim(-0.05, 3)
ax.set_xlim(0,1)

ax.set_xlabel('$b$')
ax.set_ylabel('$p(b|x,I)$')

ax = fig.add_subplot(122, yscale='log')
ax.plot(b, cuml_p_b, '-b')
ax.plot(b, cuml_p_g, '--r')
ax.plot([0.1, 0.1], [1E-6, 2], ':k')

ax.set_xlabel('$b$')
ax.set_ylabel('$P(<b|x,I)$')
ax.set_ylim(1E-6, 2)
ax.set_xlim(0,1)

plt.show()


## Log-likelihood for Binomial Distribution¶

Suppose we have N measurements, $${x_i}$$. The measurement errors are gaussian, and the measurement error for each measurement is $$\sigma_i$$. This method applies logrithm in searching the posterior probability density function (pdf). Given that the likelihood function for a single measurement, $$x_i$$, is assumed to follow a Gaussian distribution $$\mathcal{N}(\mu,\sigma)$$, the likelihood for all measurements is given by $$$L_p({x_i}|b,k,N) = ln[p(b,k,N|{x_i})]$$$

# Define the function for calculating log-likelihood Binomial distribution
def bi_logL(b, k, n):
"""Binomial likelihood"""
return np.log( b**k * (1-b)**(n-k) )

# Define the grid and compute logL
b = np.linspace(1, 5, 70)
k = np.linspace(1, 5, 70)
n = 70

logL = bi_logL(b, k, n)
logL -= logL.max()


## plot result¶

fig = plt.figure(figsize=(5, 3.75))
plt.imshow(logL, origin='lower',
extent=(b, b[-1], k, k[-1]),
cmap=plt.cm.binary,
aspect='auto')
plt.colorbar().set_label(r'$\log(L)$')
plt.clim(-5, 0)

plt.contour(b, k, convert_to_stdev(logL),
levels=(0.683, 0.955, 0.997),
colors='k')

plt.text(0.5, 0.93, r'$L(\mu,\sigma)\ \mathrm{for}\ \bar{x}=1,\ V=4,\ n=10$',
bbox=dict(ec='k', fc='w', alpha=0.9),
ha='center', va='center', transform=plt.gca().transAxes)

plt.xlabel(r'$\mu$')
plt.ylabel(r'$\sigma$')

plt.show()