Fundamentals of Statistics: Distributions
Tuesday, May 18th, 2010On some level, statistics is the process of describing distributions, which help us describe the probability of an event given a certain set of circumstances. I’ll get into the particulars of a few different types of statistical distributions like the binomial, Poisson, and Gaussian distributions which are commonly used to describe scientific data. Before we get into that, though, I wanted to just talk a bit about the statistical distribution as a concept, particularly the idea of a parent distribution and a sample distribution.
A lot of physics boils down to trying to characterize a random processes. The ubiquitous, quintessential example is a coin flip. If you want to test the fairness of a coin you might flip it many times and count the number of heads. What you would have collected is a sample distribution from the parent distribution of the coin describing the randomness of the coin. If you attempt this experiment with a real coin, you will likely get something very close to the canonical distribution for a coin flip, the binomial distribution, because for a probability close to 0.5 the sample tends to lie very close to the parent.
For other distributions the important distinction becomes much more apparent. I threw together a little python script you can play with yourself that generates 100 numbers from a randomly seeded pseudo-random number generator. It’ll then plot these data on the same axis as the probability density function of the parent distribution. Potential results might look like this. The values of the mean and standard deviation for a Gaussian distribution are listed in the legend, as well as the calculated values for a sample of 100 “measurements”. A common convention is to refer to the statistics of the parent distribution by the Greek letters
and
and the sample statistics by the roman letters
and
(or
for the deviance).