Fundamentals of Statistics: Binomial Distribution

May 27th, 2010

Remember those circular metal pieces you used to do your laundry back in college or graduate school? They had easily distinguished sides so that you could toss them in the air and let them land in one of two configurations, known for reasons now increasingly obscure due to a face lift on the American penny as “heads” or “tails”. Naturally, there is a 1/2  chance of either outcome–or as I mentioned earlier, over the course of an infinite number of measurements you’ll get an even 50/50 split.

What I want to do next, is add another coin. Each coin can be heads or tails so we have a total of four possible configurations, HH, TT, HT, or TH, with 1/4 probability of each. You can continue to add more coins, resulting in an ever increasing number of unique outcomes, increasing like 2^{n} for n coins tossed. Let’s say, though, that you don’t really care in which order the events occur, only how many coins land heads up. In this case, you’re concerned with the number of possible combinations, as opposed to the number of permutations. We can find the number of combinations by first finding the number of permutations and dividing this by the degeneracy.

So if we’re interested in tossing n coins and getting x heads, we start with a choice of n coins for our first flip, then n-1 for our second and so on until we have (n-x+1) coins for the final of our x heads. While there might be some initial despair, this can be easily summed up with the factorial as \frac{n!}{(n-x)!}. As I mentioned, if we don’t care in what order the coins were flipped, we need to divide by the degeneracy. For x heads, there are {x!} possible orderings, so the number of combinations of n items sampled x ax a time is given by (\frac{n}{x})=\frac{n!}{x!(n-x)!}, which is frequently called “n choose x”.

Probability: We can get probability from this quite simply. We know the probability of an individual event occurring, 1/2 in this case, and we know how many ways in which it can happen, so we combine these to get P(x; n, p) = \frac{n}{x}p ^{n}q^{n-1}, where q=1-p. The name “binomial distribution” arises from the binomial theorem, which tells us that the sum over all possible values of x for P(x; n, p) must be one.

Mean: I’ll spare the mathematical justification for the mean and standard deviation, as these are not straightforward. We can, however, get these correctly with a bit of intuition. In the case of the coin tossing experiment in which p=0.5, we expect to get half heads and half tails. So for n trials, we expect np of our outcome of interest leading to \mu = np.

Standard Deviation: The intuitive tack to this one is a little less, well, intuitive. We can exploit a trick called “expectation of the square minus square of the expectation” to get the variance, \sigma^{2}. For a single coin flip, the mean of the square is just just p. Conveniently, the square of the mean is p^{2}. So we can calculate \sigma^{2} = p(1-p). For n trials, it’s a simple sum so we can get the standard deviation for the binomial distribution \sigma = \sqrt{np(1-p)}.

Naturally, we don’t need a p=1/2 coin flip. It could work just as well for rolling dice where the probability of a particular outcome on one die is 1/6. If you’re sufficiently nerdy to have ever played the game Settlers of Catan, you might recall the dots under each number on the game board. These represent combinations on two dice that will yield that result. So really, any game in which you roll dice can be partially characterized by the binomial distribution. Next, we can take the limit as the probability becomes very small and arrive at the Poisson distribution, which is extremely useful for understanding the results of surveys and polls, so stay tuned for that.

Fundamentals of Statistics: Distributions

May 18th, 2010

On some level, statistics is the process of describing distributions, which help us describe the probability of an event given a certain set of circumstances. I’ll get into the particulars of a few different types of statistical distributions like the binomial, Poisson, and Gaussian distributions which are commonly used to describe scientific data. Before we get into that, though, I wanted to just talk a bit about the statistical distribution as a concept,  particularly the idea of a parent distribution and a sample distribution.

A lot of physics boils down to trying to characterize a random processes. The ubiquitous, quintessential example is a coin flip. If you want to test the fairness of a coin you might flip it many times and count the number of heads.  What you would have collected is a sample distribution from the parent distribution of the coin describing the randomness of the coin. If you attempt this experiment with a real coin, you will likely get something very close to the canonical distribution for a coin flip, the binomial distribution, because for a probability close to 0.5 the sample tends to lie very close to the parent.

For other distributions the important distinction becomes much more apparent. I threw together a little python script you can play with yourself that generates 100 numbers from a randomly seeded pseudo-random number generator. It’ll then plot these data on the same axis as the probability density function of the parent distribution. Potential results might look like this. The values of the mean and standard deviation for a Gaussian distribution are listed in the legend, as well as the calculated values for a sample of 100 “measurements”. A common convention is to refer to the statistics of the parent distribution by the Greek letters \mu and \sigma and the sample statistics by the roman letters \bar{x} and s (or s^{2} for the deviance).

Read the rest of this entry »

In Honour of Lasers

May 13th, 2010

May 16th is recognized as the anniversary of the laser. Lasers are pretty cool–I’ve had the opportunity to work on the back-end to some laser systems for studying atmospheric aerosols. Which is a bit like saying I’ve had the chance to live in the same state with John Mellencamp, except I like lasers way more than I like John Mellencamp. Anyway. On with the list of cool laser links:

Nuclear Followup

May 10th, 2010

In almost perfect lockstep harmony with Mark Twain’s saying, “There are three kinds of lies in the world: lies, damn lies, and statistics,” I messed up a little bit. There is a six percent chance of a premature death being caused by a car accident. The number I was comparing this to was the probability of a premature death from a nuclear reactor per year. A better comparison would have been using the next line on the Wolfram Alpha search which gives us a value of 15 deaths per 100000 persons per year or 1.5e-4 which is still a far cry more likely than 1e-8 to 1e-11 from the 1991 nuclear regulatory commission safety report.

Just wanted to clear that up.

The Pro-Nuke Environmentalist

May 5th, 2010

I had an interesting chat with an old friend a few days ago. It started with the sunken oil rig in the Gulf of Mexico wreaking havoc with the environment, which was essentially a segue into accusing the oil and power companies of “suppressing” clean energy technology. I’m inclined to give the corporations the benefit of the doubt that they are not cartoonishly evil villains. Thus, I’d assume they would be more likely to actively develop cleaner technology since they will be able to make money just as easily off of it once it’s adopted, which it eventually will be.

However, I am told, that their “suppression” of technology is common knowledge (although no sources have been forthcoming from either my associate or my Google searches) and that they only want technology they can “control.” I’m curious as to what kinds of developments they’re sitting on. Do they need a control collar for a living spaceship a la Farscape? But I digress. Suppression of safe, clean, efficient power is a reality, but it is not the work of corporate technocrats. Rather, it is the environmentalists themselves carrying its banner. Its name: No Nukes. Read the rest of this entry »

NYT Opinion Piece on Bayes Theorem

April 27th, 2010

Steven Strogatz is proposing that Bayes Theorem is too complicated for people, and we should switch to something more intuitive in this New York Times Opinion piece. I’m not exactly endlessly proficient with Bayes Theorem, but I think his alternative method for approaching problems of conditional probability amounts to a reworking of Bayes theorem. Check out the Wikipedia article. With the information he gives you for the mammogram problem, you should be able to get the correct solution by plugging in numbers to the “simple statement” of the theorem. He even gives you the correct answer, so you can check yourself.

State of the Blogger

April 13th, 2010

I’m alive. Just very negligent in my duties regards blogging about scientific topics. Also, I’m now engaged. Which is something I wish I had data collection set up for in advance, because from what I’ve observed the two most important items after a woman becomes engaged are “When is the date?” and “What does the ring look like?” I’d like to know if those are actually modal, or if it’s just my selective listening. I’ll probably talk a bit about having a feminist, atheist wedding as that time comes.

I have a guitar project that is making slow progress, but progress none the less. Really, if anyone wants to ask me physics questions about music, I really will do my best to answer them as long as they don’t require any actual non self-taught background in music. So seriously, fire away, it will give me a clear goal to work towards.

I’m thinking about jumping into do-it-yourself electronics. I’ve worked with single board computers at work that can interface with a GPS receiver for precise timing and trigger a laser. They can also steer the laser, but I didn’t work on that part of the code. So I’m eying an arduino microcontroller. Not sure what I want to do with it yet. Either a robot that can chase my fiancé’s cat, or a robotic coffee machine.

Biggest skeptical thing going on at the moment? The pope. A lot of vociferous atheist types like Rebeca Watson have come out supporting papal prosecution. Other vociferous atheists like PZ Myers have agreed with Rebeca. Honestly, what they said. If there’s anything I find irritating about the Internet it is the prevalence of vehement agreement with each other. I’m just trying it out to see how it feels.

Happy Ada Lovelace Day!

March 24th, 2010

I can’t make any proper claim to participation this year, as life has been something of a whirlwind and I’m just already in too many places at once. I will encourage y’all, though, to head on over to read what others have contributed at Finding Ada, or the Facebook group.

Yes, I’m male. But I’ve gotten more inspiration than I can measure from strong women both in my personal life and in the public sphere. The beautiful thing about feminism is that, in the end, everybody wins. There’s perfectly adequate reason to celebrate the participation of women in science and technology, and usually the essays and the blog-posts composed today demonstrate why.

Numbers can Cause Confusion: Binomial Coefficients Can Help

February 15th, 2010

Bad Science writer Ben Goldacre wrote about how “Guns don’t kill people, puppies do,” where he points out some ways in which the probabilities of things can be a bit confusing. An extremely common example, which was messed up by a British newspaper, is the probability of multiple people (siblings in this case) sharing a birthday.

Like he says, the trick is that it doesn’t matter when the first child was born, only that the next two were born on the same day. This is a trivial case in which the binomial coefficient simply cancels out the leading 1/365. In general, though, you can use the binomial coefficient (sometimes called “n choose k”) to calculate probabilities where you can get the same result in multiple ways.

{n \choose k} = \frac{n!}{k! (n-k)!}

So if you’re guessing the suit of a hidden playing card, you have a 1/4 chance of guessing correctly. If you guess for 10 different cards and get 5 correct, you might be tempted to say your feat was as likely as (\frac{1}{4})^{5}\times(\frac{3}{4})^{5}=0.00023, and that you might have psychic powers. What you’re missing, though, is the number of ways you could have guessed correctly, {10 \choose 5} = 252, which means the probability of correctly guessing half the cards was 0.058. You could still be psychic, but now it doesn’t seem quite as likely.

Tune in later for a crash-course introduction to P-values and hypothesis testing, so we can conclude with more certainty one way or the other on your psychic abilities.

Attempting to fix LaTeX support (again).

February 12th, 2010

I have ideas kicking around all the time for things I want to write about, but more often than not they’re derailed because my LaTeX support has again been broken or something else is wrong. So if all goes according to plan, the next line should contain the differential form of Ampere’s Law with Gauss’ correction:

 \vec{\nabla} \times \vec{B} = \mu_{0}\vec{J} + \mu_{0}\epsilon_{0} \frac{\partial\vec{E}}{\partial t}
Feel free to see if it works in the comments. I have no idea if it’s even supposed to. Put equations between double $ and proceed as usually if you’re TeX savvy.