Posts Tagged ‘statistics’

High Hopes for Old Dreams

Friday, September 3rd, 2010

One of my long term goals is to be old one day. One of those tingling distant fears I have is that being old will impair my ability to solve mental problems. I practically live off good feelings from problem solving. So this article from completely outside my expertise caught my eye: Cognitive activity and the cognitive morbidity of Alzheimer disease.

See those numbers in parentheses? Those pretty much sum up the statistics for each test they did. I’ll ignore the first two for the moment because I don’t have faith in my own ability to interpret them correctly for you, but the last one is the infamous p-value. This value tells you how likely your data set is assuming the null hypothesis.

In their case, the null hypothesis appears to be that cognitive activity has no effect on the progression of dementia for individuals at all three stages. The first group, which had no cognitive impairment at the onset of the study, saw less degradation with more activity. The p-value of 0.003 means there is a 0.3% chance of this if the null hypothesis is true, so they conclude that cognitive activity helps.

The second group with “mild cognitive impairment” had a result with a p-value of 0.300, or a %30 chance that the result came from the null hypothesis. The text of the article reflects this lack of a meaningful result. Lastly, the group with Alzheimer Disease, the degradation of cognitive function decreased with increased activity with p<0.001, considered a strong result for rejecting the null hypothesis.

Which is an important point. Hypothesis testing like this must be carefully constructed. All a p-value tells you is how likely your data set is assuming your null hypothesis. Rejecting the null hypothesis does not necessarily imply the result being tested. My interpretation of the above article as a whole would be that they’ve found compelling evidence that cognitive activity affects the rate of progression of dementia at various stages. What those specific effects are, they can’t really say yet. That’s why they’ll probably design another study to home in on the details.  I could go on about journalists jumping the gun and exaggerating results to satisfy people’s craving for instant gratification, but I think that’s already been said.

Facebook Meme Statistics: Population Growth

Monday, May 31st, 2010

I saw this one go around a bit on Facebook. Not sure how widespread it is.

Of all the humans who’ve ever lived, 6.4 percent are alive today. The sheer number of people is overwhelming natural systems, destroying biodiversity, and challenging efforts to control global warming. Earth’s population is rising at 80 million people per year – roughly the number of unwanted pregnancies. Solving the population problem means making every child a wanted child.

I’d say this has a fair sentiment. I’m a huge proponent of reproductive rights, meaning access to birth control and abortion. No matter what agenda you’re trying to push, though, one thing you should never do is use misleading statistics. All of the numbers in this post check out–they are accurate, but they aren’t stated in a way that puts them into perspective so that you can understand what they mean. The 80 million people per year figure is to most people, myself included, just a really big number.

If we take another step back and look at how that number has changed over time we’ll see that it was over 2% for most of the 1960-70s and has been on the decrease ever since. Saying a large figure like 80 million people per year triggers panic, not because that’s bad, but because it’s a really big number and we’re not sure how to handle that information. It sounds bad, right? So I think this is just a scare tactic to convince people we need to put the hand brake on global population growth by making careful, sane family planning the norm (something we should do, just not for this reason) because Earth’s population is screaming out of control… when it’s not. (more…)

Fundamentals of Statistics: Distributions

Tuesday, May 18th, 2010

On some level, statistics is the process of describing distributions, which help us describe the probability of an event given a certain set of circumstances. I’ll get into the particulars of a few different types of statistical distributions like the binomial, Poisson, and Gaussian distributions which are commonly used to describe scientific data. Before we get into that, though, I wanted to just talk a bit about the statistical distribution as a concept,  particularly the idea of a parent distribution and a sample distribution.

A lot of physics boils down to trying to characterize a random processes. The ubiquitous, quintessential example is a coin flip. If you want to test the fairness of a coin you might flip it many times and count the number of heads.  What you would have collected is a sample distribution from the parent distribution of the coin describing the randomness of the coin. If you attempt this experiment with a real coin, you will likely get something very close to the canonical distribution for a coin flip, the binomial distribution, because for a probability close to 0.5 the sample tends to lie very close to the parent.

For other distributions the important distinction becomes much more apparent. I threw together a little python script you can play with yourself that generates 100 numbers from a randomly seeded pseudo-random number generator. It’ll then plot these data on the same axis as the probability density function of the parent distribution. Potential results might look like this. The values of the mean and standard deviation for a Gaussian distribution are listed in the legend, as well as the calculated values for a sample of 100 “measurements”. A common convention is to refer to the statistics of the parent distribution by the Greek letters \mu and \sigma and the sample statistics by the roman letters \bar{x} and s (or s^{2} for the deviance).

(more…)

NYT Opinion Piece on Bayes Theorem

Tuesday, April 27th, 2010

Steven Strogatz is proposing that Bayes Theorem is too complicated for people, and we should switch to something more intuitive in this New York Times Opinion piece. I’m not exactly endlessly proficient with Bayes Theorem, but I think his alternative method for approaching problems of conditional probability amounts to a reworking of Bayes theorem. Check out the Wikipedia article. With the information he gives you for the mammogram problem, you should be able to get the correct solution by plugging in numbers to the “simple statement” of the theorem. He even gives you the correct answer, so you can check yourself.

Numbers can Cause Confusion: Binomial Coefficients Can Help

Monday, February 15th, 2010

Bad Science writer Ben Goldacre wrote about how “Guns don’t kill people, puppies do,” where he points out some ways in which the probabilities of things can be a bit confusing. An extremely common example, which was messed up by a British newspaper, is the probability of multiple people (siblings in this case) sharing a birthday.

Like he says, the trick is that it doesn’t matter when the first child was born, only that the next two were born on the same day. This is a trivial case in which the binomial coefficient simply cancels out the leading 1/365. In general, though, you can use the binomial coefficient (sometimes called “n choose k”) to calculate probabilities where you can get the same result in multiple ways.

{n \choose k} = \frac{n!}{k! (n-k)!}

So if you’re guessing the suit of a hidden playing card, you have a 1/4 chance of guessing correctly. If you guess for 10 different cards and get 5 correct, you might be tempted to say your feat was as likely as (\frac{1}{4})^{5}\times(\frac{3}{4})^{5}=0.00023, and that you might have psychic powers. What you’re missing, though, is the number of ways you could have guessed correctly, {10 \choose 5} = 252, which means the probability of correctly guessing half the cards was 0.058. You could still be psychic, but now it doesn’t seem quite as likely.

Tune in later for a crash-course introduction to P-values and hypothesis testing, so we can conclude with more certainty one way or the other on your psychic abilities.

Piled Higher and Deeper on News Media Polls

Monday, January 25th, 2010

polls I’d be remiss in my duties if I failed to post a link to Jorge Cham’s recent PHD comic about the use of polls and statistics in the news media. He hits the proverbial nail on its proverbial head. Also, if you haven’t read the comic before, it will be an amusing and whimsical dark comedy to those who have not experienced grad school, and a humorously whimsical reflection of the dark, bleak reality that is graduate school for those who have.  Has anyone seen my thesis topic?

Governator Coded F-Bomb?

Wednesday, October 28th, 2009

So I’ve seen this discussion floating around, best illustrated here by BoingBoing, that Governor Schwarzenegger included a coded message along the left hand column of a veto letter. I’ve also seen a lot of requests for some statistics on this one. I’m not even sure how to correctly approach this from a statistical point of view, and, alas, I’m currently in crisis mode at work so I don’t see an opportunity to play with it presenting itself.

I have seen it reported by the Twitter that, “A cryptologist calculated the odds of Schwarzenegger’s “F@$% You” letter happening randomly: 5.5 in 1 trillion.” This is accompanied by a hat tip to NPR with no link. Does anyone have any idea what segment this was on so I can dig it up on the archive or maybe a link to the article if there’s a text version? I’ll let y’all know if I come across anything myself.

(Poor) Graphical Representation of Data

Friday, October 23rd, 2009

The internet is generally bad at math. I suspect the anonymity reduces the shame one might otherwise worry about should one be completely wrong, so folks are much more likely just to say things without thinking about it first. Unfortunately for me, I really like funny graphs. Back in college I had the flying spaghetti monster graph tacked to my wall. That one’s still funny for a lot of reasons.

I’ve gotten in a habit of reading GraphJam , though. If you are unfamiliar with it, GraphJam is a strangely modified WordPress blog in which people submit humorous graphs and some of these get posted on the main page. The frequency with which people seem to misunderstand what a graph means is alarming. The most egregious transgressions come from Venn diagrams, which is somewhat surprising since these are no more complicated than (THING A (overlap) THING B) and you can get a correctly used one that will fill you with warm feelings every weekday morning at Indexed but I digress…
(more…)

Online Dating Response Rates

Thursday, September 24th, 2009

There’s a collection of interesting statistics collected from the Okcupid online dating site. They’ve tracked the response rate to messages containing certain words or combinations of words. To sum up the results, using correct English grammar and spelling as well as unusual words to set a message apart from others really help. Complementing a woman on her looks doesn’t. I actually met my girlfriend online, and she says that most people said something about her breasts. I asked her about her favorite place to get coffee. Guess I won on that one. If you must know, though, she’s fond of Dewy’s, while I prefer Phoenix.

Anyway, the other stat that was emphasized by Sean “The Guy Who Wrote My GR Textbook” Carroll, is that mentioning atheism improved response rate more than Christianity. Mentioning god without a proper name reduced the response rate from the average. So add that to your list of reasons why atheism is a good way to start conversations.

Slightly In Love With Wolfram

Friday, July 3rd, 2009

If you’re a huge nerd, you’ve probably at least seen Wolfram Alpha. At least once a week or so I stop to play with it to see if I can get it to do anything cool. So it can tell you where most of the worlds Portuguese speakers live and give you a side-by-side comparison of the population, land area, and GDP of South Africa and Lesotho. Both of those are extremely cool, but this morning I found a new one. Just in case you ever wanted to know what would happen if you rolled two 12-sided dice (since 3rd Ed. D&D will probably never give you the opportunity) it will now give you the mean value and a histogram! It works for an number of dice with any number of sides. Even the mysteriously fair 5-sider. There seems to have been a large gaming-related “expansion” or something to the site, because it will also tell you the probability of a full house.

This kind of thing could probably entertain me for hours.