Distributions and formuli --statistics again! scared face

The number of heads in a 100 flips follows the binomial distribution. Most statistics books have a discussion of the binomial formula, so we'll skip it here. Below you see the distribution of 1040 sets of 100 flips, with the normal curve overlaid in black.

distribution of n of heads

What does this mean about your data?

More formally, the theoretical distribution has a mean of 50 and standard deviation of 5. So this means that 68% of the time the number of heads will vary between 45 - 55 (within one standard deviation), and 95% of the time it will vary between 41 -59 (1.96 standard deviations).

Level 3:

Note that even with 1,040 samples of 100 flips, the above distribution is not a perfectly normal -- there were too many samples with more than 50 heads, creating a negative skew. Why do you suppose that is? (Write your answer below)

Runs

The number of runs will be normally distributed around 51 with a standard deviation of 5. The runs distribution will look just like the previous distribution, shifted over 1 point. If you are like most people, you will produce a sequence with more than 56 and maybe even more than 61 runs, which deviates from randomness.

Level 3:

The formula for the mean number of runs is:
where H = number of heads, T = the number of tails, and N = the number of flips.

The standard deviation for the number of runs is:

Longest Run

The distribution of the longest run is determined by a very complicated formula which depends on the number of heads in the sequence [ You don't even want to look at the formula!]. Below you can see what the distribution looks like for 1,040 sets of 100 flips. distribution of the longest run

As you can see, this distribution deviates from a normal one. Because it is positively skewed, you can't use the mean and standard deviation to generate what to expect from your imaginary distribution. A rough estimate would be that the longest run would be about 7, around 68% of the time it will fall between 5 - 8, and 95% of the time it will fall between 4 -10. If you are like most people, your longest run will be too short.

Level 3:

The difficulty with the longest run significance calculations is that they depend on the proportion of heads/tails in your series. If you have a disproportionate number (of either heads or tails), you are more likely to get a long streak because there are many more ways to combine heads/tails into a long string the more you have of them. So if you have a proportion of heads which deviates substantially from .50, your average longest streak is more likely to be 8 or 9 instead of 7.