Project 1

Overview

The text contains tables of cumulative binomial and poisson probabilities.

Alternatively, one can produce the tables with a spreadsheet program.

The spreadsheet Tables.xls contains the cumulative binomial probabilities for n=15 and the cumulative Poisson probabilities table in the text extended to x=40.

The objective of this project is to produce a spreadsheet that can serve as a table for the geometric, negative binomial, and hypergeometric distributions (for which there are no tables in the text).

The Geometric and Negative Binomial Distributions

These two closely related distributions arise from an experiment similar to the binomial distribution:

In the special case of r=1, this is called a Geometric distribution. In this case, we conduct Bernoulli trials until the first success, and set X equal to the number of failures that precede the first success. An example would be tossing a coin until the first heads. In this case, X would be the number of tails that precede the first heads.

If r>1, it is called a Negative Binomial distribution. In this case, we conduct Bernoulli trials until we obtain r success, and set X equal to the number of failures that precede the rth success. An example would be tossing a coin until the third time it comes up heads. In this case, X would be the number of tails that precede the third heads.

For both the negative binomial and geometric distributions, the sample space S is the set of nonnegative integers: 0,1,2,3,4,. . . .

Your spreadsheet should contain:

In addition, it should compute the approximate mean and variance of X using only values less than 100. In most cases this should be very close to the exact values given by the formulas in the text on page 120. You should use the formula on page 101 for the expected value. It is probably easier to use the shortcut formula for the variance on page 105.

Your spreadsheet should be designed so that you can change the values of p and r and have the other cells recalculate automatically (i.e., the arguments to the BINOMDIST function for the number of successes required and the probability of success on each trial should be references to the cells containing r and p.

If you use excel you can use the NEGBINOMDIST function for both the geometric and negative binomial distributions. If you use GNUMERIC, you should use GEOMDIST for the geometric and NEGBINOMDIST for the negative binomial (GNUMERIC is an excellent free spreadsheet program from the Free Software Foundation at MIT).

Unlike BINOMDIST, NEGBINOMDIST does not have a "cumulative" option, so you will have to compute this with a formula.

The Hypergeometric Distribution

This distribution arises from the following experiment: Unlike the binomial and negative binomial models, because we do not replace the chip that was drawn, the probability of a red chip is not constant. This leads to the rather complicated formula on page 117.

The hypergeometric distribution arises when we select n individuals from a finite population.

It also applies to capture-recapture methods for estimating population sizes. A certain number (M) of individuals are captured, tagged, and released. Now we have a population of N individuals, M of which are tagged. Some time later, a sample of n individuals is captured and the number with tags is recorded as X. The objective is usually to estimate N.

The sample space for X is the set of integers from zero to n, inclusive: 0,1,2,. . .,n Your spreadsheet should contain:

In addition, it should compute the mean and variance of X for the given values of n, N, and M. (Hint: Use the HYPERGEOMDIST to compute the probabilities. HYPERGEOMDIST returns #NUM! for values of X that are not possible. When computing expected values, use the SUMIF function to exclude these.)

As with the negative binomial distribution, your spreadsheet should recalculate everything when you change n, N, or M.