In population genetics there are two equations that allow us to estimate the frequency of *alleles* within a population and also to estimate the number of homozygotes vs heterozygotes for a recessive trait. These equations are known today as the Hardy-Weinberg equations because they were simultaneous proposed by two independent scientists. Like many equations, they assume a model that is not exactly reflective of the real world, however they do lend us an understanding of the rules of the system.

The two equations are:

**q + p = 1**

**q ^{2} +2pq +q^{2} = 1**

It’s that easy. In each of these equations p stands for the frequency of one allele in a population and q stands for the frequency of the other allele. Assuming there are only two alleles, they must add up to 100%, represented by the decimal number 1 here.

In order to use these equations, certain conditions must be adhered to.

- No gene flow (immigration / emigration)
- No sexual selection
- No survival selection
- No mutations
- No genetic drift

The last one is the one that has been interesting me lately.

What is genetic drift? What it describes are statistical anomalies, like a run of ‘Red’ on the Roulette Wheel or an unexpectedly long string of ‘Heads’ when tossing a coin.

What happens during genetic drift is that one allele becomes favored just because of such a statistical swing. But unlike roulette or coin tosses, when an allele loses out for a number of generations, it stands a diminishing chance of being seen again. The statistical anomaly becomes ‘hard-coded’ and self-reinforcing, such that eventually alleles disappear.

The key is that small samples allow genetic drift to happen more often, while larger populations tend to not see this occur. Using out coin toss example, if you toss a coin ten times, it is not especially surprising when you get 8 ‘heads’ and 2 ‘tails’. Whereas, in a toss of 1000 coins, getting 800 ‘heads’ is nearly inconceivable.

I encountered this while coding a genetics simulation program (note: my simulation uses a Wright-Fisher model that has distinct, non-overlapping generations). I wrote the program and started testing it by allowing random breeding to occur over 100 generations or so. I started using only 100 animals in my simulation, but regularly saw one allele outcompete all others, meaning that the population had lost diversity.

Below is an example with 100 organisms with four alleles for the gene breeding randomly for 200 generations.

I was sure it was a problem with my algorithm. Then I started increasing the number of animals and the ‘problem’ went away.

Here’s a second experiment at the other end of the spectrum using 50,000 animals also with four alleles breeding for 200 generations. I’ve forced Excel to graph this out on the same axis.

All this, just to demonstrate to myself that the prohibition against genetic drift is actually another way of saying, “This only works with large populations.”

What interested me is how to know whether your population is large enough to ‘resist’ genetic drift. And, how quickly will genetic drift drive alleles to fixation / loss?

“The expected number of generations for fixation to occur is proportional to the population size, such that fixation is predicted to occur much more rapidly in smaller populations.”

Not surprisingly, there is an equation designed to predict the time (# of generations) before an allele is lost by drift.

The expected time for the neutral allele to be lost through genetic drift can be calculated as

where *T* is the number of generations, *Ne* is the effective population size, and *p* is the initial frequency for the given allele.

(this section is informed greatly by the work of Otto and Whitlock at the University of Columbia, Vancouver. )

Sometimes having a computer simulation comes in handy to help get a better look at how these rules apply given different populations. I’d like to get this simulation built into a simple app for either desktop or mobile device to make public, but I have been having a lot of difficulty making the leap from a program running in the console to something worth sharing.