next up previous

4.2.1 Individuals and Populations

To simulate sexual reproduction we need to represent the entire genome of each individual in the population. One possibility is to use a pair of strings for each individual. For example, if the genome contains n loci, two strings of ``0s'' and ``1s'', each of length n are sufficient to represent any individual, and an array of size 2k strings will hold a population of up to k individuals.

Using two strings per individual wastes a lot of space, however. Given the description of the mathematical model another representation suggests itself: since we are modelling only two alleles, we can use a ``bit vector'' instead of a string. In C, an unsigned integer is a convenient bit vector; on a machine with 32-bit words, a variable of type unsigned int can hold 32 loci of one strand, and a pair of such integers can represent an organism with up to 32 loci in its genome.

In this representation, the status of locus i is determined by the ith bit in each strand. A 0 bit will represent a wild gene, and a 1 bit will represent a mutation. As a simple example, consider a species that has only 8 loci. The individual with genes

would be represented in C as
unsigned char strand0;
unsigned char strand1;

strand0 = 0x26;	/* 26 in hex = 00100110 in binary */
strand1 = 0x71;	/* 71 in hex = 01110001 in binary */
In this example, the individual is homozygous wild in the first locus, is heterozygous at the second locus (since the 2nd bit in the first strand is a 0 and the corresponding bit in the second strand is 1), and homozygous mutant in the third locus.

For situations in which we need to model a genome that is longer than the word size of a machine, we will have to replace the single unsigned integer with an array of unsigned integers. For example, to model a genome with 1 million loci on a machine with 32-bit integers we will need words.

Whatever representation is chosen, the arrays should be initialized to all 0s to model an initially healthy population.