To simulate sexual reproduction we need to represent the entire genome of each individual in the population. One possibility is to use a pair of strings for each individual. For example, if the genome contains n loci, two strings of ``0s'' and ``1s'', each of length n are sufficient to represent any individual, and an array of size 2k strings will hold a population of up to k individuals.
Using two strings per individual wastes
a lot of space, however.
Given the description of the mathematical model another
representation suggests itself: since we are modelling only two alleles,
we can use a ``bit vector'' instead of a string. In C, an unsigned
integer is a convenient bit vector; on a machine with 32-bit words,
a variable of type
unsigned int can hold 32 loci of one strand,
and a pair of such integers can represent an organism with up to 32
loci in its genome.
In this representation, the status of locus
i is determined
by the ith bit in each strand. A 0 bit will represent
a wild gene, and a 1 bit will represent a mutation.
As a simple example, consider a species that has only 8 loci.
The individual with genes
00100110... 01110001...would be represented in C as
unsigned char strand0; unsigned char strand1; strand0 = 0x26; /* 26 in hex = 00100110 in binary */ strand1 = 0x71; /* 71 in hex = 01110001 in binary */In this example, the individual is homozygous wild in the first locus, is heterozygous at the second locus (since the 2nd bit in the first strand is a
0and the corresponding bit in the second strand is
1), and homozygous mutant in the third locus.
For situations in which we need to model a genome that is longer than the word size of a machine, we will have to replace the single unsigned integer with an array of unsigned integers. For example, to model a genome with 1 million loci on a machine with 32-bit integers we will need words.
Whatever representation is chosen, the arrays should be initialized to all 0s to model an initially healthy population.