The initial population for a GA search is usually selected randomly, although there may be occasions when heuristic selection is appropriate (Grefenstette [34]). Within the algorithm, population selection is based on the principle of `survival of the fittest.' The standard procedure is to define the probability of a particular solution 's survival to be:

if the objective function is to
be maximized, where is
the *fitness* (objective value) of solution , and

is the total fitness of the population (of size ), or

if is to be minimized. The new population is then selected by simulating the spinning of a suitably weighted roulette wheels times.

It is clear that must always be positive for this scheme to be used. Its range and scaling are also important. For instance, early in a search it is possible for a few superindividuals (solutions with fitness values significantly better than average) to dominate the selection process. Various schemes have been suggested to overcome this potential danger, of which the simplest is linear scaling, whereby is rescaled through an equation of the form:

Coefficients and are chosen each generation so that the average values of and are equal and so that the maximum value of is a specified multiple of (usually twice) the average. Linear scaling risks the introduction of negative values of for low performance solutions and must, therefore be used with caution.

Baker [3] suggested that should simply be made a (linear) function of the solution's rank within the population. For example, the best solution might be allocated a survival probability of . If this is the case, that for the worst solution is then constrained to be zero, because of the normalization condition that the survival probabilities sum to unity. This ranking scheme has been found to overcome the problems of over- or underselection, without revealing any obvious drawbacks.

Roulette wheel selection suffers from the
disadvantage of being a high-variance process with the result that
there are often large differences between the actual and expected
numbers of copies made - there is no guarantee that the best
solution will be copied. De Jong [15] tested an
*elitist* scheme, which
gave just such a guarantee by enlarging the population to include a
copy of the best solution if it hadn't been retained. He found that
on problems with just one maximum (or minimum) the algorithm
performance was much improved, but on multimodal problems it was
degraded.

Numerous schemes which introduce various levels of
determinism into the selection process have been investigated.
Overall, it seems that a procedure entitled *stochastic remainder
selection without replacement* offers the best performance. In this,
the expected number of copies of each solution is calculated as

Each solution is then copied times, being the integer part of . The fractional remainder

is treated as the probability of further duplication. For example, a solution for which would certainly be copied once and would be copied again with probability 0.8. Each solution is successively subjected to an appropriately weighted simulated coin toss until the new population is complete.