The initial population for a GA search is usually selected randomly, although there may be occasions when heuristic selection is appropriate (Grefenstette ). Within the algorithm, population selection is based on the principle of `survival of the fittest.' The standard procedure is to define the probability of a particular solution 's survival to be:
if the objective function is to be maximized, where is the fitness (objective value) of solution , and
is the total fitness of the population (of size ), or
if is to be minimized. The new population is then selected by simulating the spinning of a suitably weighted roulette wheels times.
It is clear that must always be positive for this scheme to be used. Its range and scaling are also important. For instance, early in a search it is possible for a few superindividuals (solutions with fitness values significantly better than average) to dominate the selection process. Various schemes have been suggested to overcome this potential danger, of which the simplest is linear scaling, whereby is rescaled through an equation of the form:
Coefficients and are chosen each generation so that the average values of and are equal and so that the maximum value of is a specified multiple of (usually twice) the average. Linear scaling risks the introduction of negative values of for low performance solutions and must, therefore be used with caution.
Baker  suggested that should simply be made a (linear) function of the solution's rank within the population. For example, the best solution might be allocated a survival probability of . If this is the case, that for the worst solution is then constrained to be zero, because of the normalization condition that the survival probabilities sum to unity. This ranking scheme has been found to overcome the problems of over- or underselection, without revealing any obvious drawbacks.
Roulette wheel selection suffers from the disadvantage of being a high-variance process with the result that there are often large differences between the actual and expected numbers of copies made - there is no guarantee that the best solution will be copied. De Jong  tested an elitist scheme, which gave just such a guarantee by enlarging the population to include a copy of the best solution if it hadn't been retained. He found that on problems with just one maximum (or minimum) the algorithm performance was much improved, but on multimodal problems it was degraded.
Numerous schemes which introduce various levels of determinism into the selection process have been investigated. Overall, it seems that a procedure entitled stochastic remainder selection without replacement offers the best performance. In this, the expected number of copies of each solution is calculated as
Each solution is then copied times, being the integer part of . The fractional remainder
is treated as the probability of further duplication. For example, a solution for which would certainly be copied once and would be copied again with probability 0.8. Each solution is successively subjected to an appropriately weighted simulated coin toss until the new population is complete.