One of the advantages of this style of parallel machine organization is a savings in the amount of logic. Anywhere from 20% to 50% of the logic on a typical processor chip is devoted to control, namely to fetching, decoding, and scheduling instructions. The remainder is used for on-chip storage (registers and cache) and the logic required to implement the data processing (adders, multipliers, etc.). In an SIMD machine, only one control unit fetches and processes instructions, so more logic can be dedicated to arithmetic circuits and registers. For example, 32 PEs fit on one chip in the MasPar MP-1, and a 1024- processor system is built from 32 chips, all of which fit on a single board (the control unit occupies a separate board).
Vector processing is performed on an SIMD machine by distributing elements of vectors across all data memories. For example, suppose we have two vectors, a and b, and a machine with 1024 PEs. We would store in location 0 of memory i and in location 1 of memory i. To add a and b, the machine would tell each PE to load the contents of location 0 into one register, the contents of location 1 into another register, add the two registers, and write the result. As long as the number of PEs is greater than the length of the vectors, vector processing on an SIMD machine is done in constant time, i.e. it does not depend on the length of the vectors. Vector operations on a pipelined SISD vector processor, however, take time that is a linear function of the length of the vectors.