A good example of a superscalar processor is the IBM RS/6000 . There are three major subsystems in this processor: the instruction fetch unit, an integer processor, and a floating point processor. The instruction fetch unit is a 2- stage pipeline; during the first stage a packet of four instructions is fetched from an instruction cache, and in the second stage instructions are routed to the integer processor and/or floating point processor. An interesting feature of this instruction unit is that it executes branch instructions itself so that in a tight loop there is effectively no overhead from branching since the instruction unit executes branches while the data units are computing values. The integer unit is a four-stage pipeline. In addition to executing data processing instructions this unit does some preprocessing for the floating point unit. The floating point unit itself is six stages deep.
The following example from  shows the potential of this style of computing. This code from a computer graphics application rotates and displaces a set of (x,y) pairs by an angle q and displacement ():