Use a Gantt chart to illustrate the effectiveness of vector chaining. Use the following parameters: the length of each vector is 64 elements; the multiply pipeline is 7 stages deep, the add pipeline has 6 stages, and there is a one-cycle delay (called the chain slot time) after producing the first product before the first pair of operands go to the adder. How many cycles does it take to compute V4 = V3+(V0*V1) without chaining? With chaining?
Figure 22: A single vertex in a 3D cube.