The Process of fetching the next instruction while the current instruction is being executed is called as “pipelining”. Pipelining is supported by the processor to increase the speed of program execution. Increases throughput. Several operations take place simultaneously, rather than serially in pipelining. The Pipeline has three stages fetch, decode and execute as shown in Fig.
The three stages used in the pipeline are:
(i) Fetch : In this stage the ARM processor fetches the instruction from the memory.
(ii) Decode : In this stage recognizes the instruction that is to be executed.
(iii) Execute 2 In this stage the processor processes the instruction and writes the result back to desired register.
If these three stages of execution are overlapped, we will achieve higher speed of execution. Such pipeline exists in version 7 of ARM processor. Once the pipeline is filled, each instructions require s one cycle to complete execution. Below fig shows three staged pipelined instruction.
In first cycle, the processor fetches instruction 1 from the memory In the second cycle the processor fetches instruction 2 from the memory and decodes instruction 1. In the third cycle the processor fetches instruction 3 from memory, decodes instruction 2 and executes instruction 1. In the fourth cycle the processor fetches instruction 4, decodes instruction 3 and executes instruction 2. The pipeline thus executes an instruction in three cycles i.e. it delivers a throughput equal to one instruction per cycle.
In case of a multi-cycle instruction as shown 1n Fig. 9.10.2(b), instruction 2 (i. e. STR of the store instruction) requires 4 clock cycles and hence the pipeline stalls for one clock pulse. The first instruction completes execution in the third clock pulse, while the second instruction instead of completing execution in fourth clock pulse completes the same in fifth clock pulse. Thereafter every instruction completes execution in one clock pulse as seen in this figure.
The amount of work done at each stage can be reduced by increasing the number of stages in the pipeline. To improve the performance, the processor then can be operated at higher operating frequency.
As more number of cycles are required to fill the pipeline, the system latency also increases. The data dependency between the stages can also be increased as the stages of pipeline increase. So the instructions need to be schedule while writing code to decrease data dependency.
The organization of an ARM Processor with three stage pipeline consists of the following: Register bank: This includes various registers as seen in the programmer’s model of ARM.
Barrel Shifter: We have also seen this barrel shifter that is used to do various operations as discussed earlier in this chapter.
The ALU: This unit performs the various arithmetic and logical operations
Address register and increment: The register stores the address and the incrementer increments the same so as to point to the next instruction. 5. Data register: It is used as a buffer to store the data when written to the memory or read from the memory.
Instruction Decoder: As the name says, it decodes the instruction and issues the control signals accordingly. Hence, the Instruction decoder is associated with the control logic that issues the control signals. Fig shows the implementation of the three stage pipelining in ARM. In the figure, the flow of the instruction shows the three stages of pipelining. The instruction is first fetched by the PC (at the top) giving address to the address register and is incremented using the incrementer, so that the PC points to the next instruction. The instruction is fetched through the data bus (at the bottom) and is given to the Instruction Decode and Control unit. This unit decodes the instruction, which is the second stage of the pipeline and then the instruction is executed by the ALU, multiplier and barrel shifter using the registers from the register bank.