Analog Devices ee-123 Application Notes

a
a Engineer To Engineer Note
aa
EE-123
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781) 461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
An Overview of the ADSP-219x Pipeline
Last modified 10/13/00
This tech-note is intended to provide a brief description of the salient features of the ADSP-219x pipeline to users and programmers of the ADSP-219x, to help them to optimize their programs to maximize performance and throughput. Specific cases that illustrate the pipeline operation (such as loops, jumps, and interrupt calls) will be explained.
Knowledge of ADSP-2100 family assembly language (which is identical to ADSP-219x assembly) is assumed. For more detailed information on the ADSP-219x pipeline, please refer the ADSP-219x User’s Manual.
Introduction
The ADSP-219x has a six-stage instruction pipeline comprising the Look-ahead, Pre-fetch, Fetch, Address generation, Decode, and Execute stages. Note that as this is not an execution pipeline, the DSP core executes every instruction in a single core cycle. A two-stage memory pipeline is incorporated within the instruction pipeline. What this means is that it takes two core cycles for data to be available on the data bus after the address has been placed on the address bus. Therefore, during any given cycle, up to 6 instructions maybe in different stages of completion.
The additional depth in the pipeline allows a much increased operating speed for the processor core. The functions of the different stages of the pipeline are briefly:
Look-ahead. In the look-ahead stage, the pipeline places an instruction address on the PMA bus.
The look-ahead stage is also used by the sequencer in resolving bus conflicts. Since the two stages of the pipeline that could possibly use the PMA bus are the look-ahead and the address generation, if there is a conflict between these two stages on who acquires the bus, the sequencer looks in the instruction cache to determine if the instruction that causes the conflict has already been cached. If it is determined that the instruction was cached, the sequencer looks ahead for the instruction from the instruction cache, allowing the address generation stage to use the PMA bus instead. If however, the instruction was not cached, the address generation stage wins priority in the current cycle to use the PMA bus, while the program sequencer will get the PMA bus in the next cycle.
Pre-fetch. The pre-fetch stage is essentially spent waiting for memory accesses that were initiated in the look-ahead stage.
Fetch. In this stage, the instruction that was “looked-ahead” two cycles ago is fetched from program memory, over the PMD bus.
Copyright 2000, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers’ prod ucts or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is as sumed by Analog Devices regarding the technical accuracy of the content provided in all Analog Devices’ Engineer-to-Engineer Notes.
a
Address generation. In this stage, certain parts of the instruction are decoded, such as DAG operations. If any memory data is required by the instruction, the address for this data is placed over the appropriate address bus (PMA for PM data and DMA for DM data)
Decode. This stage is used by the processor to decode the rest of the instruction and for setting up computational units. It is also spent waiting for memory accesses to occur.
Execute. In this stage, the instruction is executed, status conditions are set, and results and outputs written to the appropriate destination.
Syntax and Terminology
For the remainder of the document, the following syntax will be followed in pipeline diagrams and discussions. The stages of the pipeline will be indicated by their first letters in Boldface Upper-case, addresses will be indicated by the prefix `a’ (e.g., a1,a2,a3..), while instructions will be prefixed by `i’ (e.g., i1,i2,i3..). The pipeline blocks will contain the corresponding instructions as they traverse through the pipeline. Blocks that are crossed out ( ) represent instances where the sequencer has invalidated the contents of that pipeline stage and effectively replaced that instruction with a NOP. Blocks grayed out ( ) represent a cache-lookup.
Cache hit. A cache hit occurs when the program sequencer determines that the address for an instruction is present in the instruction cache. The result of a cache hit is the sequencer gets the instruction from the cache instead of fetching it from program memory, thereby freeing up the PMA bus for another access.
Cache miss. A cache miss occurs when the address looked up by the sequencer is not present in the instruction cache. In the event of a cache-miss, the sequencer has to fetch the instruction from memory. In relevant cases, the pipeline diagram will be accompanied with a block diagram describing the contents of the Program Memory Address (PMA), Program Memory Data (PMD), Data Memory Address (DMA), and Data Memory Data (DMD) buses in that cycle. For example, consider the following sequence of consecutive instructions and addresses. Figure 1(a) shows the pipeline diagram, while Figure 1(b) shows the contents of the four buses. (For the sake of simplicity, the contents of the DMA and DMD buses have been left out.)
Address Instruction
a1 i1 a2 i2 a3 i3 a4 i4 a5 i5 a6 i6 a7 i7
EN-123 Page 2
Notes on using Analog Devices’ DSP, audio, & video components from the Computer Products Division
Phone: (800) ANALOG-D or (781) 461-3881, FAX: (781) 461-3010, EMAIL: dsp.support@analog.com
CLOCK CYCLES, TIME
PIPELINE
STAGE
L i6 i7 : : : :
P i5 i6 i7 : : :
F i4 i5 i6 i7 : :
A i3 i4 i5 i6 i7 :
D i2 i3 i4 i5 i6 i7
E i1 i2 i3 i4 i5 i6
Figure 1(a)
PMA a6 a7 a8 .. .. ..
PMD i4 i5 i6 .. .. ..
DMA
DMD
Figure 1(b)
Case 1
. Latencies on Jumps/Calls
In all, the ADSP-219x supports 5 varieties of delayed and non-delayed jumps and calls. There is a 13-bit conditional jump/call (type 10), a 16-bit unconditional jump/call (type 10a), both of which use relative addresses and not absolute addresses. The range of relative addresses (from the current PC) for the type 10 instruction is –4096 to +4095, while the corresponding range for the type 10a is –32768 to +32767.
In addition, the ADSP-219x also supports a delayed or non-delayed conditional Indirect Jump/Call (where the address to jump or call is passed in a DAG index register). Note that the destination address is the absolute address contained in the DAG register, with the 8 MSBs of the destination address taken from the corresponding page register.
e.g., IF NE CALL (I4); // make sure you set up IJPG and I4 before you execute this instruction IF AV JUMP (I5) (db); // same holds true for this instruction
Finally, the ADSP-219x also supports a 2-instruction, conditional, non-delayed absolute jump/call. The absolute 24-bit address is specified in the instruction. The linker is cognizant enough when this jump is invoked, to decide whether the absolute address needs to be specified. Whenever possible, the shorter, faster relative jumps/calls will be used. There is a way to force the linker to use the absolute jump by prefixing the jump with an “L”.
e.g., LJUMP foo; LJUMP 0xFF0000;
The latency of the instruction pipeline is to introduce a latency of 4 “core processor” cycles for both conditional as well as unconditional jumps and calls IF the branch is taken, and NO latency if the branch is not taken. Figures 2(a) and 2(b) illustrate an example case of the branch taken, and branch not taken, respectively.
EN-123 Page 3
Notes on using Analog Devices’ DSP, audio, & video components from the Computer Products Division
Phone: (800) ANALOG-D or (781) 461-3881, FAX: (781) 461-3010, EMAIL: dsp.support@analog.com
Address Instruction
a1 i1: MR=MR+MX0*MY0 (SS);
a2 i2: IF COND JUMP aa1; a3 i3 a4 i4 a5 i5 a6 i6 a7 i7
... ...
aa1 ii1
CLOCK CYCLES
L i6 ii1 ii2 : : P i5 i6 ii1 ii2 : :
F i4 i5 i6 ii1 ii2 : A i3 i4 i5 i6 ii1 ii2 : D i2 i3 i4 i5 i6 ii1 ii2 E i1 i2 i3 i4 i5 i6 ii1
Figure 2(a) Branch taken
CLOCK CYCLES
L i6 i7 : : : P i5 i6 i7 : : :
F i4 i5 i6 i7 : : A i3
i4
i5 i6 i7 :
D i2 i3 i4 i5 i6 i7 E i1 i2 i3 i4 i5 i6
Figure 2(b) Branch not taken
Delayed Jumps/Calls
To compensate for the increased overhead, the ADSP-219x programming model now gives a programmer the option (which didn’t exist on the ADSP-218x) to use delayed branches and function calls. Two instructions can be executed in the pipeline pending the branch. Figure 3(a) and 3(b) shows the pipeline structure for a delayed jump that is taken and not taken, respectively. Note that the instructions in the delayed branch slots are executed regardless of whether the jump is taken or not. Also, there are some restrictions on the types of instructions that can be part of a delayed branch slot. For example, stack manipulation operations such as and pushes and pops of stacks are not allowed. Multi-word instructions are allowed only in the first delay slot.
EN-123 Page 4
Notes on using Analog Devices’ DSP, audio, & video components from the Computer Products Division
Phone: (800) ANALOG-D or (781) 461-3881, FAX: (781) 461-3010, EMAIL: dsp.support@analog.com
Loading...
+ 8 hidden pages