Analog Devices EE171 Application Notes

Engineer To Engineer Note EE-171
s
a
Technical Notes on using Analog Devices' DSP components and development tools
Contact our technical support by phone: (800) ANALOG-D or e-mail: dsp.support@analog.com Or vi sit ou r on-l ine re sourc es ht tp:// www.analog.com/dsp and http://www.analog.com/dsp/EZAnswer
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies
Contributed by DSP Apps May 13, 2003

Introduction

This document describes the multi-cycle instructions and latencies specific to the ADSP-BF535 Blackfin® Processor. Multi-cycle instructions are ones that take more than one cycle to complete. This cycle penalty cannot be avoided without removing the instruction that caused it. A latency condition can occur when two instructions require extra cycles to complete because they are close to each other in the assembly program. The programmer can avoid this cycle penalty by separating the two instructions. Other causes for latencies are memory stalls and store buffer hazards. For many latency conditions, a discussion of how to improve performance is also provided.

Multicycle Instructions

This section describes the instructions that take more than one cycle to complete. All instructions not mentioned in this discussion are single-cycle instructions.
Multi-cycle instructions consist these types: Push Multiple/Pop Multiple, 32-bit Multiply, Call, Jump, Conditional Branch, Return, Core and System Synchronization, Linkage, and Interrupts and Emulation. In the following examples, the total number of cycles needed to complete a certain instruction is shown next to the corresponding instruction.

Push Multiple/Pop Multiple

The Push Multiple and Pop Multiple instructions take n cycles to complete, where n is the number of registers pushed or popped.
Example Number of Cycles
[--SP] = (R7:0, P5:0); 14 (R7:0, P5:3) = [SP++]; 11
Copyright 2003, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
a
32-bit Multiply (modulo 2
Example Number of Cycles
R0 *= R1; 5
32
)

Call, Jump

Example Number of Cycles
CALL 0x22; 4 JUMP(P0); 4

Conditional Branch

The number of cycles a branch takes depends on the prediction as well as the actual outcome.
Prediction Taken Not taken Outcome Number of Cycles
Taken Not taken Taken Not taken 4 cycles 7 cycles 7 cycles 1 cycle

Return

Examples Number of Cycles
RTX; 7 RTE; 7 RTN; 7 RTI; 7 RTS; 4

Core and System Synchronization

Examples Number of Cycles
CSYNC; 7 SSYNC; 7

Linkage

Examples Number of Cycles
LINK 4; 4 cycles UNLINK; 3 cycles
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 2 of 15
a

Interrupts and Emulation

Examples Number of Cycles
RAISE 10; 3 cycles EMUEXCPT; 3 cycles STI R4; 3 cycles

Instruction Latencies

Unlike multi-cycle instructions, instruction latencies (or stall cycles) are contingent on the placement of specific instruction pairs relative to one another. They can be avoided by separating them by as many instructions as there are stalls incurred between them. For example, if a pair of instructions incurs a 2 cycle latency, separating them by two instructions will eliminate that latency.
Bold blue type is used to identify register dependencies within the instruction pairs. An example of a
dependency is when a register is accessed in the instruction immediately following an instruction that modified the register. The lack of the color blue in a entry indicates that the latency condition will occur regardless of what registers are used. Italicized red type is used to highlight the stall consequences.
Instruction latencies are separated into these groups: Accumulator to Data Register Latencies, Register Move Latencies, Move Conditional and Move CC Latencies, Loop Setup Latencies, Instructions Within Hardware Loop Latencies, Loop Buffer Misalignment Latencies, and Miscellaneous Latencies. The total cycle time of each entry can be calculated by adding the cycles taken by each instruction to the number of stall cycles for the instruction pair.
Refer to the Appendix for abbreviations, instruction group descriptions, as well as register groupings.
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 3 of 15

Accumulator to Data Register Latencies

Description Example <Cycles + Stalls>
a
- dreg = Areg2Dreg op
- video op using dreg as src
- dreg = Areg2Dreg Op
- rnd12/rnd20 using dreg as src
- dreg = Areg2Dreg Op
- shift/rotate op using dreg as src
- dreg = Areg2Dreg Op
- add on sign using dreg as src
R1 = R6.L * R4.H (IS);
R5 = BYTEOP1P (R3:2, R1:0);
R4.L = (A0 = R3.H*R1.H);
R0.H = R2 + R4 (RND12);
R4.L = (A0 = R3.H*R1.H);
R1 = ROT R2 BY R4.L;
R0.H=R0.L=SIGN(R2.H)*R3.H+SIG
N (R2.L)*R3.L; R6.H=R6.L=SIGN(R0.H)*R1.H+SIGN
(R0.L)*R1.L;
<1> <1+2> <1> <1+1> <1> <1+1>
<1>
<1+1>

Register Move Latencies

In each of the following cases, the stall condition occurs when the same register is used in both instructions.
Description Example <Cycles + Stalls>
- dreg = sysreg
- ALU op using dreg as src (or vector ALU op)
- dreg = preg
- sysreg = dreg
- dreg = sysreg
- dreg = dreg
- dreg = sysreg
- multiply/video op with dreg as src
- dreg = sysreg
- accreg = dreg
- preg = dreg
- any op using preg
- dagreg = dreg
- any op using dagreg
R0 = LC0;
R2 = R1 + R0;
------------------------------------------
R2 = LC0;
R1.L = R2 (RND);
R0 = P0;
ASTAT = R0;
R0 = ASTAT;
R1 = R0;
R0 = LC0;
R2.H = R1.L * R0.H;
R0 = LC0;
A0 = R0;
P0 = R3;
R0 = P0;
I3 = R3;
R0 = I3;
<1> <1+1>
-----------------------------------------­<1> <1+1>
<1> <1+1> <1> <1+1> <1> <1+2> <1> <1+1> <1> <I+3> <1> <1+3>
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 4 of 15
a
- dreg = sysreg
- sysreg = dreg
- accreg = sysreg
- accreg = dreg
- accreg = sysreg
- accreg = preg
- accreg = sysreg
- accreg = accreg
- accreg = sysreg
- dreg = accreg
- accreg = sysreg
- sysreg = accreg
- accreg = sysreg
- math op using accreg as src
- accreg = sysreg
- POP to accreg
- POP to dagreg
R0 = LC0;
ASTAT = R0;
A0.w = LC0; A0 = R0; A0.w = LC0; A0.w = P0; A0.w = LC0;
A1 = A0;
A0.w = LC0;
R0.L = A0.x;
A0.w = LC0;
ASTAT = A0.w;
A1.x = LC0;
R1.H = (A0+=A1);
A0.w = LC0; A0.w = [SP ++ ]; I3 = [SP++];
<1> <1+1> <1> <1+1> <1> <1+1> <1> <1+1> <1> <1+1> <1> <1+1> <1> <1+1> <1> <1+1> <1>
- any op using dagreg
R0 = I3;
<1+3>

Move Conditional and Move CC Latencies

In each of the following cases, the stall condition occurs when the same register is used in both instructions.
Description Example <Cycles + Stalls>
- dreg = CC
- if CC dreg = dreg
- if CC dreg = dreg
- multiply/video op using dreg as src
- if CC dreg = preg
- math op using dreg as src
R0 = CC;
if CC R1 = R0; if CC R0 = R1; R2.H = R1.L * R0.H;
-----------------------------------------­if CC R1 = R3; SAA (R3:2, R1:0); if CC R0 = P0; R2 = R1 + R0;
<1> <1+1> <1> <1+1>
-----------------------------------------­<1> <1+1>
<1> <1+1>
------------------------------------------
if CC R3 = P1;
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 5 of 15
-----------------------------------------­<1>
Loading...
+ 10 hidden pages