Technical Notes on using Analog Devices' DSP components and development tools
Contact our technical support by phone: (800) ANALOG-D or e-mail: dsp.support@analog.com
Or vi sit ou r on-l ine re sourc es ht tp:// www.analog.com/dsp and http://www.analog.com/dsp/EZAnswer
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies
Contributed by DSP Apps May 13, 2003
Introduction
This document describes the multi-cycle instructions and latencies specific to the ADSP-BF535
Blackfin® Processor. Multi-cycle instructions are ones that take more than one cycle to complete. This
cycle penalty cannot be avoided without removing the instruction that caused it. A latency condition can
occur when two instructions require extra cycles to complete because they are close to each other in the
assembly program. The programmer can avoid this cycle penalty by separating the two instructions.
Other causes for latencies are memory stalls and store buffer hazards. For many latency conditions, a
discussion of how to improve performance is also provided.
Multicycle Instructions
This section describes the instructions that take more than one cycle to complete. All instructions not
mentioned in this discussion are single-cycle instructions.
Multi-cycle instructions consist these types: Push Multiple/Pop Multiple, 32-bit Multiply, Call, Jump,
Conditional Branch, Return, Core and System Synchronization, Linkage, and Interrupts and Emulation.
In the following examples, the total number of cycles needed to complete a certain instruction is shown
next to the corresponding instruction.
Push Multiple/Pop Multiple
The Push Multiple and Pop Multiple instructions take n cycles to complete, where n is the number of
registers pushed or popped.
Copyright 2003, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of
customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property
of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however
no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
a
32-bit Multiply (modulo 2
Example Number of Cycles
R0 *= R1; 5
32
)
Call, Jump
Example Number of Cycles
CALL 0x22; 4
JUMP(P0); 4
Conditional Branch
The number of cycles a branch takes depends on the prediction as well as the actual outcome.
Prediction Taken Not taken
Outcome
Number of Cycles
Taken Not taken Taken Not taken
4 cycles 7 cycles 7 cycles 1 cycle
Return
Examples Number of Cycles
RTX; 7
RTE; 7
RTN; 7
RTI; 7
RTS; 4
Core and System Synchronization
Examples Number of Cycles
CSYNC; 7
SSYNC; 7
Linkage
Examples Number of Cycles
LINK 4; 4 cycles
UNLINK; 3 cycles
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 2 of 15
Unlike multi-cycle instructions, instruction latencies (or stall cycles) are contingent on the placement of
specific instruction pairs relative to one another. They can be avoided by separating them by as many
instructions as there are stalls incurred between them. For example, if a pair of instructions incurs a 2
cycle latency, separating them by two instructions will eliminate that latency.
Bold blue typeis used to identify register dependencies within the instruction pairs. An example of a
dependency is when a register is accessed in the instruction immediately following an instruction that
modified the register. The lack of the color blue in a entry indicates that the latency condition will occur
regardless of what registers are used. Italicized red typeis used to highlight the stall consequences.
Instruction latencies are separated into these groups: Accumulator to Data Register Latencies, Register
Move Latencies, Move Conditional and Move CC Latencies, Loop Setup Latencies, Instructions Within
Hardware Loop Latencies, Loop Buffer Misalignment Latencies, and Miscellaneous Latencies. The total
cycle time of each entry can be calculated by adding the cycles taken by each instruction to the number of
stall cycles for the instruction pair.
Refer to the Appendix for abbreviations, instruction group descriptions, as well as register groupings.
ADSP-BF535 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-171) Page 3 of 15
Accumulator to Data Register Latencies
Description Example <Cycles + Stalls>
a
- dreg = Areg2Dreg op
- video op using dreg as src
- dreg = Areg2Dreg Op
- rnd12/rnd20 using dreg as src
- dreg = Areg2Dreg Op
- shift/rotate op using dreg as src
- dreg = Areg2Dreg Op
- add on sign using dreg as src
R1= R6.L * R4.H (IS);
R5 = BYTEOP1P (R3:2, R1:0);
R4.L = (A0 = R3.H*R1.H);
R0.H = R2 + R4 (RND12);
R4.L = (A0 = R3.H*R1.H);
R1 = ROT R2 BY R4.L;
R0.H=R0.L=SIGN(R2.H)*R3.H+SIG
N (R2.L)*R3.L;
R6.H=R6.L=SIGN(R0.H)*R1.H+SIGN
(R0.L)*R1.L;
<1>
<1+2>
<1>
<1+1>
<1>
<1+1>
<1>
<1+1>
Register Move Latencies
In each of the following cases, the stall condition occurs when the same register is used in both
instructions.