Technical Notes on using Analog Devices' DSP components and development tools
Contact our technical support by phone: (800) ANALOG-D or e-mail: dsp.support@analog.com
Or vi sit ou r on-l ine re sourc es ht tp:// www.analog.com/dsp and http://www.analog.com/dsp/EZAnswers
ADSP-BF531/532/533 Blackfin® Processor Multi-cycle Instructions and
Latencies
Contributed by Tom L. September 24, 2003
Introduction
This document contains a description of the multicycle instructions and latencies specific to the ADSPBF531/532/533 Blackfin® Processor devices. Multicycle instructions take more than one cycle to
complete. The specific cycle count cannot be reduced without removing the instruction that caused it. A
latency condition can occur when two instructions require extra cycles to complete because they are close
to each other in the assembly program. The programmer can avoid this cycle penalty by separating the
two instructions. Other causes for latencies are memory stalls and store buffer hazards; for these
conditions, a discussion of how to improve performance is provided.
The Pipeline Viewer within the VisualDSP++™ simulator provides a way of looking at the
way instructions are pushed through the processor’s pipeline. While the causes for various
conditions like stalls can be discovered interactively, this document contains more detail about
the nature of execution latencies.
All of the cycle counts described in this document are based on the assumption that code is
executed from L1 memory.
Multicycle Instructions
This section describes the instructions that take more than one cycle to complete. All instructions not
mentioned in this discussion are single-cycle instructions, provided they are executed from L1 memory.
Multicycle instructions include the following categories: Push Multiple/Pop Multiple, 32-bit Multiply,
Call, Jump, Conditional Branch, Returns from Events, Core and System Synchronization, Linkage,
Interrupts and Emulation, and Testset. In the following examples, the total number of cycles needed to
complete a certain instruction is shown next to the corresponding instruction. The full descriptions of
each instruction’s functionality is provided in the Blackfin Processor Instruction Set Reference.
Push Multiple/Pop Multiple
The Push Multiple and Pop Multiple instructions take n core cycles to complete, where n is the number of
registers pushed or popped, assuming the stack is located in L1 data memory.
Copyright 2003, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of
customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property
of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however
no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
The 32-bit by 32-bit integer multiply instruction always takes 3 cycles to complete.
Example Number of Cycles
R0 *= R1; 3 cycles
Call, Jump
All call and jump instructions take 5 cycles to complete, provided the target address is an aligned location
(see Instruction Alignment Unit Empty Latencies later in this document).
The number of cycles a branch takes depends on the prediction as well as the actual outcome.
Prediction Taken Not taken
Outcome
Number of Cycles
Returns from Events
Examples Number of Cycles
RTX; // return from an exception 5 cycles
RTE; // return from emulation 5 cycles
RTN; // return from an NMI 5 cycles
ADSP-BF531/532/533 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-197) Page 2 of 12
Taken Not taken Taken Not taken
5 cycles 9 cycles 9 cycles 1 cycle
RTI; // return from an interrupt 5 cycles
RTS; // return from a subroutine 5 cycles
Core and System Synchronization
Examples Number of Cycles
CSYNC; 10 cycles
SSYNC; >10 cycles
Linkage
Examples Number of Cycles
LINK 4; 3 cycles
UNLINK; 2 cycles
Interrupts and Emulation
Examples Number of Cycles
a
RAISE 10; 3 cycles (if interrupt branch is not taken)
EXCPT 3; 3 cycles (if exception branch is not taken)
STI R4; 3 cycles
Testset
The TESTSET instruction is a multicycle instruction that executes in a variable number of cycles. It is
dependent on the cycles needed for a read acknowledge from off-core memory. It is also dependent on
whether the address being tested is both in the cache and dirty. The number of cycles can be determined as
follows,
cycles = 1 (instruction) + 1 (stall) + x (read acknowledge) + y (cache latency)
Instruction Latencies
Unlike multicycle instructions, instruction latencies are contingent on the placement of specific instruction
pairs relative to one another. They can be avoided by separating them by as many instructions as there are
cycles incurred between them. For example, if a pair of instructions incurs a two cycle latency, separating
them by two instructions will eliminate that latency.
Bold blue typeis used to identify register dependencies within the instruction pairs. A dependency
occurs if a register is accessed in the instruction immediately following an instruction that modifies the
register. The lack of the color blue in a entry indicates that the latency condition will occur regardless of
what registers are used. Italicized red typeis used to highlight the stall consequences.
ADSP-BF531/532/533 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-197) Page 3 of 12
a
Instruction latencies are separated into these groups: Accumulator to Data Register Latencies, Register
Move Latencies, Move Conditional and Move CC Latencies, Loop Setup Latencies, Hardware Loop
Latencies, Instruction Alignment Unit Latencies, and Miscellaneous Latencies. The total cycle time of
each entry can be calculated by adding the cycles taken by each instruction to the number of stall cycles
for the instruction pair.
Refer to the Appendix for abbreviations, instruction group descriptions, as well as register
Accumulator to Data Register Latencies
Description Example <Cycles + Stalls>
groupings.
- dreg = Areg2Dreg op
- video op using dreg as src
Register Move Latencies
R1= R6.L * R4.H (IS);
R5 = BYTEOP1P (R3:2, R1:0);
<1>
<1+1>
In each of the following cases, the stall condition occurs when the same register is used in both
instructions.
Description Example <Cycles + Stalls>
- dreg = sysreg
- multiply/video op with dreg as src
- preg = dreg
- any op using preg
- dagreg = dreg
- any op using dagreg
- POP to dagreg
- any op using dagreg
- LOAD/POP to preg
- any op using preg
R0= LC0;
R2.H = R1.L * R0.H;
P0= R3;
R0 = P0;
I3= R3;
R0 = I3;
I3= [SP++];
R0 = I3;
P3 = [SP++];
R0 = P3;
<1>
<1+1>
<1>
<1+4>
<1>
<1+4>
<1>
<1+3>
<1>
<1+3>
- dreg = seqreg
- any ALU op using dreg
- dreg = MMR register
- any ALU op using dreg
R0 = RETS;
R1 = R0 + R3;
R3 = [P0]; // P0 points to an MMR
R0 = R3 – R0;
<1>
<1+1>
<1>
<1+1>
Move Conditional and Move CC Latencies
In each of the following cases, the stall condition occurs when the same register is used in both
instructions.
ADSP-BF531/532/533 Blackfin® Processor Multi-cycle Instructions and Latencies (EE-197) Page 4 of 12
Loading...
+ 8 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.