Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of
Texas Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
†
IEEE Standard 1149.1–1990, IEEE Standard Test Access Port and Boundary-Scan Architecture
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of Texas Instruments
standard warranty. Production processing does not necessarily include
testing of all parameters.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
Copyright 1998, Texas Instruments Incorporated
On products compliant to MIL-PRF-38535, all parameters are tested
unless otherwise noted. On all other products, production
processing does not necessarily include testing of all parameters.
SDRAM-type cycles105
special register set cycles116
device reset127
absolute maximum ratings over specified temperature
ranges 128
recommended operating conditions128
electrical characteristics over recommended ranges
of supply voltage and operating case temperature128
signal transition levels129
timing parameter symbology130
general notes on timing parameters130
CLKIN timing requirements131
local-bus switching characteristics over full operating
range: CLKOUT131
device reset timing requirements132
local bus timing requirements: cycle configuration inputs 133
local bus timing: cycle completion inputs134
general output signal characteristics over
operating conditions137
data input timing139
local bus timing: 2-cycle/column CAS
external interrupt timing141
XPT input timing142
host-interface timing143
video interface timing: SCLK timing144
video interface timing: FCLK input and video outputs145
video interface timing: external sync inputs146
emulator interface connection147
MECHANICAL DATA150
MECHANICAL DATA151
timing140
description
The SMJ320C80 is a single-chip, MIMD parallel processor capable of performing over two billion operations
per second. It consists of a 32-bit RISC master processor with a 100-MFLOPS (million floating-point operations
per second) IEEE floating-point unit, four 32-bit parallel processing digital signal processors (DSPs), a transfer
controller with up to 400-MBps off-chip transfer rate, and a video controller . All the processors are coupled tightly
through an on-chip crossbar that provides shared access to on-chip RAM. This performance and
programmability make the ’C80 ideally suited for video, imaging, and high-speed telecommunications
applications.
BS1–BS0I
CT2–CT0ICycle timing selection. CT2–CT0 signals determine the timing of the current memory access.
D63–D0I/OData bus. D63–D0 transfer up to 64 bits of data per memory cycle into or out of the ’C80.
DBENO
DDINO
FAULTI
PS3–PS0I
READYI
RLO
RETRYI
STATUS5–STATUS0O
UTIMEI
CAS/DQM7–
CAS
/DQM0
DSFO
RASORow-address strobe. RAS drives the RAS inputs of DRAMs, VRAMs, and SDRAMs.
TRG/CASO
WO
†
I = input, O = output, Z = high-impedance
‡
This pin has an internal pullup and can be left unconnected during normal operation.
§
This pin has an internal pulldown and can be left unconnected during normal operation.
¶
For proper operation, all VDD and VSS pins must be connected externally.
†
LOCAL MEMORY INTERFACE
Address bus. A31–A0 output the 32-bit byte address of the external memory cycle. The address can be
multiplexed for DRAM accesses.
Address-shift selection. AS2–AS0 determine how the column address appears on the address bus. Eight
shift values are supported, including zero.
Bus size selection. BS1 – BS0 indicate the bus size of the memory or other devices being accessed,
allowing dynamic bus sizing for data buses less than 64 bits wide.
Data-buffer enable. DBEN drives the active-low output enables of bidirectional transceivers that can be
used to buffer input and output data on D63–D0.
Data direction indicator. DDIN indicates the direction of the data that passes through the transceivers.
When DDIN
Fault. FAULT is driven low by external circuitry to inform the ’C80 that a fault has occurred on the current
memory row access.
Page size indication. PS3– PS0 indicate the page size of the memory device(s) being accessed by the
current cycle. The ’C80 uses this information to determine when to begin a new row access.
Ready. READY indicates that the external device is ready to complete the memory cycle. READY is driven
low by external circuitry to insert wait states into a memory cycle.
Row latch. The high-to-low transition of RL can be used to latch the valid 32-bit byte address that is present
on A31–A0.
Retry. RETR Y is driven low by external circuitry to indicate that the addressed memory is busy. The ’C80
memory cycle is rescheduled.
Status code. At row time, STA TUS5–STATUS0 indicate the type of cycle being performed. At column time,
they identify the processor and type of request that initiated the cycle.
User-timing selection. UTIME causes the timing of RAS and CAS/DQM7–CAS/DQM0 to be modified so
that custom memory timings can be generated. During reset, UTIME
’C80 operates.
Column-address strobes. CAS/DQM7–CAS/DQM0 drive the CAS inputs of DRAMs and VRAMs, or the
DQM input of synchronous dynamic random-access memories (SDRAMs). The eight strobes provide
O
byte-write access to memory.
Special function. DSF selects special VRAM functions such as block-write, load color register , split-register
transfer, and synchronous graphics random-access memory (SGRAM) block write.
Transfer/output enable or column-address strobe. TRG/CAS is used as an output enable for DRAMs and
VRAMs, and also as a transfer enable for VRAMs. TRG
Write enable. W is driven low before CAS during write cycles. W controls the direction of the transfer during
VRAM transfer cycles.
is low, the transfer is from external memory into the ’C80.
selects the endian mode in which the
DRAM, VRAM, AND SDRAM CONTROL
/CAS also drives the CAS inputs of SDRAMs.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
11
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
Terminal Functions (Continued)
TERMINAL
NAME
HACKO
HREQI
REQ1, REQ0O
CLKINI
CLKOUTO
EINT1, EINT2, EINT3I
LINT4I
RESETI
XPT2–XPT0I
EMU0, EMU1
‡
TCK
‡
TDI
TDOOTest data output. TDO provides output data for all IEEE-1149.1 instructions and data scans of the ’C80.
‡
TMS
TRST
†
I = input, O = output, Z = high-impedance
‡
This pin has an internal pullup and can be left unconnected during normal operation.
§
This pin has an internal pulldown and can be left unconnected during normal operation.
¶
For proper operation, all VDD and VSS pins must be connected externally.
‡
§
†
TYPE
HOST INTERFACE
Host acknowledge. The ’C80 drives HACK output low following an active HREQ to indicate that it has driven
the local memory bus signals to the high-impedance state and is relinquishing the bus. HACK
asynchronously following HREQ
Host request. An external device drives HREQ low to request ownership of the local memory bus. When
HREQ
is high, the ’C80 owns and drives the bus. HREQ is synchronized internally to the ’C80’s internal
clock. Also, HREQ
edge of RESET
occurrence on EINT3
Internal cycle request. REQ1 and REQ0 provide a two-bit code indicating the highest-priority memory cycle
request that is being received by the TC. External logic can monitor REQ1 and REQ0 to determine if it is
necessary to relinquish the local memory bus to the ’C80.
Input clock. CLKIN generates the internal ’C80 clocks to which all processor functions (except the frame
timers) are synchronous.
Local output clock. CLKOUT provides a way to synchronize external circuitry to internal timings. All ’C80
output signals (except the VC signals) are synchronous to this clock.
Edge-triggered interrupts. EINT1, EINT2 and EINT3 allow external devices to interrupt the master
processor (MP) on one of three interrupt levels (EINT1
triggered. EINT3
EINT3
causes the MP to unhalt and fetch its reset vector (the EINT3 interrupt-pending bit is not set in this
case).
Level-triggered interrupt. LINT4 provides an active-low level-triggered interrupt to the MP. Its priority falls
below that of the edge-triggered interrupts. Any interrupt request should remain low until it is recognized
by the ’C80.
Reset. RESET is driven low to reset the ’C80 (all processors). During reset, all internal registers are set
to their initial state and all outputs are driven to their inactive or high-impedance levels. During the rising
edge of RESET
of HREQ
External packet transfer. XPT2–XPT0 are used by external devices to request a high-priority XPT by the
TC.
I/O
Emulation pins. EMU0 and EMU1 are used to support emulation host interrupts, special functions targeted
at a single processor, and multiprocessor halt-event communications.
Test clock. TCK provides the clock for the ’C80 IEEE-1149.1 logic, allowing it to be compatible with other
I
IEEE-1149.1 devices, controllers, and test equipment designed for different clock rates.
ITest data input. TDI provides input data for all IEEE-1149.1 instructions and data scans of the ’C80.
ITest-mode select. TMS controls the IEEE-1149.1 state machine.
Test reset. TRST resets the ’C80 IEEE-1149.1 module. When low, all boundary-scan logic is disabled,
I
allowing normal ’C80 operation.
and UTIME pins, respectively.
is used at reset to determine the power-up state of the MP . If HREQ is low at the rising
, the MP comes up running. If HREQ is high, the MP remains halted until the first interrupt
.
SYSTEM CONTROL
also serves as an unhalt signal. If the MP is powered-up halted, the first rising edge on
, the MP reset mode and the ’C80’s operating endian mode are determined by the levels
EMULATION CONTROL
being detected inactive, and then the ’C80 resumes driving the bus.
DESCRIPTION
DESCRIPTION
is the highest priority). The interrupts are rising-edge
is driven high
12
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
Terminal Functions (Continued)
TERMINAL
NAME
CAREA0, CAREA1O
CBLNK0 / VBLNK0,
CBLNK1
CSYNC0 / HBLNK0,
CSYNC1
FCLK0, FCLK1I
HSYNC0,
HSYNC1
SCLK0, SCLK1I
VSYNC0,
VSYNC1
V
V
†
‡
§
¶
/ VBLNK1
/ HBLNK1
¶
SS
¶
DD
I = input, O = output, Z = high-impedance
This pin has an internal pullup and can be left unconnected during normal operation.
This pin has an internal pulldown and can be left unconnected during normal operation.
For proper operation, all VDD and VSS pins must be connected externally.
†
TYPE
VIDEO INTERFACE
Composite area. CAREA0 and CAREA1 define a special area such as an overscan boundary. This area
represents the logical OR of the internal horizontal and vertical area signals.
Composite blanking/vertical blanking. Each of CBLNK0 / VBLNK0 and CBLNK1/VBLNK1 provides one
of two blanking functions, depending on the configuration of the CSYNC
Composite blanking disables pixel display/capture during both horizontal and vertical retrace periods
O
I/O/Z
I/O/Z
I/O/Z
IGround. Electrical ground inputs
IPower. Nominal 3.3-V power supply inputs
and is enabled when CSYNC is selected for composite-sync video systems.
Vertical blanking disables pixel display/capture during vertical retrace periods and is enabled when
HBLNK
is selected for separate-sync video systems.
Following reset, CBLNK0
respectively.
Composite sync /horizontal blanking. CSYNC0 / HBLNK0 and CSYNC1 / HBLNK1 can be programmed
for one of two functions:
Composite sync is for use on composite-sync video systems and can be programmed as an input,
output, or high-impedance signal
from externally generated active-low sync pulses. As an output, the active-low composite-sync pulses
are generated from either external HSYNC
the high-impedance state, the pin is neither driven nor allowed to drive circuitry.
Horizontal blank disables pixel display / capture during horizontal retrace periods in separate-sync
video systems and can be used as an output only.
Immediately following reset, CSYNC0
high-impedance CSYNC0
Frame clock. FCLK0 and FCLK1 are derived from the external video system’s dotclock and are used to
drive the ’C80 video logic for frame timer 0 and frame timer 1.
Horizontal sync. HSYNC0 and HSYNC1 control the video system. They can be programmed as input,
output, or high impedance signals. As an input, HSYNC
generated horizontal sync pulses. As an output, HSYNC
by the ’C80 on-chip frame timer. In the high-impedance state, the pin is not driven, and no internal
synchronization is allowed to occur. Immediately following reset, HSYNC0
high-impedance state.
Serial data clock. SCLK0 and SCLK1 are used by the ’C80 shift register transfer (SRT) controller to track
the VRAM tap point when using midline reload. SCLK0 and SCLK1 should be the same signals that clock
the serial register on the VRAMs controlled by frame timer 0 and frame timer 1, respectively.
Vertical sync. VSYNC0 and VSYNC1 control the video system. They can be programmed as inputs,
outputs, or high-impedance signals. As inputs, VSYNCx
generated vertical-sync pulses. As outputs, VSYNCx
’C80 on-chip frame timer. In the high-impedance state, the pin is not driven and no internal synchronization
is allowed to occur. Immediately following reset, VSYNCx
/ VBLNK0 and CBLNK1 / VBLNK1 are configured as CBLNK0 and CBLNK1,
and CSYNC1, respectively.
POWER
DESCRIPTION
DESCRIPTION
. As an input, the ’C80 extracts horizontal and vertical sync information
and VSYNC signals or the ’C80’s internal video timers. In
/ HBLNK0 and CSYNC1 / HBLNK1 are configured as
/HBLNK pin:
synchronizes the video timer to externally
is an active-low horizontal sync pulse generated
and HSYNC1 are in the
synchronize the frame timer to externally
are active-low vertical-sync pulses generated by the
is in the high-impedance state.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
13
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
Terminal Functions (Continued)
TERMINAL
NAME
NCNo connect serves as an alignment key or is for factory use and must be left unconnected.
†
I = input, O = output, Z = high-impedance
‡
This pin has an internal pullup and can be left unconnected during normal operation.
§
This pin has an internal pulldown and can be left unconnected during normal operation.
¶
For proper operation, all VDD and VSS pins must be connected externally.
TYPE
†
MISCELLANEOUS
DESCRIPTION
DESCRIPTION
architecture
Figure 1 shows the major components of the ’C80: the master processor (MP), the parallel digital signal
processors (PPs), the transfer controller ( TC), and the IEEE-1149.1 emulation interface. Shared access to
on-chip RAM is achieved through the crossbar. Crossbar connections are represented by
perform three accesses per cycle through its local (L), global (G), and instruction (I) ports. The MP can access
two RAMs per cycle through its crossbar/data (C/D) and instruction (I) ports, and the TC can access one RAM
through its crossbar interface. Up to nine simultaneous accesses are supported in each cycle. Addresses can
be changed every cycle, allowing the crossbar matrix to be changed on a cycle-by-cycle basis. Contention
between processors for the same RAM in the same cycle is resolved by a round-robin priority scheme. In
addition to the crossbar, a 32-bit data path exists between the MP and the TC and VC. This allows the MP to
access TC control registers that are memory-mapped into the MP memory space.
The ’C80 has a 4G-byte address space as shown in Figure 2. The lower 32M bytes are used to address internal
RAM and memory-mapped registers.
. Each PP can
14
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
architecture (continued)
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
C/DCrossbar/data port
GGlobal port
LLocal port
IInstruction port
Figure 1. Block Diagram Showing Data Paths
Parameter RAM
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
Parameter RAM
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
Parameter RAM
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
Parameter RAM
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
Parameter RAM
Data Cache
Data Cache
Instruction Cache
Instruction Cache
TC
64
64
32
(JTAG)
1149.1
IEEE-
32
32
32
32
32
3264
3264
3264
3264
LGI
LGI
LGI
LGI
C/DI
OCR
VC
ADSP3
ADSP2
ADSP1
ADSP0
DIGITAL SIGNAL PROCESSOR
MP
SGUS025 – AUGUST 1998
15
SMJ320C80
64
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
architecture (continued)
External Memory
(4064M bytes)
Reserved
(8063K bytes)
Memory-Mapped VC Registers
Memory-Mapped TC Registers
(512 bytes)
(512 bytes)
Reserved
(28K bytes)
MP Instruction Cache
(4K bytes)
Reserved
(28K bytes)
MP Data Cache
(4K bytes)
Reserved
(32K bytes)
ADSP3 Instruction Cache
(2K bytes)
Reserved
(6K bytes)
ADSP2 Instruction Cache
(2K bytes)
Reserved
(6K bytes)
ADSP1 Instruction Cache
(2K bytes)
Reserved
(6K bytes)
ADSP0 Instruction Cache
(2K bytes)
Registers
(8132K bytes)
MP Parameter RAM
(2K bytes)
Registers
(50K bytes)
0xFFFFFFFF
0x02000000
0x01FFFFFF
0x01820400
0x018203FF
0x01820200
0x018201FF
0x01820000
0x0181FFFF
0x01819000
0x01818FFF
0x01818000
0x01817FFF
0x01811000
0x01810FFF
0x01810000
0x0180FFFF
0x01808000
0x01807FFF
0x01807800
0x018077FF
0x01806000
0x01805FFF
0x01805800
0x018057FF
0x01804000
0x01803FFF
0x01803800
0x018037FF
0x01802000
0x01801FFF
0x01801800
0x018017FF
0x01010800
0x010107FF
0x01010000
0x0100FFFF
0x01003800
ADSP3 Parameter RAM
(2K bytes)
Reserved
(2K bytes)
ADSP2 Parameter RAM
(2K bytes)
Reserved
(2K bytes)
ADSP1 Parameter RAM
(2K bytes)
Reserved
(2K bytes)
ADSP0 Parameter RAM
(2K bytes)
Reserved
(16338K bytes)
ADSP3 Data RAM2
(2K bytes)
Reserved
(2K bytes)
ADSP2 Data RAM2
(2K bytes)
Reserved
(2K bytes)
ADSP1 Data RAM2
(2K bytes)
Reserved
(2K bytes)
ADSP0 Data RAM2
(2K bytes)
Reserved
(16K bytes)
ADSP3 Data RAM1
(2K bytes)
ADSP3 Data RAM0
(2K bytes)
ADSP2 Data RAM1
(2K bytes)
ADSP2 Data RAM0
(2K bytes)
ADSP1 Data RAM1
(2K bytes)
ADSP1 Data RAM0
(2K bytes)
ADSP0 Data RAM1
(2K bytes)
ADSP0 Data RAM0
(2K bytes)
0x010037FF
0x01003000
0x01002FFF
0x01002800
0x010027FF
0x01002000
0x01001FFF
0x01001800
0x010017FF
0x01001000
0x01000FFF
0x01000800
0x010007FF
0x01000000
0x00FFFFFF
0x0000B800
0x0000B7FF
0x0000B000
0x0000AFFF
0x0000A800
0x0000A7FF
0x0000A000
0x00009FFF
0x00009800
0x000097FF
0x00009000
0x00008FFF
0x00008800
0x000087FF
0x00008000
0x00007FFF
0x00004000
0x00003FFF
0x00003800
0x000037FF
0x00003000
0x00002FFF
0x00002800
0x000027FF
0x00002000
0x00001FFF
0x00001800
0x000017FF
0x00001000
0x00000FFF
0x00000800
0x000007FF
0x00000000
16
Figure 2. Memory Map
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
master processor (MP) architecture
The master processor (MP) is a 32-bit RISC processor with an integral IEEE-754 floating-point unit. The MP
is designed for effective execution of C code and is capable of performing at well over 130000 dhrystones/s.
Major tasks which the MP typically performs are:
D
Task control and user interface
D
Information processing and analysis
D
IEEE-754 floating point (including graphics transforms)
MP functional block diagram
Figure 3 shows a block diagram of the master processor. Key features of the MP include:
Floating-point operation and parallel load or store
Multiply and accumulate
D
High performance
–50 million instructions per second (MIPS)
–100 million floating-point operations per second (MFLOPS)
–Over 130000 dhrystones/s
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
17
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP functional block diagram (continued)
(Thirty-One 32-Bit Registers)
Register File
Barrel Rotator
Mask Generator
Zero Comparator
Integer Arithmetic and
Logic Unit (ALU)
Leftmost/Rightmost One
Timer
Control Registers
Instruction Register
Program Counters (PCs)
PC Incrementer
Scoreboard
Double-Precision
Floating-Point Multiplier
(Single-Precision Core)
Double-Precision Floating-Point
Accumulators
Double-Precision
Floating-Point Adder
Emulation Logic
Instruction Cache
Controller
Crossbar Interface
Endian Multiplexers
Data-Cache
Controller
Figure 3. MP Block Diagram
MP general-purpose registers
The MP contains 31 32-bit general-purpose registers, R1–R31. Register R0 always reads as zero and writes
to it are discarded. Double-precision values are always stored in an even-odd register pair with the
higher-numbered register always holding the sign bit and exponent. The R0/R1 pair is not available for this use.
A scoreboard keeps track of which registers are awaiting loads or the result of a previous instruction and stalls
the instruction pipeline until the register contains valid data. As a recommended software convention, R1 is
typically used as a stack pointer and R31 as a return-address link register.
Figure 4 shows the MP general-purpose registers.
18
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
Not Available
R2, R3
R4, R5
R30, R31
Floating Point
Integer
Unsi
Bit
Integer
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP general-purpose registers (continued)
Zero/Discard
R1
R2
R3
R4
R5
•
•
•
R30
R31
32-Bit Registers64-Bit Register Pairs
Figure 4. MP General-Purpose Registers
The 32-bit registers can contain signed-integer, unsigned-integer, or single-precision floating-point values.
Signed and unsigned bytes and halfwords are sign-extended or zero-filled. Doublewords can be stored in a
64-bit even/odd register pair. Double-precision floating-point values are referenced using the even register
number or the register pair. Figure 5 through Figure 7 show the register data formats.
•
•
•
Single-Precision
Signed 32-bit
gned 32-
SSign bit
EExponent
MValue
ISigned integer value
UUnsigned integer value
MSMost significant
LSLeast signficant
31220
S E E E E E E E E M M M M M M M M M M M M M M M M M M M M M M M
310
S IIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
MSLS
310
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
MSLS
MSLS
Figure 5. MP Register 32-Bit Data Formats
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
19
SMJ320C80
Halfword
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP general-purpose registers (continued)
3170
Signed Byte
Unsigned Byte
Signed Halfword
S
S S S S S S S S S S S S S S S S S S S S S S S S IIIIIII
3170
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U
31150
S
S S S S S S S S S S S S S S S S IIIIIIIIIIIIIII
MSLS
MSLS
MSLS
Unsigned
SSign bit(s)
ISigned byte/halfword value
UUnsigned byte/halfword value
MSMost significant
LSLeast signficant
31150
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U U U U U U U U U
MSLS
Figure 6. MP Register 8-Bit and 16-Bit Data Formats
310
Odd Register
MS
310
Even RegisterLeast Significant 32-Bit Word
31190
Odd Register
Even Register
SSign bit(s)
EExponent
ISigned byte/halfword value
UUnsigned byte/halfword value
MSMost significant
LSLeast signficant
S
E E E E E E E E E E E M M M M M M M M M M M M M M M M M M M M
310
M
M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
Most Significant 32-Bit Word
MS
LS
LS
20
Figure 7. MP Register 64-Bit Data Formats
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP double-precision floating-point accumulators
There are four double-precision floating-point registers (see Figure 8) to accumulate intermediate floating-point
results.
In addition to the general-purpose registers, there are a number of control registers that are used to represent
the state of the processor. Table 1 shows the control register numbers of the accessible registers.
Table 1. Control Register Numbers
NUMBERNAMEDESCRIPTIONNUMBERNAMEDESCRIPTION
0x0000EPCException Program Counter0x0015–0x001F—Reserved
0x0001EIPException Instruction Pointer0x0020SYSSTKSystem Stack Pointer
0x0002CONFIGConfiguration0x0021SYSTMPSystem Temporary Register
0x0003—Reserved0x0022–0x002F—Reserved
0x0004INTPENInterrupt Pending Register0x0030MPCEmulator Exception Program Counter
0x0005—Reserved0x0031MIPEmulator Exception Instruction Pointer
0x0006IEInterrupt Enable Register0x0032—Reserved
0x0007—Reserved0x0033ECOMCNTLEmulator Communication Control
0x0008FPSTFloating-Point Status0x0034ANASTA TEmulation Analysis Status Register
0x0009—Reserved0x0035–0x0038—Reserved
0x000APPERROR PP Error Register0x0039BRK1Emulation Breakpoint 1 Register
0x000B—Reserved0x003ABRK2Emulation Breakpoint 2 Register
0x0011FLTADRFaulting Address0x4000IN0PVector Load Pointer 0
0x0012FLTTAGFaulting Tag0x4001IN1PVector Load Pointer 1
0x0013FLTDTLFaulting Data (low)0x4002OUTPVector Store Pointer
0x0014FLTDTHFaulting Data (high)
Packet-Transfer Request
Register
0x0200 – 0x020FiCACHETInstruction Cache Tags 0 to 15
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
21
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP pipeline registers
The MP uses a three-stage fetch, execute, access (FEA) pipeline. The primary pipeline registers are
manipulated implicitly by branch and trap instructions and are not accessible by the user. The exception and
emulation pipeline registers are user-accessible as control registers. All pipeline registers are 32 bits.
Program Execution Mode
NormalExceptionEmulation
Program CounterPCEPCMPC
Instruction PointerIPEIPMIP
Instruction RegisterIR
• Instruction register (IR) contains the instruction being
executed.
• Instruction pointer (IP) points to the instruction being
executed.
• Program counter (PC) points to the instruction being
fetched.
• Exception/emulator instruction pointer (EIP/MIP) points to the
instruction that would have been executed had the exception /
emulation trap not occurred.
• Exception/emulator program counter (EPC/MPC) points to the
instruction to be fetched on returning from the exception/emulation
trap.
Figure 9. MP FEA Pipeline Registers
configuration (CONFIG) register (0x0002)
The CONFIG register controls or reflects the state of certain options as shown in Figure 10.
3130292827262524232221201918171615141312111
E
RTHXReservedTypeReservedReleaseReserved
Endian mode; 0 = big-endian, 1 = little-endian, read only
The IE register contains enable bits for each of the interrupts/traps as shown in Figure 11. The
global-interrupt-enable (ie) bit and the appropriate individual interrupt-enable bit must be set in order for an
interrupt to occur.
3130292827262524232221201918171615141312111
p
x4x3bppbpcm
e
PP error
pe
External interrupt 4 (LINT4
x4
x3
External interrupt 3 (EINT3
bp
Bad packet transfer
pb
Packet transfer busy
pc
Packet transfer complete
mi
MP message interrupt
p3
PP3 message interrupt
i
)
)
p3p2p1p0iomfx2x
PP2 message interrupt
p2
PP1 message interrupt
p1
PP0 message interrupt
p0
Integer overflow
io
Memory fault
mf
External interrupt 2 (EINT2
x2
x1
External interrupt 1 (EINT1
ti
MP timer interrupt
1
9876543210
0
f1f0fxfuf
ti
Frame-timer 1 interrupt
f1
Frame-timer 0 interrupt
f0
Floating-point inexact
fx
Floating-point underflow
fu
Floating-point overflow
fo
Floating-point divide-by-zero
)
)
fz
Floating-point invalid
fi
Global-interrupt enable
ie
o
fz fiie
Figure 11. IE Register
interrupt-pending (INTPEN) register (0x0004)
The bits in INTPEN register show the current state of each interrupt/trap. Pending interrupts do not occur unless
the ie bit and corresponding interrupt-enable bit are set. Software must write a 1 to the appropriate INTPEN bit
to clear an interrupt. Figure 12 shows the INTPEN register locations.
3130292827262524232221201918171615141312111
p
x4x3bppbpcm
e
PP error
pe
External interrupt 4 (LINT4
x4
x3
External interrupt 3 (EINT3
bp
Bad packet transfer
pb
Packet transfer busy
pc
Packet transfer complete
mi
MP message interrupt
p3
PP3 message interrupt
i
)
)
p3p2p1p0iomfx2x
PP2 message interrupt
p2
PP1 message interrupt
p1
PP0 message interrupt
p0
Integer overflow
io
Memory fault
mf
External interrupt 2 (EINT2
x2
x1
External interrupt 1 (EINT1
ti
MP timer interrupt
1
9876543210
0
f1f0fxfuf
ti
Frame-timer 1 interrupt
f1
Frame-timer 0 interrupt
f0
Floating-point inexact
fx
Floating-point underflow
fu
Floating-point overflow
fo
Floating-point divide-by-zero
)
)
fz
Floating-point invalid
fi
Global-interrupt enable
ie
fz fi
o
Figure 12. INTPEN Register
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
23
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
floating-point status (FPST) register (0x0008)
FPST contains status and control information for the floating-point unit (FPU) as shown in Figure 13. Bits 17–21
are read/write FPU control bits. Bits 22–26 are read/write accumulated status bits. All other bits show the status
of the last FPU instruction to complete and are read only.
3130292827262524232221201918171615141312111
dest
dest
az
ao
au
ax
sm
vm
drm
opcode
e1
azaoauaxsmfsv
ai
Destination register value
Accumulated value invalid
The bits in the PPERROR register reflect parallel processor errors (see Figure 14). The MP can use these when
a PP error interrupt occurs to determine the cause of the error.
3130292827262524232221201918171615141312111
Reserved
hPPhalted
IPP illegal instruction
fPP fault type
0icache
1Direct external access (DEA)
PKTREQ controls the submission and priority of packet-transfer requests as shown in Figure 15. It also
indicates that a packet transfer is currently active.
3130292827262524232221201918171615141312111
Reserved
IImmediate (urgent) priority selected
FHigh (foreground) priority selected
SSuspend packet transfer
QPacket transfer queued; read only
PSubmit packet-transfer request
9876543210
0
IF S Q P
Figure 15. PKTREQ Register
24
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
memory-fault registers
The five read-only memory-fault registers contain information about memory address exceptions, as shown in
Figure 16.
The ILRU and DLRU registers track least-recently-used (LRU) information for the sixteen instruction-cache and
sixteen data-cache blocks. The ITAGxx registers contain block addresses and the present flags for each
sub-block. DT AGxx registers are identical to IT AGxx registers but include dirty bits for each sub-block. Figure 17
shows the cache registers.
mru, nmru, nlru, and lru have the value 0, 1, 2, or 3 representing the block number and are mutually exclusive for each set.
Most-recently-used
Next most-recently-used
Next least-recently-used
LRU
Least-recently-used
P
Sub-block present
D
Sub-block dirty
9876543210
0
P D P D PDP D
3210
Sub-Block
Figure 17. Cache Registers
26
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP cache architecture
The MP contains two four-way set-associative, 4K caches for instructions and data. Each cache is divided into
four sets with four blocks in each set. Each block represents 256 bytes of contiguous instructions or data and
is aligned to a 256-byte address boundary. Each block is partitioned into four sub-blocks that each contain
sixteen 32-bit words and are aligned to 64-byte boundaries within the block. Cache misses cause one sub-block
to be loaded into cache. Figure 18 shows the cache architecture for one of the four sets in each cache. Figure 19
shows how addresses map into the cache using the cache tags and address bits.
LRU in SET 0
NLRU in SET 0
NMRU in SET 0
MRU in SET 0
LRU Stack for SET 0
Sub-Blocks
LRULeast-recently-used
NLRUNext least-recently-used
NMRUNext most-recently-used
MRUMost-recently-used
Block 0
Block 1
Block 2
Block 3
Tag Reg 0 (Block 0)
Tag Reg 1 (Block 1)
Set 0
Tag Reg 2 (Block 2)
Tag Reg 3 (Block 3)
Figure 18. MP Cache Architecture (x4 Sets)
32-Bit Logical Address
3130292827262524232221201918171615141312111
TTTTTTTTTTTTTTTTTTTTTSSssWWWWBB
T
On-Chip MP 4K Cache RAMS
Bank 0
Bank 1
9876543210
0
Set 0
Set 1Set 3
T – Tag Address Bitss – Sub-Block (within block) Select (0–3)B – Byte (within word) Select (0 – 3)
S – Set Select Bits (0–3)W – Word (within sub-block) Select (0–15)A – Block Select (which tag matched) (0 –3)
Set 2
11109876
SSAAss
Address in On-Chip
Cache Bank
543210
WWWWB B
Figure 19. MP Cache Addressing
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
27
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP parameter RAM
The parameter RAM is a noncachable, 2K-byte, on-chip RAM that contains MP interrupt vectors, MP-requested
TC task buffers, and a general-purpose area. Figure 20 shows the parameter RAM address map.
0x001010000–0x0101007F
0x001010800–0x010100DF
0x0010100E0–0x010100FB
0x0010100FC–0x010100FF
0x001010100–0x0101017F
0x001010180–0x0101021F
0x001010220–0x0101029F
0x0010102A0–0x010107FF
Suspended PT Parameters
(128 Bytes)
Reserved
(64 Bytes)
XPT Linked List Start Addresses
(60 Bytes)
MP Linked List Start Address
Off-Chip to Off-Chip PT Buffer
(128 Bytes)
Interrupt and Trap Vectors
(160 Bytes)
XPT Off-Chip to Off-Chip PT Buffer
(128 Bytes)
General-Purpose RAM
(3472 Bytes)
Figure 20. MP Parameter RAM
XPTf Linked List Start Add.
XPTe Linked List Start Add.
XPTd Linked List Start Add.
XPTc Linked List Start Add.
XPTb Linked List Start Add.
XPTa Linked List Start Add.
XPT9 Linked List Start Add.
XPT8 Linked List Start Add.
XPT7 Linked List Start Add.0x010100E0
XPT6 Linked List Start Add.0x010100E4
XPT5 Linked List Start Add.0x010100E8
XPT4 Linked List Start Add.0x010100EC
XPT3 Linked List Start Add.0x010100F0
XPT2 Linked List Start Add.0x010100F4
XPT1 Linked List Start Add.0x010100F8
28
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
MP interrupt vectors
Table 2 and Table 3 show the MP interrupts and traps and their vector addresses.
The three basic classes of MP instruction opcodes are: short immediate, three register, and long immediate.
Figure 21 shows the opcode structure for each class of instruction.
3127 2622 2115 140
Short
Immediate
Three
Register
Long
Immediate
Dest
3127 2622 21 20 1913 12 115 40
Dest
3127 2622 21 20 1913 12 115 40
Dest
Source 2Opcode15-Bit Immediate
Source 21 1Opcode0OptionsSource 1
Source 21 1Opcode1OptionsSource 1
32-Bit Long Immediate
Figure 21. MP Opcode Formats
MP opcode summary
Table 4 through Table 6 show the opcode formats for the MP. Table 7 summarizes the master processor
instruction set.
–Reserved bit (code as 0)MModify, write modified address back to register
A Annul delay slot instruction if branch takennRotate sense for shifting
E Emulation trap bitSZSize (0 = byte, 1 = halfword, 2 = word, 3 = doubleword)
F Clear present flagsUUnsigned form
iInvert endmask
0
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
31
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP opcode summary (continued)
Table 5. Long-Immediate and Three-Register Opcodes
–Reserved bit (code as 0)lLong immediate
D Direct external access bitMModify, write modified address back to register
E Emulation trap bitnRotate sense for shifting
F Clear present flagsSScale offset by data size
iInvert endmaskSZSize (0 = byte, 1 = halfword, 2 = word, 3 = doubleword
–Reserved bit (code as 0)PDestination precision for parallel load/store (0 = single, 1 = double)
aFloating-point accumulator selectP1Precision of source1 operand
CConstant operands rather than registerP2Precision of source2 operand
dDestination precision for vector (0 = sp, 1 = dp)PDPrecision of destination result
lLong immediate 32-bit dataRMRounding Mode (0 = N, 1 = Z, 2 = P, 3 = M)
mParallel memory operation specifierSScale offset by data size
Mem Src/DstVector store or load source/dst registerZUse 0 rather than accumulator
DestDestination register
0
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
33
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
MP opcode summary (continued)
Table 7. Summary of MP Opcodes
INSTRUCTIONDESCRIPTIONINSTRUCTIONDESCRIPTION
addSigned integer addor.ffBitwise OR with 1s complement
and.ttBitwise ANDor.ftBitwise OR with 1s complement
and.ffBitwise AND with 1s complementor.tfBitwise OR with 1s complement
and.ftBitwise AND with 1s complementrdcrRead control register
and.tfBitwise AND with 1s complementrmoRightmost one
bboBranch bit oneshift.dzShift, disable mask, zero extend
bbzBranch bit zeroshift.dmShift, disable mask, merge
fsubFloating-point subtractvmsub
illopIllegal operationvrnd(FP)Vector round with floating-point input
jsrJump and save returnvrnd(Int)Vector round with integer input
ldLoad signed into registervsubVector floating-point subtract
ld.uLoad unsigned into registerxnorBitwise exclusive NOR
lmoLeftmost onexorBitwise exclusive OR
or.ttBitwise OR
Vector floating-point multiply and add to
accumulator
Vector floating-point multiply and subtract
from accumulator
Vector floating-point subtract accumulator
from source
34
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP architecture
The parallel processor (PP) is a 32-bit integer DSP optimized for imaging and graphics applications. Each PP
can execute in parallel: a multiply, ALU operation, and two memory accesses within a single instruction. This
internal parallelism allows a single PP to achieve over 500 million operations per second for certain algorithms.
The PP has a three-input ALU that supports all 256 three input Boolean combinations and many combinations
of arithmetic and Boolean functions. Data-merging and bit-to-byte, bit-to-word, and bit-to-halfword translations
are supported by hardware in the input data path to the ALU. Typical tasks performed by a PP include:
Figure 22 shows a block diagram of a parallel processor. Key features of the PP include:
D
64-bit instruction word (supports multiple parallel operations)
D
Three-stage pipeline for fast instruction cycle
D
Numerous registers
–8 data, 10 address, 6 index registers
–20 other user-visible registers
D
Data Unit
–16 x 16 integer multiplier (optional dual 8 x 8)
–Splittable 3-input ALU
–32-bit barrel rotator
–Mask generator
–Multiple status flag expander for translations to/from 1 bit-per-pixel space.
–Conditional assignment of data unit results
–Conditional source selection
–Special processing hardware
Leftmost one/rightmost one
Leftmost bit change/rightmost bit change
D
Memory addressing
–Two address units (global and local) provide up to two 32-bit accesses in parallel with data unit
operation.
–12 addressing modes (immediate and indexed)
–Byte, halfword, and word addressability
–Scaled indexed addressing
–Conditional assignment for loads
–Conditional source selection for stores
IAPInstruction address port
LAP Local address port
GAP Global address port
32
IAP LAP GAP
37
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP registers
The PP contains many general-purpose registers, status registers, and configuration registers. All PP registers
are 32-bit registers. Figure 23 shows the accessible registers of the PP blocks.
The data unit contains eight 32-bit general-purpose data registers (d0–d7) referred to as the D registers. The
d0 register also acts as the control register for extended ALU (EALU) operations.
d0 register
Figure 24 shows the format when d0 is used as the EALU control register.
The mf register records status information from each split ALU segment for multiple arithmetic operations. The
mf register can be expanded to generate a mask for the ALU. Figure 25 shows the mf register format.
N C V Z –– –– –– ––– –– –– –– ––– –MSS RMsizeAsize
N
Negative status bit
C
Carry status bit
V
Overflow status bit
Z
Zero status bit
R
Rotation bit
MSS
Msize
Asize
mf status selection
00 – set by zero10 – set by extended result
01 – set by sign11 – reserved
Expander data size
Split ALU data size
Figure 26. sr Format
PP address-unit registers
address registers
The address unit contains ten 32-bit address registers which contain the base address for address
computations or which can be used for general-purpose data. The registers a0 – a4 are used for local-address
computations and registers a8–a12 are used for global-address computations.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
39
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
index registers
The six 32-bit index registers contain index values for use with the address registers in address computations
or they can be used for general-purpose data. Registers x0–x3 are used by the local-address unit and registers
x8–x9 are used by the global-address unit.
stack pointer (sp)
The sp contains the address of the top of the PP’s system stack. The stack pointer is addressed as a6 by the
local-address unit and as a14 by the global-address unit. Figure 27 shows the sp register format.
The zero registers are read-as-zero address registers for the local address unit (a7) and global-address unit
(a15). Writes to the registers are ignored and can be specified when operational results are to be discarded.
The loop registers control three levels of zero-overhead loops. The 32-bit loop-start registers (ls0 – ls2) and
loop-end registers (le0 – le2) contain the starting and ending addresses for the loops. The loop-counter registers
(lc0 – lc2) contain the number of repetitions remaining in their associated loops. The lr0 – lr2 registers are loop
reload registers used to support nested loops. The format for the loop-control (lctl) register is shown in Figure 29.
There are also six special write-only mappings of the loop-reload registers. The lrs0 – lrs2 codes are used for
fast initialization of ls
for single instruction-loop fast initialization.
The PFC unit contains a pointer to each stage of the PP pipeline. The pc contains the program counter which
points to the instruction being fetched. The ipa points to the instruction in the address stage of the pipeline and
the ipe points to the instruction in the execute stage of the pipeline. The instruction pointer
return-from-subroutine (iprs) register contains the return address for a subroutine call.
n
, lrn, and lcn registers for multi-instruction loops while the lrse0 – lrse2 codes are used
The interrupt-enable (inten) register allows individual interrupts to be enabled and configures the interrupt flag
(intflg) register operation. The intflg register contains the interrupt flag bits. Interrupt priority increases moving
from left to right on intflg.
00 – Most-recently-used (MRU)
10 – next LRU
01 – next MRU (NMRU)
11 – LRU
PP#PP Number (read only)
000 – PP0010 – PP2
001 – PP1011 – PP3
1xx – Not implemented
Sub-Block # 3210
Figure 33. Cache-Tag Registers
PP cache architecture
Each PP has its own 2K-byte instruction cache. Each cache is divided into four blocks and each block is divided
into four sub-blocks containing 16 64-bit instructions each. Cache misses cause one sub-block to be loaded
into cache. Figure 34 shows the cache architecture for one of the four sets in each cache. Figure 35 shows how
addresses map into the cache using the cache tags and address bits.
The parameter RAM is a 2K-byte, on-chip RAM which contains PP-interrupt vectors, PP-requested TC task
buffers, and a general-purpose area. The parameter RAM does not use the cache memory. Figure 35 shows
the parameter RAM address map.
Suspended PT Parameters
(128 Bytes)
Reserved
(120 Bytes)
DEA / Cache Fault Address0x0100#0F8–0x0100#0FB
PP Linked-List Start Address
Off-Chip to Off-Chip PT Buffer
(128 Bytes)
Interrupt Vectors
(128 Bytes)
General-Purpose RAM
(3572 Bytes Less Stack Size)
Stack
Stack State Information After Reset
(12 Bytes)
0x0100#000–0x0100#07F
0x0100#080–0x0100#0F7
0x0100#0FC–0x0100#0FF
0x0100#100–0x0100#17F
0x0100#180–0x0100#1FF
0x0100#200
Application-Dependent Boundary
0x0100#FF7
0x0100#FF4–0x0100#FFF
# – PP Number
Figure 36. PP Parameter RAM Address Map
Stack Pointer After Reset
PP-interrupt vectors
The PP interrupts and their vector addresses are shown in Table 8.
Table 8. PP-Interrupt Vectors
NAME
TASK0x0100#1B8Task Interrupt
PTQ0x0100#1C4Packet Transfer Queued
PTERR0x0100#1C8Packet-Transfer Error
PTEND0x0100#1CCPacket Transfer End
MPMSG0x0100#1D0MP Message
PP0MSG0x0100#1E0PP0 Message
PP1MSG0x0101#1E4PP1 Message
VECTOR
ADDRESS
INTERRUPT
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
43
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP data-unit architecture
The data unit has independent data paths for the ALU and the multiplier, each with its own set of hardware
functions. The multiplier data path includes a 16 × 16 multiplier, a halfword swapper, and rounding hardware.
The ALU data path includes a 32-bit three-input ALU, a barrel rotator, mask generator, multiple flag (mf)
expander, left/rightmost one and left/rightmost bit-change logic, and several multiplexers. Figure 37 shows the
data-unit block diagram.
src1/src2/dstc/ 0
dst2src3src4
src4/src20src1/0x1d0mfdst/dst1
Rotate Amount
Multiplexer
LMO, RMO,
Barrel Rotator
Multiplier
(Splittable)
Scale
Round
Swap/Merge
src1Any register, D reg only for left/right most one (LMO/RMO), left/right most bit change (LMBC/RMBC) hardware
scr2D reg or sometimes 5/32-bit immediatedst2 D reg only
scr3D reg onlydstc D reg only (destination companion reg source)
scr4D reg only0x1Constant
dst/dst1 Any register0Constant
A
N, C, V, Z, LV mf
LMBC, RMBC
B
Three-Input ALU (Splittable)
d05 LSBs of d0
Mask Generator
Multiplexer
Mask
Generator
C Port
Multiplexer
C
Expander
Barrel
Rotator Input
Sign Bit
ALU
Function
Code Logic
44
Figure 37. Data-Unit Block Diagram
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP data-unit architecture (continued)
The PP’s ALU can be split into one 32-bit ALU, two 16-bit ALUs, or four 8-bit ALUs. Figure 38 shows the multiple
arithmetic data flow for the case of a four 8-bit split of the ALU (called multiple-byte arithmetic). The ALU
operates as independent parallel ALUs where each ALU receives the same function code.
32
Rotate
Clear
8888
ABC
C-OutC-IN
8888
C, Z,
S, or
E
C-IN
Logic
ABC
C-OutC-IN
C, Z,
S, or
E
mf Register
4
Expander (Replicate)
C-IN
Logic
ABC
C-OutC-IN
C, Z,
S, or
E
C-IN
Logic
ABC
C-OutC-IN
C, Z,
S, or
E
Figure 38. Multiple-Byte Arithmetic Data Flow
PP multiplier
The PP’s hardware multiplier can perform one 16x16 multiply with a 32-bit result or two 8x8 multiplies with two
16-bit results in a single cycle. A 16x16 multiply can use signed or unsigned operands as shown in Figure 39.
When performing two simultaneous 8x8 split multiplies, the first input word contains unsigned byte operands
and the second input word contains signed or unsigned byte operands. These formats are shown in Figure 40
and Figure 41.
The program-flow-control (pfc) unit performs instruction fetching and decoding, loop control, and handshaking
with the transfer controller. The pfc unit architecture is shown in Figure 43.
The PP has a three-stage fetch, address, execute (FAE) pipeline as shown in Figure 42. The pc, ipa, and ipe
registers point to the address of the instruction in each stage of the pipeline. On each cycle in which the pipeline
advances, ipa is copied into ipe, pc is copied into ipa, and the pc is incremented by one instruction (8 bytes).
pc
Instruction
One
Two
Three
T1T2T4T3T5
FetchAddressExecute
FetchAddressExecute
Figure 42. FAE-Instruction Pipeline
ipa
ExecuteFetchAddress
ipe
46
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
PP program-flow-control unit architecture (continued)
pc
incrementer
lprs
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
Cache Controller
ipa
ipe
Loop Controller 0
ls0
le0
Comparator
lr0
lc0
decr.
zero
Tag Comparators
Tag RegistersPresent BitsLRU Stack
lctl
Loop Control
Instruction Decode
FAE Pipeline Control
Control Signal Generation
Loop Controller 1
Loop Controller 2
Figure 43. Program-Flow-Control Unit Block Diagram
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
InstructionControl
Signal
Instruction
Address
47
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP address-unit architecture
The PP has both a local- and global-address unit which operate independently of each other. The address units
support twelve different addressing modes. In place of performing a memory access, either or both of the
address units can perform an address computation that is written directly to a PP register instead of being used
for a memory access. This address unit arithmetic provides additional arithmetic operation to supplement the
data unit during compute-intensive algorithms.
From Global
Destination Bus
Offset
To Global
Source Bus
sp = a6 (local)
sp = a14 (global)
From Global
Destination Bus
Offset
To Global
Source Bus
a0–a4
(a7 = 0)
pba dba
PP-Relative
Multiplexer
32-Bit Adder/Subtracter Unit
Preindex/Postindex
Multiplexer
Local-Address Port
x0–x2
Index Multiplexer
Index Scaler
Preindex/Postindex
Figure 44. Address-Unit Architecture
Scale
Data Size
a8–a12
(a15 = 0)
pba, dba
PP-Relative
Multiplexer
32-Bit Adder/Subtracter Unit
Preindex/Postindex
Multiplexer
Global-Address Port
x8–x10
Index Multiplexer
Index Scaler
Preindex/Postindex
Scale
Data Size
48
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP instruction set
PP instructions are represented by algebraic expressions for the operations performed in parallel by the
multiplier, ALU, global-address unit, and local-address unit. The expressions use the || symbol to indicate
operations that are to be performed in parallel. The PP ALU operator syntax is shown in Table 9. The data unit
operations (multiplier and ALU) are summarized in Table 10 and the parallel transfers (global and local) are
summarized in Table 11.
Table 9. PP Operators by Precedence
OPERATORFUNCTION
src1 [n] src1–1
( )Subexpression delimiters
@mfExpander operator
%Mask generator
%%Nonmultiple mask generator (EALU only)
%!Modified mask generator (0xFFFFFFFF output for 0 input)
%%!Nonmultiple shift right mask generator (EALU only)
\\Rotate left
<<Shift left (pseudo-op for rotate and mask)
>>uUnsigned shift right
>> or >>sSigned shift right
&Bitwise AND
^Bitwise XOR
|Bitwise OR
+Addition
–Subtraction
=[cond]Conditional assignment
=[cond.pro]Conditional assignment with status protection
=Equate
Select odd (n=true) or even (n=false) register of D register pair
based on negative condition code
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
49
SMJ320C80
DIGITAL SIGNAL PROCESSOR
SGUS025 – AUGUST 1998
PP instruction set (continued)
Table 10. Summary of Data-Unit Operations
OperationBase set ALUs
DescriptionPerform an ALU operation specifying ALU function, 2 src and 1 dest operand, and operand routing. ALU function is one of
256 three-input Boolean operations or one of 16 arithmetic operations combined with one of 16 function modifiers.
Syntaxdst = [fmod] [ [[cond [.pro] ]] ] ALU_EXPRESSION
Examplesd6 = (d6 ^ d4) & d2
d3 = [nn.nv] d1 –1
OperationEALU || ROTATE
DescriptionPerform an extended ALU (EALU) operation (specified in d0) with one of two data routings to the ALU and optionally write
the barrel rotator output to a second dest register. ALU function is one of 256 Boolean or 256 arithmetic.
Syntaxdst1 = [ [[cond [.pro] ]] ] ealu (src2, [dst2 = ] [