Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of
Texas Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
†
IEEE Standard 1149.1–1990, IEEE Standard Test Access Port and Boundary-Scan Architecture
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of Texas Instruments
standard warranty. Production processing does not necessarily include
testing of all parameters.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
Copyright 1997, Texas Instruments Incorporated
1
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
description
The TMS320C80 is a single chip, MIMD parallel processor capable of performing over two billion operations
per second. It consists of a 32-bit RISC master processor with a 120-MFLOP IEEE floating-point unit, four 32-bit
parallel processing digital signal processors (DSPs), a transfer controller with up to 480M-byte/s off-chip
transfer rate, and a video controller. All the processors are coupled tightly through an on-chip crossbar that
provides shared access to on-chip RAM. This performance and programmability make the ’C80 ideally suited
for video, imaging, and high-speed telecommunications applications.
BS1–BS0I
CT2–CT0ICycle-timing selection. CT2–CT0 signals determine the timing of the current memory access.
D63–D0I/OData bus. D63–D0 transfer up to 64 bits of data per memory cycle into or out of the ’C80.
DBENO
DDINO
FAULTI
PS3–PS0I
READYI
RLO
RETRYI
STATUS5–STATUS0O
UTIMEI
CAS/DQM7–
CAS
/DQM0
DSFO
RASORow-address strobe. RAS drives the RAS inputs of DRAMs, VRAMs, and SDRAMs.
TRG/CASO
WO
†
I = input, O = output, Z = high impedance
†
LOCAL MEMORY INTERFACE
Address bus. A31– A0 output the 32-bit byte address of the external memory cycle. The address can be
multiplexed for DRAM accesses.
Address-shift selection. AS2–AS0 determine how the column address appears on the address bus. Eight
shift values are supported, including zero.
Bus-size selection. BS1–BS0 indicate the bus size of the memory or other device being accessed, allowing
dynamic bus sizing for data buses less than 64-bits wide.
Data-buffer enable. DBEN drives the active-low output-enables of bi-directional transceivers that can be
used to buffer input and output data on D63–D0.
Data-direction indicator. DDIN indicates the direction of the data that passes through the transceivers. When
is low, the transfer is from external memory into the ’C80.
DDIN
Fault. FAULT is driven low by external circuitry to inform the ’C80 that a fault has occurred on the current
memory row-access.
Page-size indication. PS3 – PS0 indicate the page size of the memory device(s) being accessed by the
current cycle. The ’C80 uses this information to determine when to begin a new row-access.
Ready. READY indicates that the external device is ready to complete the memory cycle. READY is driven
low by external circuitry to insert wait states into a memory cycle.
Row latch. The high-to-low transition of RL can be used to latch the valid 32-bit byte address that is present
on A31–A0.
Retry. RETRY is driven low by external circuitry to indicate that the addressed memory is busy. The ’C80
memory cycle is rescheduled.
Status code. At row time, STATUS5–STA TUS0 indicate the type of cycle being performed. At column time,
they identify the processor and type of request that initiated the cycle.
User-timing selection. UTIME causes the timing of RAS and CAS/DQM7–CAS/DQM0 to be modified so
that custom memory timings can be generated. During reset, UTIME
’C80 operates.
DRAM, VRAM, AND SDRAM CONTROL
Column-address strobes. CAS/DQM7–CAS/DQM0 drive the CAS inputs of DRAMs and VRAMs, or the
O
DQM input of SDRAMs. The eight strobes provide byte-write access to memory.
Special function. DSF selects special VRAM functions such as block-write, load color register, split-register
transfer, and SGRAM block write.
Transfer/output enable or column-address strobe. TRG/CAS is used as an output-enable for DRAMs and
VRAMs, and also as a transfer-enable for VRAMs. TRG
Write enable. W is driven low before CAS during write cycles. W controls the direction of the transfer during
VRAM transfer cycles.
/CAS also drives the CAS inputs of SDRAMs.
selects the endian mode in which the
12
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
DESCRIPTION
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
Terminal Functions (Continued)
TERMINAL
NAMETYPE
HACKO
HREQI
REQ1, REQ0O
CLKINI
CLKOUTO
EINT1, EINT2, EINT3I
LINT4I
RESETI
XPT2–XPT0IExternal packet transfer. XPT2–XPT0 are used by external devices to request a high-priority XPT by the TC.
EMU0, EMU1
‡
TCK
‡
TDI
TDOOTest data output. TDO provides output data for all IEEE-1149.1 instructions and data scans of the ’C80.
‡
TMS
TRST
†
I = input, O = output, Z = high impedance
‡
This pin has an internal pullup and can be left unconnnected during normal operation.
§
This pin has an internal pulldown and can be left unconnnected during normal operation.
‡
§
†
HOST INTERFACE
Host acknowledge. The ’C80 drives HACK output low following an active HREQ to indicate that it has driven
the local-memory-bus signals to the high-impedance state and is relinquishing the bus. HACK
asynchronously following HREQ
Host request. An external device drives HREQ low to request ownership of the local-memory bus. When
HREQ
is high, the ’C80 owns and drives the bus. HREQ is synchronized internally to the ’C80’s internal clock.
Also, HREQ
of RESET
occurrence on EINT3
Internal cycle request. REQ1 and REQ0 provide a two-bit code indicating the highest-priority memory-cycle
request that is being received by the TC. External logic can monitor REQ1 and REQ0 to determine if it is
necessary to relinquish the local-memory bus to the ’C80.
Input clock. CLKIN generates the internal ’C80 clocks to which all processor functions (except the frame
timers) are synchronous.
Local output clock. CLKOUT provides a way to synchronize external circuitry to internal timings. All ’C80
output signals (except the VC signals) are synchronous to this clock.
Edge-triggered interrupts. EINT1, EINT2 and EINT3 allow external devices to interrupt the master processor
(MP) on one of three interrupt levels (EINT1
EINT3
the MP to unhalt and fetch its reset vector (the EINT3
Level-triggered interrupt. LINT4 provides an active-low level-triggered interrupt to the MP. Its priority falls
below that of the edge-triggered interrupts. Any interrupt request should remain low until it is recognized by
the ’C80.
Reset. RESET is driven low to reset the ’C80 (all processors). During reset, all internal registers are set to
their initial state and all outputs are driven to their inactive or high-impedance levels. During the rising edge
of RESET
and UTIME pins, respectively.
Emulation pins. EMU0 and EMU1 are used to support emulation host interrupts, special functions targeted
I/O
at a single processor, and multiprocessor halt-event communications.
Test clock. TCK provides the clock for the ’C80 IEEE-1149.1 logic, allowing it to be compatible with other
I
IEEE-1149.1 devices, controllers, and test equipment designed for different clock rates.
IT est data input. TDI provides input data for all IEEE-1149.1 instructions and data scans of the ’C80.
IT est-mode select. TMS controls the IEEE-1149.1 state machine.
Test reset. TRST resets the ’C80 IEEE-1149.1 module. When low, all boundary-scan logic is disabled,
I
allowing normal ’C80 operation.
is used at reset to determine the power-up state of the MP. If HREQ is low at the rising edge
, the MP comes up running. If HREQ is high, the MP remains halted until the first interrupt
.
also serves as an unhalt signal. If the MP is powered-up halted, the first rising edge on EINT3 causes
, the MP reset mode and the ’C80’s operating endian mode are determined by the levels of HREQ
EMULATION CONTROL
being detected inactive, and then the ’C80 resumes driving the bus.
SYSTEM CONTROL
is the highest priority). The interrupts are rising-edge triggered.
interrupt-pending bit is not set in this case).
is driven high
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
13
TMS320C80
DESCRIPTION
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
Terminal Functions (Continued)
TERMINAL
NAMETYPE
CAREA0, CAREA1O
CBLNK0 / VBLNK0,
CBLNK1
CSYNC0 / HBLNK0,
CSYNC1
FCLK0, FCLK1I
HSYNC0,
HSYNC1
SCLK0, SCLK1I
VSYNC0,
VSYNC1
V
V
No ConnectNo connect serves as an alignment key and must be left unconnected.
FF2–FF1FF2–FF1 (GF package only) are reserved for factory use and should be left unconnected.
†
I = input, O = output, Z = high-impedance
‡
For proper operation, all VDD and VSS pins must be connected externally.
SS
DD
/ VBLNK1
/ HBLNK1
‡
‡
†
VIDEO INTERFACE
Composite area. CAREA0 and CAREA1 define a special area such as an overscan boundary . This area
represents the logical OR of the internal horizontal and vertical area signals.
Composite blanking / vertical blanking. Each of CBLNK0 / VBLNK0 and VBLNK1 provides one of two
blanking functions, depending on the configuration of the CSYNC
Composite blanking disables pixel display/capture during both horizontal and vertical retrace periods
O
I/O/Z
I/O/Z
I/O/Z
IGround. Electrical ground inputs
IPower. Nominal 3.3-V power supply inputs
and is enabled when CSYNC
Vertical blanking disables pixel display/capture during vertical retrace periods and is enabled when
HBLNK
is selected for separate-sync video systems.
Following reset, CBLNK0
respectively.
Composite sync/horizontal blanking. CSYNC0 / HBLNK0 and CSYNC1 / HBLNK1 can be programmed
for one of two functions:
Composite sync is for use on composite-sync video systems and can be programmed as an input,
output, or high-impedance signal
information from externally generated active-low sync pulses. As an output, the active-low composite
sync pulses are generated from either external HSYNC
video timers. In the high-impedance state, the pin is neither driven nor allowed to drive circuitry.
Horizontal blank disables pixel display /capture during horizontal retrace periods in separate-sync
video systems and can be used as an output only.
Immediately following reset, CSYNC0
high-impedance CSYNC0
Frame clock. FCLK0 and FCLK1 are derived from the external video system’s dotclock and are used to
drive the ’C80 video logic for frame timer 0 and frame timer 1.
Horizontal sync. HSYNC0 and HSYNC1 control the video system. They can be programmed as input,
output, or high impedance signals. As an input, HSYNC
generated horizontal sync pulses. As an output, HSYNC
by the ’C80 on-chip frame timer. In the high-impedance state, the pin is not driven, and no internal
synchronization is allowed to occur. Immediately following reset, HSYNC0
high-impedance state.
Serial-data clock. SCLK0 and SCLK1 are used by the ’C80 SRT controller to track the VRAM tap point
when using midline reload. SCLK0 and SCLK1 should be the same signals that clock the serial register
on the VRAMs controlled by frame timer 0 and frame timer 1, respectively.
Vertical sync. VSYNC0 and VSYNC1 control the video system. They can be programmed as inputs,
outputs, or high-impedance signals. As inputs, VSYNCx
generated vertical-sync pulses. As outputs, VSYNCx
’C80 on-chip frame timer. In the high-impedance state, the pin is not driven and no internal synchronization
is allowed to occur. Immediately following reset, VSYNCx
MISCELLANEOUS
is selected for composite sync video systems.
/ VBLNK0 and CBLNK1 / VBLNK1 are configured as CBLNK0 and CBLNK1,
and CSYNC1, respectively.
POWER
/HBLNK pin:
. As an input, the ’C80 extracts horizontal and vertical sync
and VSYNC signals or the ’C80’s internal
/ HBLNK0 and CSYNC1 / HBLNK1 are configured as
synchronizes the video timer to externally
is an active-low horizontal sync pulse generated
and HSYNC1 are in the
synchronizes the frame timer to externally
are active-low vertical-sync pulses generated by the
is in the high-impedance state.
14
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
architecture
Figure 1 shows the major components of the ’C80: the master processor (MP), the parallel digital signal
processors (PPs), the transfer controller ( TC), the video controller (VC), and the IEEE-1149.1 emulation
interface. Shared access to on-chip RAM is achieved through the crossbar. Crossbar connections are
represented by
instruction (I) ports. The MP can access two RAMs per cycle through its crossbar/data (C/D) and instruction
(I) ports, and the TC can access one RAM through its crossbar interface. Up to 15 simultaneous accesses are
supported in each cycle. Addresses can be changed every cycle, allowing the crossbar matrix to be changed
on a cycle-by-cycle basis. Contention between processors for the same RAM in the same cycle is resolved by
a round-robin priority scheme. In addition to the crossbar, a 32-bit datapath exists between the MP and the TC
and VC. This allows the MP to access TC and VC on-chip registers that are memory mapped into the MP
memory space.
The ’C80 has a 4G-byte address space as shown in Figure 2. The lower 32M bytes are used to address internal
RAM and memory-mapped registers.
. Each PP can perform three accesses per cycle through its local ( L), global ( G ), and
PP3
LGI
3264
32
Data RAM2
Data RAM1
Parameter RAM
LGI
3264
Data RAM0
Parameter RAM
Instruction Cache
PP2
32
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
PP1
LGI
3264
32
Data RAM2
Data RAM1
Parameter RAM
LGI
3264
Data RAM0
Parameter RAM
Instruction Cache
PP0
32
Data RAM2
Data RAM1
Data RAM0
Instruction Cache
Figure 1. Block Diagram Showing Datapaths
MP
OCR
C/DI
64
Parameter RAM
32
Data RAM2
Data RAM1
VC
32
IEEE-
1149.1
(JTAG)
64
64
TC
Data RAM0
Instruction Cache
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
15
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
architecture (continued)
PP0 Data RAM0
(2K Bytes)
PP0 Data RAM1
(2K Bytes)
PP1 Data RAM0
(2K Bytes)
PP1 Data RAM1
(2K Bytes)
PP2 Data RAM0
(2K Bytes)
PP2 Data RAM1
(2K Bytes)
PP3 Data RAM0
(2K Bytes)
PP3 Data RAM1
(2K Bytes)
Reserved
(16K Bytes)
PP0 Data RAM2
(2K Bytes)
Reserved
(2K Bytes)
PP1 Data RAM2
(2K Bytes)
Reserved
(2K Bytes)
PP2 Data RAM2
(2K Bytes)
Reserved
(2K Bytes)
PP3 Data RAM2
(2K Bytes)
Reserved
(16730112 Bytes)
PP0 Parameter RAM
(2K Bytes)
Reserved
(2K Bytes)
PP1 Parameter RAM
(2K Bytes)
Reserved
(2K Bytes)
PP2 Parameter RAM
(2K Bytes)
Reserved
(2K Bytes)
PP3 Parameter RAM
(2K Bytes)
0x00000000
0x000007FF
0x00000800
0x00000FFF
0x00001000
0x000017FF
0x00001800
0x00001FFF
0x00002000
0x000027FF
0x00002800
0x00002FFF
0x00003000
0x000037FF
0x00003800
0x00003FFF
0x00004000
0x00007FFF
0x00008000
0x000087FF
0x00008800
0x00008FFF
0x00009000
0x000097FF
0x00009800
0x00009FFF
0x0000A000
0x0000A7FF
0x0000A800
0x0000AFFF
0x0000B000
0x0000B7FF
0x0000B800
0x00FFFFFF
0x01000000
0x010007FF
0x01000800
0x01000FFF
0x01001000
0x010017FF
0x01001800
0x01001FFF
0x01002000
0x010027FF
0x01002800
0x01002FFF
0x01003000
0x010037FF
Reserved
(51200 Bytes)
MP Parameter RAM
(2K Bytes)
Reserved
(8327168 Bytes)
PP0 Instruction Cache
(2K Bytes)
Reserved
(6K Bytes)
PP1 Instruction Cache
(2K Bytes)
Reserved
(6K Bytes)
PP2 Instruction Cache
(2K Bytes)
Reserved
(6K Bytes)
PP3 Instruction Cache
(2K Bytes)
Reserved
(32K Bytes)
MP Data Cache
(4K Bytes)
Reserved
(28K Bytes)
MP Instruction Cache
(4K Bytes)
Reserved
(28K Bytes)
Memory-Mapped TC Registers
(512 Bytes)
Memory-Mapped VC Registers
(512 Bytes)
Reserved
(8327168 Bytes)
External Memory
(4064M Bytes)
0x01003800
0x0100FFFF
0x01010000
0x010107FF
0x01010800
0x018017FF
0x01801800
0x01801FFF
0x01802000
0x018037FF
0x01803800
0x01803FFF
0x01804000
0x018057FF
0x01805800
0x01805FFF
0x01806000
0x018077FF
0x01807800
0x01807FFF
0x01808000
0x0180FFFF
0x01810000
0x01810FFF
0x01811000
0x01817FFF
0x01818000
0x01818FFF
0x01819000
0x0181FFFF
0x01820000
0x018201FF
0x01820200
0x018203FF
0x01820400
0x01FFFFFF
0x02000000
0xFFFFFFFF
16
Figure 2. Memory Map
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
master processor (MP) architecture
The master processor (MP) is a 32-bit RISC processor with an integral IEEE-754 floating-point unit. The MP
is designed for effective execution of C code and is capable of performing at well over 130K dhrystones/s. Major
tasks which the MP typically performs are:
D
Task control and user interface
D
Information processing and analysis
D
IEEE-754 floating point (including graphics transforms)
MP functional block diagram
Figure 3 shows a block diagram of the master processor. Key features of the MP include:
Floating-point operation and parallel load or store
Multiply and accumulate
D
High performance
–60 million instructions per second (MIPS)
–120 million floating-point operations per second (MFLOPS)
–Over 130K dhrystones/s
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
17
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP functional block diagram (continued)
(Thirty-One 32-Bit Registers)
Register File
Barrel Rotator
Mask Generator
Zero Comparator
Integer ALU
Leftmost/Rightmost One
Timer
Control Registers
Instruction Register
Program Counters
PC Incrementer
Scoreboard
Double-Precision
Floating-Point Multiplier
(Single-Precision Core)
Double-Precision Floating-Point
Accumulators
Double-Precision
Floating-Point Adder
Emulation Logic
Instruction Cache
Controller
Crossbar Interface
Endian Multiplexers
Data/Cache
Controller
Figure 3. MP Block Diagram
MP general-purpose registers
The MP contains 31 32-bit general-purpose registers, R1–R31. Register R0 always reads as zero and writes
to it are discarded. Double precision values are always stored in an even-odd register pair with the higher
numbered register always holding the sign bit and exponent. The R0/R1 pair is not available for this use. A
scoreboard keeps track of which registers are awaiting loads or the result of a previous instruction and stalls
the instruction pipeline until the register contains valid data. As a recommended software convention, typically
R1 is used as a stack pointer and R31 as a return-address link register.
Figure 4 shows the MP general-purpose registers.
18
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
Not Available
R2, R3
R4, R5
R30, R31
Floating Point
Integer
Unsi
Bit
Integer
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP general-purpose registers (continued)
Zero/Discard
R1
R2
R3
R4
R5
••••••
R30
R31
32-Bit Registers64-Bit Register Pairs
Figure 4. MP General-Purpose Registers
The 32-bit registers can contain signed-integer, unsigned-integer, or single precision floating-point values.
Signed and unsigned bytes and halfwords are sign extended or zero-filled. Doublewords may be stored in a
64-bit even/odd register pair. Double-precision floating-point values are referenced using the even register
number or the register pair. Figure 5 through Figure 7 show the register data formats.
Single Precision
Signed 32-bit
gned 32-
31220
S E E E E E E E E M M M M M M M M M M M M M M M M M M M M M M M
MSLS
310
SIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
MSLS
310
U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U
MSLS
Figure 5. MP Register 32-Bit Data Formats
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
19
TMS320C80
Unsi
d
Halfword
Double Precision
Double Precision
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP general-purpose registers (continued)
3170
Signed Byte
Unsigned Byte
Signed Halfword
S S S S S S S S S S S S S S S S S S S S S S S SIIIIIII
S
3170
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U
31150
S S S S S S S S S S S S S S S S S IIIIIIIIIIIIIII
MSLS
MSLS
MSLS
gne
31150
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U U U U U U U U U U U U U U U U
MSLS
Figure 6. MP Register 8-Bit and 16-Bit Data
310
Odd Register
MS
310
Even RegisterLeast Significant 32-Bit Word
-
Floating-Point
Odd Register
Floating-Point
Even Register
31190
S E E E E E E E E E E E M M M M M M M M M M M M M M M M M M M M
310
M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
M
Most Significant 32-Bit Word
MS
Figure 7. MP Register 64-Bit Data
MP double-precision floating-point accumulators
LS
LS
a0
a1Accumulator 1
a2Accumulator 2
a3Accumulator 3
20
There are four double-precision floating-point registers (see Figure 8) to accumulate intermediate floating-point
results.
In addition to the general-purpose registers, there are a number of control registers that are used to represent
the state of the processor. Table 1 shows the control register numbers of the accessible registers.
Table 1. Control Register Numbers
NO.NAMEDESCRIPTIONNO.NAMEDESCRIPTION
0x0000EPCException Program Counter0x0015–0x001F—Reserved
0x0001EIPException Instruction Pointer0x0020SYSSTKSystem Stack Pointer
0x0002CONFIGConfiguration0x0021SYSTMPSystem Temporary Register
0x0003—Reserved0x0022–0x002F—Reserved
0x0004INTPENInterrupt Pending0x0030MPCEmulator Exception Program Cntr
0x0005—Reserved0x0031MIPEmulator Exception Instruction Ptr
0x0006IEInterrupt Enable0x0032—Reserved
0x0007—Reserved0x0033ECOMCNTLEmulator Communication Control
0x0008FPSTFloating-Point Status0x0034ANASTA TEmulation Analysis Status Reg
0x0013FLTDTLFaulting Data (low)0x4002OUTPV ector Store Pointer
0x0014FLTDTHFaulting Date (high)
MP pipeline registers
The MP uses a three-stage fetch, execute, access (FEA) pipeline. The primary pipeline registers are
manipulated implicitly by branch and trap instructions and are not accessible by the user. The exception and
emulation pipeline registers are user accessible as control registers. All pipeline registers are 32 bits.
Program Execution Mode
NormalExceptionEmulation
Program CounterPCEPCMPC
Instruction PointerIPEIPMIP
Instruction RegisterIR
• Instruction register (IR) contains the instruction being
executed.
• Instruction pointer (IP) points to the instruction being
executed.
• Program counter (PC) points to the instruction being
fetched.
• Exception/emulator instruction pointer (EIP/MIP) points to the
instruction that would have been executed had the exception /
emulation trap not occurred.
• Exception/emulator program counter (EPC/MPC) points to the
instruction to be fetched on returning from the exception/emulation
trap.
Figure 9. MP FEA Pipeline Registers
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
21
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
configuration (CONFIG) register (0x0002)
The CONFIG register controls or reflects the state of certain options as shown in Figure 10.
The IE register contains enable bits for each of the interrupts/traps as shown in Figure 11. The
global-interrupt-enable (ie) bit and the appropriate individual interrupt-enable bit must be set in order for an
interrupt to occur.
pe x4x3 bp pbpc mip3 p2 p1p0iomfx2 x1tif1f0fxfufofzfiie
PP2 message interrupt
PP error
pe
External interrupt 4 (LINT4
x4
x3
External interrupt 3 (EINT3
Bad packet transfer
bp
Packet transfer busy
pb
Packet transfer complete
pc
Message (MP self) interrupt
mi
PP3 message interrupt
p3
p2
PP1message interrupt
)
)
p1
PP0 message interrupt
p0
Integer overflow
io
Memory fault
mf
External interrupt 2 (EINT2
x2
x1
External interrupt 1 (EINT1
ti
MP timer interrupt
)
)
Frame-timer 1 interrupt
f1
Frame-timer 0 interrupt
f0
Floating-point inexact
fx
Floating-point underflow
fu
Floating-point overflow
fo
Floating-point divide-by-zero
fz
Floating-point invalid
fi
Global-interrupt enable
ie
Figure 11. IE Register
interrupt-pending (INTPEN) register (0x0004)
The bits in INTPEN register show the current state of each interrupt/trap. Pending interrupts do not occur unless
the ie bit and corresponding interrupt-enable bit are set. Software must write a 1 to the appropriate INTPEN bit
to clear an interrupt. Figure 12 shows the INTPEN register locations.
pe x4x3 bp pbpc mip3 p2 p1p0iomfx2 x1tif1f0fxfufofzfi
Figure 12. INTPEN Register
22
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
floating-point status register (FPST) (0x0008)
FPST contains status and control information for the FPU as shown in Figure 13. Bits 17–21 are read/write
floating-point unit (FPU) control bits. Bits 22–26 are read/write accumulated status bits. All other bits show the
status of the last FPU instruction to complete and are read only.
The bits in the PPERROR register reflect parallel processor errors (see Figure 14). The MP can use these when
a PP error interrupt occurs to determine the cause of the error.
PKTREQ controls the submission and priority of packet-transfer requests as shown in Figure 15. It also
indicates that a packet transfer is currently active.
The ILRU and DLRU registers track least-recently-used (LRU) information for the sixteen instruction-cache and
sixteen data-cache blocks. The ITAGxx registers contain block addresses and the present flags for each
sub-block. DT AGxx registers are identical to IT AGxx registers but include dirty bits for each sub-block. Figure 17
shows the cache registers.
mru, nmru, nlru, and lru have the value 0, 1, 2, or 3 representing the block number and are mutually exclusive for each set.
lru
Least-recently-used block
P
Sub-block present
D
Sub-block dirty
Figure 17. Cache Registers
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
25
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP cache architecture
The MP contains two four-way set-associative, 4K caches for instructions and data. Each cache is divided into
four sets with four blocks in each set. Each block represents 256 bytes of contiguous instructions or data and
is aligned to a 256-byte address boundary. Each block is partitioned into four sub-blocks that each contain
sixteen 32-bit words and are aligned to 64-byte boundaries within the block. Cache misses cause one sub-block
to be loaded into cache. Figure 18 shows the cache architecture for one of the four sets in each cache. Figure 19
shows how addresses map into the cache using the cache tags and address bits.
T – Tag Address Bitss – Sub-Block (within block) Select (0–3)B – Byte (within word) Select (0–3)
S – Set Select Bits (0–3)W – Word (within sub-block) Select (0–15)A – Block Select (which tag matched) (0–3)
Bank 1
Set 2
11109876
SSAAss
Address in On-Chip
Cache Bank
543210
WWWWBB
26
Figure 19. MP Cache Addressing
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP parameter RAM
The parameter RAM is a noncachable, 2K-byte, on-chip RAM which contains MP-interrupt vectors,
MP-requested TC task buffers, and a general-purpose area. Figure 20 shows the parameter RAM address map.
0x01010000–0x0101007F
0x01010080–0x010100DF
0x010100E0–0x010100FB
0x010100FC–0x010100FF
0x01010100–0x0101017F
0x01010180–0x0101021F
0x01010220–0x0101029F
0x010102A0–0x010107FF
Suspended PT Parameters
(128 Bytes)
Reserved
(96 Bytes)
XPT Linked List Start Addresses
(28 Bytes)
MP Linked List Start Address
Off-Chip to Off-Chip PT Buffer
(128 Bytes)
Interrupt and Trap Vectors
(160 Bytes)
XPT Off-Chip to Off-Chip PT Buffer
(128 Bytes)
General-Purpose RAM
(1376 Bytes)
Figure 20. MP Parameter RAM
XPT7/SOF0 Linked List Start Add. 0x010100E0
XPT6/SAM0 Linked List Start Add. 0x010100E4
XPT5/SOF1 Linked List Start Add. 0x010100E8
XPT4/SAM1 Linked List Start Add. 0x010100EC
XPT3 Linked List Start Add.0x010100F0
XPT2 Linked List Start Add.0x010100F4
XPT1 Linked List Start Add.0x010100F8
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
27
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP interrupt vectors
Table 2 and Table 3 show the MP interrupts and traps and their vector addresses.
The three basic classes of MP instruction opcodes are; short immediate, three register, and long immediate.
Figure 21 shows the opcode structure for each class of instruction.
3127 2622 2115 140
Short
Immediate
Three
Register
Long
immediate
Dest
3127 2622 21 20 1913 12 115 40
Dest
3127 2622 21 20 1913 12 115 40
Dest
Source 2Opcode15-Bit Immediate
Source 21 1Opcode0OptionsSource 1
Source 21 1Opcode1OptionsSource 1
32-Bit Long Immediate
Figure 21. MP Opcode Formats
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
29
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP opcode summary
Table 4 through Table 6 show the opcode formats for the MP. Table 7 summarizes the master processor
instruction set.
–Reserved bit (code as 0)MModify, write modified address back to register
A Annul delay slot instruction if branch takennRotate sense for shifting
E Emulation trap bitSZSize (0 = byte, 1 = halfword, 2 = word, 3 = doubleword)
FClear present flagsUUnsigned form
iInvert endmask
Dest
Source00000 00Unsigned Immediate
30
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP opcode summary (continued)
Table 5. Long-Immediate and Three-Register Opcodes
–Reserved bit (code as 0)lLong immediate
D Direct external access bitMModify, write modified address back to register
E Emulation trap bitnRotate sense for shifting
FClear present flagsSScale offset by data size
iInvert endmaskSZSize (0 = byte, 1 = halfword, 2 = word, 3 = doubleword
Mem Src/DstVector store or load source/dst registerZUse 0 rather than accumulator
Mem Src/Dst
fdivDestSource2111110011I–PDP2P1Source1
lmoDestSource111 1110 00–– –––– ––––– ––
rmoDestSource111111001–– –––––––––––
–Reserved bit (code as 0)PDest precision for parallel load/store (0 = single, 1 = double)
aFloating-point accumulator selectP1Precision of source1 operand
CConstant operands rather than registerP2Precision of source2 operand
dDestination precision for vector (0 = sp, 1 = dp)PDPrecision of destination result
lLong immediate 32-bit dataRMRounding Mode (0 = N, 1 = Z, 2 = P , 3 = M)
mParallel memory operation specifiersScale offset by data size
DestDestination register
Source2/Dest11 110– 000I– m P –d m sSource1
32
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
MP opcode summary (continued)
Table 7. Summary of MP Opcodes
INSTRUCTIONDESCRIPTIONINSTRUCTIONDESCRIPTION
addSigned integer addor.ffBitwise OR with 1s complement
and.ttBitwise ANDor.ftBitwise OR with 1s complement
and.ffBitwise AND with 1s complementor.tfBitwise OR with 1s complement
and.ftBitwise AND with 1s complementrdcrRead control register
and.tfBitwise AND with 1s complementrmoRightmost one
bboBranch bit oneshift.dzShift, disable mask, zero extend
bbzBranch bit zeroshift.dmShift, disable mask, merge
illopIllegal operationvrnd(FP)Vector round with floating-point input
jsrJump and save returnvrnd(Int)Vector round with integer input
ldLoad signed into registervsubVector floating-point subtract
ld.uLoad unsigned into registerxnorBitwise exclusive NOR
lmoLeftmost onexorBitwise exclusive OR
or.ttBitwise OR
TMS320C80
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
33
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP architecture
The parallel processor (PP) is a 32-bit integer DSP optimized for imaging and graphics applications. Each PP
can execute in parallel; a multiply, ALU operation, and two memory accesses within a single instruction. This
internal parallelism allows a single PP to achieve over 500 million operations per second for certain algorithms.
The PP has a three-input ALU that supports all 256 three input Boolean combinations and many combinations
of arithmetic and Boolean functions. Data-merging and bit-to-byte, bit-to-word, and bit-to-halfword translations
are supported by hardware in the input data path to the ALU. Typical tasks performed by a PP include:
Figure 22 shows a block diagram of a parallel processor. Key features of the PP include:
D
64-bit instruction word (supports multiple parallel operations)
D
Three-stage pipeline for fast instruction cycle
D
Numerous registers
–8 data, 10 address, 6 index registers
–20 other user-visible registers
D
Data Unit
–16x16 integer multiplier (optional dual 8x8)
–Splittable 3-input ALU
–32-bit barrel rotator
–Mask generator
–Multiple-status flag expander for translations to/from 1 bit-per-pixel space.
–Conditional assignment of data unit results
Leftmost one / rightmost one
Leftmost bit change / rightmost bit change
D
Memory addressing
–Two address units (global & local) provide up to two 32-bit accesses in parallel with data unit operation.
–12 addressing modes (immediate and indexed)
–Byte, halfword, and word addressability
–Scaled indexed addressing
–Conditional assignment for loads
–Conditional source selection for stores
D
Program flow
–Three hardware loop controllers
Zero overhead looping / branching
Nested loops
Multiple loop endpoints
–Instruction cache management
–PC mapped to register file
–Interrupts for messages and context switching
IAPInstruction address port
LAP Local address port
GAP Global address port
32
IAP LAP GAP
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP registers
The PP contains many general-purpose registers, status registers, and configuration registers. All PP registers
are 32-bit registers. Figure 23 shows the accessible registers of the PP blocks.
The data unit contains eight 32-bit general-purpose data registers (d0–d7) referred to as the D registers. The
d0 register also acts as the control register for EALU operations.
d0 register
Figure 24 shows the format when d0 is used as the EALU control register.
The mf register records status information from each split ALU segment for multiple arithmetic operations. The
mf register may be expanded to generate a mask for the ALU. Figure 25 shows the mf register format.
01 – set by sign11 – reserved
Expander data size
Split ALU data size
Figure 26. sr Format
PP address-unit registers
address registers
The address unit contains ten 32-bit address registers which contain the base address for address
computations or which can be used for general-purpose data. The registers a0 – a4 are used for local address
computations and registers a8–a12 are used for global-address computations.
38
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
index registers
The six 32-bit index registers contain index values for use with the address registers in address computations
or they can be used for general-purpose data. Registers x0–x2 are used by the local-address unit and registers
x8–x10 are used by the global-address unit.
stack pointer (sp)
The sp contains the address of the bottom of the PP’s system stack. The stack pointer is addressed as a6 by
the local-address unit and as a14 by the global-address unit. Figure 27 shows the sp register format.
The zero registers are read-as-zero address registers for the local address unit (a7) and global-address unit
(a15). Writes to the registers are ignored and can be specified when operational results are to be discarded.
Figure 28 shows the zero register format.
The loop registers control three levels of zero-overhead loops. The 32-bit loop start registers (ls0 – ls2) and
loop-end registers (le0 – le2) contain the starting and ending addresses for the loops. The loop-counter registers
(lc0 – lc2) contain the number of repetitions remaining in their associated loops. The lr0 – lr2 registers are loop
reload registers used to support nested loops. The format for the loop-control (lctl) register is shown in Figure 29.
There are also six special write-only mappings of the loop-reload registers. The lrs0 – lrs2 codes are used for
fast initialization of lsn, lrn, and lcn registers for multi-instruction loops while the lrse0 – lrse2 codes are used
for single instruction-loop fast initialization.
The PFC unit contains a pointer to each stage of the PP pipeline. The pc contains the program counter which
points to the instruction being fetched. The ipa points to the instruction in the address stage of the pipeline and
the ipe points to the instruction in the execute stage of the pipeline. The instruction pointer
return-from-subroutine (iprs) register contains the return address for a subroutine call. Figure 30 shows the
variable pipeline register format.
The interrupt-enable (inten) register allows individual interrupts to be enabled and configures the interrupt flag
(intflg) register operation. The intflg register contains the interrupt flag bits. Interrupt priority increases moving
from left to right on intflg. Figure 31 shows the PP-interrupt register format.
Each PP has its own 2K-byte instruction cache. Each cache is divided into four blocks and each block is divided
into four sub-blocks containing 16 64-bit instructions each. Cache misses cause one sub-block to be loaded
into cache. Figure 34 shows the cache architecture for one of the four sets in each cache. Figure 35 shows how
addresses map into the cache using the cache tags and address bits.
The parameter RAM is a, 2K-byte, on-chip RAM which contains PP-interrupt vectors, PP-requested TC task
buffers, and a general-purpose area. The parameter RAM does not use the cache memory. Figure 36 shows
the parameter RAM address map.
Suspended PT Parameters
(128 Bytes)
Reserved
(96 Bytes)
Restricted for Operating System Use
(24 Bytes)
Cache Fault Address
PP-Linked List Start Address
Off-Chip to Off-Chip PT Buffer
(128 Bytes)
Interrupt Vectors
(128 Bytes)
General-Purpose RAM
(1524 Bytes Less Stack Size)
Stack
Stack State Information After Reset
(12 Bytes)
Figure 36. PP Parameter RAM
0x0100#000–0x0100#07F
0x0100#080–0x0100#0DF
0x0100#0E0–0x0100#0F7
0x0100#0F8–0x0100#0FB
0x0100#0FC–0x0100#0FF
0x0100#100–0x0100#17F
0x0100#180–0x0100#1FF
0x0100#200
Application-Dependent Boundary
0x0100#7F0
0x0100#7F4–0x0100#7FF
# – PP Number
Stack Pointer After Reset
PP interrupt vectors
The PP interrupts and their vector addresses are shown in Table 8.
Task Interrupt
Packet Transfer Queued
Packet-Transfer Error
Packet-Transfer End
MP Message
PP0 Message
PP1 Message
PP2 Message
PP3 Message
INTERRUPT
42
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP data-unit architecture
The data unit has independent data paths for the ALU and the multiplier, each with its own set of hardware
functions. The multiplier data path includes a 16 × 16 multiplier, a halfword swapper, and rounding hardware.
The ALU data path includes a 32-bit three-input ALU, a barrel rotator, mask generator, mf expander,
left/rightmost one and left/rightmost bit-change logic, and several multiplexers. Figure 37 shows the data-unit
block diagram.
src1/src2/ dstc/0
dst2src3src4
src4/src20src1/0x1d0mfdst /dst1
Rotate Amount
Multiplexer
LMO, RMO,
Barrel Rotator
Multiplier
(Splittable)
Scale
Round
Swap/Merge
Legend:
src1Any register, D reg only for l/rmo, l/rmbc hardwaredst2 D reg only
scr2D reg or sometimes 15/32-bit immediatedstc D reg only (destination companion reg source)
scr3D reg only0x1Constant
scr4D reg only0Constant
dst/dst1 Any registerd05 LSBs of d0
A
N, C, V, Z, LV mf
LMBC, RMBC
B
Three-Input ALU (Splittable)
Mask Generator
Multiplexer
Mask
Generator
C Port
Multiplexer
C
Expander
Rotator Input
Function
Code Logic
Barrel
Sign Bit
ALU
Figure 37. Data Unit Block Diagram
The PP’s ALU can be split into one 32-bit ALU, two 16-bit ALUs, or four 8-bit ALUs. Figure 38 shows the multiple
arithmetic data flow for the case of a four 8-bit split of the ALU (called multiple-byte arithmetic). The ALU
operates as independent parallel ALUs where each ALU receives the same function code.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
43
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP data-unit architecture (continued)
32
ABC
C-OutC-IN
C, Z, S,
or E
PP multiplier
mf Register
4
Expander (Replicate)
ABC
C-IN
8888
Logic
C-OutC-IN
C, Z, S,
or E
C-IN
Logic
ABC
C-OutC-IN
C, Z, S,
or E
C-IN
Logic
ABC
C-OutC-IN
C, Z, S,
or E
Rotate
Clear
8888
C-IN
Logic
sr(C)
Figure 38. Multiple-Byte Arithmetic Data Flow
The PP’s hardware multiplier can perform one 16x16 multiply with a 32-bit result or two 8x8 multiplies with two
16-bit results in a single cycle. A 16x16 multiply can use signed or unsigned operands as shown in Figure 39.
When performing two simultaneous 8x8 split multiplies, the first input word contains unsigned byte operands
and the second input word contains signed or unsigned byte operands. These formats are shown in Figure 40
and Figure 41.
The PP has a three-stage fetch, address, execute (FAE) pipeline as shown in Figure 42. The pc, ipa, and ipe
registers point to the address of the instruction in each stage of the pipeline. On each cycle in which the pipeline
advances, ipa is copied into ipe, pc is copied into ipa, and the pc is incremented by one instruction (8 bytes).
The program-flow-control (pfc) unit performs instruction fetching and decoding, loop control, and handshaking
with the transfer controller. The pfc unit architecture is shown in Figure 43.
pc
Instruction
One
Two
Three
T1T2T4T3T5
FetchAddressExecute
FetchAddressExecute
ExecuteFetchAddress
Figure 42. FAE-Instruction Pipeline
ipa
ipe
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
45
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP program-flow-control unit architecture (continued)
pc
incrementer
lprs
Cache Controller
ipa
ipe
Loop Controller 0
ls0
le0
Comparator
lr0
lc0
decr.
zero
Tag Comparators
Tag RegistersPresent BitsLRU Stack
lctl
Loop Control
Instruction Decode
FAE Pipeline Control
Control Signal Generation
Loop Controller 1
InstructionControl
Loop Controller 2
Signal
Figure 43. Program-Flow-Control Unit Block Diagram
PP address-unit architecture
The PP has both a local- and global-address unit which operate independently of each other. The address units
support twelve different addressing modes. In place of performing a memory access, either or both of the
address units can perform an address computation that is written directly to a PP register instead of being used
for a memory access. This address-unit arithmetic provides additional arithmetic operation to supplement the
data unit during compute-intensive algorithms. Figure 44 shows the address-out architecture.
46
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
Instruction
Address
PP address-unit architecture (continued)
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
From Global
Destination Bus
a0–a4
(a7 = 0)
pba dba
PP-Relative
Multiplexer
Preindex/Postindex
Multiplexer
To Global
Offset
Index Scaler
32-Bit Adder/Subtracter Unit
Source Bus
Index Multiplexer
Preindex/Postindex
x0–x2
sp = a6 (local)
sp = a14 (global)
Scale
Data Size
From Global
Destination Bus
a8–a12
(a15 = 0)
PP-Relative
Multiplexer
Preindex/Postindex
Multiplexer
Offset
pba, dba
Index Scaler
32-Bit Adder/Subtracter Unit
Source Bus
Index Multiplexer
Preindex/Postindex
To Global
x8–x10
Scale
Data Size
Local-Address Port
Global-Address Port
Figure 44. Address-Unit Architecture
PP instruction set
PP instructions are represented by algebraic expressions for the operations performed in parallel by the
multiplier, ALU, global-address unit, and local-address unit. The expressions use the || symbol to indicate
operations that are to be performed in parallel. The PP ALU operator syntax is shown in Table 9. The data unit
operations (multiplier and ALU) are summarized in Table 10 and the parallel transfers (global and local) are
summarized in Table 11.
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
47
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP instruction set (continued)
OPERATORFUNCTION
src1 [n] src1–1
( )Subexpression delimiters
@mfExpander operator
%Mask generator
%%Nonmultiple mask generator (EALU only)
%!Modified mask generator (0xFFFFFFFF output for 0 input)
%%!Nonmultiple shift right mask generator (EALU only)
\\Rotate left
<<Shift left (pseudo-op for rotate and mask)
>>uUnsigned shift right
>> or >>sSigned shift right
&Bitwise AND
^Bitwise XOR
|Bitwise OR
+Addition
–Subtraction
=[cond]Conditional assignment
=[cond.pro]Conditional assignment with status protection
=Equate
Table 9. PP Operators by Precedence
Select odd (n=true) or even (n=false) register of D register pair
based on negative condition code
48
POST OFFICE BOX 1443 • HOUSTON, TEXAS 77251–1443
TMS320C80
DIGITAL SIGNAL PROCESSOR
SPRS023B – JULY 1994 – REVISED OCT OBER 1997
PP instruction set (continued)
Table 10. Summary of Data-Unit Operations
OperationBase set ALUs
DescriptionPerform an ALU operation specifying ALU function, 2 src and 1 dest operand, and operand routing. ALU function is one of
OperationEALU || ROTATE
DescriptionPerform an extended ALU (EALU) operation (specified in d0) with one of two data routings to the ALU and optionally write