The following changes have been made to this book.
Change History
DateIssueConfidentialityChange
15 May 2006AConfidentialFirst release for r0p1
22 October 2007BNon-ConfidentialFirst release for r1p2
16 June 2008CNon-Confidential Restricted Access First release for r1p3
11 September 2009DNon-ConfidentialSecond release for r1p3
20 November 2009ENon-ConfidentialDocumentation update for r1p3
Proprietary Notice
Words and logos marked with
countries, except as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may
be the trademarks of their respective owners.
®
or ™ are registered trademarks or trademarks of ARM Limited in the EU and other
Neither the whole nor any part of the information contained in, or the product described in, this document may be
adapted or reproduced in any material form except with the prior written permission of the copyright holder.
The product described in this document is subject to continuous developments and improvements. All particulars of the
product and its use contained in this document are given by ARM in good faith. However, all warranties implied or
expressed, including but not limited to implied warranties of merchantability, or fitness for purpose, are excluded.
This document is intended only to assist the reader in the use of the product. ARM shall not be liable for any loss or
damage arising from the use of any information in this document, or any error or omission in such information, or any
incorrect use of the product.
Some material in this document is based on ANSI/IEEE Std 754-1985, IEEE Standard for Binary Floating-Point Arithmetic. The IEEE disclaims any responsibility or liability resulting from the placement and use in the described
manner.
Where the term ARM is used it means “ARM or any of its subsidiaries as appropriate”.
Confidentiality Status
This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license
restrictions in accordance with the terms of the agreement entered into by ARM and the party that ARM delivered this
document to.
Unrestricted Access is an ARM internal classification.
Product Status
The information in this document is final, that is for a developed product.
Figure 1-2Processor Fetch and Decode pipeline stages .......................................................................... 1-17
Figure 1-3Cortex-R4 Issue and Execution pipeline stages ....................................................................... 1-17
Figure 1-4Cortex-R4F Issue and Execution pipeline stages ..................................................................... 1-18
Figure 2-1Byte-invariant big-endian (BE-8) format ...................................................................................... 2-6
Figure 2-2Little-endian format ..................................................................................................................... 2-6
Figure 11-3Debug ROM Address Register format .................................................................................... 11-12
Figure 11-4Debug Self Address Offset Register format ............................................................................ 11-13
Figure 11-5Debug Status and Control Register format ............................................................................. 11-14
Figure 11-6Watchpoint Fault Address Register format ............................................................................. 11-19
Figure 11-7Vector Catch Register format .................................................................................................. 11-20
Figure 11-8Debug State Cache Control Register format .......................................................................... 11-21
Figure 11-9Debug Run Control Register format ........................................................................................ 11-22
Figure 11-10Breakpoint Control Registers format ....................................................................................... 11-23
Figure 11-11Watchpoint Control Registers format ...................................................................................... 11-27
Figure 11-12OS Lock Status Register format ............................................................................................. 11-29
Figure 11-13Authentication Status Register format .................................................................................... 11-29
Figure 11-14PRCR format ........................................................................................................................... 11-30
Figure 11-15PRSR format ........................................................................................................................... 11-31
Figure 11-16Claim Tag Set Register format ................................................................................................ 11-33
Figure 11-17Claim Tag Clear Register format ............................................................................................ 11-34
Figure 11-18Lock Status Register format .................................................................................................... 11-34
Figure 11-19Device Type Register format .................................................................................................. 11-35
Figure 12-1FPU register bank ..................................................................................................................... 12-3
Figure 12-2Floating-Point System ID Register format ................................................................................. 12-5
Figure 12-3Floating-Point Status and Control Register format ................................................................... 12-6
Figure 12-4Floating-Point Exception Register format ................................................................................. 12-7
Figure 12-5MVFR0 Register format ............................................................................................................ 12-8
Figure 12-6MVFR1 Register format ............................................................................................................ 12-9
Figure 13-1ITETMIF Register bit assignments ............................................................................................ 13-7
Figure 13-2ITMISCOUT Register bit assignments ...................................................................................... 13-8
Figure 13-3ITMISCIN Register bit assignments .......................................................................................... 13-9
Figure 13-4ITCTRL Register bit assignments ............................................................................................. 13-9
This is the Technical Reference Manual (TRM) for the Cortex-R4 and Cortex-R4F processors.
In this book the generic term processor means both the Cortex-R4 and Cortex-R4F processors.
Any differences between the two processors are described where necessary.
Note
The Cortex-R4F processor is a Cortex-R4 processor that includes the optional Floating Point
Unit (FPU) extension, see Product revision information on page 1-24 for more information.
In this book, references to the Cortex-R4 processor also apply to the Cortex-R4F processor,
unless the context makes it clear that this is not the case.
The rnpn identifier indicates the revision status of the product described in this book, where:
rnIdentifies the major revision of the product.
pnIdentifies the minor revision or modification status of the product.
Using this book
This book is written for system designers, system integrators, and programmers who are
designing or programming a System-on-Chip (SoC) that uses the processor.
This book is organized into the following chapters:
Chapter 1 Introduction
Read this for an introduction to the processor and descriptions of the major
functional blocks.
Chapter 2 Programmer’s Model
Read this for a description of the processor registers and programming
information.
Chapter 3 Processor Initialization, Resets, and Clocking
Read this for a description of clocking and resetting the processor, and the steps
that the software must take to initialize the processor after reset.
Chapter 4 System Control Coprocessor
Read this for a description of the system control coprocessor registers and
programming information.
Chapter 5 Prefetch Unit
Read this for a description of the functions of the Prefetch Unit (PFU), including
dynamic branch prediction and the return stack.
Chapter 6 Events and Performance Monitor
Read this for a description of the Performance Monitoring Unit (PMU) and the
event bus.
Read this for a description of the Memory Protection Unit (MPU) and the access
permissions process.
Chapter 8 Level One Memory System
Read this for a description of the Level One (L1) memory system.
Chapter 10 Power Control
Read this for a description of the power control facilities.
Chapter 11 Debug
Read this for a description of the debug support.
Chapter 12 FPU Programmer’s Model
Read this for a description of the Floating Point Unit (FPU) support in the
Cortex-R4F processor.
Chapter 13 Integration Test Registers
Read this for a description of the Integration Test Registers, and of integration
testing of the processor with an ETM-R4 trace macrocell.
Chapter 15 AC Characteristics
Read this for a description of the timing parameters applicable to the processor.
Preface
Conventions
Chapter 14 Cycle Timings and Interlock Behavior
Read this for a description of the instruction cycle timing and instruction
interlocks.
Appendix A Processor Signal Descriptions
Read this for a description of the inputs and outputs of the processor.
Appendix B ECC Schemes
Read this for a description of how to select the Error Checking and Correction
(ECC) scheme depending on the Tightly-Coupled Memory (TCM) configuration.
Appendix C Revisions
Read this for a description of the technical changes between released issues of this
book.
Glossary Read this for definitions of terms used in this guide.
Conventions that this book can use are described in:
•Typographical
•Timing diagrams on page xix
•Signals on page xix.
Typographical
The typographical conventions are:
italicHighlights important notes, introduces special terminology, denotes
internal cross-references, and citations.
bold Highlights interface elements, such as menu names. Denotes signal
names. Also used for terms in descriptive lists, where appropriate.
Denotes text that you can enter at the keyboard, such as commands, file
and program names, and source code.
monospace
Denotes a permitted abbreviation for a command or option. You can enter
the underlined text instead of the full command or option name.
monospace italic
Denotes arguments to monospace text where the argument is to be
replaced by a specific value.
monospace bold
Denotes language keywords when used outside example code.
< and > Enclose replaceable terms for assembler syntax where they appear in code
or code fragments. For example:
MRC p15, 0 <Rd>, <CRn>, <CRm>, <Opcode_2>
Timing diagrams
The figure named Key to timing diagram conventions explains the components used in timing
diagrams. Variations, when they occur, have clear labels. You must not assume any timing
information that is not explicit in the diagrams.
Shaded bus and signal areas are undefined, so the bus or signal can assume any value within the
shaded area at that time. The actual level is unimportant and does not affect normal operation.
Clock
HIGH to LOW
Transient
HIGH/LOW to HIGH
Bus stable
Bus to high impedance
Bus change
High impedance to stable bus
Key to timing diagram conventions
Signals
The signal conventions are:
Signal level The level of an asserted signal depends on whether the signal is
active-HIGH or active-LOW. Asserted means:
•HIGH for active-HIGH signals
•LOW for active-LOW signals.
Lower-case n At the start or end of a signal name denotes an active-LOW signal.
Prefix A Denotes global Advanced eXtensible Interface (AXI) signals.
Prefix AR Denotes AXI read address channel signals.
Prefix AW Denotes AXI write address channel signals.
Prefix B Denotes AXI write response channel signals.
Prefix P Denotes Advanced Peripheral Bus (APB) signals.
The processor implements the ARMv7-R architecture and ARMv7 debug architecture. In
addition, the Cortex-R4F processor implements the VFPv3-D16 architecture. This includes the
VFPv3 instruction set.
The ARMv7-R architecture provides 32-bit ARM and 16-bit and 32-bit Thumb instruction sets,
including a range of Single Instruction, Multiple-Data (SIMD) Digital Signal Processing (DSP)
instructions that operate on 16-bit or 8-bit data values in 32-bit registers.
See the ARM Architecture Reference Manual, ARMv7-A and ARMv7-R edition for more
information on the:
This section describes the main components of the processor:
•Data Processing Unit on page 1-5
•Load/store unit on page 1-5
•Prefetch unit on page 1-5
•L1 memory system on page 1-5
•L2 AXI interfaces on page 1-7
•Debug on page 1-8
•System control coprocessor on page 1-9
•Interrupt handling on page 1-9.
Figure 1-1 shows the structure of the processor.
Introduction
ATCM
B1TCM
B0TCM
Processor
Coupled
Memory
interface
Tightly-
(TCM)
ETM
ETM
interface
Data
Prefetch Unit
Level one memory system
L1
instruction
cache control
L1
instruction
cache RAM
L2 interface
AXI
slave port
Processing
Unit
Memory
Protection
Unit
Level two interface
Debug
Debug
interface
Load/Store
Unit
L1
data cache
control
L1
data
cache RAM
L2 interface
AXI
master port
AXI slave bus
AXI master bus
Figure 1-1 Processor block diagram
The PreFetch Unit (PFU) fetches instructions from the memory system, predicts branches, and
passes instructions to the Data Processing Unit (DPU). The DPU executes all instructions and
uses the Load/Store Unit (LSU) for data memory transfers. The PFU and LSU interface to the
L1 memory system that contains L1 instruction and data caches and an interface to a L2 system.
The L1 memory can also contain optional TCM interfaces.
The DPU holds most of the program-visible state of the processor, such as general-purpose
registers, status registers and control registers. It decodes and executes instructions, operating
on data held in the registers in accordance with the ARM Architecture. Instructions are fed to
the DPU from the PFU through a buffer. The DPU performs instructions that require data to be
transferred to or from the memory system by interfacing to the LSU. See Chapter 2
Programmer’s Model for more information.
Floating Point Unit
The Floating Point Unit (FPU) is an optional part of the DPU which includes the VFP register
file and status registers. It performs floating-point operations on the data held in the VFP register
file. See Chapter 12 FPU Programmer’s Model for more information.
1.3.2Load/store unit
The LSU manages all load and store operations, interfacing with the DPU to the TCMs, caches,
and L2 memory interfaces.
1.3.3Prefetch unit
Introduction
The PFU obtains instructions from the instruction cache, the TCMs, or from external memory
and predicts the outcome of branches in the instruction stream. See Chapter 5 Prefetch Unit for
more information.
Branch prediction
The branch predictor is a global type that uses history registers and a 256-entry pattern history
table.
Return stack
The PFU includes a 4-entry return stack to accelerate returns from procedure calls. For each
procedure call, the return address is pushed onto a hardware stack. When a procedure return is
recognized, the address held in the return stack is popped, and the prefetch unit uses it as the
predicted return address.
1.3.4L1 memory system
The processor L1 memory system includes the following features:
•separate instruction and data caches
•flexible TCM interfaces
•64-bit datapaths throughout the memory system
•MPU that supports configurable memory region sizes
•export of memory attributes for L2 memory system
•parity or ECC supported on local memories.
For more information of the blocks in the L1 memory system, see:
You can configure the processor to include separate instruction and data caches. The caches
have the following features:
•Support for independent configuration of the instruction and data cache sizes between
4KB and 64KB.
•Pseudo-random cache replacement policy.
•8-word cache line length. Cache lines can be either write-back or write-through,
determined by MPU region.
•Ability to disable each cache independently.
•Streaming of sequential data from
LDM
and
LDRD
operations, and sequential instruction
fetches.
•Critical word first filling of the cache on a cache miss.
•Implementation of all the cache RAM blocks and the associated tag and valid RAM
blocks using standard ASIC RAM compilers
•Parity or ECC supported on local memories.
Memory Protection Unit
An optional MPU provides memory attributes for embedded control applications. You can
configure the MPU to have eight or twelve regions, each with a minimum resolution of 32 bytes.
MPU regions can overlap, and the highest numbered region has the highest priority.
The MPU checks for protection and memory attributes, and some of these can be passed to an
external L2 memory system.
For more information, see Chapter 7 Memory Protection Unit.
TCM interfaces
Because some applications might not respond well to caching, there are two TCM interfaces that
permit connection to configurable memory blocks of Tightly-Coupled Memory (ATCM and
BTCM). These ensure high-speed access to code or data. As an option, the BTCM can have two
memory ports for increased bandwidth.
An ATCM typically holds interrupt or exception code that must be accessed at high speed,
without any potential delay resulting from a cache miss.
A BTCM typically holds a block of data for intensive processing, such as audio or video
processing.
You can individually configure the TCM blocks at any naturally aligned address in the memory
map. Permissible TCM block sizes are:
The TCMs are external to the processor. This provides flexibility in optimizing the TCM
subsystem for performance, power, and RAM type. The INITRAMA and INITRAMB pins
enable booting from the ATCM or BTCM, respectively. Both the ATCM and BTCM support
wait states.
For more information, see Chapter 8 Level One Memory System.
Error correction and detection
To increase the tolerance of the system to soft memory faults, you can configure the caches for
either:
•parity generation and error correction/detection
•ECC code generation, single-bit error correction, and two-bit error detection.
Similarly, you can configure the TCM interfaces for:
•parity generation and error detection
•ECC code generation, single-bit error correction, and two-bit error detection.
1.3.5L2 AXI interfaces
For more information, see Chapter 8 Level One Memory System.
The L2 AXI interfaces enable the L1 memory system to have access to peripherals and to
external memory using an AXI master and AXI slave port.
AXI master interface
The AXI master interface provides a high bandwidth interface to second level caches, on-chip
RAM, peripherals, and interfaces to external memory. It consists of a single AXI port with a
64-bit read channel and a 64-bit write channel for instruction and data fetches.
The AXI master can run at the same frequency as the processor, or at a lower synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is required.
AXI slave interface
The AXI slave interface enables AXI masters, including the AXI master port of the processor,
to access data and instruction cache RAMs and TCMs on the AXI system bus. You can use this
for DMA into and out of the TCM RAMs and for software test of the TCM and cache RAMs.
The slave interface can run at the same frequency as the processor or at a lower, synchronous
frequency. If asynchronous clocking is required an external asynchronous AXI slice is required.
Bits in the Auxiliary Control Register and Slave Port Control Register can control access to the
AXI slave. Access to the TCM RAMs can be granted to any master, to only privileged masters,
or completely disabled. Access to the cache RAMs can be separately controlled in a similar way.
The processor has a CoreSight compliant Advanced Peripheral Bus version 3 (APBv3) debug
interface. This permits system access to debug resources, for example, the setting of
watchpoints and breakpoints.
The processor provides extensive support for real-time debug and performance profiling.
The following sections give an overview of debug:
•System performance monitoring
•ETM interface
•Real-time debug facilities.
System performance monitoring
This is a group of counters that you can configure to monitor the operation of the processor and
memory system. For more information, see About the PMU on page 6-6.
ETM interface
The Embedded Trace Macrocell (ETM) interface enables you to connect an external ETM unit
to the processor for real-time code tracing of the core in an embedded system.
The ETM interface collects various processor signals and drives these signals from the
processor. The interface is unidirectional and runs at the full speed of the processor. The ETM
interface connects directly to the external ETM unit without any additional glue logic. You can
disable the ETM interface for power saving. For more information, see the CoreSight ETM-R4 Technical Reference Manual.
Real-time debug facilities
The processor contains an EmbeddedICE-RT logic unit to provide real-time debug facilities. It
has:
•up to eight breakpoints
•up to eight watchpoints
•a Debug Communications Channel (DCC).
Note
The number of breakpoints and watchpoints is configured during implementation, see
Configurable options on page 1-13.
The EmbeddedICE-RT logic monitors the internal address and data buses. You access the
EmbeddedICE-RT logic through a memory-mapped APB interface.
The processor implements the ARMv7 Debug architecture, including the extensions of the
architecture to support CoreSight.
To get full access to the processor debug capability, you can access the debug register map
through the APBv3 slave port. See Chapter 11 Debug for more information on debug.
The EmbeddedICE-RT logic supports two modes of debug operation:
Halt mode On a debug event, such as a breakpoint or watchpoint, the debug logic stops the
Monitor debug mode
1.3.7System control coprocessor
The system control coprocessor provides configuration and control of the memory system and
its associated functionality. Other system-level operations, such as memory barrier instructions,
are also managed through the system control coprocessor.
Introduction
processor and forces it into debug state. This enables you to examine the internal
state of the processor, and the external state of the system, independently from
other system activity. When the debugging process completes, the processor and
system state are restored, and normal program execution resumes.
On a debug event, the processor generates a debug exception instead of entering
debug state, as in halt mode. The exception entry enables a debug monitor
program to debug the processor while enabling critical interrupt service routines
to operate on the processor. The debug monitor program can communicate with
the debug host over the DCC or any other communications interface in the
system.
For more information, see System control and configuration on page 4-4.
1.3.8Interrupt handling
Interrupt handling in the processor is compatible with previous ARM architectures, but has
several additional features to improve interrupt performance for real-time applications.
VIC port
The core has a dedicated port that enables an external interrupt controller, such as the ARM
PrimeCell Vectored Interrupt Controller (VIC), to supply a vector address along with an Interrupt Request (IRQ) signal. This provides faster interrupt entry, but you can disable it for
compatibility with earlier interrupt controllers.
If you do not have a VIC in your design, you must ensure the nIRQ and nFIQ signals are
asserted, held LOW, and remain LOW until the exception handler clears them.
Low interrupt latency
On receipt of an interrupt, the processor abandons any pending restartable memory operations.
Restartable memory operations are the multiword transfer instructions
and
Note
POP
that can access Normal memory.
LDM, LDRD, STRD, STM, PUSH
,
To minimize the interrupt latency, ARM recommends that you do not perform:
•multiple accesses to areas of memory marked as Device or Strongly Ordered
•SWP operations to slow areas of memory.
Exception processing
The ARMv7-R architecture contains exception processing instructions to reduce interrupt
handler entry and exit time: