5 December 2003CThird release. Includes r0p5 changes. Defects corrected.
26 January 2004DFourth release. Includes r0p4. Technically identical to previous release.
Proprietary Notice
Words and logos marked with
as otherwise stated below in this proprietary notice. Other brands and names mentioned herein may be the
trademarks of their respective owners.
Neither the whole nor any part of the information contained in, or the product described in, this document
may be adapted or reproduced in any material form except with the prior written permission of the copyright
holder.
®
or ™ are registered trademarks or trademarks owned by ARM Limited, except
The product described in this document is subject to continuous developments and improvements. All
particulars of the product and its use contained in this document are given by ARM in good faith. However,
all warranties implied or expressed, including but not limited to implied warranties of merchantability, or
fitness for purpose, are excluded.
This document is intended only to assist the reader in the use of the product. ARM Limited shall not be liable
for any loss or damage arising from the use of any information in this document, or any error or omission in
such information, or any incorrect use of the product.
Confidentiality Status
This document is Open Access. This document has no restriction on distribution.
Product Status
The information in this document is final, that is for a developed product.
This is the Technical Reference Manual for the ARM926EJ-S processor.
Product revision status
The rnpn identifier indicates the revision status of the product described in this manual,
where:
rnIdentifies the major revision of the product.
pnIdentifies the minor revision or modification status of the product.
Intended audience
This document has been written for experienced hardware and software engineers who
have previous experience of ARM products, and who wish to use an ARM926EJ-S
processor in their system design.
Using this manual
This document is organized into the following chapters:
Chapter 1 Introduction
Read this chapter for an overview of the ARM926EJ-S processor.
Chapter 2 Programmer’s Model
Read this chapter for details of the programmer’s model and
ARM926EJ-S registers.
Chapter 3 Memory Management Unit
Read this chapter for details of the Memory Management Unit (MMU)
and address translation process and how to use the CP15 register to
enable and disable the MMU.
Chapter 4 Caches and Write Buffer
Read this chapter for a description of the instruction cache, the data
cache, the write buffer, and the physical address tag RAM.
Chapter 5 Tightly-Coupled Memory Interface
Read this chapter for a description of the Tightly-Coupled Memory
(TCM) interface and how to use the CP15 region register to enable and
disable the caches. It includes examples on how various RAM types can
be connected.
Read this chapter for a description of the Bus Interface Unit (BIU)
interface to AMBA.
Chapter 7 Noncachable Instruction Fetches
Read this chapter for a description of how speculative noncachable
instruction fetches are used in the ARM926EJ-S processor to improve
performance.
Chapter 8 Coprocessor Interface
Read this chapter for a description of the coprocessor interface. The
chapter includes timing diagrams for coprocessor operations.
Chapter 9 Instruction Memory Barrier
Read this chapter for the Instruction Memory Barrier (IMB) description
and how IMB operations are used to ensure consistency between data and
instruction streams processed by the ARM926EJ-S processor.
Chapter 10 Embedded Trace Macrocell Support
Read this chapter to understand how Embedded Trace Macrocell (ETM)
is supported in the ARM926EJ-S processor.
Preface
Chapter 11 Debug Support
Read this chapter for a description of the debug interface and
EmbeddedICE-RT.
Chapter 12 Power Management
Read this chapter for a description of the power management facilities
provided by the ARM926EJ-S processor.
Appendix A Signal Descriptions
This appendix lists the ARM926EJ-S processor signals in functional
groups.
Appendix B CP15 Test and Debug Registers
Read this appendix for detailed information on the registers used for test
and debug.
The level of an asserted signal depends on whether the signal is active-HIGH or
active-LOW. Asserted means HIGH for active-HIGH signals and LOW for active-LOW
signals:
Prefix H Denotes Advanced High-performance Bus (AHB) signals.
Prefix n Denotes active-LOW signals except in the case of AHB or Advanced
Peripheral Bus APB reset signals. These are named HRESETn and
PRESETn respectively.
Prefix DH Denotes data side AHB signals.
Prefix IH Denotes instruction side AHB signals.
Prefix DR Denotes data side TCM interface signals.
Prefix IR Denotes instruction side TCM interface signals.
This chapter describes the ARM926EJ-S registers in CP15, the system control
coprocessor, and provides information for programming the microprocessor. It contains
the following sections:
•About the programmer’s model on page 2-2
•Summary of ARM926EJ-S system control coprocessor (CP15) registers on
The system control coprocessor (CP15) is used to configure and control the
ARM926EJ-S processor. The caches, Tightly-Coupled Memories (TCMs), Memory Management Unit (MMU), and most other system options are controlled using CP15
registers. You can only access CP15 registers with MRC and MCR instructions in a
privileged mode. CDP, LDC, STC, MCRR, and MRRC instructions, and unprivileged
MRC or MCR instructions to CP15 cause the Undefined instruction exception to be
taken.
All CP15 register bits that are defined and contain state are set to 0 by Reset except:
•The V bit is set to 0 at reset if the VINITHI signal is LOW, or 1 if the VINITHI
signal is HIGH.
•The B bit is set to 0 at reset if the BIGENDINIT signal is LOW, or 1 if the
BIGENDINIT signal is HIGH.
•The instruction TCM is enabled at reset if the INITRAM pin is HIGH. This
enables booting from the instruction TCM and sets the ITCM bit in the ITCM
region register to 1.
2.2.1Addresses in an ARM926EJ-S system
Three distinct types of address exist in an ARM926EJ-S system. Table 2-2 shows the
address types in ARM926EJ-S processor.
This is an example of the address manipulation that occurs when the ARM9EJ-S core
requests an instruction:
1.The VA of the instruction is issued by the ARM9EJ-S core.
2.The VA is translated using the FCSE PID value to the MVA. The Instruction
Cache (ICache) and Memory Management Unit (MMU) detect the MVA (see
Process ID Register c13 on page 2-33).
3.If the protection check carried out by the MMU on the MVA does not abort and
the MVA tag is in the ICache, the instruction data is returned to the ARM9EJ-S
core.
4.If the protection check carried out by the MMU on the MVA does not abort, and
the cache misses (the MVA tag is not in the cache), then the MMU translates the
MVA to produce the PA. This address is given to the AMBA bus interface to
perform an external access.
2.2.2Accessing CP15 registers
You can only access CP15 registers with MRC and MCR instructions in a privileged
mode. The instruction bit pattern of the MCR and MRC instructions is shown in
Figure 2-1 on page 2-5.
Attempting to read from a write-only register, or writing to a read-only register causes
Unpredictable results. In all instructions that access CP15:
•The Opcode_1 field Should Be Zero except when the values specified are used to
select the desired operations. Using other values results in Unpredictable
behavior.
•The Opcode_2 and CRm fields Should Be Zero except when the values specified
are used to select the desired behavior. Using other values results in Unpredictable
behavior.
Table 2-3 shows the terms and abbreviations used in this chapter.
Table 2-3 CP15 abbreviations
TermAbbreviationDescription
UnpredictableUNPFor reads: The data returned when reading from
this location is unpredictable. It can have any
value.
For writes: Writing to this location causes
unpredictable behavior, or an unpredictable
change in device configuration.
UndefinedUNDAn instruction that accesses CP15 in the manner
indicated takes the Undefined instruction
exception.
Should Be ZeroSBZWhen writing to this location, all bits of this field
Should Be OneSBOWhen writing to this location, all bits in this field
Should Be One.
Should Be Zero or
Preserved
SBZPWhen writing to this location, all bits of this field
Should Be Zero or preserved by writing the same
value that has been previously read from the same
field.
In all cases, reading from, or writing any data values to any CP15 registers, including
those fields specified as Unpredictable, Should Be One, or Should Be Zero does not
cause any physical damage to the chip.
The following registers are described in this section:
•ID Code, Cache Type, and TCM Status Registers, c0
•Control Register c1 on page 2-12
•Translation Table Base Register c2 on page 2-17
•Domain Access Control Register c3 on page 2-17
•Register c4 on page 2-18
•Fault Status Registers c5 on page 2-18
•Fault Address Register c6 on page 2-20
•Cache Operations Register c7 on page 2-20
•TLB Operations Register c8 on page 2-24
•Cache Lockdown and TCM Region Registers c9 on page 2-26
•TLB Lockdown Register c10 on page 2-32
•Register c11 and c12 on page 2-33
•Process ID Register c13 on page 2-33
•Register c14 on page 2-35
•Test and Debug Register c15 on page 2-36.
Programmer’s Model
2.3.1ID Code, Cache Type, and TCM Status Registers, c0
Register c0 accesses the ID Register, Cache Type Register, and TCM Status Registers.
Reading from this register returns the device ID, the cache type, or the TCM status
depending on the value of Opcode_2 used:
Opcode_2 = 0 ID value.
Opcode_2 = 1 instruction and data cache type.
Opcode_2 = 2 TCM status.
The CRm field Should Be Zero when reading from these registers. Table 2-4 shows the
instructions you can use to read register c0.
This is a read-only register that returns the 32-bit device ID code.
You can access the ID Code Register by reading CP15 register c0 with the Opcode_2
field set to any value other than 1 or 2. For example:
MRC p15, 0, <Rd>, c0, c0, {0, 3-7} ;returns ID
The contents of the ID Code Register are shown in Table 2-5.
Table 2-5 Register 0, ID code
Register bitsFunctionValue
[31:24]ASCII code of implementer trademark
[23:20]Variant
[19:16]Architecture (ARMv5TEJ)
[15:4]Part number
[3:0]Revision
a. The revision value can be in the range 0x0 to 0x5, depending on the
layout revision you are using..
0x41
0x0
0x6
0x926
0x05
a
Cache Type Register c0
This is a read-only register that contains information about the size and architecture of
the Instruction Cache (ICache) and Data Cache (DCache) enabling operating systems
to establish how to perform such operations as cache cleaning and lockdown.
You can access the cache type register by reading CP15 register c0 with the Opcode_2
field set to 1. For example:
Ctype The Ctype field determines the cache type. See Table 2-6.
S bit Specifies if the cache is a unified cache (S=0), or separate ICache and
DCache (S=1). If S=0, the Isize and Dsize fields both describe the unified
cache and must be identical. In the ARM926EJ-S processor, this bit is set
to a 1 to denote separate caches.
Dsize Specifies the size, line length, and associativity of the DCache, or of the
unified cache if the S bit is 0.
Isize Specifies the size, length, and associativity of the ICache, or of the
unified cache if the S bit is 0.
The Ctype field specifies if the cache supports lockdown or not, and how it is cleaned.
The encoding is shown in Table 2-6. All unused values are reserved.
Table 2-6 Ctype encoding
ValueMethodCache cleaningCache lockdown
b1110Write-backRegister 7 operations
a. See Cache Lockdown Register c9 on page 2-26 for more details on
Format C for cache lockdown.
Format C
a
The Dsize and Isize fields in the Cache Type Register have the same format. This is
shown in Figure 2-3.
11 10 96 53 2 1 0
0 0SizeAssoc M Len
Figure 2-3 Dsize and Isize field format
Size The Size field determines the cache size in conjunction with the M bit.
Assoc The Assoc field determines the cache associativity in conjunction with
the M bit.
M bit The multiplier bit determines the cache size and cache associativity
values in conjunction with the Size and Assoc fields. If the cache is
present, M must be set to 0. If the cache is absent, M must be set to 1. For
the ARM926EJ-S processor, M is always set to 0.
Len The Len field determines the line length of the cache.
The size of the cache is determined by the Size field and the M bit. The M bit is 0 for
the DCache and ICache. The Size field is bits [21:18] for the DCache and bits [9:6] for
the ICache. The minimum size of each cache is 4KB, and the maximum size is 128KB.
Table 2-7 shows the cache size encoding.
Table 2-7 Cache size encoding (M=0)
Size fieldCache size
b00114KB
b01008KB
b010116KB
b011032KB
b011164KB
b1000128KB
The associativity of the cache is determined by the Assoc field and the M bit. The M bit
is 0 for the DCache and ICache. The Assoc field is bits [17:15] for the DCache and bits
[5:3] for the ICache. Table 2-8 shows the cache associativity encoding.
The line length of the cache is determined by the Len field. The Len field is bits [13:12]
for the DCache and bits [1:0] for the ICache. Table 2-9 shows the line length encoding.
Table 2-9 Line length encoding
Len fieldCache line length
b108 words (32 bytes)
Other valuesReserved
The cache type register values for an ARM926EJ-S processor with the following
configuration are shown in Table 2-10:
•separate instruction and data caches
•DCache size = 8KB, ICache size = 16KB
•associativity = 4-way
•line length = eight words
•caches use write-back, register 7 for cache cleaning, and Format C for cache
lockdown.
See Cache Lockdown Register c9 on page 2-26 for more details on Format C for cache
lockdown.
Table 2-10 Example Cache Type Register format (continued)
FunctionRegister bitsValue
IsizeReserved[11:10]b00
Size[9:6]b0101 = 16KB
Assoc[5:3]b010 = 4-way
M[2]b0
Len[1:0]b10 = 8 words per line (32 bytes)
TCM Status Register c0
This is a read-only register that enables operating systems to establish if TCM memories
are present. See also TCM Region Register c9 on page 2-29.
You can access the TCM Status Register by reading CP15 register c0 with the Opcode_2
field set to 2. For example:
MRC p15,0,<Rd>,c0,c0,2 ;returns TCM details
The format of the TCM Status Register is shown in Figure 2-4.
3117 16 151 0
SBZ/UNP
DTCM
present
Figure 2-4 TCM Status Register format
SBZ/UNP
ITCM
present
2.3.2Control Register c1
Register c1 is the Control Register for the ARM926EJ-S processor. This register
specifies the configuration used to enable and disable the caches and MMU. It is
recommended that you access this register using a read-modify-write sequence.
For both reading and writing, the CRm and Opcode_2 fields Should Be Zero. To read
and write this register, use the instructions:
All defined control bits are set to zero on reset except the V bit and the B bit. The V bit
is set to zero at reset if the VINITHI signal is LOW, or one if the VINITHI signal is
HIGH. The B bit is set to zero at reset if the BIGENDINIT signal is LOW, or one if the
BIGENDINIT signal is HIGH.
Figure 2-5 shows the format of the Control Register.
3119 18 17 16 15 14 13 12 11 10 9 8 7 63 2 1 0
S
S
S
L4R
B
B
B
Z
O
O
V I SBZ R S BSBOC A
R
Figure 2-5 Control Register format
Table 2-11 describes the functions of the Control Register bits.
Table 2-11 Control bit functions register c1
BitNameFunction
MSBZ
[31:19]-Reserved.
When read returns an Unpredictable value.
When written Should Be Zero, or a value read from bits [31:19] on the
same processor.
Using a read-modify-write sequence when modifying this register
provides the greatest future compatibility.
[18]-Reserved, SBO. Read = 1, write = 1.
[17]-Reserved, SBZ. Read = 0, write = 0.
[16]-Reserved, SBO. Read = 1, write = 1.
[15]L4 bitDetermines if the T bit is set when load instructions change the PC:
0 = loads to PC set the T bit
1 = loads to PC do not set T bit (ARMv4 behavior).
For more details see the ARM Architecture Reference Manual.
[14]RR bitReplacement strategy for ICache and DCache:
0 = Random replacement
1 = Round-robin replacement.
Assuming that TCM regions are disabled, the caches behave as shown in Table 2-12.
CacheMMUBehavior
Programmer’s Model
Table 2-12 Effects of Control Register on caches
ICache disabledEnabled or
disabled
ICache enabledDisabledAll instruction fetches are cachable, with no protection checks. All addresses are flat
ICache enabledEnabledInstruction fetches are cachable or noncachable, and protection checks are performed.
DCache disabledEnabled or
disabled
DCache enabledDisabledAll data accesses are noncachable nonbufferable. All addresses are flat mapped. That
DCache enabledEnabledAll data accesses are cachable or noncachable, and protection checks are performed.
All instruction fetches are from external memory (AHB).
mapped. That is VA = MVA = PA.
All addresses are remapped from VA to PA, depending on the MMU page table entry.
That is, VA translated to MVA, MVA remapped to PA.
All data accesses are to external memory (AHB).
is VA = MVA = PA.
All addresses are remapped from VA to PA, depending on the MMU page table entry.
That is, VA translated to MVA, MVA remapped to PA.
If either the DCache or the ICache is disabled, then the contents of that cache are not
accessed. If the cache is subsequently re-enabled, the contents will not have changed.
To guarantee that memory coherency is maintained, the DCache must be cleaned of
dirty data before it is disabled.
The M bit of the Control Register, when combined with the En bit in the respective TCM
region register c9, directly affects the TCM interface behavior, as shown in Table 2-13.
TCMMMUCacheBehavior
Table 2-13 Effects of Control Register on TCM interface
Instruction
TCM disabled
Instruction
TCM enabled
Instruction
TCM enabled
Instruction
TCM enabled
Data TCM
disabled
Data TCM
enabled
Data TCM
enabled
DisabledICache
disabled
DisabledICache
disabled
DisabledICache
enabled
EnabledICache
enabled
DisabledDCache
disabled
DisabledDCache
disabled
DisabledDCache
enabled
All instruction fetches are from the external memory (AHB).
All instruction fetches are from the TCM interface, or from external memory
(AHB), depending on the setting of the base address in the instruction TCM
region register. No protection checks are made. All addresses are flat mapped.
That is, VA = MVA= PA.
All instruction fetches are from the TCM interface, or from the ICache,
depending on the setting of the base address in the Instruction TCM region
register. No protection checks are made. All addresses are flat mapped. That is,
VA = M VA = PA .
All instruction fetches are from the TCM interface, or from the ICache/AHB
interface, depending on the setting of the base address in the Instruction TCM
region register. Protection checks are made. All addresses are remapped from
VA to PA, depending on the page entry. That is, the VA is translated to an MVA,
and the MVA is remapped to a PA.
All data accesses are to external memory (AHB).
All data accesses are to the TCM interface, or to the external memory, depending
on the setting of the base address in the data TCM region register. No protection
checks are made. All addresses are flat mapped. That is, VA = MVA= PA.
All data accesses are to the TCM interface or to external memory, depending on
the setting of the base address in the data TCM region register. All addresses are
flat mapped. That is, VA =MVA = PA.
All data accesses are either from the TCM interface, or from the DCache/AHB
interface, depending on the setting of the base address in the data TCM region
register. Protection checks are made. All addresses are remapped from VA to PA,
depending on the page entry. That is the VA is translated to an MVA, and the
MVA is remapped to a PA.
Note
Read accesses on the TCM interface are not prevented when an ARM9EJ-S core
memory access is aborted. All reads on the TCM interface must be treated as
speculative. ARM92EJ-S processor write accesses that are aborted do not take place on
the TCM interface.
2.3.3Translation Table Base Register c2
Register c2 is the Translation Table Base Register (TTBR), for the base address of the
first-level translation table.
Reading from c2 returns the pointer to the currently active first-level translation table in
bits [31:14] and an Unpredictable value in bits [13:0].
Writing to register c2 updates the pointer to the first-level translation table from the
value in bits [31:14] of the written value. Bits [13:0] Should Be Zero.
You can use the following instructions to access the TTBR:
Accessing (reading or writing) this register causes Unpredictable behavior.
2.3.6Fault Status Registers c5
Register c5 accesses the Fault Status Registers (FSRs). The FSRs contain the source of
the last instruction or data fault. The instruction-side FSR is intended for debug
purposes only. The FSR is updated for alignment faults, and external aborts that occur
while the MMU is disabled.
Table 2-16 shows the encodings used for the status field in the FSR, and if the Domain
field contains valid information. See Fault address and fault status registers on
page 3-21 for details of MMU aborts.
Table 2-16 FSR status field encoding
PrioritySourceSizeStatusDomain
HighestAlignment-b00x1Invalid
LowestExternal abortSection or pageb10x0Invalid
2.3.7Fault Address Register c6
Register c6 accesses the Fault Address Register (FAR). The FAR contains the Modified
Virtual Address of the access being attempted when a Data Abort occurred. The FAR is
only updated for Data Aborts, not for Prefetch Aborts. The FAR is updated for
alignment faults, and external aborts that occur while the MMU is disabled.
You can use the following instructions to access the FAR:
MRC p15, 0, <Rd>, c6, c0, 0 ; read FAR
MCR p15, 0, <Rd>, c6, c0, 0 ; write FAR
Writing c6 sets the FAR to the value of the data written. This is useful for a debugger to
restore the value of the FAR to a previous state.
External abort on translationFirst level
Second level
TranslationSection
Page
DomainSection
Page
PermissionSection
Page
b1100
b1110
b0101
b0111
b1001
b1011
b1101
b1111
Invalid
Valid
Invalid
Valid
Valid
Valid
Valid
Valid
The CRm and Opcode_2 fields Should Be Zero when reading or writing CP15 c6.
2.3.8Cache Operations Register c7
Register c7 controls the caches and the write buffer. The function of each cache
operation is selected by the Opcode_2 and CRm fields in the MCR instruction used to
write to CP15 c7. Writing other Opcode_2 or CRm values is Unpredictable.
Reading from CP15 c7 is Unpredictable, with the exception of the two test and clean
operations (see Table 2-18 on page 2-22 and Test and clean operations on page 2-24).
You can use the following instruction to write to c7:
Table 2-17 Function descriptions register c7 (continued)
FunctionDescription
Prefetch ICache linePerforms an ICache lookup of the specified modified
virtual address. If the cache misses, and the region is
cachable, a linefill is performed.
Drain write bufferThis instruction acts as an explicit memory barrier. It drains
the contents of the write buffers of all memory stores
occurring in program order before this instruction is
completed. No instructions occurring in program order
after this instruction are executed until it completes. This
can be used when timing of specific stores to the level two
memory system has to be controlled (for example, when a
store to an interrupt acknowledge location has to complete
before interrupts are enabled).
Wait for interruptThis instruction drains the contents of the write buffers,
puts the processor into a low-power state, and stops it from
executing further instructions until an interrupt (or debug
request) occurs. When an interrupt does occur, the MCR
instruction completes and the IRQ or FIQ handler is entered
as normal. The return link in R14_irq or R14_fiq contains
the address of the MCR instruction plus eight, so that the
typical instruction used for interrupt return (
PC,R14,#4
) returns to the instruction following the MCR.
SUBS
Table 2-18 lists the cache operation functions and the associated data and instruction
formats for c7.
Clean and invalidate DCache entry (Set/Way)Set/Way
Test, clean, and invalidate DCache-
Drain write bufferSBZ
Wait for interruptSBZ
MCR p15, 0, <Rd>, c7, c6, 2
MCR p15, 0, <Rd>, c7, c10, 1
MCR p15, 0, <Rd>, c7, c10, 2
MRC p15, 0, <Rd>, c7, c10, 3
MCR p15, 0, <Rd>, c7, c14, 1
MCR p15, 0, <Rd>, c7, c14, 2
MRC p15, 0, <Rd>, c7, c14, 3
MCR p15, 0, <Rd>, c7, c10, 4
MCR p15, 0, <Rd>, c7, c0, 4
The MVA format for Rd for the CP15 c7 MCR operations is shown in Figure 2-9. The
Tag, Set, and Word fields define the MVA. For all of the cache operations, Word Should
Be Zero.
31S+5 S+45 42 1 0
Tag
Set (= index)WordSBZ
Figure 2-9 Register c7 MVA format
The Set/Way format for Rd for the CP15 c7 MCR operations is shown in Figure 2-10
on page 2-24, where A and S are the base-two logarithms of the associativity and the
number of sets. The Set, Way, and Word fields define the format. For all of the cache
operations, Word Should Be Zero.
For a 16KB cache, 4-way set associative, 8-word line, then:
•A = log
•S = log
associativity = log24 = 2
2
NSETS where:
2
NSETS= cache size in bytes/associativity/line length in bytes:
The test and clean DCache instruction provides an efficient way to clean the entire
DCache using a simple loop. The test and clean DCache instruction tests a number of
lines in the DCache to determine if any of them are dirty. If any dirty lines are found,
then one of those lines is cleaned. The test and clean DCache instruction also returns the
status of the entire DCache in bit 30.
Note
The test and clean DCache instruction,
MRC p15, 0, r15, c7, c10, 3
, is a special
encoding that uses r15 as a destination operand. However, the PC is not changed by
using this instruction. This MRC instruction also sets the condition code flags.
If the cache contains any dirty lines, bit 30 is set to 0. If the cache contains no dirty lines,
bit 30 is set to 1. This means that you can use the following loop to clean the entire
DCache:
tc_loop:MRC p15, 0, r15, c7, c10, 3; test and clean
BNE tc_loop
The test, clean, and invalidate DCache instruction is the same as test and clean DCache,
except that when the entire cache has been cleaned, it is invalidated. This means that
you can use the following loop to clean and invalidate the entire DCache:
tci_loop:MRC p15, 0, r15, c7, c14, 3; test clean and invalidate
BNE tci_loop
2.3.9TLB Operations Register c8
This is a write-only register used to control the Translation Lookaside Buffer (TLB).
There is a single TLB used to hold entries for both data and instructions. The TLB is
divided into two parts:
The fully-associative part (also referred to as the lockdown part of the TLB) is used to
store entries to be locked down. Entries held in the lockdown part of the TLB are
preserved during an invalidate TLB operation. Entries can be removed from the
lockdown TLB using an invalidate TLB single entry operation.
Six TLB operations are defined, and the function to be performed is selected by the
Opcode_2 and CRm fields in the MCR instruction used to write CP15 c8. Writing other
Opcode_2 or CRm values is Unpredictable. Reading from this register is Unpredictable.
You can use the instructions shown in Table 2-19 to perform TLB operations.
Invalidate instruction TLB single entry (MVA)Invalidate single entryMVA
Invalidate data TLBInvalidate set-associative TLBSBZ
Invalidate data TLB single entry (MVA)Invalidate single entryMVA
Those instructions that are intended to be used with dual TLB implementations (such as
the ARM920T core or the ARM1020T core) apply to any entry, regardless of the type
of access that caused the entry to be loaded into the TLB (see the ARM Architecture Reference Manual).
The invalidate TLB operations invalidate all the unpreserved entries in the TLB. The
invalidate TLB single entry operations invalidate any TLB entry corresponding to the
Modified Virtual Address given in Rd, regardless of its preserved state. See TLB Lockdown Register c10 on page 2-32 for a description of how to preserve entries in the
TLB.
Figure 2-11 on page 2-26 shows the Modified Virtual Address format used for
invalidate TLB single entry operations.
If either small or large pages are used, and these pages contain subpage access
permissions that are different, then you must use four invalidate TLB single entry
operations, with the MVA set to each subpage, to invalidate all information related to
that page held in a TLB.
2.3.10Cache Lockdown and TCM Region Registers c9
Register c9 accesses the Cache Lockdown and TCM Region Registers. The register
accessed is determined by the value of the CRm field:
CRm = c0 selects the Cache Lockdown Register
CRm = c1 selects the TCM Region Register.
Other values of CRm are reserved.
Cache Lockdown Register c9
The Cache Lockdown Register uses a cache-way-based locking scheme (Format C) that
enables you to control each cache way independently.
SBZ
Figure 2-11 Register c8 MVA format
These registers enable you to control which cache ways of the four-way cache are used
for the allocation on a linefill. When the registers are defined, subsequent linefills are
only placed in the specified target cache way. This gives you some control over the
cache pollution caused by particular applications, and provides a traditional lockdown
operation for locking critical code into the cache.
A locking bit for each cache way determines if the normal cache allocation is allowed
to access that cache way. See Table 2-21 on page 2-28.
A maximum of three cache ways of the four-way associative cache can be locked,
ensuring that normal cache line replacement is performed.
Note
If no cache ways have L bits set to 0, then cache way 3 is used for all linefills.
The first four bits of this register determine the L bit for the associated cache way. The
Opcode_2 field of the MRC or MCR instruction determines whether the instruction or
data lockdown register is accessed:
Opcode_2 = 0 Selects the DCache lockdown register.
Opcode_2 = 1 Selects the ICache lockdown register.
You can use the instructions shown in Table 2-20 to access the Cache Lockdown
Register.
Table 2-20 Cache Lockdown Register instructions
FunctionDataInstruction
Read DCache Lockdown RegisterL bits
Write DCache Lockdown RegisterL bits
Read ICache Lockdown RegisterL bits
Write ICache Lockdown RegisterL bits
MRC p15,0,<Rd>,c9,c0,0
MCR p15,0,<Rd>,c9,c0,0
MRC p15,0,<Rd>,c9,c0,1
MCR p15,0,<Rd>,c9,c0,1
You must only modify the Cache Lockdown Register using a read-modify-write
sequence. For example:
The format of the Cache Lockdown Register L bits is shown in Table 2-21. All cache
ways are available for allocation from reset.
Table 2-21 Cache Lockdown Register L bits
Bits4-way associativeNotes
[31:16]UNP/SBZPReserved
[15:4]
3L bit for Way 3Bits[3:0] are the L bits for each cache way:
2L bit for Way 2
1L bit for Way 1
0L bit for Way 0
0xFFF
SBO
0 = Allocation to the cache way is determined by the
standard replacement algorithm (reset state)
1 = No allocation is performed to this cache way.
You can use the cache lockdown and cache unlock procedures described in:
•Specific loading of addresses into a cache way
•Cache unlock procedure on page 2-29.
Specific loading of addresses into a cache way
The procedure to lock down code and data into way i of a cache with N ways using
Format C involves making it impossible to allocate to any cache way other than the
target cache way:
1.Ensure that no processor exceptions can occur during the execution of this
procedure, for example by disabling interrupts. If this is not possible, all code and
data used by any exception handlers must be treated as code and data as in steps
2 and 3.
2.If an ICache way is being locked down, ensure that all the code executed by the
lockdown procedure is in an uncachable area of memory (including TCM) or in
an already locked cache way.
3.If a DCache way is being locked down, ensure that all data used by the lockdown
procedure is in an uncachable area of memory (including TCM) or is in an already
locked cache way.
4.Ensure that the data/instructions that are to be locked down are in a cachable area
of memory.
5.Ensure that the data/instructions that are to be locked down are not already in the
cache. Use the register c7 clean and/or invalidate operations to ensure this.
6.Write to register c9, CRm == 0, setting L==0 for bit i and L==1 for all other ways.
7.For each of the cache lines to be locked down in cache way i:
•If a DCache is being locked down, use an LDR instruction to load a word
from the memory cache line to ensure that the memory cache line is loaded
into the cache.
•If an ICache is being locked down, use the register c7 MCR prefetch ICache
line (CRm == c13, Opcode2 == 1) to fetch the memory cache line into the
cache.
8.Write to register c9, CRm == 0 setting L == 1 for bit i and restoring all the other
bits to the values they had before the lockdown routine was started.
Cache unlock procedure
To unlock the locked down portion of the cache, write to register c9 setting L == 0 for
the appropriate bit. For example, the following sequence sets the L bit to 0 for way 0 of
the ICache, unlocking way 0:
The ARM926EJ-S processor supports physically-indexed, physically-tagged TCM.
The TCM Region Register supports one region of instruction TCM and one region of
data TCM. The minimum size of TCM region that can be supported is 4KB. The TCM
Status Register indicates if TCM memories are attached (see TCM Status Register c0 on
page 2-12). The size of each TCM region is defined by the DRSIZE and IRSIZE input
pins.
The data TCM is always disabled at reset. The instruction TCM is enabled at reset if the
INITRAM pin is HIGH. This enables booting from the instruction TCM and sets the
ITCM enable bit in the ITCM region register. You can use the TCM Region Register
instructions listed in Table 2-22.
If either the data or instruction TCM is disabled, then the contents of the respective
TCM are not accessed. If the TCM is subsequently re-enabled, the contents will not
have been changed by the ARM926EJ-S processor.
For a Harvard arrangement, the instruction-side TCM must be accessible for both reads
and writes during normal operation, and for loading code, or for debug activity. This
enables accesses to literal pools, undefined instruction emulation, and parameter
passing for SWI operations. You must insert an Instruction Memory Barrier (IMB)
between a write to the instruction TCM and the instructions being read from the
instruction TCM. See Chapter 9 Instruction Memory Barrier for more details.
Note
Instruction fetches from the data TCM are not possible. An attempt to fetch an
instruction from an address in the data TCM space does not result in an access to the
data TCM, and the instruction is fetched from main memory. These accesses can result
in external aborts, because the address range might not be supported in main memory.
The instruction TCM must not be programmed to the same base address as the data
TCM. If the two TCMs are of different sizes, the regions in physical memory must not
overlap. If they do overlap, it is Unpredictable which memory is accessed.
Note
The base address value setting must be aligned to the TCM size.
The TLB Lockdown Register controls where hardware page table walks place the TLB
entry, in the set associative region or the lockdown region of the TLB, and if in the
lockdown region, which entry is written. The lockdown region of the TLB contains
eight entries. See TLB structure on page 3-31 for a description of the structure of the
TLB.
Writing the TLB Lockdown Register with the preserve bit (P bit) set to:
1 Means subsequent hardware page table walks place the TLB entry in the
lockdown region at the entry specified by the victim, in the range 0 to 7.
0 Means subsequent hardware page table walks place the TLB entry in the
set associative region of the TLB.
TLB entries in the lockdown region are preserved so that invalidate TLB operations
only invalidate the unpreserved entries in the TLB. That is, those in the set-associative
region. Invalidate TLB single entry operations invalidate any TLB entry corresponding
to the Modified Virtual Address given in Rd, regardless of their preserved state. That is,
if they are in the lockdown or set-associative regions of the TLB. See TLB Operations Register c8 on page 2-24 for a description of the TLB invalidate operations.
The instructions you can use to program the TLB Lockdown Register are shown in
Table 2-25.
Table 2-25 Programming the TLB Lockdown Register
FunctionInstruction
Read data TLB lockdown victim
Write data TLB lockdown victim
MRC p15,0,<Rd>,c10,c0,0
MCR p15,0,<Rd>,c10,c0,0
Figure 2-14 shows the TLB Lockdown Register format.
3129 2826 2510
VictimSBZ/UNP
Figure 2-14 TLB Lockdown Register format
PSBZ
The victim automatically increments after any table walk that results in an entry being
written into the lockdown part of the TLB.
It is not possible for a lockdown entry to entirely map either small or large pages, unless
all the subpage access permissions are identical. Entries can still be written into the
lockdown region, but the address range that is mapped only covers the subpage
corresponding to the address that was used to perform the page table walk.
Example 2-1 is a code sequence that locks down an entry to the current victim.
Example 2-1 Lock down an entry to the current victim
ADR r1,LockAddr ; set r1 to the value of the address to be locked down
MCR p15,0,r1,c8,c7,1 ; invalidate TLB single entry to ensure that
; LockAddr is not already in the TLB
MRC p15,0,r0,c10,c0,0 ; read the lockdown register
ORR r0,r0,#1 ; set the preserve bit
MCR p15,0,r0,c10,c0,0 ; write to the lockdown register
LDR r1,[r1] ; TLB will miss, and entry will be loaded
MRC p15,0,r0,c10,c0,0 ; read the lockdown register (victim will have
; incremented)
BIC r0,r0,#1 ; clear preserve bit
MCR p15,0,r0,c10,c0,0 ; write to the lockdown register
2.3.12Register c11 and c12
Accessing (reading or writing) these registers causes Unpredictable behavior.
2.3.13Process ID Register c13
Register c13 accesses the process identifier registers. The register accessed depends on
the value of the Opcode_2 field:
Opcode_2 = 0 Selects the Fast Context Switch Extension (FCSE) Process
Identifier (PID) Register.
Opcode_2 = 1 Selects the Context ID Register.
You can use the process ID register to determine the process that is currently running.
The process identifier is set to 0 at reset.
Addresses issued by the ARM9EJ-S core in the range 0 to 32MB are translated in
accordance with the value contained in this register. Address A becomes A + (FCSE
PID x 32MB). It is this modified address that is seen by the caches, MMU, and TCM
interface. Addresses above 32MB are not modified. The FCSE PID is a seven-bit field,
enabling 128 x 32MB processes to be mapped.
If the FCSE PID is 0, there is a flat mapping between the virtual addresses output by the
ARM9EJ-S core and the modified virtual addresses used by the caches, MMU, and
TCM interface. The FCSE PID is set to 0 at system reset.
If the MMU is disabled, then no FCSE address translation occurs.
FCSE translation is not applied for addresses used for entry based cache or TLB
maintenance operations. For these operations VA = MVA.
Table 2-26 shows the ARM instructions that can be used to access the FCSE PID
Register.
Table 2-26 FCSE PID Register operations
FunctionDataARM Instruction
Read FCSE PIDFCSE PID
Write FCSE PIDFCSE PID
MRC p15,0,<Rd>,c13,c0, 0
MCR p15,0,<Rd>,c13,c0, 0
The format of the FCSE PID Register is shown in Figure 2-15.
3125 240
FCSE PID
SBZ
Figure 2-15 Process ID Register format
Performing a fast context switch
You can perform a fast context switch by writing to CP15 register c13 with Opcode_2
= 0. The contents of the caches and the TLB do not have to be flushed after a fast context
switch because they still hold valid address tags. The two instructions after the FCSE
PID has been written have been fetched with the old FCSE PID, as the following code
example shows:
MOV r0, #1:SHL:25;Fetched with FCSE PID = 0
MCR p15,0,r0,c13,c0,0;Fetched with FCSE PID = 0
A1;Fetched with FCSE PID = 0
A2;Fetched with FCSE PID = 0
A3;Fetched with FCSE PID = 1
Where A1, A2, and A3 are the three instructions following the fast context switch.
Context ID Register
The Context ID Register provides a mechanism to allow real-time trace tools to identify
the currently executing process in multi-tasking environments.
The contents of this register are replicated on the ETMPROCID pins of the
ARM926EJ-S processor. ETMPROCIDWR is pulsed when a write occurs to the
Context ID Register.
Table 2-27 shows the ARM instructions that you can use to access the Context ID
Register.
Table 2-27 Context ID register operations
FunctionDataARM Instruction
Read context IDContext ID
Write context IDContext ID
MRC p15,0,<Rd>,c13,c0, 1
MCR p15,0,<Rd>,c13,c0, 1
The format of the Context ID Register, Rd, transferred during this operation is shown
in Figure 2-16.
310
Context identifier
Figure 2-16 Context ID Register format
2.3.14Register c14
Accessing (reading or writing) this register is reserved.
You can use register c15 to provide device-specific test and debug operations in
ARM926EJ-S processors. Appendix B CP15 Test and Debug Registers describes the
registers and functions available using CP15 c15.This register is defined to be reserved
for implementation-defined purposes in the ARM Architecture Reference Manual. If
you write software that uses the device-specific facilities provided by c15, then this
software is unlikely to be either backwards or forwards compatible.
The ARM926EJ-S MMU is an ARM architecture v5 MMU. It provides virtual memory
features required by systems operating on platforms such as Symbian OS, WindowsCE,
and Linux. A single set of two-level page tables stored in main memory is used to
control the address translation, permission checks, and memory region attributes for
both data and instruction accesses.
The MMU uses a single unified Translation Lookaside Buffer (TLB) to cache the
information held in the page tables.
To support both sections and pages, there are two levels of address translation. The
MMU puts the translated physical addresses into the MMU Translation Lookaside
Buffer TLB.
The MMU TLB has two parts:
•the main TLB
•the lockdown TLB.
The main TLB is a two-way, set-associative cache for page table information. It has 32
entries per way for a total of 64 entries. The lockdown TLB is an eight-entry
fully-associative cache that contains locked TLB entries. Locking TLB entries can
ensure that a memory access to a given region never incurs the penalty of a page table
walk. For more details of the TLBs see TLB structure on page 3-31.
The MMU features are:
•standard ARM architecture v4 and v5 MMU mapping sizes, domains, and access
protection scheme
•mapping sizes are 1MB (sections), 64KB (large pages), 4KB (small pages), and
1KB (tiny pages)
•access permissions for large pages and small pages can be specified separately for
each quarter of the page (subpage permissions)
•hardware page table walks
•invalidate entire TLB using CP15 c8
•invalidate TLB entry selected by MVA, using CP15 c8
For large and small pages, access permissions are defined for each subpage (1KB for
small pages, 16KB for large pages). Sections and tiny pages have a single set of access
permissions.
All regions of memory have an associated domain. A domain is the primary access
control mechanism for a region of memory. It defines the conditions necessary for an
access to proceed. The domain determines if:
•access permissions are used to qualify the access
•the access is unconditionally allowed to proceed
•the access is unconditionally aborted.
In the latter two cases, the access permission attributes are ignored.
There are 16 domains. These are configured using the domain access control register
(see Domain Access Control Register c3 on page 2-17).
3.1.2Translated entries
The main TLB caches 64 translated entries. If, during a memory access, the main TLB
contains a translated entry for the MVA, the MMU reads the protection data to detrmine
if the access is permitted:
Memory Management Unit
•if access is permitted and an off-chip access is required, the MMU outputs the
appropriate physical address corresponding to the MVA
•if access is permitted and an off-chip access is not required, the cache or TCM
services the access
•if access is not permitted, the MMU signals the CPU core to abort.
If the TLB misses (it does not contain an entry for the MVA) the translation table walk
hardware is invoked to retrieve the translation information from a translation table in
physical memory. When retrieved, the translation information is written into the TLB,
possibly overwriting an existing value.
To enable use of TLB locking features, the location to be written can be specified using
CP15 c10 TLB Lockdown Register.
At reset the MMU is turned off, no address mapping occurs, and all regions are marked
as noncachable and nonbufferable.
Table 3-1 shows the CP15 registers that are used in conjunction with page table
descriptors stored in memory to determine the operation of the MMU.
RegisterBitsRegister description
Control register c1M, A, S, RContains bits to enable the MMU (M bit), enable data address alignment
checks (A bit), and to control the access protection scheme (S bit and R
bit).
Table 3-1 MMU program-accessible CP15 registers
Translation table
base register c2
Domain access
control register
c3
Fault status
registers, IFSR
and DFSR, c5
Fault address
register c6
TLB operations
register c8
TLB lockdown
register c10
[31:14]Holds the physical address of the base of the translation table
maintained in main memory. This base address must be on a 16KB
boundary.
[31:0]Comprises 16 two-bit fields. Each field defines the access control
attributes for one of 16 domains (D15 to D0).
[7:0]Indicates the cause of a Data or Prefetch Abort, and the domain number
of the aborted access, when an abort occurs. Bits [7:4] specify which of
the 16 domains (D15 to D0) was being accessed when a fault occurred.
Bits [3:0] indicate the type of access being attempted. The value of all
other bits is Unpredictable. The encoding of these bits is shown in
Table 3-9 on page 3-22.
[31:0]Holds the MVA associated with the access that caused the Data Abort.
See Table 3-9 on page 3-22 for details of the address stored for each
type of fault. The ARM9EJ-S register R14_abt holds the VA associated
with a Prefetch Abort.
[31:0]This register is used to perform TLB maintenance operations. These are
either invalidating all the (unpreserved) entries in the TLB, or
invalidating a specific entry.
[28:26] and
[0]
Enables specific page table entries to be locked into the TLB. Locking
entries in the TLB guarantees that accesses to the locked page or section
can proceed without incurring the time penalty of a TLB miss. This
enables the execution latency for time-critical pieces of code such as
interrupt handlers to be minimized.
All the CP15 MMU registers, except c8, contain state that can be read using MRC
instructions, and written using MCR instructions. Registers c5 and c6 are also written
by the MMU during an abort. Writing to c8 causes the MMU to perform a TLB
operation, to manipulate TLB entries. This register is write-only.
The CP15 registers are described in Chapter 2 Programmer’s Model.
The VA generated by the CPU core is converted to a Modified Virtual Address (MVA)
by the FCSE using the value held in CP15 c13. The MMU translates MVAs into
physical addresses to access external memory, and also performs access permission
checking.
The MMU table-walking hardware is used to add entries to the TLB. The translation
information that comprises both the address translation data and the access permission
data resides in a translation table located in physical memory. The MMU provides the
logic for automatically traversing this translation table and loading entries into the TLB.
The number of stages in the hardware table walking and permission checking process
is one or two depending on whether the address is marked as a section-mapped access
or a page-mapped access.
There are three sizes of page-mapped accesses and one size of section-mapped access.
Page-mapped accesses are for:
•large pages
•small pages
•tiny pages.
The translation process always begins in the same way, with a level one fetch. A
section-mapped access requires only a level one fetch, but a page-mapped access
requires an additional level two fetch.
The hardware translation process is initiated when the TLB does not contain a
translation for the requested MVA. The Translation Table Base Register (TTBR), CP15
register c2, points to the base address of a table in physical memory that contains section
or page descriptors, or both. The 14 low-order bits [13:0] of the TTBR are
Unpredictable on a read, and the table must reside on a 16KB boundary. Figure 3-1
shows the format of the TTBR.
3114 130
The translation table has up to 4096 x 32-bit entries, each describing 1MB of virtual
memory. This enables up to 4GB of virtual memory to be addressed.
Figure 3-2 on page 3-7 shows the table walk process.
This address selects a 4-byte translation table entry. This is a first-level descriptor for
either a section or a page table.
3.2.3First-level descriptor
The first-level descriptor returned is a section descriptor, a coarse page table descriptor,
or a fine page table descriptor, or is invalid. Figure 3-4 on page 3-9 shows the format of
a first-level descriptor.
Section descriptor bit assignments are described in Table 3-4.
Table 3-4 Section descriptor bits
BitsDescription
[31:20]Form the corresponding bits of the physical address for a section
[19:12]Always written as 0
[11:10]The AP bits specify the access permissions for this section
[9]Always written as 0
[8:5]Specify one of the 16 possible domains (held in the domain access control register)
that contain the primary access controls
[4]Should be written as 1, for backwards compatibility
[3:2]These bits (C and B) indicate if the area of memory mapped by this section is
treated as write-back cachable, write-through cachable, noncached buffered, or
noncached nonbuffered
[1:0]These bits must be 10 to indicate a section descriptor
3.2.5Coarse page table descriptor
A coarse page table descriptor provides the base address of a page table that contains
second-level descriptors for either large page or small page accesses. Coarse page tables
have 256 entries, splitting the 1MB that the table describes into 4KB blocks. Figure 3-6
shows the format of a coarse page table descriptor.
311098543210
S
Domain1 SBZ 0
B
Z
Figure 3-6 Coarse page table descriptor
1Coarse page table base address
Note
If a coarse page table descriptor is returned from the first-level fetch, a second-level
fetch is initiated.
Coarse page table descriptor bit assignments are described in Table 3-5.
BitsDescription
[31:10]These bits form the base for referencing the second-level descriptor (the coarse
page table index for the entry is derived from the MVA)
[9]Always written as 0
[8:5]These bits specify one of the 16 possible domains (held in the domain access
control registers) that contain the primary access controls
[4]Always written as 1
[3:2]Always written as 0
[1:0]These bits must be 01 to indicate a coarse page table descriptor
3.2.6Fine page table descriptor
A fine page table descriptor provides the base address of a page table that contains
second-level descriptors for large page, small page, or tiny page accesses. Fine page
tables have 1024 entries, splitting the 1MB that the table describes into 1KB blocks.
Figure 3-7 shows the format of a fine page table descriptor.
Table 3-5 Coarse page table descriptor bits
311211 98543210
SBZDomain1 SBZ 1
Figure 3-7 Fine page table descriptor
1Fine page table base address
Note
If a fine page table descriptor is returned from the first-level fetch, a second-level fetch
is initiated.
If the first-level fetch returns either a coarse page table descriptor or a fine page table
descriptor, this provides the base address of the page table to be used. The page table is
then accessed and a second-level descriptor is returned. Figure 3-9 on page 3-15 shows
the format of second-level descriptors.
A second-level descriptor defines a tiny, a small, or a large page descriptor, or is invalid:
•a large page descriptor provides the base address of a 64KB block of memory
•a small page descriptor provides the base address of a 4KB block of memory
•a tiny page descriptor provides the base address of a 1KB block of memory.
Coarse page tables provide base addresses for either small or large pages. Large page
descriptors must be repeated in 16 consecutive entries. Small page descriptors must be
repeated in each consecutive entry.
Fine page tables provide base addresses for large, small, or tiny pages. Large page
descriptors must be repeated in 64 consecutive entries. Small page descriptors must be
repeated in four consecutive entries and tiny page descriptors must be repeated in each
consecutive entry.
Second-level descriptor bit assignments are described in Table 3-7.
Table 3-7 Second-level descriptor bits
Bits
Description
LargeSmallTiny
[31:16][31:12][31:10]These bits form the corresponding bits of the physical
Figure 3-10 Large page translation from a coarse page table
Because the upper four bits of the page index and low-order four bits of the coarse page
table index overlap, each coarse page table entry for a large page must be duplicated 16
times (in consecutive memory locations) in the coarse page table.
If a large page descriptor is included in a fine page table, the high-order six bits of the
page index and low-order six bits of the fine page table index overlap. Each fine page
table entry for a large page must therefore be duplicated 64 times.
Figure 3-11 shows the complete translation sequence for a 4KB small page.
3120 1912 110
Table index
Translation table base
3114 130
Translation base
3114 132 1 0
Translation base
First-level descriptor
311098543210
Coarse page table base address
Table index0 0
Domain 10 1
Modified virtual address
Level two
table index
Page index
3110 92 1 0
Coarse page table base address
311211109876543210
Page base address
3112 110
Page base address
Second-level descriptor
Physical address
L2 table index0 0
AP3 AP2 AP1 AP0 C B 1 0
Page index
Figure 3-11 Small page translation from a coarse page table
If a small page descriptor is included in a fine page table, the upper two bits of the page
index and low-order two bits of the fine page table index overlap. Each fine page table
entry for a small page must therefore be duplicated four times.
Figure 3-12 shows the complete translation sequence for a 1KB tiny page.
3120 1910 90
Table index
Translation table base
3114 130
Translation base
3114 132 1 0
Translation base
First-level descriptor
311298543210
Fine page table base address
Table index0 0
11
Domain 11 1
Modified virtual address
Level two
table index
Memory Management Unit
Page index
3112 112 1 0
Fine page table base address
311096543210
3110 90
Second-level descriptor
Page base address
Physical address
Page base address
L2 table index0 0
AP C B 1 1
Page index
Figure 3-12 Tiny page translation from a fine page table
Page translation involves one additional step beyond that of a section translation. The
first-level descriptor is the fine page table descriptor and this is used to point to the
first-level descriptor.
The domain specified in the first-level description and access permissions specified in
the first-level description together determine whether the access has permissions to
proceed. See section Domain access control on page 3-24 for details.
Subpages
You can define access permissions for subpages of small and large pages. If, during a
page table walk, a small or large page has a different subpage permission, only the
subpage being accessed is written into the TLB. For example, a 16KB (large page)
subpage entry is written into the TLB if the subpage permission differs, and a 64KB
entry is put in the TLB if the subpage permissions are identical.
When you use subpage permissions, and the page entry then has to be invalidated, you
must invalidate all four subpages separately.
The MMU generates an abort on the following types of faults:
•alignment faults (data accesses only)
•translation faults
•domain faults
•permission faults.
In addition, an external abort can be raised by the external system. This can happen only
for access types that have the core synchronized to the external system:
•page walks
•noncached reads
•nonbuffered writes
•noncached read-lock-write sequence (SWP).
Alignment fault checking is enabled by the A bit in CP15 c1. Alignment fault checking
is not affected by whether or not the MMU is enabled. Translation, domain, and
permission faults are only generated when the MMU is enabled.
The access control mechanisms of the MMU detect the conditions that produce these
faults. If a fault is detected as a result of a memory access, the MMU aborts the access
and signals the fault condition to the CPU core. The MMU retains status and address
information about faults generated by the data accesses in the data fault status register
and fault address register (see Fault address and fault status registers).
Memory Management Unit
The MMU also retains status about faults generated by instruction fetches in the
instruction fault status register.
Note
The address information for an instruction side abort is contained in the core link
register r14_abt.
An access violation for a given memory access inhibits any corresponding external
access to the AHB interface, with an abort returned to the CPU core.
3.3.1Fault address and fault status registers
On a Data Abort, the MMU places an encoded four-bit value, the fault status, along with
the four-bit encoded domain number, in the data FSR. Similarly, on a Prefetch Abort, in
the instruction FSR (intended for debug purposes only). In addition, the MVA
associated with the Data Abort is latched into the FAR. If an access violation
simultaneously generates more than one source of abort, they are encoded in the priority
given in Table 3-9. The FAR is not updated by faults caused by instruction prefetches.
Table 3-9 shows the various access permissions and controls supported by the data
MMU, and how these are interpreted to generate faults.
Table 3-9 Priority encoding of fault status
PrioritySourceSizeStatusDomain
HighestAlignment-b00x1Invalid
External abort on translationFirst level
Second level
TranslationSection
Page
DomainSection
Page
PermissionSection
Page
LowestExternal abortSection or pageb10x0 Invalid
b1100
b1110
b0101
b0111
b1001
b1011
b1101
b1111
Invalid
Valid
Invalid
Valid
Valid
Valid
Valid
Valid
Note
Alignment faults can write either b0001 or b0011 into FSR[3:0].
Invalid values can occur in the status bit encoding for domain faults. This happens when
the fault is raised before a valid domain field has been read from a page table
description.
Aborts masked by a higher priority abort can be regenerated by fixing the cause of the
higher priority abort, and repeating the access.
Alignment faults are not possible for instruction fetches.
The instruction FSR can also be updated for instruction prefetch operations
(
For load and store instructions that can involve the transfer of more than one word
(LDM/STM, LDRD, STRD, and STC/LDC), the value written into the FAR register
depends on the type of access, and for external aborts, on whether or not the access
crosses a 1KB boundary. Table 3-10 shows the FAR values for multi-word transfers.
Table 3-10 FAR values for multi-word transfers
SourceFAR
AlignmentMVA of first aborted address in transfer.
External abort on translationMVA of first aborted address in transfer.
TranslationMVA of first aborted address in transfer.
DomainMVA of first aborted address in transfer.
PermissionMVA of first aborted address in transfer.
External abort for noncached reads, or
nonbuffered writes.
MVA of last address before 1KB boundary if any
word of the transfer before 1KB boundary is
externally aborted.
MVA of last address in transfer if the first
externally aborted word is after 1KB boundary.
Compatibility Issues
To enable code to be easily ported to ARM architecture v4 or v5 MMUs, or to future
architectures, it is recommended that no reliance is made on external abort behavior.
The instruction FSR is intended for debugging purposes only. Code that is intended to
be ported to other ARM architecture v4 or v5 MMUs must not use the instruction FSR.
MMU accesses are primarily controlled through the use of domains. There are 16
domains and each has a two-bit field to define access to it. Two types of user are
supported:
•clients
•managers.
The domains are defined in the domain access control register, CP15 c3. Figure 2-7 on
page 2-18 shows how the 32 bits of the register are allocated to define the 16 two-bit
domains.
Table 3-11 defines how the bits within each domain are interpreted to specify the access
permissions.
ValueMeaningDescription
0 0No accessAny access generates a domain fault.
0 1ClientAccesses are checked against the access permission bits in
Table 3-11 Domain access control register, access control bits
the section or page descriptor.
1 0ReservedReserved. Currently behaves like the no access mode.
1 1ManagerAccesses are not checked against the access permission
bits so a permission fault cannot be generated.
Table 3-12 shows how to interpret the Access Permission (AP) bits and how their
interpretation is dependent on the R and S bits (Control Register c1 bits [9:8]).
The sequence the MMU uses to check for access faults is different for sections and
pages. The sequence for both types of access is shown in Figure 3-13.
Modified virtual address
Section
translation
fault
Section
domain
fault
Section
permission
fault
No access (00)
Reserved (10)
Violation
Check address alignment
Get first-level descriptorInvalid
SectionPage
Get page
table entry
Check domain status
SectionPage
Client (01)Client (01)
Manager
(11)
Check
access
permissions
Check
access
permissions
Misaligned
Invalid
No access (00)
Reserved (10)
Violation
Alignment
fault
Page
translation
fault
Page
domain
fault
Page
permission
fault
Physical address
Figure 3-13 Sequence for checking faults
The conditions that generate each of the faults are described in:
If alignment fault checking is enabled (the A bit in CP15 c1 is set), the MMU generates
an alignment fault on any data word access if the address is not word-aligned, or on any
halfword access if the address is not halfword-aligned, irrespective of whether the
MMU is enabled or not. An alignment fault is not generated on any instruction fetch or
any byte access.
If an access generates an alignment fault, the access sequence aborts without reference
to other permission checks.
3.5.2Translation faults
There are two types of translation fault:
Section A section translation fault is generated if the level one descriptor is
Memory Management Unit
Note
marked as invalid. This happens if bits [1:0] of the descriptor are both 0.
Page A page translation fault is generated if the level one descriptor is marked
as invalid. This happens if bits [1:0] of the descriptor are both 0.
3.5.3Domain faults
There are two types of domain fault:
Section The level one descriptor holds the four-bit domain field, which selects
one of the 16 two-bit domains in the domain access control register. The
two bits of the specified domain are then checked for access permissions
as described in Table 3-12 on page 3-24. The domain is checked when the
level one descriptor is returned.
Page The level one descriptor holds the four-bit domain field, which selects
one of the 16 two-bit domains in the domain access control register. The
two bits of the specified domain are then checked for access permissions
as described in Table 3-12 on page 3-24. The domain is checked when the
level one descriptor is returned.
If the specified access is either no access (00), or reserved (10), then either a section
domain fault or page domain fault occurs.
If the two-bit domain field returns 01 (client), then access permissions are checked as
follows:
Section If the level one descriptor defines a section-mapped access, the AP bits of
Large page or small page
Tiny page If the level one descriptor defines a page-mapped access, and the level
the descriptor define whether or not the access is allowed, according to
Table 3-12 on page 3-24. Their interpretation is dependent on the setting
of the S and R bits (CP15 c1 bits 8 and 9). If the access is not allowed, a
section permission fault is generated.
If the level one descriptor defines a page-mapped access and the level two
descriptor is for a large or small page, four access permission fields (ap3
to ap0) are specified, each corresponding to one quarter of the page. For
small pages ap3 is selected by the top 1KB of the page and ap0 is selected
by the bottom 1KB of the page. For large pages, ap3 is selected by the top
16KB of the page and ap0 is selected by the bottom 16KB of the page.
The selected AP bits are then interpreted in exactly the same way as for
a section (see Table 3-12 on page 3-24), the only difference is that the
fault generated is a page permission fault.
two descriptor is for a tiny page, the AP bits of the level one descriptor
define whether or not the access is allowed in the same way as for a
section. The fault generated is a page permission fault.
In addition to the MMU generated aborts, external aborts can be generated for certain
types of access that involve transfers over the AHB bus. These can be used to flag errors
on external memory accesses. However, not all accesses can be aborted in this way.
The following accesses can be externally aborted:
•page walks
•noncached reads
•nonbuffered writes
•noncached read-lock-write (SWP) sequence.
For a read-lock-write (SWP) sequence, if the read externally aborts, the write is always
attempted.
A swap to an NCB region is forced to have precisely the same behavior as a swap to an
NCNB region. This means that the write part of a swap to an NCB region can be
externally aborted.
3.6.1Enabling the MMU
Before enabling the MMU using CP15 c1 you must:
Memory Management Unit
1.Program the TTB register (CP15 c2) and the domain access control register (Cp15
c3).
2.Program first-level and second-level page tables as required, ensuring that a valid
translation table is placed in memory at the location specified by the TTB register.
When these steps have been performed, you can enable the MMU by setting CP15 c1
bit 0 HIGH.
Care must be taken if the translated address differs from the untranslated address
because several instructions following the enabling of the MMU might have been
prefetched with the MMU off (VA = MVA = PA).
In this case, enabling the MMU can be considered as a branch with delayed execution.
A similar situation occurs when the MMU is disabled. Consider the following code
sequence:
MRC p15, 0, R1, c1, C0, 0 ; Read control register
ORR R1, #0x1 ; Set M bit
MCR p15, 0,R1,C1, C0,0 ; Write control register and enable MMU
Fetch Flat
Fetch Flat
Fetch Translated
Because the same register, CP15 c1, controls the enabling of the ICache, DCache, and
the MMU, all three can be enabled using a single MCR instruction.
3.6.2Disabling the MMU
To disable the MMU, clear bit 0 in CP15 c1.
If the MMU is enabled, then disabled, and subsequently re-enabled, the contents of the
TLB are preserved. If these are now invalid, then the TLB must be invalidated before
re-enabling the MMU. See TLB Operations Register c8 on page 2-24.
The MMU contains a single unified TLB used for both data accesses and instruction
fetches. The TLB is divided into two parts:
•an eight-entry fully-associative part used exclusively for holding locked down
•a set-associative part for all other entries, 2 way x 32 entry.
Whether an entry is placed in the set-associative, or lockdown part of the TLB is
dependent on the state of the TLB lockdown register, when the entry is written into the
TLB (see TLB Lockdown Register c10 on page 2-32).
When an entry has been written into the lockdown part of the TLB, it can only be
removed by being overwritten explicitly, or by an MVA-based TLB invalidate
operation, where the MVA matches the locked down entry.
The structure of the set-associative part of the TLB does not form part of the
programmer's model for the ARM926EJ-S processor. No assumptions must be made
about the structure, replacement algorithm, or persistence of entries in the
set-associative part. Specifically:
•Any entry written into the set-associative part of the TLB can be removed at any
•The set-associative part of the TLB must be considered as a cache of the
•If any of the subpage permissions for a given page are different, then each of the
Memory Management Unit
TLB entries
time. The set-associative part of the TLB must be considered as a temporary cache
of translation/page table information. No reliance must be placed on an entry
either residing or not residing in the set-associative TLB, unless that entry already
exists in the lockdown TLB. The set-associative part of the TLB can contain
entries that are defined in the page tables but do not correspond to address values
that have been accessed since the TLB was invalidated.
underlying page table, where memory coherency must be maintained at all times.
If a level one descriptor is modified in main memory, then to guarantee coherency
either an invalidate TLB or invalidate TLB by entry operation must be used to
remove any cached copies of the level one descriptor. This is required regardless
of the type of level one descriptor (section, level two page table reference, or
fault).
subpages are treated separately. To invalidate all the entries associated with a page
with subpage permissions then four MVA-based invalidate operations are
required, one for each subpage.
The size of the caches can be from 4KB to 128KB, in power of two increments.
The caches have the following features:
•The caches are virtual index, virtual tag, addressed using the Modified Virtual
Address (MVA). This enables the avoidance of cache cleaning and/or invalidating
on context switch.
•The caches are four-way set associative, with a cache line length of eight words
per line (32 bytes per line), and with two dirty bits in the DCache.
•The DCache supports write-through and write-back (or copyback) cache
operations, selected by memory region using the C and B bits in the MMU
translation tables.
•Allocate on read-miss is supported. The caches perform critical-word first cache
refilling.
•Pseudo-random or round-robin replacement, selectable by the RR bit in CP15 c1.
•Cache lockdown registers enable control over which cache ways are used for
allocation on a linefill, providing a mechanism for both lockdown and controlling
cache pollution.
•The DCache stores the Physical Address (PA) tag corresponding to each DCache
entry in the tag RAM for use during cache line write-backs, in addition to the
Virtual Address tag stored in the tag RAM. This means that the MMU is not
involved in DCache write-back operations, removing the possibility of TLB
misses related to the write-back address.
•The PLD data preload instruction does not cause data cache linefills. It is treated
as a NOP instruction.
•Cache maintenance operations to provide efficient invalidation of:
—the entire DCache or ICache
—regions of the DCache or ICache
—regions of virtual memory.
They also provide operations for efficient cleaning and invalidation of:
The latter allows DCache coherency to be efficiently maintained when small code
changes occur, for example for self-modifying code and changes to exception
vectors.
The write buffer is used for all writes to a noncachable, bufferable region, write-through
region, and write misses to a write-back region. A separate buffer is incorporated in the
DCache for holding write-back data for cache line evictions or cleaning of dirty cache
lines.
The main write buffer has a 16-word data buffer and a four-address buffer.
The DCache write-back buffer has eight data word entries and a single address entry.
The MCR drain write buffer instruction enables both write buffers to be drained under
software control.
The MCR wait for interrupt causes both write buffers to be drained and the
ARM926EJ-S processor to be put into a low-power state until an interrupt occurs.
Write buffer behavior is described in Table 4-4 on page 4-6.
No forwarding takes place for read accesses which have corresponding pending writes
in the write buffer. For such accesses the write buffer is drained and the value fetched
from external memory.