The MicroBlaze™ Processor Reference Guide provides information about the 32-bit soft
processor, MicroBlaze, which is included in Vivado. The document is intended as a guide to
the MicroBlaze hardware architecture.
Guide Contents
This guide contains the following chapters:
•Chapter 2, MicroBlaze Architecture contains an overview of MicroBlaze features as well
as information on Big-Endian and Little-Endian bit-reversed format, 32-bit general
purpose registers, cache software support, and AXI4-Stream interfaces.
Chapter 1
•Chapter 3, MicroBlaze Signal Interface Description describes the types of signal
interfaces that can be used to connect MicroBlaze.
•Chapter 4, MicroBlaze Application Binary Interface describes the Application Binary
Interface important for developing software in assembly language for the processor.
•Chapter 5, MicroBlaze Instruction Set Architecture provides notation, formats, and
instructions for the Instruction Set Architecture (ISA) of MicroBlaze.
•Appendix A, Performance and Resource Utilization contains maximum frequencies and
resource utilization numbers for different configurations and devices.
•Appendix B, Additional Resources and Legal Notices provides links to documentation
and additional resources.
MicroBlaze Processor Reference Guide6
UG984 (v2018.2) June 21, 2018www.xilinx.com
MicroBlaze Architecture
Bus
IF
I-Cache
Instruction
Buffer
Instruction
Buffer
Branch Target
Cache
Program
Counter
M_AXI_IC
Memory Management Unit (MMU)
ITLBDTLBUTLB
Bus
IF
D-Cache
M_AXI_DC
M_AXI_DP
DLMB
M0_AXIS ..
M15_AXIS
S0_AXIS ..
S15_AXIS
Special
Purpose
Registers
Instruction
Decode
Register File
32 x 32b
ALU
Shift
Barrel Shift
Multiplier
Divider
FPU
Instruction-side
Bus interface
Data-side
Bus interface
Optional MicroBlaze feature
M_AXI_IP
ILMB
M_ACE_DC
M_ACE_IC
X19738-090717
SendFeedback
Introduction
This chapter contains an overview of MicroBlaze™ features and detailed information on
MicroBlaze architecture including Big-Endian or Little-Endian bit-reversed format, 32-bit
general purpose registers, virtual-memory management, cache software support, and
AXI4-Stream interfaces.
Overview
The MicroBlaze embedded processor soft core is a reduced instruction set computer (RISC)
optimized for implementation in Xilinx® Field Programmable Gate Arrays (FPGAs). The
following figure shows a functional block diagram of the MicroBlaze core.
Chapter 2
X-Ref Target - Figure 2-1
MicroBlaze Processor Reference Guide7
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-1: MicroBlaze Core Block Diagram
Chapter 2: MicroBlaze Architecture
SendFeedback
Features
The MicroBlaze soft core processor is highly configurable, allowing you to select a specific
set of features required by your design.
The fixed feature set of the processor includes:
•Thirty-two 32-bit general purpose registers
•32-bit instruction word with three operands and two addressing modes
•Default 32-bit address bus, extensible to 64 bits
•Single issue pipeline
In addition to these fixed features, the MicroBlaze processor is parameterized to allow
selective enabling of additional functionality. Older (deprecated) versions of MicroBlaze
support a subset of the optional features described in this manual. Only the latest
(preferred) version of MicroBlaze (v10.0) supports all options.
RECOMMENDED: Xilinx recommends that all new designs use the latest preferred version of the
MicroBlaze processor.
The following table provides an overview of the configurable features by MicroBlaze
versions.
Table 2-1: Configurable Feature Overview by MicroBlaze Version
Table 2-1: Configurable Feature Overview by MicroBlaze Version (Cont’d)
Feature
Disable hardware multiplier
Hardware debug readable ESR and
EAR
Processor Version Register (PVR)
Area or speed optimized
Hardware multiplier 64-bit result
LUT cache memory
Floating-point conversion and
square root instructions
Memory Management Unit (MMU)
Extended stream instructions
Use Cache Interface for All I-Cache
Memory Accesses
Use Cache Interface for All D-Cache
Memory Accesses
Use Write-back Caching Policy for
D-Cache
Branch Target Cache (BTC)
Streams for I-Cache
Victim handling for I-Cache
Victim handling for D-Cache
AXI4 (M_AXI_DP) data side interface
AXI4 (M_AXI_IP) instruction side
interface
AXI4 (M_AXI_DC) protocol for DCache
AXI4 (M_AXI_IC) protocol for ICache
AXI4 protocol for stream accesses
Fault tolerant features
Force distributed RAM for cache
tags
Configurable cache data widths
Count Leading Zeros instruction
Memory Barrier instruction
Stack overflow and underflow
detection
Allow stream instructions in user
mode
1
MicroBlaze versions
v9.2v9.3v9.4v9.5v9.6v10.0
optionoptionoptionoptionoptionoption
YesYesYesYesYesYes
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
YesYesYesYesYesYes
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
MicroBlaze Processor Reference Guide9
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-1: Configurable Feature Overview by MicroBlaze Version (Cont’d)
Feature
v9.2v9.3v9.4v9.5v9.6v10.0
Lockstep support
Configurable use of FPGA
primitives
Low-latency interrupt mode
Swap instructions
Sleep mode and sleep instruction
Relocatable base vectors
ACE (M_ACE_DC) protocol for DCache
ACE (M_ACE_IC) protocol for ICache
Extended debug: performance
monitoring, program trace, nonintrusive profiling
Reset mode: enter sleep or debug
halt at reset
Extended debug: external program
trace
Extended data addressing
Pipeline pause functionality
Hibernate and suspend instructions
Non-secure mode
Bit field instructions
2
Parallel debug interface
MMU Physical Address Extension
1. Used for saving DSP48E primitives.
2. Bit field instructions are available when C_USE_BARREL = 1.
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
YesYesYesYesYesYes
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoptionoption
optionoptionoptionoptionoption
optionoptionoptionoptionoption
MicroBlaze versions
optionoptionoptionoption
optionoption
YesYes
YesYes
YesYes
option
option
option
MicroBlaze Processor Reference Guide10
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Data Types and Endianness
The MicroBlaze processor uses Big-Endian or Little-Endian format to represent data,
depending on the selected endianness. The parameter
endian) by default.
The hardware supported data types for MicroBlaze are word, half word, and byte. When
using the reversed load and store instructions LHUR, LWR, SHR, and SWR, the bytes in the
data are reversed, as indicated by the byte-reversed order.
The following tables show the bit and byte organization for each type.
Table 2-2: Word Data Type
Big-Endian Byte Addressnn+1n+2n+3
Big-Endian Byte SignificanceMSByteLSByte
Big-Endian Byte Ordernn+1n+2n+3
Big-Endian Byte-Reversed Ordern+3n+2n+1n
Little-Endian Byte Addressn+3n+2n+1n
Little-Endian Byte SignificanceMSByteLSByte
Little-Endian Byte Ordern+3n+2n+1n
Little-Endian Byte-Reversed Ordernn+1n+2n+3
Bit Label031
Bit SignificanceMSBitLSBit
C_ENDIANNESS is set to 1 (little-
Table 2-3: Half Word Data Type
Big-Endian Byte Addressnn+1
Big-Endian Byte SignificanceMSByteLSByte
Big-Endian Byte Ordernn+1
Big-Endian Byte-Reversed Ordern+1n
Little-Endian Byte Addressn+1n
Little-Endian Byte SignificanceMSByteLSByte
Little-Endian Byte Ordern+1n
Little-Endian Byte-Reversed Ordernn+1
Bit Label0 15
Bit SignificanceMSBitLSBit
Table 2-4: Byte Data Type
Byte Addressn
Bit Label0 7
Bit SignificanceMSBitLSBit
MicroBlaze Processor Reference Guide11
UG984 (v2018.2) June 21, 2018www.xilinx.com
Instructions
SendFeedback
Instruction Summary
All MicroBlaze instructions are 32 bits and are defined as either Type A or Type B. Type A
instructions have up to two source register operands and one destination register operand.
Type B instructions have one source register and a 16-bit immediate operand (which can be
extended to 32 bits by preceding the Type B instruction with an imm instruction).
Type B instructions have a single destination register operand. Instructions are provided in
the following functional categories: arithmetic, logical, branch, load/store, and special. The
following table describes the instruction set nomenclature used in the semantics of each
instruction.
Instruction Set Architecture, for more information on these instructions.
Table 2-5: Instruction Set Nomenclature
SymbolDescription
Table 2-5 lists the MicroBlaze instruction set. See Chapter 5, MicroBlaze
Chapter 2: MicroBlaze Architecture
RaR0 - R31, General Purpose Register, source operand a
RbR0 - R31, General Purpose Register, source operand b
RdR0 - R31, General Purpose Register, destination operand
SPR[x]Special Purpose Register number x
MSRMachine Status Register = SPR[1]
ESRException Status Register = SPR[5]
EARException Address Register = SPR[3]
FSRFloating-point Unit Status Register = SPR[7]
PVRxProcessor Version Register, where x is the register number = SPR[8192 + x]
BTRBranch Target Register = SPR[11]
PCExecute stage Program Counter = SPR[0]
x[y]Bit y of register x
x[y:z]Bit range y to z of register x
xBit inverted value of register x
Imm16 bit immediate value
Immxx bit immediate value
FSLx4 bit AXI4-Stream port designator, where x is the port number
CCarry flag, MSR[29]
SaSpecial Purpose Register, source operand
SdSpecial Purpose Register, destination operand
s(x)Sign extend argument x to 32-bit value
MicroBlaze Processor Reference Guide12
UG984 (v2018.2) June 21, 2018www.xilinx.com
Table 2-5: Instruction Set Nomenclature (Cont’d)
SendFeedback
SymbolDescription
*AddrMemory contents at location Addr (data-size aligned)
:=Assignment operator
=Equality comparison
!=Inequality comparison
>Greater than comparison
>=Greater than or equal comparison
<Less than comparison
<=Less than or equal comparison
+Arithmetic add
*Arithmetic multiply
/Arithmetic divide
>> xBit shift right x bits
<< xBit shift left x bits
Chapter 2: MicroBlaze Architecture
andLogic AND
orLogic OR
xorLogic exclusive OR
op1 if cond else op2 Perform op1 if condition cond is true, else perform op2
&Concatenate. For example “0000100 & Imm7” is the concatenation of the fixed field
“0000100” and a 7 bit immediate value.
signedOperation performed on signed integer data type. All arithmetic operations are
performed on signed word operands, unless otherwise specified
unsignedOperation performed on unsigned integer data type
floatOperation performed on floating-point data type
clz(r)Count leading zeros
Table 2-6: MicroBlaze Instruction Set Summary
Type A0-56-1011-15 16-2021-31
Semantics
Type B0-56-1011-1516-31
ADD Rd,Ra,Rb000000RdRaRb00000000000 Rd := Rb + Ra
RSUB Rd,Ra,Rb000001RdRaRb00000000000 Rd := Rb + Ra + 1
ADDC Rd,Ra,Rb000010RdRaRb00000000000 Rd := Rb + Ra + C
RSUBC Rd,Ra,Rb000011RdRaRb00000000000 Rd := Rb + Ra + C
ADDK Rd,Ra,Rb000100RdRaRb00000000000 Rd := Rb + Ra
RSUBK Rd,Ra,Rb000101RdRaRb00000000000 Rd := Rb + Ra + 1
MicroBlaze Processor Reference Guide13
UG984 (v2018.2) June 21, 2018www.xilinx.com
Table 2-6: MicroBlaze Instruction Set Summary (Cont’d)
SendFeedback
Chapter 2: MicroBlaze Architecture
Type A0-56-1011-1516-2021-31
Semantics
Type B0-56-1011-1516-31
CMP Rd,Ra,Rb000101RdRaRb00000000001 Rd := Rb + Ra + 1
Rd[0] := 0 if (Rb >= Ra) else
Rd[0] := 1
CMPU Rd,Ra,Rb000101RdRaRb00000000011 Rd := Rb + Ra + 1 (unsigned)
BRLD Rd,Rb100110Rd10100Rb00000000000 PC := PC + Rb
Rd := PC
BRA Rb1001100000001000Rb00000000000 PC := Rb
BRAD Rb1001100000011000Rb00000000000 PC := Rb
BRALD Rd,Rb100110Rd11100Rb00000000000 PC := Rb
Rd := PC
MicroBlaze Processor Reference Guide18
UG984 (v2018.2) June 21, 2018www.xilinx.com
Table 2-6: MicroBlaze Instruction Set Summary (Cont’d)
SendFeedback
Chapter 2: MicroBlaze Architecture
Type A0-56-1011-1516-2021-31
Semantics
Type B0-56-1011-1516-31
BRK Rd,Rb100110Rd01100Rb00000000000 PC := Rb
Rd := PC
MSR[BIP] := 1
BEQ Ra,Rb10011100000RaRb00000000000 PC := PC + Rb if Ra = 0
BNE Ra,Rb10011100001RaRb00000000000 PC := PC + Rb if Ra != 0
BLT Ra,Rb10011100010RaRb00000000000 PC := PC + Rb if Ra < 0
BLE Ra,Rb10011100011RaRb00000000000 PC := PC + Rb if Ra <= 0
BGT Ra,Rb10011100100RaRb00000000000 PC := PC + Rb if Ra > 0
BGE Ra,Rb10011100101RaRb00000000000 PC := PC + Rb if Ra >= 0
BEQD Ra,Rb10011110000RaRb00000000000 PC := PC + Rb if Ra = 0
BNED Ra,Rb10011110001RaRb00000000000 PC := PC + Rb if Ra != 0
BLTD Ra,Rb10011110010RaRb00000000000 PC := PC + Rb if Ra < 0
BLED Ra,Rb10011110011RaRb00000000000 PC := PC + Rb if Ra <= 0
BGTD Ra,Rb10011110100RaRb00000000000 PC := PC + Rb if Ra > 0
BGED Ra,Rb10011110101RaRb00000000000 PC := PC + Rb if Ra >= 0
ORI Rd,Ra,Imm101000RdRaImmRd := Ra or s(Imm)
ANDI Rd,Ra,Imm101001RdRaImmRd := Ra and s(Imm)
XORI Rd,Ra,Imm101010RdRaImmRd := Ra xor s(Imm)
ANDNI Rd,Ra,Imm101011RdRaImmRd := Ra and s(Imm)
IMM Imm1011000000000000ImmImm[0:15] := Imm
RTSD Ra,Imm10110110000RaImmPC := Ra + s(Imm)
RTID Ra,Imm10110110001RaImmPC := Ra + s(Imm)
MSR[IE] := 1
RTBD Ra,Imm10110110010RaImmPC := Ra + s(Imm)
MSR[BIP] := 0
RTED Ra,Imm10110110100RaImmPC := Ra + s(Imm)
MSR[EE] := 1, MSR[EIP] := 0
ESR := 0
BRI Imm1011100000000000ImmPC := PC + s(Imm)
MBAR Imm101110Imm000100000000000000100PC := PC + 4; Wait for memory
accesses.
BRID Imm1011100000010000ImmPC := PC + s(Imm)
BRLID Rd,Imm101110Rd10100ImmPC := PC + s(Imm)
Rd := PC
BRAI Imm1011100000001000ImmPC := s(Imm)
MicroBlaze Processor Reference Guide19
UG984 (v2018.2) June 21, 2018www.xilinx.com
Table 2-6: MicroBlaze Instruction Set Summary (Cont’d)
SendFeedback
Chapter 2: MicroBlaze Architecture
Type A0-56-1011-1516-2021-31
Semantics
Type B0-56-1011-1516-31
BRAID Imm1011100000011000ImmPC := s(Imm)
BRALID Rd,Imm101110Rd11100ImmPC := s(Imm)
Rd := PC
BRKI Rd,Imm101110Rd01100ImmPC := s(Imm)
Rd := PC
MSR[BIP] := 1
BEQI Ra,Imm10111100000RaImmPC := PC + s(Imm) if Ra = 0
BNEI Ra,Imm10111100001RaImmPC := PC + s(Imm) if Ra != 0
BLTI Ra,Imm10111100010RaImmPC := PC + s(Imm) if Ra < 0
BLEI Ra,Imm10111100011RaImmPC := PC + s(Imm) if Ra <= 0
BGTI Ra,Imm10111100100RaImmPC := PC + s(Imm) if Ra > 0
BGEI Ra,Imm10111100101RaImmPC := PC + s(Imm) if Ra >= 0
BEQID Ra,Imm10111110000RaImmPC := PC + s(Imm) if Ra = 0
BNEID Ra,Imm10111110001RaImmPC := PC + s(Imm) if Ra != 0
BLTID Ra,Imm10111110010RaImmPC := PC + s(Imm) if Ra < 0
BLEID Ra,Imm10111110011RaImmPC := PC + s(Imm) if Ra <= 0
BGTID Ra,Imm10111110100RaImmPC := PC + s(Imm) if Ra > 0
BGEID Ra,Imm10111110101RaImmPC := PC + s(Imm) if Ra >= 0
LBU Rd,Ra,Rb
LBUR Rd,Ra,Rb
LBUEA Rd,Ra,Rb110000RdRaRb00010000000 Addr := Ra & Rb
LHU Rd,Ra,Rb
LHUR Rd,Ra,Rb
LHUEA Rd,Ra,Rb110001RdRaRb00010000000 Addr := Ra & Rb
LW Rd,Ra,Rb
LWR Rd,Ra,Rb
LWX Rd,Ra,Rb110010RdRaRb10000000000 Addr := Ra + Rb
110000RdRaRb00000000000
01000000000
110001RdRaRb00000000000
01000000000
110010RdRaRb00000000000
01000000000
Addr := Ra + Rb
Rd[0:23] := 0
Rd[24:31] := *Addr[0:7]
Rd[0:23] := 0
Rd[24:31] := *Addr[0:7]
Addr := Ra + Rb
Rd[0:15] := 0
Rd[16:31] := *Addr[0:15]
Rd[0:15] := 0
Rd[16:31] := *Addr[0:15]
Addr := Ra + Rb
Rd := *Addr
Rd := *Addr
Reservation := 1
MicroBlaze Processor Reference Guide20
UG984 (v2018.2) June 21, 2018www.xilinx.com
Table 2-6: MicroBlaze Instruction Set Summary (Cont’d)
SendFeedback
Chapter 2: MicroBlaze Architecture
Type A0-56-1011-1516-2021-31
Semantics
Type B0-56-1011-1516-31
LWEA Rd,Ra,Rb110010RdRaRb00010000000 Addr := Ra & Rb
Rd := *Addr
SB Rd,Ra,Rb
SBR Rd,Ra,Rb
SBEA Rd,Ra,Rb110100RdRaRb00010000000 Addr := Ra & Rb
SH Rd,Ra,Rb
SHR Rd,Ra,Rb
SHEA Rd,Ra,Rb110101RdRaRb00010000000 Addr := Ra & Rb
SW Rd,Ra,Rb
SWR Rd,Ra,Rb
SWX Rd,Ra,Rb110110RdRaRb10000000000 Addr := Ra + Rb
SWEA Rd,Ra,Rb110110RdRaRb00010000000 Addr := Ra & Rb
110100RdRaRb00000000000
01000000000
110101RdRaRb00000000000
01000000000
110110RdRaRb00000000000
01000000000
Addr := Ra + Rb
*Addr[0:8] := Rd[24:31]
*Addr[0:8] := Rd[24:31]
Addr := Ra + Rb
*Addr[0:16] := Rd[16:31]
*Addr[0:16] := Rd[16:31]
Addr := Ra + Rb
*Addr := Rd
*Addr := Rd if Reservation = 1
Reservation := 0
*Addr := Rd
LBUI Rd,Ra,Imm111000RdRaImmAddr := Ra + s(Imm)
Rd[0:23] := 0
Rd[24:31] := *Addr[0:7]
LHUI Rd,Ra,Imm111001RdRaImmAddr := Ra + s(Imm)
Rd[0:15] := 0
Rd[16:31] := *Addr[0:15]
LWI Rd,Ra,Imm111010RdRaImmAddr := Ra + s(Imm)
Rd := *Addr
SBI Rd,Ra,Imm111100RdRaImmAddr := Ra + s(Imm)
*Addr[0:7] := Rd[24:31]
SHI Rd,Ra,Imm111101RdRaImmAddr := Ra + s(Imm)
*Addr[0:15] := Rd[16:31]
SWI Rd,Ra,Imm111110RdRaImmAddr := Ra + s(Imm)
*Addr := Rd
1. Due to the many different corner cases involved in floating-point arithmetic, only the normal behavior is described. A full
description of the behavior can be found in Chapter 5, “MicroBlaze Instruction Set Architecture.”
MicroBlaze Processor Reference Guide21
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Semaphore Synchronization
The LWX and SWX instructions are used to implement common semaphore operations,
including test and set, compare and swap, exchange memory, and fetch and add. They are
also used to implement spinlocks.
These instructions are typically used by system programs and are called by application
programs as needed.
Generally, a program uses LWX to load a semaphore from memory, causing the reservation
to be set (the processor maintains the reservation internally). The program can compute a
result based on the semaphore value and conditionally store the result back to the same
memory location using the SWX instruction. The conditional store is performed based on
the existence of the reservation established by the preceding LWX instruction. If the
reservation exists when the store is executed, the store is performed and MSR[C] is cleared
to 0. If the reservation does not exist when the store is executed, the target memory
location is not modified and MSR[C] is set to 1.
If the store is successful, the sequence of instructions from the semaphore load to the
semaphore store appear to be executed atomically—no other device modified the
semaphore location between the read and the update. Other devices can read from the
semaphore location during the operation.
For a semaphore operation to work properly, the LWX instruction must be paired with an
SWX instruction, and both must specify identical addresses.
The reservation granularity in MicroBlaze is a word. For both instructions, the address must
be word aligned. No unaligned exceptions are generated for these instructions.
The conditional store is always attempted when a reservation exists, even if the store
address does not match the load address that set the reservation.
Only one reservation can be maintained at a time. The address associated with the
reservation can be changed by executing a subsequent LWX instruction.
The conditional store is performed based upon the reservation established by the last LWX
instruction executed. Executing an SWX instruction always clears a reservation held by the
processor, whether the address matches that established by the LWX or not.
Reset, interrupts, exceptions, and breaks (including the BRK and BRKI instructions) all clear
the reservation.
The following provides general guidelines for using the LWX and SWX instructions:
•The LWX and SWX instructions should be paired and use the same address.
MicroBlaze Processor Reference Guide22
UG984 (v2018.2) June 21, 2018www.xilinx.com
•An unpaired SWX instruction to an arbitrary address can be used to clear any
reservation held by the processor.
Chapter 2: MicroBlaze Architecture
SendFeedback
•A conditional sequence begins with an LWX instruction. It can be followed by memory
accesses and/or computations on the loaded value. The sequence ends with an SWX
instruction. In most cases, failure of the SWX instruction should cause a branch back to
the LWX for a repeated attempt.
•An LWX instruction can be left unpaired when executing certain synchronization
primitives if the value loaded by the LWX is not zero. An implementation of Test and Set
exemplifies this:
loop: lwx r5,r3,r0; load and reserve
bneir5,next; branch if not equal to zero
addik r5,r5,1; increment value
swxr5,r3,r0; try to store non-zero value
addic r5,r0,0; check reservation
bneir5,loop; loop if reservation lost
next:
•Performance can be improved by minimizing looping on an LWX instruction that fails to
return a desired value. Performance can also be improved by using an ordinary load
instruction to do the initial value check. An implementation of a spinlock exemplifies
this:
loop: lwr5,r3,r0; load the word
bneir5,loop; loop back if word not equal to 0
lwxr5,r3,r0; try reserving again
bneir5,loop; likely that no branch is needed
addik r5,r5,1; increment value
swxr5,r3,r0 ; try to store non-zero value
addic r5,r0,0; check reservation
bneir5,loop; loop if reservation lost
•Minimizing the looping on an LWX/SWX instruction pair increases the likelihood that
forward progress is made. The old value should be tested before attempting the store.
If the order is reversed (store before load), more SWX instructions are executed and
reservations are more likely to be lost between the LWX and SWX instructions.
Self-modifying Code
When using self-modifying code software must ensure that the modified instructions have
been written to memory prior to fetching them for execution. There are several aspects to
consider:
MicroBlaze Processor Reference Guide23
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
•The instructions to be modified could already have been fetched prior to modification:
Into the instruction prefetch buffer
-
Into the instruction cache, if it is enabled
-
Into a stream buffer, if instruction cache stream buffers are used
-
Into the instruction cache, and then saved in a victim buffer, if victim buffers are
-
used.
To ensure that the modified code is always executed instead of the old unmodified
code, software must handle all these cases.
•If one or more of the instructions to be modified is a branch, and the branch target
cache is used, the branch target address might have been cached.
To avoid using the cached branch target address, software must ensure that the branch
target cache is cleared prior to executing the modified code.
•The modified instructions might not have been written to memory prior to execution:
They might be en-route to memory, in temporary storage in the interconnect or the
-
memory controller.
They might be stored in the data cache, if write-back cache is used.
-
They might be saved in a victim buffer, if write-back cache and victim buffers are
-
used.
Software must ensure that the modified instructions have been written to memory before
being fetched by the processor.
The annotated code below shows how each of the above issues can be addressed. This code
assumes that both instruction cache and write-back data cache is used. If not, the
corresponding instructions can be omitted.
The following code exemplifies storing a modified instruction:
swir5,r6,0 ; r5 = new instruction
; r6 = physical instruction address
wdc.flush r6,r0; flush write-back data cache line
mbar1; ensure new instruction is written to memory
wicr7,r0; invalidate line, empty stream & victim buffers
The physical and virtual addresses above are identical, unless MMU virtual mode is used. If
the MMU is enabled, the code sequences must be executed in real mode, because WIC and
WDC are privileged instructions. The first instruction after the code sequences above must
not be modified, because it might have been prefetched.
X-Ref Target - Figure 2-2
R0 – R31
031
X19739-091117
SendFeedback
Chapter 2: MicroBlaze Architecture
Registers
MicroBlaze has an orthogonal instruction set architecture. It has thirty-two 32-bit general
purpose registers and up to eighteen 32-bit special purpose registers, depending on
configured options.
General Purpose Registers
The thirty-two 32-bit General Purpose Registers are numbered R0 through R31. The register
file is reset on bit stream download (reset value is 0x00000000). The following figure is a
representation of a General Purpose Register and
register and the register reset value (if existing).
Note: The register file is not reset by the external reset inputs: Reset and Debug_Rst.
Table 2-7 provides a description of each
Figure 2-2: R0-R31
Table 2-7: General Purpose Registers (R0-R31)
BitsNameDescriptionReset Value
0:31R0Always has a value of zero. Anything written to R0 is
discarded
0:31R1 through R1332-bit general purpose registers-
0:31R1432-bit register used to store return addresses for
interrupts.
0:31R1532-bit general purpose register. Recommended for storing
return addresses for user vectors.
0:31R1632-bit register used to store return addresses for breaks.-
0:31R17If MicroBlaze is configured to support hardware
exceptions, this register is loaded with the address of the
instruction following the instruction causing the HW
exception, except for exceptions in delay slots that use BTR
instead (see
general purpose register.
0:31R18 through R31 R18 through R31 are 32-bit general purpose registers.-
Branch Target Register (BTR)); if not, it is a
0x00000000
-
-
-
See Table 4-2 for software conventions on general purpose register usage.
MicroBlaze Processor Reference Guide25
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-3
31
PC
0
X19740-082517
31
RES
ReservedCC
0
3029282726252423222120191817
IECBIPFSLICEDZODCEEEEIPPVRUMUMSVMVMS
X19741-091117
SendFeedback
Chapter 2: MicroBlaze Architecture
Special Purpose Registers
Program Counter (PC)
The program counter (PC) is the 32-bit address of the execution instruction. It can be read
with an MFS instruction, but it cannot be written with an MTS instruction. When used with
the MFS instruction the PC register is specified by setting Sa = 0x0000. The following figure
illustrates the PC and
Table 2-8: Program Counter (PC)
BitsNameDescriptionReset Value
Table 2-8 provides a description and reset value.
Figure 2-3: PC
X-Ref Target - Figure 2-4
0:31PCProgram Counter
Address of executing instruction, that is, “mfs r2, 0” stores the
address of the mfs instruction itself in R2.
0x00000000
Machine Status Register (MSR)
The Machine Status Register contains control and status bits for the processor. It can be
read with an MFS instruction. When reading the MSR, bit 29 is replicated in bit 0 as the carry
copy. MSR can be written using either an
MSRCLR instructions.
When writing to the MSR using MSRSET or MSRCLR, the Carry bit takes effect immediately
and the remaining bits take effect one clock cycle later. When writing using MTS, all bits
take effect one clock cycle later. Any value written to bit 0 is discarded.
When used with an MTS or MFS instruction, the MSR is specified by setting Sx = 0x0001.
The following table illustrates the MSR register and
and reset values.
MTS instruction or the dedicated MSRSET and
Table 2-9 provides the bit description
MicroBlaze Processor Reference Guide26
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-4: MSR
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-9: Machine Status Register (MSR)
BitsNameDescriptionReset Value
0CCArithmetic Carry Copy
Copy of the Arithmetic Carry (bit 29). CC is always the same as bit C.
1:16Reserved
17VMSVirtual Protected Mode Save
Only available when configured with an MMU
(if C_USE_MMU > 1 and C_AREA_OPTIMIZED = 0 or 2)
Read/Write
18VMVirtual Protected Mode
0 = MMU address translation and access protection disabled, with
1 = User Mode, certain instructions are not allowed
Only available when configured with an MMU
(if C_USE_MMU > 0 and C_AREA_OPTIMIZED = 0 or 2)
Read/Write
21PVRProcessor Version Register exists
0 = No Processor Version Register
1 = Processor Version Register exists
Read only
22EIPException In Progress
0 = No hardware exception in progress
1 = Hardware exception in progress
Only available if configured with exception support
C_*_EXCEPTION or C_USE_MMU > 0)
(
Read/Write
0
Based on
parameter
C_PVR
0
MicroBlaze Processor Reference Guide27
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-9: Machine Status Register (MSR) (Cont’d)
BitsNameDescriptionReset Value
23EEException Enable
0 = Hardware exceptions disabled
1 = Hardware exceptions enabled
Only available if configured with exception support
C_*_EXCEPTION or C_USE_MMU > 0)
(
Read/Write
24DCEData Cache Enable
0 = Data Cache disabled
1 = Data Cache enabled
Only available if configured to use data cache (C_USE_DCACHE = 1)
Read/Write
25DZODivision by Zero or Division Overflow
0 = No division by zero or division overflow has occurred
1 = Division by zero or division overflow has occurred
Only available if configured to use hardware divider
(C_USE_DIV = 1)
Read/Write
26ICEInstruction Cache Enable
0 = Instruction Cache disabled
1 = Instruction Cache enabled
Only available if configured to use instruction cache
(C_USE_ICACHE = 1)
Read/Write
1
0
0
2
0
0
MicroBlaze Processor Reference Guide28
UG984 (v2018.2) June 21, 2018www.xilinx.com
27FSLAXI4-Stream Error
0 = get or getd had no error
1 = get or getd control type mismatch
This bit is sticky, that is it is set by a get or getd instruction when a
control bit mismatch occurs. To clear it an MTS or MSRCLR instruction
must be used.
Only available if configured to use stream links (C_FSL_LINKS > 0)
Read/Write
28BIPBreak in Progress
0 = No Break in Progress
1 = Break in Progress
Break Sources can be software break instruction or hardware break
Ext_Brk or Ext_NM_Brk pin.
from
Read/Write
0
0
Chapter 2: MicroBlaze Architecture
C_ADDR_SIZE - 1
EAR
0
X19742-082517
SendFeedback
Table 2-9: Machine Status Register (MSR) (Cont’d)
BitsNameDescriptionReset Value
29CArithmetic Carry
0 = No Carry (Borrow)
1 = Carry (No Borrow)
Read/Write
30IEInterrupt Enable
0 = Interrupts disabled
1 = Interrupts enabled
Read/Write
31-Reserved0
1. The MMU exceptions (Data Storage Exception, Instruction Storage Exception, Data TLB Miss Exception, Instruction
TLB Miss Exception) cannot be disabled, and are not affected by this bit.
2. This bit is only used for integer divide-by-zero or divide overflow signaling. There is a floating-point equivalent in
the FSR. The DZO-bit flags divide by zero or divide overflow conditions regardless if the processor is configured
with exception handling or not.
0
0
Exception Address Register (EAR)
The Exception Address Register stores the full load/store address that caused the exception
for the following:
•An unaligned access exception that specifies the unaligned access data address
X-Ref Target - Figure 2-5
•An
M_AXI_DP exception that specifies the failing AXI4 data access address
•A data storage exception that specifies the (virtual) effective address accessed
•An instruction storage exception that specifies the (virtual) effective address read
•A data TLB miss exception that specifies the (virtual) effective address accessed
•An instruction TLB miss exception that specifies the (virtual) effective address read
The contents of this register is undefined for all other exceptions. When read with the MFS
or MFSE instruction, the EAR is specified by setting Sa = 0x0003. The EAR register is
illustrated in the following figure and
Table 2-10 provides bit descriptions and reset values.
With extended data addressing is enabled (parameter C_ADDR_SIZE > 32), the 32 least
significant bits of the register are read with the MFS instruction, and the most significant
bits with the MFSE instruction.
Figure 2-5: EAR
MicroBlaze Processor Reference Guide29
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-6
31
EC
19
Reserved
2726
20
ESS
DS
X19743-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
Table 2-10: Exception Address Register (EAR)
BitsNameDescriptionReset Value
0:C_ADDR_SIZE-1EARException Address Register0
Exception Status Register (ESR)
The Exception Status Register contains status bits for the processor. When read with the
MFS instruction, the ESR is specified by setting Sa = 0x0005. The ESR register is illustrated
in the following figure,
provides the Exception Specific Status (ESS).
Table 2-11 provides bit descriptions and reset values, and Table 2-12
Figure 2-6: ESR
Table 2-11: Exception Status Register (ESR)
BitsNameDescriptionReset Value
0:18Reserved
19DSDelay Slot Exception.
0 = not caused by delay slot instruction
1 = caused by delay slot instruction
Read-only
20:26ESSException Specific Status
For details, see Table 2-12.
Read-only
Table 2-12
0
See
MicroBlaze Processor Reference Guide30
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-11: Exception Status Register (ESR) (Cont’d)
BitsNameDescriptionReset Value
27:31ECException Cause
00000 = Stream exception
00001 = Unaligned data access exception
00010 = Illegal op-code exception
00011 = Instruction bus error exception
00100 = Data bus error exception
00101 = Divide exception
00110 = floating-point unit exception
00111 = Privileged instruction exception
00111 = Stack protection violation exception
10000 = Data storage exception
10001 = Instruction storage exception
10010 = Data TLB miss exception
10011 = Instruction TLB miss exception
Read-only
Table 2-12: Exception Specific Status (ESS)
Exception
Cause
Unaligned
Data Access
Illegal
Instruction
Instruction
bus error
Data bus
error
Divide20DECDivide - Division exception cause
BitsNameDescriptionReset Value
20WWord Access Exception
0 = unaligned halfword access
1 = unaligned word access
21SStore Access Exception
0 = unaligned load access
1 = unaligned store access
22:26RxSource/Destination Register
General purpose register used as source (Store) or
destination (Load) in unaligned access
20:26Reserved0
20ECCException caused by ILMB correctable or
uncorrectable error
21:26Reserved0
20ECCException caused by DLMB correctable or
uncorrectable error
21:26Reserved0
0 = Divide-By-Zero
1 = Division Overflow
21:26Reserved0
0
0
0
0
0
0
0
MicroBlaze Processor Reference Guide31
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-12: Exception Specific Status (ESS) (Cont’d)
Exception
Cause
Floatingpoint unit
Privileged
instruction
Stack
protection
violation
Stream20:22Reserved0
Data
storage
Instruction
storage
Data TLB
miss
Instruction
TLB miss
BitsNameDescriptionReset Value
20:26Reserved0
20:26Reserved0
20:26Reserved0
23:26FSLAXI4-Stream index that caused the exception0
20DIZData storage - Zone protection
0 = Did not occur
1 = Occurred
21SData storage - Store instruction
0 = Did not occur
1 = Occurred
22:26Reserved0
20DIZInstruction storage - Zone protection
0 = Did not occur
1 = Occurred
21:26Reserved0
20Reserved0
21SData TLB miss - Store instruction
0 = Did not occur
1 = Occurred
22:26Reserved0
20:26Reserved0
0
0
0
0
MicroBlaze Processor Reference Guide32
UG984 (v2018.2) June 21, 2018www.xilinx.com
Branch Target Register (BTR)
The Branch Target Register only exists if the MicroBlaze processor is configured to use
exceptions. The register stores the branch target address for all delay slot branch
instructions executed while MSR[EIP] = 0. If an exception is caused by an instruction in a
delay slot (that is, ESR[DS]=1), the exception handler should return execution to the address
stored in BTR instead of the normal exception return address stored in R17. When read with
the MFS instruction, the BTR is specified by setting Sa = 0x000B. The BTR register is
illustrated in the following figure and
Table 2-13 provides bit descriptions and reset values.
X-Ref Target - Figure 2-7
31
BTR
0
X19744-082517
31
DO
Reserved
3029
27
UFOF
DZIO
28
0
X19745-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
Figure 2-7: BTR
Table 2-13: Branch Target Register (BTR)
BitsNameDescriptionReset Value
X-Ref Target - Figure 2-8
0:31BTRBranch target address used by handler when returning from
0x00000000
an exception caused by an instruction in a delay slot.
Read-only
Floating-Point Status Register (FSR)
The Floating-Point Status Register contains status bits for the floating-point unit. It can be
read with an MFS, and written with an MTS instruction. When read or written, the register is
specified by setting Sa = 0x0007. The bits in this register are sticky − floating-point
instructions can only set bits in the register, and the only way to clear the register is by
using the MTS instruction. The following figure illustrates the FSR register and
provides bit descriptions and reset values.
Figure 2-8: FSR
Table 2-14: Floating-Point Status Register (FSR)
Table 2-14
MicroBlaze Processor Reference Guide33
UG984 (v2018.2) June 21, 2018www.xilinx.com
BitsNameDescriptionReset Value
0:26Reservedundefined
27IOInvalid operation0
28DZDivide-by-zero0
29OFOverflow0
30UFUnderflow0
31DODenormalized operand error0
X-Ref Target - Figure 2-9
EDR
310
X19746-082517
SLR
310
X19747-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
Exception Data Register (EDR)
The Exception Data Register stores data read on an AXI4-Stream link that caused a stream
exception.
The contents of this register is undefined for all other exceptions. When read with the MFS
instruction, the EDR is specified by setting Sa = 0x000D. The following figure illustrates the
EDR register and
Note: The register is only implemented if C_FSL_LINKS is greater than 0 and C_FSL_EXCEPTION
is set to 1.
Table 2-15 provides bit descriptions and reset values.
Figure 2-9: EDR
X-Ref Target - Figure 2-10
Table 2-15: Exception Data Register (EDR)
BitsNameDescriptionReset Value
0:31EDRException Data Register0x00000000
Stack Low Register (SLR)
The Stack Low Register stores the stack low limit use to detect stack overflow. When the
address of a load or store instruction using the stack pointer (register R1) as rA is less than
the Stack Low Register, a stack overflow occurs, causing a Stack Protection Violation
exception if exceptions are enabled in MSR.
When read with the MFS instruction, the SLR is specified by setting Sa = 0x0800.
Figure 2-10 illustrates the SLR register and Table 2-16 provides bit descriptions and reset
values.
Note: The register is only implemented if stack protection is enabled by setting the parameter
C_USE_STACK_PROTECTION to 1. If sta ck protection is not implemented, writing to the register has
no effect.
Note: Stack protection is not available when the MMU is enabled (C_USE_MMU > 0). With the MMU
page-based memory protection is provided through the UTLB instead.
MicroBlaze Processor Reference Guide34
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-10: SLR
Chapter 2: MicroBlaze Architecture
SHR
310
X19748-082517
SendFeedback
Table 2-16: Stack Low Register (SLR)
BitsNameDescriptionReset Value
0:31SLRStack Low Register0x00000000
Stack High Register (SHR)
The Stack High Register stores the stack high limit use to detect stack underflow. When the
address of a load or store instruction using the stack pointer (register R1) as rA is greater
than the Stack High Register, a stack underflow occurs, causing a Stack Protection Violation
exception if exceptions are enabled in MSR.
When read with the MFS instruction, the SHR is specified by setting Sa = 0x0802. The
following figure illustrates the SHR register and
reset values.
Note: The register is only implemented if stack protection is enabled by setting the parameter
C_USE_STACK_PROTECTION to 1. If sta ck protection is not implemented, writing to the register has
no effect.
Table 2-17 provides bit descriptions and
X-Ref Target - Figure 2-11
Note: Stack protection is not available when the MMU is enabled (C_USE_MMU > 0). With the MMU
page-based memory protection is provided through the UTLB instead.
Figure 2-11: SHR
Table 2-17: Stack High Register (SHR)
BitsNameDescriptionReset Value
0:31SHRStack High Register0xFFFFFFFF
Process Identifier Register (PID)
The Process Identifier Register is used to uniquely identify a software process during MMU
address translation. It is controlled by the
The register is only implemented if
C_AREA_OPTIMIZED is set to 0 (Performance) or 2 (Frequency).
C_USE_MMU is greater than 1 (User Mode) and
C_USE_MMU configuration option on MicroBlaze.
MicroBlaze Processor Reference Guide35
UG984 (v2018.2) June 21, 2018www.xilinx.com
When accessed with the MFS and MTS instructions, the PID is specified by setting Sa =
0x1000. The register is accessible according to the memory management special registers
parameter
C_MMU_TLB_ACCESS.
X-Ref Target - Figure 2-12
31
24
PID
RESERVED
0
X19749-091317
30
ZP15
28
ZP14
26
ZP13
24
ZP12
22
ZP11
20
ZP10
18
ZP9
16
ZP8
14
ZP7
12
ZP6
10
ZP5
8
ZP4
6
ZP3
4
ZP2
2
ZP1
0
ZP0
X19750-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
PID is also used when accessing a TLB entry:
•When writing Translation Look-Aside Buffer High (TLBHI) the value of PID is stored in
the TID field of the TLB entry
•When reading TLBHI and MSR[UM] is not set, the value in the TID field is stored in PID
The following figure illustrates the PID register and Table 2-18 provides bit descriptions and
reset values.
Figure 2-12: PID
Table 2-18: Process Identifier Register (PID)
BitsNameDescriptionReset Value
X-Ref Target - Figure 2-13
0:23Reserved
24:31PIDUsed to uniquely identify a software process during MMU
0x00
address translation.
Read/Write
Zone Protection Register (ZPR)
The Zone Protection Register is used to override MMU memory protection defined in TLB
entries. It is controlled by the
is only implemented if
C_USE_MMU is greater than 1 (User Mode), C_AREA_OPTIMIZED is set
to 0 (Performance) or 2 (Frequency), and if the number of specified memory protection
zones is greater than zero (
number of specified memory protection zones (
MFS and MTS instructions, the ZPR is specified by setting Sa = 0x1001. The register is
accessible according to the memory management special registers parameter
C_MMU_TLB_ACCESS.
The following figure illustrates the ZPR register and Table 2-19 provides bit descriptions
and reset values.
C_USE_MMU configuration option on MicroBlaze. The register
C_MMU_ZONES > 0). The implemented register bits depend on the
C_MMU_ZONES). When accessed with the
MicroBlaze Processor Reference Guide36
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-13: ZPR
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-19: Zone Protection Register (ZPR)
BitsNameDescriptionReset Value
0:1
2:3
...
30:31
ZP0
ZP1
...
ZP15
Zone Protect
User mode (MSR[UM] = 1):
00 = Override V in TLB entry. No access to the page is allowed
01 = No override. Use V, WR and EX from TLB entry
10 = No override. Use V, WR and EX from TLB entry
11 = Override WR and EX in TLB entry. Access the page as writable
and executable
Privileged mode (MSR[UM] = 0):
00 = No override. Use V, WR and EX from TLB entry
01 = No override. Use V, WR and EX from TLB entry
10 = Override WR and EX in TLB entry. Access the page as writable
and executable
11 = Override WR and EX in TLB entry. Access the page as writable
and executable
The Translation Look-Aside Buffer Low Register is used to access MMU Unified Translation
Look-Aside Buffer (UTLB) entries. It is controlled by the
MicroBlaze. The register is only implemented if
and
C_AREA_OPTIMIZED is set to 0 (Performance) or 2 (Frequency). When accessed with the
C_USE_MMU is greater than 1 (User Mode),
MFS and MTS instructions, the TLBLO is specified by setting Sa = 0x1003.
C_USE_MMU configuration option on
When reading or writing TLBLO, the UTLB entry indexed by the TLBX register is accessed.
The register is readable according to the memory management special registers parameter
C_MMU_TLB_ACCESS.
When the MMU Physical Address Extension (PAE) is enabled (parameters C_USE_MMU = 3
and
C_ADDR_SIZE > 32), the 32 least significant bits of TLBLO are accessed with the MFS
and MTS instructions, and the most significant bits with the MFSE and MTSE instruction.
When writing the register with PAE enabled, the most significant bits must be written first.
The UTLB is reset on bit stream download (reset value is 0x00000000 for all TLBLO entries).
Note: The UTLB is not reset by the external reset inputs: Reset and Debug_Rst. This means that
the entire UTLB must be initialized after reset, to avoid any stale data.
The following figure illustrates the TLBLO register and Table 2-20 provides bit descriptions
and reset values. When PAE is enabled the RPN field of the register is extended according
to the
C_ADDR_SIZE parameter up to 54 bits to be able to hold up to a 64-bit physical
When a TLB hit occurs, this field is read from the TLB entry and is
used to form the physical address. Depending on the value of the
SIZE field, some of the RPN bits are not used in the physical address.
Software must clear unused bits in this field to zero.
Only defined when C_USE_MMU=3 (Virtual).
Read/Write
EXExecutable
When bit is set to 1, the page contains executable code, and
instructions can be fetched from the page. When bit is cleared to 0,
instructions cannot be fetched from the page. Attempts to fetch
instructions from a page with a clear EX bit cause an instructionstorage exception.
Read/Write
WRWritable
When bit is set to 1, the page is writable and store instructions can
be used to store data at addresses within the page.
When bit is cleared to 0, the page is read-only (not writable).
Attempts to store data into a page with a clear WR bit cause a data
storage exception.
Read/Write
0x000000
0
0
24:27
n-8:n-5
ZSELZone Select
This field selects one of 16 zone fields (Z0-Z15) from the zoneprotection register (ZPR).
For example, if ZSEL 0x5, zone field Z5 is selected. The selected ZPR
field is used to modify the access protection specified by the TLB
entry EX and WR fields. It is also used to prevent access to a page by
overriding the TLB V (valid) field.
When the parameter C_DCACHE_USE_WRITEBACK is set to 1, this
bit controls caching policy. A write-through policy is selected when
set to 1, and a write-back policy is selected otherwise.
This bit is fixed to 1, and write-through is always used, when
C_DCACHE_USE_WRITEBACK is cleared to 0.
Read/Write
IInhibit Caching
When bit is set to 1, accesses to the page are not cached (caching is
inhibited).
When cleared to 0, accesses to the page are cacheable.
Read/Write
MMemory Coherent
This bit is fixed to 0, because memory coherence is not implemented
on MicroBlaze.
Read Only
GGuarded
When bit is set to 1, speculative page accesses are not allowed
(memory is guarded).
When cleared to 0, speculative page accesses are allowed.
The G attribute can be used to protect memory-mapped I/O devices
from inappropriate instruction accesses.
Read/Write
0/1
0
0
0
1. The bit index n = C_ADDR_SIZE applies when PAE is enabled.
Translation Look-Aside Buffer High Register (TLBHI)
The Translation Look-Aside Buffer High Register is used to access MMU Unified Translation
Look-Aside Buffer (UTLB) entries. It is controlled by the
MicroBlaze. The register is only implemented if
and
C_AREA_OPTIMIZED is set to 0 (Performance) or 2 (Frequency). When accessed with the
C_USE_MMU is greater than 1 (User Mode),
MFS and MTS instructions, the TLBHI is specified by setting Sa = 0x1004. When reading or
writing TLBHI, the UTLB entry indexed by the TLBX register is accessed.
The register is readable according to the memory management special registers parameter
C_MMU_TLB_ACCESS.
PID is also used when accessing a TLB entry:
•When writing TLBHI the value of PID is stored in the TID field of the TLB entry
•When reading TLBHI and MSR[UM] is not set, the value in the TID field is stored in PID
The UTLB is reset on bit stream download (reset value is 0x00000000 for all TLBHI entries).
C_USE_MMU configuration option on
MicroBlaze Processor Reference Guide39
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-15
TAG
22
0
31
28
27
26
25
SIZE
V E U0 Reserved
X19752-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
Note: The UTLB is not reset by the external reset inputs: Reset and Debug_Rst.
The following figure illustrates the TLBHI register and Table 2-21 provides bit descriptions
and reset values.
Figure 2-15: TLBHI
Table 2-21: Translation Look-Aside Buffer High Register (TLBHI)
BitsNameDescriptionReset Value
0:21TAGTLB-entry tag
Is compared with the page number portion of the virtual memory
address under the control of the SIZE field.
Read/Write
22:24SIZESize
Specifies the page size. The SIZE field controls the bit range used in
comparing the TAG field with the page number portion of the virtual
memory address. The page sizes defined by this field are listed in
Table 2-38.
Read/Write
25VValid
When this bit is set to 1, the TLB entry is valid and contains a pagetranslation entry.
When cleared to 0, the TLB entry is invalid.
Read/Write
26EEndian
When this bit is set to 1, the page is accessed as a big endian page.
When cleared to 0, the page is accessed as a little endian page.
The E bit only affects data read or data write accesses. Instruction
accesses are not affected.
The E bit is only implemented when the parameter
C_USE_REORDER_INSTR is set to 1, otherwise it is fixed to 0.
Read/Write
0x000000
000
0
0
MicroBlaze Processor Reference Guide40
UG984 (v2018.2) June 21, 2018www.xilinx.com
27U0User Defined
This bit is fixed to 0, since there are no user defined storage
attributes on MicroBlaze.
Read Only
28:31Reserved
0
X-Ref Target - Figure 2-16
31
26
INDEX
ReservedMISS
0
X19753-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
Translation Look-Aside Buffer Index Register (TLBX)
The Translation Look-Aside Buffer Index Register is used as an index to the Unified
Translation Look-Aside Buffer (UTLB) when accessing the TLBLO and TLBHI registers. It is
controlled by the
implemented if
0 (Performance) or 2 (Frequency). When accessed with the MFS and MTS instructions, the
TLBX is specified by setting
The following figure illustrates the TLBX register and Table 2-22 provides bit descriptions
and reset values.
C_USE_MMU configuration option on MicroBlaze. The register is only
C_USE_MMU is greater than 1 (User Mode), and C_AREA_OPTIMIZED is set to
Sa = 0x1002.
Figure 2-16: TLBX
Table 2-22: Translation Look-Aside Buffer Index Register (TLBX)
BitsNameDescriptionReset Value
0MISSTLB Miss
This bit is cleared to 0 when the TLBSX register is written with a
virtual address, and the virtual address is found in a TLB entry.
The bit is set to 1 if the virtual address is not found. It is also cleared
when the TLBX register itself is written.
Read Only
Can be read if the memory management special registers
parameter
1:25Reserved
26:31INDEXTLB Index
This field is used to index the Translation Look-Aside Buffer entry
accessed by the TLBLO and TLBHI registers. The field is updated
with a TLB index when the TLBSX register is written with a virtual
address, and the virtual address is found in the corresponding TLB
entry.
Read/Write
Can be read and written if the memory management special
registers parameter
C_MMU_TLB_ACCESS > 0 (MINIMAL).
C_MMU_TLB_ACCESS > 0 (MINIMAL).
0
000000
MicroBlaze Processor Reference Guide41
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-17
31
22
Reserved
VPN
0
X19754-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
Translation Look-Aside Buffer Search Index Register (TLBSX)
The Translation Look-Aside Buffer Search Index Register (TLBSX) is used to search for a
virtual page number in the Unified Translation Look-Aside Buffer (UTLB). It is controlled by
the
C_USE_MMU configuration option on the MicroBlaze processor.
The register is only implemented if C_USE_MMU is greater than 1 (User Mode), and
C_AREA_OPTIMIZED is set to 0 (Performance) or 2 (Frequency).
When written with the MTS instruction, the TLBSX is specified by setting Sa = 0x1005. The
following figure illustrates the TLBSX register and
reset values.
Figure 2-17: TLBSX
Table 2-23 provides bit descriptions and
Table 2-23: Translation Look-Aside Buffer Index Search Register (TLBSX)
BitsNameDescriptionReset Value
0:21VPNVirtual Page Number
This field represents the page number portion of the virtual memory
address. It is compared with the page number portion of the virtual
memory address under the control of the SIZE field, in each of the
Translation Look-Aside Buffer entries that have the V bit set to 1.
If the virtual page number is found, the TLBX register is written with
the index of the TLB entry and the MISS bit in TLBX is cleared to 0. If
the virtual page number is not found in any of the TLB entries, the
MISS bit in the TLBX register is set to 1.
Write Only
22:31Reserved
Processor Version Register (PVR)
The Processor Version Register is controlled by the C_PVR configuration option on
MicroBlaze.
•When C_PVR is set to 0 (None) the processor does not implement any PVR and
MSR[PVR]=0.
MicroBlaze Processor Reference Guide42
UG984 (v2018.2) June 21, 2018www.xilinx.com
•When
C_PVR is set to 1 (Basic), MicroBlaze implements only the first register: PVR0, and
if set to 2 (Full), all 13 PVR registers (PVR0 to PVR12) are implemented.
When read with the MFS or MFSE instruction the PVR is specified by setting Sa = 0x200x,
with x being the register number between 0x0 and 0xB.
Chapter 2: MicroBlaze Architecture
SendFeedback
With extended data addressing is enabled (parameter C_ADDR_SIZE > 32), the 32 least
significant bits of PVR8 and PVR9 are read with the MFS instruction, and the most
significant bits with the MFSE instruction.
When physical address extension (PAE) is enabled (parameters C_USE_MMU = 3 and
C_ADDR_SIZE > 32), the 32 least significant bits of PVR6 and PVR7 are read with the MFS
instruction, and the most significant bits with the MFSE instruction.
Table 2-24 through Table 2-36 provide bit descriptions and values.
0:31VECTORSLocation of MicroBlaze vectorsC_BASE_VECTORS
MicroBlaze Processor Reference Guide48
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Pipeline Architecture
MicroBlaze instruction execution is pipelined. For most instructions, each stage takes one
clock cycle to complete. Consequently, the number of clock cycles necessary for a specific
instruction to complete is equal to the number of pipeline stages, and one instruction is
completed on every cycle in the absence of data, control or structural hazards.
A data hazard occurs when the result of an instruction is needed by a subsequent
instruction. This can result in stalling the pipeline, unless the result can be forwarded to the
subsequent instruction. The MicroBlaze GNU Compiler attempts to avoid data hazards by
reordering instructions during optimization.
A control hazard occurs when a branch is taken, and the next instruction is not immediately
available. This results in stalling the pipeline. MicroBlaze provides delay slot branches and
the optional branch target cache to reduce the number of stall cycles.
A structural hazard occurs for a few instructions that require multiple clock cycles in the
execute stage or a later stage to complete. This is achieved by stalling the pipeline.
Load and store instructions accessing slower memory might take multiple cycles. The
pipeline is stalled until the access completes. MicroBlaze provides the optional data cache
to improve the average latency of slower memory.
When executing from slower memory, instruction fetches might take multiple cycles. This
additional latency directly affects the efficiency of the pipeline. MicroBlaze implements an
instruction prefetch buffer that reduces the impact of such multi-cycle instruction memory
latency. While the pipeline is stalled for any other reason, the prefetch buffer continues to
load sequential instructions speculatively. When the pipeline resumes execution, the fetch
stage can load new instructions directly from the prefetch buffer instead of waiting for the
instruction memory access to complete.
If instructions are modified during execution (for example with self-modifying code), the
prefetch buffer should be emptied before executing the modified instructions, to ensure
that it does not contain the old unmodified instructions.
RECOMMENDED: The recommended way to do this is using an MBAR instruction, although it is also
possible to use a synchronizing branch instruction, for example BRI 4.
MicroBlaze also provides the optional instruction cache to improve the average instruction
fetch latency of slower memory.
MicroBlaze Processor Reference Guide49
UG984 (v2018.2) June 21, 2018www.xilinx.com
All hazards are independent, and can potentially occur simultaneously. In such cases, the
number of cycles the pipeline is stalled is defined by the hazard with the longest stall
duration.
Chapter 2: MicroBlaze Architecture
SendFeedback
Three Stage Pipeline
With C_AREA_OPTIMIZED set to 1 (Area), the pipeline is divided into three stages to
minimize hardware cost: Fetch, Decode, and Execute.
cycle1cycle2cycle3cycle4cycle5cycle6cycle7
instruction 1FetchDecodeExecute
instruction 2FetchDecodeExecuteExecuteExecute
instruction 3FetchDecode
StallStallExecute
The three stage pipeline does not have any data hazards. Pipeline stalls are caused by
control hazards, structural hazards due to multi-cycle instructions, memory accesses using
slower memory, instruction fetch from slower memory, or stream accesses.
The multi-cycle instruction categories are barrel shift, multiply, divide and floating-point
instructions.
Five Stage Pipeline
With C_AREA_OPTIMIZED set to 0 (Performance), the pipeline is divided into five stages to
maximize performance: Fetch (IF), Decode (OF), Execute (EX), Access Memory (MEM), and
Writeback (WB).
The five stage pipeline has two kinds of data hazard:
StallStallMEMWB
MicroBlaze Processor Reference Guide50
UG984 (v2018.2) June 21, 2018www.xilinx.com
•An instruction in OF needs the result from an instruction in EX as a source operand. In
this case, the EX instruction categories are load, store, barrel shift, multiply, divide, and
floating-point instructions. This results in a 1-2 cycle stall.
•An instruction in OF uses the result from an instruction in MEM as a source operand. In
this case, the MEM instruction categories are load, multiply, and floating-point
instructions. This results in a 1 cycle stall.
Pipeline stalls are caused by data hazards, control hazards, structural hazards due to multicycle instructions, memory accesses using slower memory, instruction fetch from slower
memory, or stream accesses.
The multi-cycle instruction categories are divide and floating-point instructions.
Chapter 2: MicroBlaze Architecture
SendFeedback
Eight Stage Pipeline
With C_AREA_OPTIMIZED set to 2 (Frequency), the pipeline is divided into eight stages to
maximize possible frequency: Fetch (IF), Decode (OF), Execute (EX), Access Memory 0 (M0),
Access Memory 1 (M1), Access Memory 2 (M2), Access Memory 3 (M3) and Writeback (WB).
The eight stage pipeline has four kinds of data hazard:
•An instruction in OF needs the result from an instruction in EX as a source operand. In
this case, the EX instruction categories are load, store, barrel shift, multiply, divide, and
floating-point instructions. This results in a 1-5 cycle stall.
•An instruction in OF uses the result from an instruction in M0 as a source operand. In
this case, the M0 instruction categories are load, multiply, divide, and floating-point
instructions. This results in a 1-4 cycle stall.
•An instruction in OF uses the result from an instruction in M1 or M2 as a source
operand. In this case, the M1 or M2 instruction categories are load, divide, and
floating-point instructions. This results in a 1-3 or 1-2 cycle stall respectively.
•An instruction in OF uses the result from an instruction in M3 as a source operand. In
this case, M3 instruction categories are load and floating-point instructions. This results
in a 1 cycle stall.
In addition to multi-cycle instructions, there are three other kinds of structural hazards:
MicroBlaze Processor Reference Guide51
UG984 (v2018.2) June 21, 2018www.xilinx.com
•An instruction in OF is a stream instruction, and the instruction in EX is a stream, load,
store, divide, or floating-point instruction with corresponding exception implemented.
This results in a 1 cycle stall.
•An instruction in OF is a stream instruction, and the instruction in M0, M1, M2 or M3 is
a load, store, divide, or floating-point instruction with corresponding exception
implemented. This results in a 1 cycle stall.
•An instruction in M0 is a load or store instruction, and the instruction in M1, M2 or M3
is a load, store, divide, or floating-point instruction with corresponding exception
implemented. This results in a 1 cycle stall.
Pipeline stalls are caused by data hazards, control hazards, structural hazards, memory
accesses using slower memory, instruction fetch from slower memory, or stream accesses.
The multi-cycle instruction categories are divide instructions and floating-point instructions
FDIV, FINT, and FSQRT.
Chapter 2: MicroBlaze Architecture
SendFeedback
Branches
Normally the instructions in the fetch and decode stages (as well as prefetch buffer) are
flushed when executing a taken branch. The fetch pipeline stage is then reloaded with a new
instruction from the calculated branch address. A taken branch in MicroBlaze takes three
clock cycles to execute, two of which are required for refilling the pipeline. To reduce this
latency overhead, MicroBlaze supports branches with delay slots and the optional branch
target cache.
Delay Slots
When executing a taken branch with delay slot, only the fetch pipeline stage in MicroBlaze
is flushed. The instruction in the decode stage (branch delay slot) is allowed to complete.
This technique effectively reduces the branch penalty from two clock cycles to one. Branch
instructions with delay slots have a D appended to the instruction mnemonic. For example,
the BNE instruction does not execute the subsequent instruction (does not have a delay
slot), whereas BNED executes the next instruction before control is transferred to the
branch location.
A delay slot must not contain the following instructions: IMM, branch, or break. Interrupts
and external hardware breaks are deferred until after the delay slot branch has been
completed. Instructions that could cause recoverable exceptions (for example unaligned
word or halfword load and store) are allowed in the delay slot.
If an exception is caused in a delay slot the ESR[DS] bit is set, and the exception handler is
responsible for returning the execution to the branch target (stored in the special purpose
register BTR). If the ESR[DS] bit is set, register R17 is not valid (otherwise it contains the
address following the instruction causing the exception).
Branch Target Cache
To improve branch performance, MicroBlaze provides a branch target cache (BTC) coupled
with a branch prediction scheme. With the BTC enabled, a correctly predicted immediate
branch or return instruction incurs no overhead.
The BTC operates by saving the target address of each immediate branch and return
instruction the first time the instruction is encountered. The next time it is encountered, it
is usually found in the Branch Target Cache, and the Instruction Fetch Program Counter is
then simply changed to the saved target address, in case the branch should be taken.
Unconditional branches and return instructions are always taken, whereas conditional
branches use branch prediction, to avoid taking a branch that should not have been taken
and vice versa.
MicroBlaze Processor Reference Guide52
UG984 (v2018.2) June 21, 2018www.xilinx.com
The BTC is cleared when a memory barrier (MBAR 0) or synchronizing branch (BRI 4) is
executed. This also occurs when the memory barrier or synchronizing branch follows
immediately after a branch instruction, even if that branch is taken. To avoid inadvertently
Chapter 2: MicroBlaze Architecture
SendFeedback
clearing the BTC, the memory barrier or synchronizing branch should not be placed
immediately after a branch instruction.
There are three cases where the branch prediction can cause a mispredict, namely:
•A conditional branch that should not have been taken, is actually taken,
•A conditional branch that should actually have been taken, is not taken,
•The target address of a return instruction is incorrect, which might occur when
returning from a function called from different places in the code.
All of these cases are detected and corrected when the branch or return instruction reaches
the execute stage, and the branch prediction bits or target address are updated in the BTC,
to reflect the actual instruction behavior. This correction incurs a penalty of 2 clock cycles
for the 5-stage pipeline and 7-9 clock cycles for the 8-stage pipeline.
The size of the BTC can be selected with C_BRANCH_TARGET_CACHE_SIZE. The default
recommended setting uses one block RAM, and provides 512 entries. When selecting 64
entries or below, distributed RAM is used to implement the BTC, otherwise block RAM is
used.
When the BTC uses block RAM, and C_FAULT_TOLERANT is set to 1, block RAMs are
protected by parity. In case of a parity error, the branch is not predicted. To avoid
accumulating errors in this case, the BTC should be cleared periodically by a synchronizing
branch.
The Branch Target Cache is available when C_USE_BRANCH_TARGET_CACHE is set to 1 and
C_AREA_OPTIMIZED is set to 0 (Performance) or 2 (Frequency).
Pipeline Hazard Example
The effect of a data hazard is illustrated in Table 2-37, using the five stage pipeline.
The example shows a data hazard for a multiplication instruction, where the subsequent
add instruction needs the result in register r3 to proceed. This means that the add
instruction is stalled in OF during cycle 3 and 4 until the multiplication is complete.
Table 2-37: Multiplication Data Hazard Example
CycleIFOFEXMEMWB
1mul r3, r4, r5
2add r6, r3, r4mul r3, r4, r5
3add r6, r3, r4mul r3, r4, r5
4add r6, r3, r4-mul r3, r4, r5
5add r6, r3, r4--mul r3, r4, r5
6add r6, r3, r4--
MicroBlaze Processor Reference Guide53
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Avoiding Data Hazards
In some cases, the MicroBlaze GNU Compiler is not able to optimize code to completely
avoid data hazards. However, it is often possible to change the source code in order to
achieve this, mainly by better utilization of the general purpose registers.
Two C code examples are shown here:
•Multiplication of a static array in memory
static int a[4], b[4], c[4];
register int a0, a1, a2, a3, b0, b1, b2, b3, c0, c1, c2, c3;
This code ensures that load instructions are first executed to load operands into
separate registers, which are then multiplied and finally stored. The code can be
extended up to 8 multiplications without running out of general purpose registers.
•Fetching a data packet from an AXI4-Stream interface.
This code ensures that get instructions using different registers are first executed, and
then data is stored. The code can be extended to up to 16 accesses without running out
of general purpose registers.
MicroBlaze Processor Reference Guide54
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Memory Architecture
MicroBlaze is implemented with a Harvard memory architecture; instruction and data
accesses are done in separate address spaces.
The instruction address space has a 32-bit virtual address range (that is, handles up to 4GB
of instructions), and can be extended up to a 64-bit physical address range when using the
MMU in virtual mode.
The data address space has a default 32-bit range, and can be extended up to a 64-bit
range (that is, handles from 4GB to 16EB of data). The instruction and data memory ranges
can be made to overlap by mapping them both to the same physical memory. The latter is
necessary for software debugging.
Both instruction and data interfaces of MicroBlaze are default 32 bits wide and use big
endian or little endian, bit-reversed format, depending on the selected endianness.
MicroBlaze supports word, halfword, and byte accesses to data memory.
Big endian format is only supported when using the MMU in virtual or protected mode
(
C_USE_MMU > 1) or when reorder instructions are enabled (C_USE_REORDER_INSTR = 1).
Data accesses must be aligned (word accesses must be on word boundaries, halfword on
halfword boundaries), unless the processor is configured to support unaligned exceptions.
All instruction accesses must be word aligned.
MicroBlaze prefetches instructions to improve performance, using the instruction prefetch
buffer and (if enabled) instruction cache streams. To avoid attempts to prefetch instructions
beyond the end of physical memory, which might cause an instruction bus error or a
processor stall, instructions must not be located too close to the end of physical memory.
The instruction prefetch buffer requires 16 bytes margin, and using instruction cache
streams adds two additional cache lines (32, 64 or 128 bytes).
MicroBlaze does not separate data accesses to I/O and memory (it uses memory-mapped
I/O). The processor has up to three interfaces for memory accesses:
•Local Memory Bus (LMB)
•Advanced eXtensible Interface (AXI4) for peripheral access
•Advanced eXtensible Interface (AXI4) or AXI Coherency Extension (ACE) for cache
access
MicroBlaze Processor Reference Guide55
UG984 (v2018.2) June 21, 2018www.xilinx.com
The LMB memory address range must not overlap with AXI4 ranges.
The C_ENDIANNESS parameter is always set to little endian.
Chapter 2: MicroBlaze Architecture
SendFeedback
MicroBlaze has a single cycle latency for accesses to local memory (LMB) and for cache read
hits, except with
cache read hits require two clock cycles, and with
C_AREA_OPTIMIZED set to 1 (Area), when data side accesses and data
C_FAULT_TOLERANT set to 1, when byte
writes and halfword writes to LMB normally require two clock cycles.
The data cache write latency depends on C_DCACHE_USE_WRITEBACK. When
C_DCACHE_USE_WRITEBACK is set to 1, the write latency normally is one cycle (more if the
cache needs to do memory accesses). When
C_DCACHE_USE_WRITEBACK is cleared to 0, the
write latency normally is two cycles (more if the posted-write buffer in the memory
controller is full).
The MicroBlaze instruction and data caches can be configured to use 4, 8 or 16 word cache
lines. When using a longer cache line, more bytes are prefetched, which generally improves
performance for software with sequential access patterns. However, for software with a
more random access pattern the performance can instead decrease for a given cache size.
This is caused by a reduced cache hit rate due to fewer available cache lines.
For details on the different memory interfaces, see Chapter 3, MicroBlaze Signal Interface
Description.
Privileged Instructions
The following MicroBlaze instructions are privileged:
•GET, GETD,PUT,PUTD (except when explicitly allowed)
WIC, WDC
•
•MTS, MTSE
•MSRCLR, MSRSET (except when only the C bit is affected)
BRK
•
•RTID, RTBD, RTED
•BRKI (except when jumping to physical address C_BASE_VECTORS + 0x8 or
C_BASE_VECTORS + 0x18)
SLEEP, HIBERNATE, SUSPEND
•
•LBUEA, LHUEA, LWEA, SBEA, SHEA, SWEA (except when explicitly allowed)
Attempted use of these instructions when running in user mode causes a privileged
instruction exception. When setting the parameter
instructions
GET, GETD, PUT, and PUTD are not considered privileged, and can be executed
when running in user mode.
C_MMU_PRIVILEGED_INSTR to 1 or 3, the
MicroBlaze Processor Reference Guide56
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
CAUTION! It is strongly discouraged to do this, unless absolutely necessary for performance reasons,
because it allows application processes to interfere with each other.
When setting the parameter C_MMU_PRIVILEGED_INSTR to 2 or 3, the extended address
instructions
LBUEA, LHUEA, LWEA, SBEA, SHEA, and SWEA are not considered privileged, and
will bypass the MMU translation, treating the extended address as a physical address. This
is useful to run software in virtual mode while still having direct access to the full physical
address space, but is discouraged in all cases where protection between application
processes is necessary.
There are six ways to leave user mode and virtual mode:
BRALID Re,C_BASE_VECTORS + 0x8” to perform a user vector exception
BRKI” jumping to physical address
In all of these cases, except hardware generated reset, the user mode and virtual mode
status is saved in the MSR UMS and VMS bits.
Application (user-mode) programs transfer control to system-service routines (privileged
mode programs) using the
C_BASE_VECTORS + 0x8. Executing this instruction causes a system-call exception to occur.
BRALID or BRKI instruction, jumping to physical address
The exception handler determines which system-service routine to call and whether the
calling application has permission to call that service. If permission is granted, the
exception handler performs the actual procedure call to the system-service routine on
behalf of the application program.
The execution environment expected by the system-service routine requires the execution
of prologue instructions to set up that environment. Those instructions usually create the
block of storage that holds procedural information (the activation record), update and
initialize pointers, and save volatile registers (the registers that the system-service routine
uses). Prologue code can be inserted by the linker when creating an executable module, or
it can be included as stub code in either the system-call interrupt handler or the systemlibrary routines.
MicroBlaze Processor Reference Guide57
UG984 (v2018.2) June 21, 2018www.xilinx.com
Returns from the system-service routine reverse the process described above. Epilogue
code is executed to unwind and deallocate the activation record, restore pointers, and
restore volatile registers. The interrupt handler executes a return from exception instruction
(
RTED) to return to the application.
Chapter 2: MicroBlaze Architecture
SendFeedback
Virtual-Memory Management
Programs running on MicroBlaze use effective addresses to access a flat 4 GB address
space. The processor can interpret this address space in one of two ways, depending on the
translation mode:
•In real mode, effective addresses are used to directly access physical memory
•In virtual mode, effective addresses are translated into physical addresses by the
virtual-memory management hardware in the processor
Virtual mode provides system software with the ability to relocate programs and data
anywhere in the physical address space. System software can move inactive programs and
data out of physical memory when space is required by active programs and data.
Relocation can make it appear to a program that more memory exists than is actually
implemented by the system. This frees the programmer from working within the limits
imposed by the amount of physical memory present in a system. Programmers do not need
to know which physical-memory addresses are assigned to other software processes and
hardware devices. The addresses visible to programs are translated into the appropriate
physical addresses by the processor.
Virtual mode provides greater control over memory protection. Blocks of memory as small
as 1 KB can be individually protected from unauthorized access. Protection and relocation
enable system software to support multitasking. This capability gives the appearance of
simultaneous or near-simultaneous execution of multiple programs.
In MicroBlaze, virtual mode is implemented by the memory-management unit (MMU),
available when
(Performance) or 2 (Frequency). The MMU controls effective-address to physical-address
mapping and supports memory protection. Using these capabilities, system software can
implement demand-paged virtual memory and other memory management schemes.
The MicroBlaze MMU implementation is based upon the PowerPC™ 405 processor.
The MMU features are summarized as follows:
•Translates effective addresses into physical addresses
•Controls page-level access during address translation
•Provides additional virtual-mode protection control through the use of zones
•Provides independent control over instruction-address and data-address translation
and protection
•Supports eight page sizes: 1 kB, 4 kB, 16 kB, 64 kB, 256 kB, 1 MB, 4 MB, and 16 MB. Any
combination of page sizes can be used by system software
C_USE_MMU is set to 3 (Virtual) and C_AREA_OPTIMIZED is set to 0
MicroBlaze Processor Reference Guide58
UG984 (v2018.2) June 21, 2018www.xilinx.com
•Software controls the page-replacement strategy
Chapter 2: MicroBlaze Architecture
31
24
Processor ID Register
31
n
32-bit Effective Address
0
Effective Page Number
Offset
39
n+8
40-bit Virtual Address
8
Effective Page Number
OffsetPID
0
Translation Look-Aside
Buffer (TLB) Look-Up
31
32-bit Physical Address
0
Real Page NumberOffset
32-63
Up to 64-bit Physical Address
0
Physical Address Extension: Real Page Number
Offset
or
0
X19755-091317
SendFeedback
Real Mode
The processor references memory when it fetches an instruction and when it accesses data
with a load or store instruction. Programs reference memory locations using a 32-bit
effective address calculated by the processor. When real mode is enabled, the physical
address is identical to the effective address and the processor uses it to access physical
memory. After a processor reset, the processor operates in real mode. Real mode can also
be enabled by clearing the VM bit in the MSR.
Physical-memory data accesses (loads and stores) are performed in real mode using the
effective address. Real mode does not provide system software with virtual address
translation, but the full memory access-protection is available, implemented when
C_USE_MMU > 1 (User Mode) and C_AREA_OPTIMIZED = 0 (Performance) or 2 (Frequency).
Implementation of a real-mode memory manager is more straightforward than a virtualmode memory manager. Real mode is often an appropriate solution for memory
management in simple embedded environments, when access-protection is necessary, but
virtual address translation is not required.
Virtual Mode
In virtual mode, the processor translates an effective address into a physical address using
the process shown in
address can be extended up to 64 bits. Virtual mode can be enabled by setting the VM bit
in the MSR.
X-Ref Target - Figure 2-18
Figure 2-18. With the Physical Address Extension (PAE) the physical
MicroBlaze Processor Reference Guide59
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-18: Virtual-Mode Address Translation
Chapter 2: MicroBlaze Architecture
SendFeedback
Each address shown in Figure 2-18 contains a page-number field and an offset field. The
page number represents the portion of the address translated by the MMU. The offset
represents the byte offset into a page and is not translated by the MMU. The virtual address
consists of an additional field, called the process ID (PID), which is taken from the PID
register (see Process-ID Register, page
number (EPN) is referred to as the virtual page number (VPN). The value n is determined by
the page size, as shown in
System software maintains a page-translation table that contains entries used to translate
each virtual page into a physical page. The page size defined by a page translation entry
determines the size of the page number and offset fields. For example, when a 4 kB page
size is used, the page-number field is 20 bits and the offset field is 12 bits. The VPN in this
case is 28 bits.
Then the most frequently used page translations are stored in the translation look-aside
buffer (TLB). When translating a virtual address, the MMU examines the page-translation
entries for a matching VPN (PID and EPN). Rather than examining all entries in the table,
only entries contained in the processor TLB are examined. When a page-translation entry is
found with a matching VPN, the corresponding physical-page number is read from the
entry and combined with the offset to form the physical address. This physical address is
used by the processor to reference memory.
Table 2-38.
36). The combination of PID and effective page
System software can use the PID to uniquely identify software processes (tasks, subroutines,
threads) running on the processor. Independently compiled processes can operate in
effective-address regions that overlap each other. This overlap must be resolved by system
software if multitasking is supported. Assigning a PID to each process enables system
software to resolve the overlap by relocating each process into a unique region of virtualaddress space. The virtual-address space mappings enable independent translation of each
process into the physical-address space.
Page-Translation Table
The page-translation table is a software-defined and software-managed data structure
containing page translations. The requirement for software-managed page translation
represents an architectural trade-off targeted at embedded-system applications.
Embedded systems tend to have a tightly controlled operating environment and a welldefined set of application software. That environment enables virtual-memory
management to be optimized for each embedded system in the following ways:
•The page-translation table can be organized to maximize page-table search
performance (also called table walking) so that a given page-translation entry is
located quickly. Most general-purpose processors implement either an indexed page
table (simple search method, large page-table size) or a hashed page table (complex
search method, small page-table size). With software table walking, any hybrid
organization can be employed that suits the particular embedded system. Both the
page-table size and access time can be optimized.
MicroBlaze Processor Reference Guide60
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
•Independent page sizes can be used for application modules, device drivers, system
service routines, and data. Independent page-size selection enables system software to
more efficiently use memory by reducing fragmentation (unused memory). For
example, a large data structure can be allocated to a 16 MB page and a small I/O
device-driver can be allocated to a 1 KB page.
•Page replacement can be tuned to minimize the occurrence of missing page
translations. As described in the following section, the most-frequently used page
translations are stored in the translation look-aside buffer (TLB).
Software is responsible for deciding which translations are stored in the TLB and which
translations are replaced when a new translation is required. The replacement strategy
can be tuned to avoid thrashing, whereby page-translation entries are constantly being
moved in and out of the TLB. The replacement strategy can also be tuned to prevent
replacement of critical-page translations, a process sometimes referred to as page
locking.
The unified 64-entry TLB, managed by software, caches a subset of instruction and data
page-translation entries accessible by the MMU. Software is responsible for reading entries
from the page-translation table in system memory and storing them in the TLB. The
following section describes the unified TLB in more detail. Internally, the MMU also contains
shadow TLBs for instructions and data, with sizes configurable by
C_MMU_DTLB_SIZE respectively.
C_MMU_ITLB_SIZE and
These shadow TLBs are managed entirely by the processor (transparent to software) and are
used to minimize access conflicts with the unified TLB.
Translation Look-Aside Buffer
The translation look-aside buffer (TLB) is used by the MicroBlaze MMU for address
translation when the processor is running in virtual mode, memory protection, and storage
control. Each entry within the TLB contains the information necessary to identify a virtual
page (PID and effective page number), specify its translation into a physical page,
determine the protection characteristics of the page, and specify the storage attributes
associated with the page.
The MicroBlaze TLB is physically implemented as three separate TLBs:
•Unified TLB: The UTLB contains 64 entries and is pseudo-associative. Instruction-page
and data-page translation can be stored in any UTLB entry. The initialization and
management of the UTLB is controlled completely by software.
•Instruction Shadow TLB: The ITLB contains instruction page-translation entries and is
fully associative. The page-translation entries stored in the ITLB represent the mostrecently accessed instruction-page translations from the UTLB. The ITLB is used to
minimize contention between instruction translation and UTLB-update operations. The
initialization and management of the ITLB is controlled completely by hardware and is
transparent to software.
MicroBlaze Processor Reference Guide61
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-19
Perform DTLB
Look-Up
Generate I-side
Effective Address
No Translation
Perform ITLB
Look-Up
Translation Disabled
(MSR[VM]=0)
Translation Enabled
(MSR[VM]=1)
Generate D-side
Effective Address
No Translation
Translation Enabled
(MSR[VM]=1)
Translation Disabled
(MSR[VM]=0)
ITLB HitITLB MissDTLB Miss DTLB Hit
Extract Real
Address from ITLB
Perform UTLB
Look-Up
Extract Real
Address from DTLB
Continue I-cache
Access
Continue I-cache
or D-cache
Access
UTLB HitUTLB Miss
Extract Real
Address from UTLB
I-Side TLB Miss or
D-Side TLB Miss
Exception
Route Address
to ITLB
Route Address
to DTLB
X19756-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
•Data Shadow TLB: The DTLB contains data page-translation entries and is fully
associative. The page-translation entries stored in the DTLB represent the most-recently
accessed data-page translations from the UTLB. The DTLB is used to minimize
contention between data translation and UTLB-update operations. The initialization
and management of the DTLB is controlled completely by hardware and is transparent
to software.
The following figure provides the translation flow for TLB.
Figure 2-19: TLB Address Translation Flow
MicroBlaze Processor Reference Guide62
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-20
RPN
22
0
31
28
24
23
ZSEL
W IG
TAG
22
0
3528
272625
SIZE
V ETID
TLBLO:
TLBHI:
29
30
M
U0
EX
WR
X19757-091117
SendFeedback
Chapter 2: MicroBlaze Architecture
TLB Entry Format
The following figure shows the format of a TLB entry. Each TLB entry ranges from 68 bits up
to 100 bits and is composed of two portions: TLBLO (also referred to as the data entry), and
TLBHI (also referred to as the tag entry).
Figure 2-20: TLB Entry Format (PAE Disabled)
When the Physical Address Extension (PAE) is enabled, the TLB entry is extended with up to
32 additional bits in the TLBLO RPN field to support up to a 64 bit physical address.
The TLB entry contents are described in more detail in Table 2-20 and Table 2-21, including
the TLBLO format with PAE enabled.
The fields within a TLB entry are categorized as follows:
•Virtual-page identification (TAG, SIZE, V, TID): These fields identify the page-translation
entry. They are compared with the virtual-page number during the translation process.
•Physical-page identification (RPN, SIZE): These fields identify the translated page in
physical memory.
•Access control (EX, WR, ZSEL): These fields specify the type of access allowed in the
page and are used to protect pages from improper accesses.
•Storage attributes (W, I, M, G, E, U0): These fields specify the storage-control attributes,
such as caching policy for the data cache (write-back or write-through), whether a page
is cacheable, and how bytes are ordered (endianness).
Table 2-38 shows the relationship between the TLB-entry SIZE field and the translated
page size. This table also shows how the page size determines which address bits are
involved in a tag comparison, which address bits are used as a page offset, and which bits
in the physical page number are used in the physical address. With PAE enabled, the most
significant bits of the physical address are directly taken from the extended RPN field.
MicroBlaze Processor Reference Guide63
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Table 2-38: Page-Translation Bit Ranges by Page Size
When the MMU translates a virtual address (the combination of PID and effective address)
into a physical address, it first examines the appropriate shadow TLB for the page
translation entry. If an entry is found, it is used to access physical memory. If an entry is not
found, the MMU examines the UTLB for the entry. A delay occurs each time the UTLB must
be accessed due to a shadow TLB miss. The miss latency ranges from 2-32 cycles. The DTLB
has priority over the ITLB if both simultaneously access the UTLB.
Figure 2-20 shows the logical process the MMU follows when examining a page-translation
entry in one of the shadow TLBs or the UTLB. All valid entries in the TLB are checked.
A TLB hit occurs when all of the following conditions are met by a TLB entry:
•The entry is valid
•The TAG field in the entry matches the effective address EPN under the control of the
SIZE field in the entry
•The TID field in the entry matches the PID
If any of the above conditions are not met, a TLB miss occurs. A TLB miss causes an
exception, described as follows:
MicroBlaze Processor Reference Guide64
UG984 (v2018.2) June 21, 2018www.xilinx.com
A TID value of 0x00 causes the MMU to ignore the comparison between the TID and PID.
Only the TAG and EA[EPN] are compared. A TLB entry with TID=0x00 represents a processindependent translation. Pages that are accessed globally by all processes should be
assigned a TID value of 0x00. A PID value of 0x00 does not identify a process that can access
any page. When PID=0x00, a page-translation hit only occurs when TID=0x00. It is possible
for software to load the TLB with multiple entries that match an EA[EPN] and PID
Chapter 2: MicroBlaze Architecture
SendFeedback
combination. However, this is considered a programming error and results in undefined
behavior.
When a hit occurs, the MMU reads the RPN field from the corresponding TLB entry. Some
or all of the bits in this field are used, depending on the value of the
Table 2-38).
For example, with PAE disabled, if the SIZE field specifies a 256 kB page size, RPN[0:13]
represents the physical page number and is used to form the physical address. RPN[14:21]
is not used, and software must clear those bits to 0 when initializing the TLB entry. The
remainder of the physical address is taken from the page-offset portion of the EA. If the
page size is 256 kB, the 32-bit physical address is formed by concatenating RPN[0:13] with
bits 14:31 of the effective address.
Instead, with PAE enabled and assuming a physical address size of 40 bits (C_ADDR_SIZE set
to 40), RPN[0:21] represents the physical page number and RPN[22:29] is not used. The 40bit physical address is formed by concatenating RPN[0:21] with bits 14:31 of the effective
address.
SIZE field (see
Prior to accessing physical memory, the MMU examines the TLB-entry access-control fields.
These fields indicate whether the currently executing program is allowed to perform the
requested memory access.
If access is allowed, the MMU checks the storage-attribute fields to determine how to
access the page. The storage-attribute fields specify the caching policy for memory
accesses.
TLB Access Failures
A TLB-access failure causes an exception to occur. This interrupts execution of the
instruction that caused the failure and transfers control to an interrupt handler to resolve
the failure. A TLB access can fail for two reasons:
•A matching TLB entry was not found, resulting in a TLB miss
•A matching TLB entry was found, but access to the page was prevented by either the
storage attributes or zone protection
When an interrupt occurs, the processor enters real mode by clearing MSR[VM] to 0. In real
mode, all address translation and memory-protection checks performed by the MMU are
disabled. After system software initializes the UTLB with page-translation entries,
management of the MicroBlaze UTLB is usually performed using interrupt handlers running
in real mode.
MicroBlaze Processor Reference Guide65
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-21
Check TLB-Entry
Using Virtual Address
TLB HI[V]=1TLB Entry Miss
No
TLBHI[TID]=0x00
Yes
Compare
TLBHI[TAG] with EA[EPN]
Using TLBHI[SIZE]
Compare
TLBHI[TID] with PID
TLB Entry Miss
No Match
Check AccessAccess Violation
Not allowed
Match (TLB Hit)
Allowed
Check for
Guarded Storage
Storage Violation
Guarded
Data ReferenceInstruction Fetch
Read TLBLO[RPN]
Using TLBHI[SIZE]
Extract Offset from EA
using TLBHI[SIZE]
Generate Physical Address from
TLBLO[RPN] and Offset
YesNo
Match
TLB Entry Miss
No Match
Not Guarded
X19758-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
The following figure diagrams the general process for examining a TLB entry.
MicroBlaze Processor Reference Guide66
UG984 (v2018.2) June 21, 2018www.xilinx.com
The following sections describe the conditions under which exceptions occur due to TLB
Figure 2-21: General Process for Examining a TLB Entry
access failures.
Chapter 2: MicroBlaze Architecture
SendFeedback
Data-Storage Exception
When virtual mode is enabled, (MSR[VM]=1), a data-storage exception occurs when access
to a page is not permitted for any of the following reasons:
•From user mode:
The TLB entry specifies a zone field that prevents access to the page (ZPR[Zn]=00).
-
This applies to load and store instructions.
The TLB entry specifies a read-only page (TLBLO[WR]=0) that is not otherwise
-
overridden by the zone field (ZPR[Zn]‚ 11). This applies to store instructions.
•From privileged mode:
The TLB entry specifies a read-only page (TLBLO[WR]=0) that is not otherwise
-
overridden by the zone field (ZPR[Zn]‚ 10 and ZPR[Zn]‚ 11). This applies to store
instructions.
Instruction-Storage Exception
When virtual mode is enabled, (MSR[VM]=1), an instruction-storage exception occurs when
access to a page is not permitted for any of the following reasons:
•From user mode:
The TLB entry specifies a zone field that prevents access to the page (ZPR[Zn]=00).
-
The TLB entry specifies a non-executable page (TLBLO[EX]=0) that is not otherwise
-
overridden by the zone field (ZPR[Zn]‚ 11).
The TLB entry specifies a guarded-storage page (TLBLO[G]=1).
-
•From privileged mode:
The TLB entry specifies a non-executable page (TLBLO[EX]=0) that is not otherwise
-
overridden by the zone field (ZPR[Zn]‚ 10 and ZPR[Zn]‚ 11).
The TLB entry specifies a guarded-storage page (TLBLO[G]=1).
-
Data TLB-Miss Exception
When virtual mode is enabled (MSR[VM]=1) a data TLB-miss exception occurs if a valid,
matching TLB entry was not found in the TLB (shadow and UTLB). Any load or store
instruction can cause a data TLB-miss exception.
MicroBlaze Processor Reference Guide67
UG984 (v2018.2) June 21, 2018www.xilinx.com
Instruction TLB-Miss Exception
When virtual mode is enabled (MSR[VM]=1) an instruction TLB-miss exception occurs if a
valid, matching TLB entry was not found in the TLB (shadow and UTLB). Any instruction
fetch can cause an instruction TLB-miss exception.
Chapter 2: MicroBlaze Architecture
SendFeedback
Access Protection
System software uses access protection to protect sensitive memory locations from
improper access. System software can restrict memory accesses for both user-mode and
privileged-mode software. Restrictions can be placed on reads, writes, and instruction
fetches. Access protection is available when virtual protected mode is enabled.
Access control applies to instruction fetches, data loads, and data stores. The TLB entry for
a virtual page specifies the type of access allowed to the page.
The TLB entry also specifies a zone-protection field in the zone-protection register that is
used to override the access controls specified by the TLB entry.
TLB Access-Protection Controls
Each TLB entry controls three types of access:
•Process: Processes are protected from unauthorized access by assigning a unique
process ID (PID) to each process. When system software starts a user-mode application,
it loads the PID for that application into the PID register. As the application executes,
memory addresses are translated using only TLB entries with a TID field in Translation
Look-Aside Buffer High (TLBHI) that matches the PID. This enables system software to
restrict accesses for an application to a specific area in virtual memory.
A TLB entry with TID=0x00 represents a process-independent translation. Pages that
are accessed globally by all processes should be assigned a TID value of 0x00.
•Execution: The processor executes instructions only if they are fetched from a virtual
page marked as executable (TLBLO[EX]=1). Clearing TLBLO[EX] to 0 prevents execution
of instructions fetched from a page, instead causing an instruction-storage interrupt
(ISI) to occur. The ISI does not occur when the instruction is fetched, but instead occurs
when the instruction is executed. This prevents speculatively fetched instructions that
are later discarded (rather than executed) from causing an ISI.
The zone-protection register can override execution protection.
•Read/Write: Data is written only to virtual pages marked as writable (TLBLO[WR]=1).
Clearing TLBLO[WR] to 0 marks a page as read-only. An attempt to write to a read-only
page causes a data-storage interrupt (DSI) to occur.
The zone-protection register can override write protection.
TLB entries cannot be used to prevent programs from reading pages. In virtual mode, zone
protection is used to read-protect pages. This is done by defining a no-access-allowed zone
(ZPR[Zn] = 00) and using it to override the TLB-entry access protection. Only programs
running in user mode can be prevented from reading a page. Privileged programs always
have read access to a page.
MicroBlaze Processor Reference Guide68
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Zone Protection
Zone protection is used to override the access protection specified in a TLB entry. Zones are
an arbitrary grouping of virtual pages with common access protection. Zones can contain
any number of pages specifying any combination of page sizes. There is no requirement for
a zone to contain adjacent pages.
The zone-protection register (ZPR) is a 32-bit register used to specify the type of protection
override applied to each of 16 possible zones. The protection override for a zone is encoded
in the ZPR as a 2-bit field.
The 4-bit zone-select field in a TLB entry (TLBLO[ZSEL]) selects one of the 16 zone fields
from the ZPR (Z0–Z15). For example, zone Z5 is selected when ZSEL = 0101.
Changing a zone field in the ZPR applies a protection override across all pages in that zone.
Without the ZPR, protection changes require individual alterations to each page translation
entry within the zone.
Unimplemented zones (when C_MMU_ZONES < 16) are treated as if they contained 11.
UTLB Management
The UTLB serves as the interface between the processor MMU and memory-management
software. System software manages the UTLB to tell the MMU how to translate virtual
addresses into physical addresses. When a problem occurs due to a missing translation or
an access violation, the MMU communicates the problem to system software using the
exception mechanism. System software is responsible for providing interrupt handlers to
correct these problems so that the MMU can proceed with memory translation.
Software reads and writes UTLB entries using the MFS and MTS instructions, respectively.
With PAE enabled, the MFSE and MTSE instructions are used to access the most significant
part of the real page number. These instructions use the TLBX register index (numbered 0 to
63) corresponding to one of the 64 entries in the UTLB. The tag and data portions are read
and written separately, so software must execute two MFS or MTS instructions, and also an
additional MFSE or MTSE instruction when PAE is enabled, to completely access an entry.
The UTLB is searched for a specific translation using the TLBSX register. TLBSX locates a
translation using an effective address and loads the corresponding UTLB index into the
TLBX register.
Individual UTLB entries are invalidated using the MTS instruction to clear the valid bit in the
tag portion of a TLB entry (TLBHI[V]).
MicroBlaze Processor Reference Guide69
UG984 (v2018.2) June 21, 2018www.xilinx.com
When C_FAULT_TOLERANT is set to 1, the UTLB block RAM is protected by parity. In case of
a parity error, a TLB miss exception occurs. To avoid accumulating errors in this case, each
entry in the UTLB should be periodically invalidated.
Chapter 2: MicroBlaze Architecture
SendFeedback
Recording Page Access and Page Modification
Software management of virtual-memory poses several challenges:
•In a virtual-memory environment, software and data often consume more memory than
is physically available. Some of the software and data pages must be stored outside
physical memory, such as on a hard drive, when they are not used. Ideally, the mostfrequently used pages stay in physical memory and infrequently used pages are stored
elsewhere.
•When pages in physical-memory are replaced to make room for new pages, it is
important to know whether the replaced (old) pages were modified.
If they were modified, they must be saved prior to loading the replacement (new) pages.
If the old pages were not modified, the new pages can be loaded without saving the old
pages.
•A limited number of page translations are kept in the UTLB. The remaining translations
must be stored in the page-translation table. When a translation is not found in the
UTLB (due to a miss), system software must decide which UTLB entry to discard so that
the missing translation can be loaded. It is desirable for system software to replace
infrequently used translations rather than frequently used translations.
Solving the above problems in an efficient manner requires keeping track of page accesses
and page modifications. MicroBlaze does not track page access and page modification in
hardware. Instead, system software can use the TLB-miss exceptions and the data-storage
exception to collect this information. As the information is collected, it can be stored in a
data structure associated with the page-translation table.
Page-access information is used to determine which pages should be kept in physical
memory and which are replaced when physical-memory space is required. System software
can use the valid bit in the TLB entry (TLBHI[V]) to monitor page accesses. This requires
page translations be initialized as not valid (TLBHI[V]=0) to indicate they have not been
accessed. The first attempt to access a page causes a TLB-miss exception, either because
the UTLB entry is marked not valid or because the page translation is not present in the
UTLB. The TLB-miss handler updates the UTLB with a valid translation (TLBHI[V]=1). The set
valid bit serves as a record that the page and its translation have been accessed. The TLBmiss handler can also record the information in a separate data structure associated with
the page-translation entry.
Page-modification information is used to indicate whether an old page can be overwritten
with a new page or the old page must first be stored to a hard disk. System software can use
the write-protection bit in the TLB entry (TLBLO[WR]) to monitor page modification. This
requires page translations be initialized as read-only (TLBLO[WR]=0) to indicate they have
not been modified. The first attempt to write data into a page causes a data-storage
exception, assuming the page has already been accessed and marked valid as described
above. If software has permission to write into the page, the data-storage handler marks the
page as writable (TLBLO[WR]=1) and returns.
MicroBlaze Processor Reference Guide70
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
The set write-protection bit serves as a record that a page has been modified. The datastorage handler can also record this information in a separate data structure associated
with the page-translation entry.
Tracking page modification is useful when virtual mode is first entered and when a new
process is started.
MicroBlaze Processor Reference Guide71
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Reset, Interrupts, Exceptions, and Break
MicroBlaze supports reset, interrupt, user exception, break, and hardware exceptions. The
following section describes the execution flow associated with each of these events.
The relative priority starting with the highest is:
1. Reset
2. Hardware Exception
3. Non-maskable Break
4. Break
5. Interrupt
6. User Vector (Exception)
Table 2-39 defines the memory address locations of the associated vectors and the
hardware enforced register file locations for return addresses. Each vector allocates two
addresses to allow full address range branching (requires an
instruction). Normally the vectors start at address 0x00000000, but the parameter
C_BASE_VECTORS can be used to locate them anywhere in memory.
IMM followed by a BRAI
The address range 0x28 to 0x4F is reserved for future software support by Xilinx. Allocating
these addresses for user applications is likely to conflict with future releases of SDK support
software.
Table 2-39: Vectors and Return Address Register File Location
EventVector Address
Reset
User Vector (Exception)
Interrupt
Break: Non-maskable
hardware
Break: Hardware
Break: Software
Hardware Exception
Reserved by Xilinx for future
use
1. With low-latency interrupt mode, the vector address is supplied by the Interrupt Controller.
All of these events will clear the reservation bit, used together with the LWX and SWX
instructions to implement mutual exclusion, such as semaphores and spinlocks.
Reset
SendFeedback
Chapter 2: MicroBlaze Architecture
When a Reset or Debug_Rst
(1)
occurs, MicroBlaze flushes the pipeline and starts fetching
instructions from the reset vector (address 0x0). Both external reset signals are active high
and should be asserted for a minimum of 16 cycles. See
MicroBlaze Core Configurability in
Chapter 3 for more information on the MSR reset value parameters.
MicroBlaze can be configured to trap the following internal error conditions: illegal
instruction, instruction and data bus error, and unaligned access. The divide exception can
only be enabled if the processor is configured with a hardware divider (
When configured with a hardware floating-point unit (C_USE_FPU>0), it can also trap the
following floating-point specific exceptions: underflow, overflow, float division-by-zero,
invalid operation, and denormalized operand error.
C_USE_DIV=1).
When configured with a hardware memory management unit (MMU), it can also trap the
following memory management specific exceptions: Illegal Instruction Exception, Data
Storage Exception, Instruction Storage Exception, Data TLB Miss Exception, and Instruction
TLB Miss Exception.
A hardware exception causes MicroBlaze to flush the pipeline and branch to the hardware
exception vector (address
C_BASE_VECTORS + 0x20). The execution stage instruction in the
exception cycle is not executed.
The exception also updates the general purpose register R17 in the following manner:
•For the MMU exceptions (Data Storage Exception, Instruction Storage Exception, Data
TLB Miss Exception, Instruction TLB Miss Exception) the register R17 is loaded with the
appropriate program counter value to re-execute the instruction causing the exception
upon return. The value is adjusted to return to a preceding
IMM instruction, if any. If the
exception is caused by an instruction in a branch delay slot, the value is adjusted to
return to the branch instruction, including adjustment for a preceding
IMM instruction,
if any.
MicroBlaze Processor Reference Guide73
UG984 (v2018.2) June 21, 2018www.xilinx.com
1. Reset input controlled by the debugger using MDM.
Chapter 2: MicroBlaze Architecture
SendFeedback
•For all other exceptions the register R17 is loaded with the program counter value of
the subsequent instruction, unless the exception is caused by an instruction in a branch
delay slot. If the exception is caused by an instruction in a branch delay slot, the
ESR[DS] bit is set. In this case the exception handler should resume execution from the
branch target address stored in BTR.
The EE and EIP bits in MSR are automatically reverted when executing the RTED instruction.
The VM and UM bits in MSR are automatically reverted from VMS and UMS when executing
the
RTED, RTBD, and RTID instructions.
Exception Priority
When two or more exceptions occur simultaneously, they are handled in the following
order, from the highest priority to the lowest:
•Instruction Bus Exception
•Instruction TLB Miss Exception
•Instruction Storage Exception
•Illegal Opcode Exception
•Privileged Instruction Exception or Stack Protection Violation Exception
•Data TLB Miss Exception
•Data Storage Exception
•Unaligned Exception
•Data Bus Exception
•Divide Exception
•FPU Exception
•Stream Exception
Exception Causes
•Stream Exception: The AXI4-Stream exception is caused by executing a get or getd
instruction with the ‘e’ bit set to ‘1’ when there is a control bit mismatch.
•Instruction Bus Exception: The instruction bus exception is caused by errors when
reading data from memory.
The instruction peripheral AXI4 interface (M_AXI_IP) exception is caused by an error
-
response on
M_AXI_IP_RRESP.
MicroBlaze Processor Reference Guide74
UG984 (v2018.2) June 21, 2018www.xilinx.com
The instruction cache AXI4 interface (M_AXI_IC) is caused by an error response on
-
M_AXI_IC_RRESP. The exception can only occur when C_ICACHE_ALWAYS_USED is set
Chapter 2: MicroBlaze Architecture
SendFeedback
to 1 and the cache is turned off, or if the MMU Inhibit Caching bit is set for the
address. In all other cases the response is ignored.
The instructions side local memory (ILMB) can only cause instruction bus exception
-
when either an uncorrectable error occurs in the LMB memory, as indicated by the
IUE signal, or C_ECC_USE_CE_EXCEPTION is set to 1 and a correctable error occurs
in the LMB memory, as indicated by the
ICE signal.
•Illegal Opcode Exception: The illegal opcode exception is caused by an instruction
with an invalid major opcode (bits 0 through 5 of instruction). Bits 6 through 31 of the
instruction are not checked. Optional processor instructions are detected as illegal if
not enabled. If the optional feature
C_OPCODE_0x0_ILLEGAL is enabled, an illegal
opcode exception is also caused if the instruction is equal to 0x00000000.
•Data Bus Exception: The data bus exception is caused by errors when reading data
from memory or writing data to memory.
The data peripheral AXI4 interface (M_AXI_DP) exception is caused by an error
-
response on
The data cache AXI4 interface (M_AXI_DC) exception is caused by:
-
M_AXI_DP_RRESP or M_AXI_DP_BRESP.
-An error response on
OKAY response on M_AXI_DC_RRESP in case of an exclusive access using LWX.
-
The exception can only occur when
cache is turned off, when an exclusive access using
M_AXI_DC_RRESP or M_AXI_DC_BRESP,
C_DCACHE_ALWAYS_USED is set to 1 and the
LWX or SWX is performed, or if the
MMU Inhibit Caching bit is set for the address. In all other cases the response is
ignored.
The data side local memory (DLMB) can only cause instruction bus exception when
-
either an uncorrectable error occurs in the LMB memory, as indicated by the
signal, or
LMB memory, as indicated by the
C_ECC_USE_CE_EXCEPTION is set to 1 and a correctable error occurs in the
DCE signal. An error can occur for all read
DUE
accesses, and for byte and halfword write accesses.
•Unaligned Exception: The unaligned exception is caused by a word access where the
address to the data bus has bits 30 or 31 set, or a half-word access with bit 31 set.
MicroBlaze Processor Reference Guide75
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
•Divide Exception: The divide exception is caused by an integer division (idiv or
idivu) where the divisor is zero, or by a signed integer division (idiv) where overflow
occurs (-2147483648 / -1).
•FPU Exception: An FPU exception is caused by an underflow, overflow, divide-by-zero,
illegal operation, or denormalized operand occurring with a floating-point instruction.
Underflow occurs when the result is denormalized.
-
Overflow occurs when the result is not-a-number (NaN).
-
The divide-by-zero FPU exception is caused by the rA operand to fdiv being zero
-
when rB is not infinite.
Illegal operation is caused by a signaling NaN operand or by illegal infinite or zero
-
operand combinations.
•Privileged Instruction Exception: The Privileged Instruction exception is caused by an
attempt to execute a privileged instruction in User Mode.
•Stack Protection Violation Exception: A Stack Protection Violation exception is
caused by executing a load or store instruction using the stack pointer (register R1) as
rA with an address outside the stack boundaries defined by the special Stack Low and
Stack High registers, causing a stack overflow or a stack underflow.
•Data Storage Exception: The Data Storage exception is caused by an attempt to
access data in memory that results in a memory-protection violation.
•Instruction Storage Exception: The Instruction Storage exception is caused by an
attempt to access instructions in memory that results in a memory-protection violation.
•Data TLB Miss Exception: The Data TLB Miss exception is caused by an attempt to
access data in memory, when a valid Translation Look-Aside Buffer entry is not present,
and virtual protected mode is enabled.
•Instruction TLB Miss Exception: The Instruction TLB Miss exception is caused by an
attempt to access instructions in memory, when a valid Translation Look-Aside Buffer
entry is not present, and virtual protected mode is enabled.
Should an Instruction Bus Exception, Illegal Opcode Exception, or Data Bus Exception occur
when
and MSR[EE] cleared), the pipeline is halted, and the external signal
C_FAULT_TOLERANT is set to 1, and an exception is in progress (that is MSR[EIP] set
MB_Error is set.
Imprecise Exceptions
Normally all exceptions in MicroBlaze are precise, meaning that any instructions in the
pipeline after the instruction causing an exception are invalidated, and have no effect.
MicroBlaze Processor Reference Guide76
UG984 (v2018.2) June 21, 2018www.xilinx.com
When C_IMPRECISE_EXCEPTIONS is set to 1 (ECC) an Instruction Bus Exception or Data Bus
Exception caused by ECC errors in LMB memory is not precise, meaning that a subsequent
memory access instruction in the pipeline might be executed. If this behavior is acceptable,
the maximum frequency can be improved by setting this parameter to 1.
Chapter 2: MicroBlaze Architecture
SendFeedback
Equivalent Pseudocode
ESR[DS] ← exception in delay slot
if ESR[DS] then
← branch target PC
BTR
if MMU exception then
if branch preceded by IMM then
← PC - 8
r17
else
← PC - 4
r17
else
← invalid value
r17
else if MMU exception then
if instruction preceded by IMM then
← PC - 4
r17
else
r17 ← PC
else
← PC + 4
r17
PC ← C_BASE_VECTORS + 0x00000020
MSR[EE]
MSR[UMS] ← MSR[UM], MSR[UM] ← 0, MSR[VMS] ← MSR[VM], MSR[VM] ← 0
ESR[EC] ←exception specific value
ESR[ESS]
EAR ←exception specific value
FSR ←exception specific value
Reservation ← 0
← 0, MSR[EIP]← 1
← exception specific value
Breaks
There are two kinds of breaks:
•Hardware (external) breaks
•Software (internal) breaks
Hardware Breaks
Hardware breaks are performed by asserting the external break signal (that is, the Ext_BRK
and
Ext_NM_BRK input ports). On a break, the instruction in the execution stage completes
while the instruction in the decode stage is replaced by a branch to the break vector
(address
The break return address (the PC associated with the instruction in the decode stage at the
time of the break) is automatically loaded into general purpose register R16. MicroBlaze
also sets the Break In Progress (
A normal hardware break (that is, the Ext_BRK input port) is only handled when MSR[BIP]
and MSR[EIP] are set to 0 (that is, there is no break or exception in progress). The Break In
Progress flag disables interrupts. A non-maskable break (that is, the
port) is always handled immediately.
C_BASE_VECTORS + 0x18).
BIP) flag in the Machine Status Register (MSR).
Ext_NM_BRK input
MicroBlaze Processor Reference Guide77
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
The BIP bit in the MSR is automatically cleared when executing the RTBD instruction.
The Ext_BRK signal must be kept asserted until the break has occurred, and deasserted
before the RTBD instruction is executed. The
clock cycle.
Ext_NM_BRK signal must only be asserted one
Software Breaks
To perform a software break, use the brk and brki instructions. Refer to Chapter 5,
MicroBlaze Instruction Set Architecture for detailed information on software breaks.
As a special case, when C_DEBUG_ENABLED is greater than zero, and “brkirD,0x18” is
executed, a software breakpoint is signaled to the debugger; for example, the Xilinx System
Debugger (XSDB) tool, irrespective of the value of
in the MSR is not set.
C_BASE_VECTORS. In this case the BIP bit
Latency
The time it takes the MicroBlaze processor to enter a break service routine from the time
the break occurs depends on the instruction currently in the execution stage and the
latency to the memory storing the break vector.
MicroBlaze supports one external interrupt source (connected to the Interrupt input port).
The processor only reacts to interrupts if the Interrupt Enable (IE) bit in the Machine Status
Register (MSR) is set to 1. On an interrupt, the instruction in the execution stage completes
while the instruction in the decode stage is replaced by a branch to the interrupt vector.
This is either address
address supplied by the Interrupt Controller.
The interrupt return address (the PC associated with the instruction in the decode stage at
the time of the interrupt) is automatically loaded into general purpose register R14. In
addition, the processor also disables future interrupts by clearing the IE bit in the MSR. The
IE bit is automatically set again when executing the RTID instruction.
Interrupts are ignored by the processor if either of the break in progress (BIP) or exception
in progress (
EIP) bits in the MSR are set to 1.
C_BASE_VECTORS + 0x10, or with low-latency interrupt mode, the
MicroBlaze Processor Reference Guide78
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
By using the parameter C_INTERRUPT_IS_EDGE, the external interrupt can either be set to
level-sensitive or edge-triggered:
•When using level-sensitive interrupts, the Interrupt input must remain set until
MicroBlaze has taken the interrupt, and jumped to the interrupt vector. Software must
acknowledge the interrupt at the source to clear it before returning from the interrupt
handler. If not, the interrupt is taken again, as soon as interrupts are enabled when
returning from the interrupt handler.
•When using edge-triggered interrupts, MicroBlaze detects and latches the
Interrupt
input edge, which means that the input only needs to be asserted one clock cycle. The
interrupt input can remain asserted, but must be deasserted at least one clock cycle
before a new interrupt can be detected. The latching of an edge-triggered interrupt is
independent of the IE bit in MSR. Should an interrupt occur while the IE bit is 0, it will
immediately be serviced when the IE bit is set to 1.
With periodic interrupt sources, such as the FIT Timer IP core, that do not have a method to
clear the interrupt from software, it is recommended to use edge-triggered interrupts.
Low-latency Interrupt Mode
A low-latency interrupt mode is available, which allows the Interrupt Controller to directly
supply the interrupt vector for each individual interrupt (using the
input port). The address of each fast interrupt handler must be passed to the Interrupt
Controller when initializing the interrupt system. When a particular interrupt occurs, this
address is supplied by the Interrupt Controller, which allows MicroBlaze to directly jump to
the handler code.
With this mode, MicroBlaze also directly sends the appropriate interrupt acknowledge to
the Interrupt Controller (using the
Interrupt_Ack output port), although it is still the
responsibility of the Interrupt Service Routine to acknowledge level sensitive interrupts at
the source.
Interrupt_Address
MicroBlaze Processor Reference Guide79
UG984 (v2018.2) June 21, 2018www.xilinx.com
This information allows the Interrupt Controller to acknowledge interrupts appropriately,
both for level-sensitive and edge-triggered interrupt.
To inform the Interrupt Controller of the interrupt handling events, Interrupt_Ack is set to:
•01: When MicroBlaze jumps to the interrupt handler code,
•10: When the RTID instruction is executed to return from interrupt,
•11: When MSR[IE] is changed from 0 to 1, which enables interrupts again.
The Interrupt_Ack output port is active during one clock cycle, and is then reset to 00.
Chapter 2: MicroBlaze Architecture
SendFeedback
Latency
The time it takes MicroBlaze to enter an Interrupt Service Routine (ISR) from the time an
interrupt occurs, depends on the configuration of the processor and the latency of the
memory controller storing the interrupt vectors. If MicroBlaze is configured to have a
hardware divider, the largest latency happens when an interrupt occurs during the
execution of a division instruction.
With low-latency interrupt mode, the time to enter the ISR is significantly reduced, since
the interrupt vector for each individual interrupt is directly supplied by the Interrupt
Controller. With compiler support for fast interrupts, there is no need for a common ISR at
all. Instead, the ISR for each individual interrupt will be directly called, and the compiler
takes care of saving and restoring registers used by the ISR.
The user exception vector is located at address 0x8. A user exception is caused by inserting
a ‘BRALID Rx,0x8’ instruction in the software flow. Although Rx could be any general
purpose register, Xilinx recommends using R15 for storing the user exception return
address, and to use the RTSD instruction to return from the user exception handler.
MicroBlaze can be used with an optional instruction cache for improved performance when
executing code that resides outside the LMB address range.
The instruction cache has the following features:
•Direct mapped (1-way associative)
•User selectable cacheable memory address range
•Configurable cache and tag size
•Caching over AXI4 interface (
•Option to use 4, 8 or 16 word cache-line
•Cache on and off controlled using a bit in the MSR
•Optional WIC instruction to invalidate instruction cache lines
•Optional stream buffers to improve performance by speculatively prefetching
instructions
•Optional victim cache to improve performance by saving evicted cache lines
•Optional parity protection that invalidates cache lines if a Block RAM bit error is
detected
•Optional data width selection to either use 32 bits, an entire cache line, or 512 bits
M_AXI_IC)
General Instruction Cache Functionality
When the instruction cache is used, the memory address space is split into two segments:
a cacheable segment and a non-cacheable segment. The cacheable segment is determined
by two parameters:
this range correspond to the cacheable address segment. All other addresses are noncacheable.
C_ICACHE_BASEADDR and C_ICACHE_HIGHADDR. All addresses within
MicroBlaze Processor Reference Guide81
UG984 (v2018.2) June 21, 2018www.xilinx.com
The cacheable segment size must be 2N, where N is a positive integer. The range specified
by
C_ICACHE_BASEADDR and C_ICACHE_HIGHADDR must comprise a complete power-of-two
N
range, such that range = 2
zero.
The cacheable instruction address consists of two parts: the cache address, and the tag
address. The MicroBlaze instruction cache can be configured from 64 bytes to 64 kB. This
corresponds to a cache address of between 6 and 16 bits. The tag address together with the
cache address should match the full address of cacheable memory.
and the N least significant bits of C_ICACHE_BASEADDR must be
X-Ref Target - Figure 2-22
Tag AddressCache Address
Instruction Address Bits
- -
=
Ta g
RAM
Line Addr
Ta g
Valid (word and line)
Cache_Hit
Instruction
RAM
Word Addr
Cache_instruction_data
30 31
0
X19759-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
When selecting cache sizes below 2 kB, distributed RAM is used to implement the Tag RAM
and Instruction RAM. Distributed RAM is always used to implement the Tag RAM, when
setting the parameter
C_ICACHE_FORCE_TAG_LUTRAM to 1. This parameter is only available
with cache size 8 kB and less for 4 word cache-lines, with 16 kB and less for 8 word cachelines, and with 32 kB and less for 16 word cache-lines.
For example: in a MicroBlaze configured with C_ICACHE_BASEADDR= 0x00300000,
C_ICACHE_FORCE_TAG_LUTRAM=0; the cacheable memory of 64 kB uses 16 bits of byte
and
address, and the 4 kB cache uses 12 bits of byte address, thus the required address tag
width is: 16-12=4 bits. The total number of block RAM primitives required in this
configuration is: 2 RAMB16 for storing the 1024 instruction words, and 1 RAMB16 for 128
cache line entries, each consisting of: 4 bits of tag, 8 word-valid bits, 1 line-valid bit. In total
3 RAMB16 primitives.
The following figure shows the organization of Instruction Cache.
MicroBlaze Processor Reference Guide82
UG984 (v2018.2) June 21, 2018www.xilinx.com
Figure 2-22: Instruction Cache Organization
Instruction Cache Operation
For every instruction fetched, the instruction cache detects if the instruction address
belongs to the cacheable segment. If the address is non-cacheable, the cache controller
ignores the instruction and lets the
is cacheable, a lookup is performed on the tag memory to check if the requested address is
currently cached. The lookup is successful if: the word and line valid bits are set, and the tag
address matches the instruction address tag segment. On a cache miss, the cache controller
requests the new instruction over the instruction AXI4 interface (
the memory controller to return the associated cache line.
C_ICACHE_DATA_WIDTH determines the bus data width, either 32 bits, an entire cache line
(128, 256 or 512 bits), or 512 bits.
M_AXI_IP or LMB complete the request. If the address
M_AXI_IC), and waits for
Chapter 2: MicroBlaze Architecture
SendFeedback
When C_FAULT_TOLERANT is set to 1, a cache miss also occurs if a parity error is detected in
a tag or instruction Block RAM.
The instruction cache issues burst accesses for the AXI4 interface when 32-bit data width is
used, otherwise single accesses are used.
Stream Buffers
When stream buffers are enabled, by setting the parameter C_ICACHE_STREAMS to 1, the
cache will speculatively fetch cache lines in advance in sequence following the last
requested address, until the stream buffer is full.
The stream buffer can hold up to two cache lines. Should the processor subsequently
request instructions from a cache line prefetched by the stream buffer, which occurs in
linear code, they are immediately available.
The stream buffer often improves performance, since the processor generally has to spend
less time waiting for instructions to be fetched from memory.
C_ICACHE_DATA_WIDTH determines the amount of data transferred from the stream buffer
each clock cycle, either 32 bits or an entire cache line.
To be able to use instruction cache stream buffers, area optimization must not be enabled.
Victim Cache
The victim cache is enabled by setting the parameter C_ICACHE_VICTIMS to 2, 4 or 8. This
defines the number of cache lines that can be stored in the victim cache. Whenever a cache
line is evicted from the cache, it is saved in the victim cache. By saving the most recent lines
they can be fetched much faster, should the processor request them, thereby improving
performance. If the victim cache is not used, all evicted cache lines must be read from
memory again when they are needed.
C_ICACHE_DATA_WIDTH determines the amount of data transferred from/to the victim
cache each clock cycle, either 32 bits or an entire cache line.
Note: To be able to use the victim cache, area optimization must not be enabled.
Instruction Cache Software Support
MicroBlaze Processor Reference Guide83
UG984 (v2018.2) June 21, 2018www.xilinx.com
MSR Bit
The ICE bit in the MSR provides software control to enable and disable caches.
The contents of the cache are preserved by default when the cache is disabled. You can
invalidate cache lines using the WIC instruction or using the hardware debug logic of
MicroBlaze.
Chapter 2: MicroBlaze Architecture
SendFeedback
WIC Instruction
The optional WIC instruction (C_ALLOW_ICACHE_WR=1) is used to invalidate cache lines in
the instruction cache from an application. For a detailed description, see
MicroBlaze Instruction Set Architecture.
The WIC instruction can also be used together with parity protection to periodically
invalidate entries the cache, to avoid accumulating errors.
Chapter 5,
MicroBlaze Processor Reference Guide84
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Data Cache
Overview
The MicroBlaze processor can be used with an optional data cache for improved
performance. The cached memory range must not include addresses in the LMB address
range. The data cache has the following features:
•Direct mapped (1-way associative)
•Write-through or Write-back
•User selectable cacheable memory address range
•Configurable cache size and tag size
•Caching over AXI4 interface (
•Option to use 4, 8 or 16 word cache-lines
•Cache on and off controlled using a bit in the MSR
•Optional WDC instruction to invalidate or flush data cache lines
•Optional victim cache with write-back to improve performance by saving evicted cache
lines
•Optional parity protection for write-through cache that invalidates cache lines if a Block
RAM bit error is detected
•Optional data width selection to either use 32 bits, an entire cache line, or 512 bits
M_AXI_DC)
General Data Cache Functionality
When the data cache is used, the memory address space is split into two segments: a
cacheable segment and a non-cacheable segment. The cacheable area is determined by
two parameters:
range correspond to the cacheable address space. All other addresses are non-cacheable.
The cacheable segment size must be 2N, where N is a positive integer. The range specified
by
C_DCACHE_BASEADDR and C_DCACHE_HIGHADDR must comprise a complete power-of-two
range, such that range = 2
zero.
C_DCACHE_BASEADDR and C_DCACHE_HIGHADDR. All addresses within this
N
and the N least significant bits of C_DCACHE_BASEADDR must be
MicroBlaze Processor Reference Guide85
UG984 (v2018.2) June 21, 2018www.xilinx.com
X-Ref Target - Figure 2-23
Tag AddressCache Word Address
Data Address Bits
--
=
Ta g
RAM
Addr
Ta g
Valid
Cache_Hit
Data
RAM
Addr
Cache_data
Load_Instruction
030 31
X19760-091317
SendFeedback
Chapter 2: MicroBlaze Architecture
The following figure shows the Data Cache organization.
Figure 2-23: Data Cache Organization
The cacheable data address consists of two parts: the cache address, and the tag address.
The MicroBlaze data cache can be configured from 64 bytes to 64 kB. This corresponds to
a cache address of between 6 and 16 bits. The tag address together with the cache address
should match the full address of cacheable memory. When selecting cache sizes below 2 kB,
distributed RAM is used to implement the Tag RAM and Data RAM, except that block RAM
is always used for the Data RAM when
C_DCACHE_USE_WRITEBACK is not set. Distributed RAM is always used to implement the Tag
RAM, when setting the parameter
C_AREA_OPTIMIZED is set to 1 (Area) and
C_DCACHE_FORCE_TAG_LUTRAM to 1. This parameter is
only available with cache size 8 kB and less for 4 word cache-lines, with 16 kB and less for
8 word cache-lines, and with 32 kB and less for 16 word cache-lines.
For example, in a MicroBlaze configured with C_DCACHE_BASEADDR=0x00400000,
C_DCACHE_FORCE_TAG_LUTRAM=0; the cacheable memory of 16 kB uses 14 bits of byte
address, and the 2 kB cache uses 11 bits of byte address, thus the required address tag
width is 14-11=3 bits. The total number of block RAM primitives required in this
configuration is 1 RAMB16 for storing the 512 data words, and 1 RAMB16 for 128 cache line
entries, each consisting of 3 bits of tag, 4 word-valid bits, 1 line-valid bit. In total, 2 RAMB16
primitives.
Data Cache Operation
The caching policy used by the MicroBlaze data cache, write-back or write-through, is
determined by the parameter
write-back protocol is implemented; otherwise write-through is implemented.
However, when configured with an MMU (C_USE_MMU > 1, C_AREA_OPTIMIZED = 0
(Performance) or 2 (Frequency),
virtual mode is determined by the W storage attribute in the TLB entry, whereas write-back
is used in real mode.
C_DCACHE_USE_WRITEBACK. When this parameter is set, a
C_DCACHE_USE_WRITEBACK = 1), the caching policy in
MicroBlaze Processor Reference Guide86
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
With the write-back protocol, a store to an address within the cacheable range always
updates the cached data. If the target address word is not in the cache (that is, the access
is a cache miss), and the location in the cache contains data that has not yet been written
to memory (the cache location is dirty), the old data is written over the data AXI4 interface
(
M_AXI_DC) to external memory before updating the cache with the new data. If only a
single word needs to be written, a single word write is used, otherwise a burst write is used.
For byte or halfword stores, in case of a cache miss, the address is first requested over the
data AXI4 interface, while a word store only updates the cache.
With the write-through protocol, a store to an address within the cacheable range
generates an equivalent byte, halfword, or word write over the data AXI4 interface to
external memory. The write also updates the cached data if the target address word is in the
cache (that is, the write is a cache hit). A write cache-miss does not load the associated
cache line into the cache.
Provided that the cache is enabled a load from an address within the cacheable range
triggers a check to determine if the requested data is currently cached. If it is (that is, on a
cache hit) the requested data is retrieved from the cache. If not (that is, on a cache miss) the
address is requested over the data AXI4 interface using a burst read, and the processor
pipeline stalls until the cache line associated to the requested address is returned from the
external memory controller.
The parameter C_DCACHE_DATA_WIDTH determines the bus data width, either 32 bits, an
entire cache line (128, 256 or 512 bits), or 512 bits.
When C_FAULT_TOLERANT is set to 1 and write-through protocol is used, a cache miss also
occurs if a parity error is detected in the tag or data block RAM.
MicroBlaze Processor Reference Guide87
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
The following table summarizes all types of accesses issued by the data cache AXI4
interface.
Table 2-40: Data Cache Interface Accesses
PolicyStateDirectionAccess Type
Writethrough
Write-backCache
Cache
Enabled
Cache
Disabled
Enabled
Cache
Disabled
ReadBurst for 32-bit interface non-exclusive access and exclusive
access with ACE enabled, single access otherwise
WriteSingle access
ReadBurst for 32-bit interface exclusive access with ACE enabled,
single access otherwise
WriteSingle access
ReadBurst for 32-bit interface, single access otherwise
WriteBurst for 32-bit interface cache lines with more than one valid
word, a single access otherwise
ReadBurst for 32-bit interface non-exclusive access, discarding all but
the desired data, a single access otherwise
WriteSingle access
Victim Cache
The victim cache is enabled by setting the parameter C_DCACHE_VICTIMS to 2, 4 or 8. This
defines the number of cache lines that can be stored in the victim cache. Whenever a
complete cache line is evicted from the cache, it is saved in the victim cache. By saving the
most recent lines they can be fetched much faster, should the processor request them,
thereby improving performance. If the victim cache is not used, all evicted cache lines must
be read from memory again when they are needed.
MicroBlaze Processor Reference Guide88
UG984 (v2018.2) June 21, 2018www.xilinx.com
With the AXI4 interface, C_DCACHE_DATA_WIDTH determines the amount of data transferred
from/to the victim cache each clock cycle, either 32 bits or an entire cache line.
Note: To be able to use the victim cache, write-back must be enabled and area optimization must
not be enabled.
Data Cache Software Support
MSR Bit
The DCE bit in the MSR controls whether or not the cache is enabled. When disabling
caches the user must ensure that all the prior writes within the cacheable range have been
completed in external memory before reading back over
writing to a semaphore immediately before turning off caches, and then in a loop poll until
it has been written. The contents of the cache are preserved when the cache is disabled.
M_AXI_DP. This can be done by
Chapter 2: MicroBlaze Architecture
SendFeedback
WDC Instruction
The optional WDC instruction (C_ALLOW_DCACHE_WR=1) is used to invalidate or flush cache
lines in the data cache from an application. For a detailed description, please refer to
Chapter 5, MicroBlaze Instruction Set Architecture.
The WDC instruction can also be used together with parity protection to periodically
invalidate entries the cache, to avoid accumulating errors.
With an external L2 cache, such as the System Cache, connected to MicroBlaze using the
ACE interface, external cache invalidate or flush can be performed with WDC. See the
LogiCore IP System Cache Product Guide (PG118)
Cache.
[Ref 6] for more information on the System
MicroBlaze Processor Reference Guide89
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Floating-Point Unit (FPU)
Overview
The MicroBlaze floating-point unit is based on the IEEE 754-1985 standard[Ref 18]:
•Uses IEEE 754 single precision floating-point format, including definitions for infinity,
not-a-number (NaN), and zero
•Supports addition, subtraction, multiplication, division, comparison, conversion and
square root instructions
•Implements round-to-nearest mode
•Generates sticky status bits for: underflow, overflow, divide-by-zero and invalid
operation
For improved performance, the following non-standard simplifications are made:
•Denormalized
(1)
operands are not supported. A hardware floating-point operation on a
denormalized number returns a quiet NaN and sets the sticky denormalized operand
error bit in FSR; see Floating-Point Status Register (FSR).
•A denormalized result is stored as a signed 0 with the underflow bit set in FSR. This
method is commonly referred to as Flush-to-Zero (FTZ)
•An operation on a quiet NaN returns the fixed NaN: 0xFFC00000, rather than one of the
NaN operands
•Overflow as a result of a floating-point operation always returns signed ∞
Format
An IEEE 754 single precision floating-point number is composed of the following three
fields:
1. 1-bit sign
2. 8-bit biased exponent
3. 23-bit fraction (a.k.a. mantissa or significand)
The fields are stored in a 32 bit word as defined in the following figure:
MicroBlaze Processor Reference Guide90
UG984 (v2018.2) June 21, 2018www.xilinx.com
1. Numbers that are so close to 0, that they cannot be represented with full precision, that is, any number n that falls in the
following ranges: ( 1.17549*10
-38
> n > 0 ), or ( 0 > n > -1.17549 * 10
-38
)
X-Ref Target - Figure 2-24
31
9
fraction
exponent
10
sign
X19761-082517
SendFeedback
Chapter 2: MicroBlaze Architecture
Figure 2-24: IEEE 754 Single Precision Format
The value of a floating-point number v in MicroBlaze has the following interpretation:
1. If exponent = 255 and fraction <> 0, then v= NaN, regardless of the sign bit
2. If exponent = 255 and fraction = 0, then v= (-1)
3. If 0 < exponent < 255, then v = (-1)
sign
* 2
4. If exponent = 0 and fraction <> 0, then v = (-1)
5. If exponent = 0 and fraction = 0, then v = (-1)
sign
* ∞
(exponent-127)
sign
* 2
sign
* 0
* (1.fraction)
-126
* (0.fraction)
For practical purposes only 3 and 5 are useful, while the others all represent either an error
or numbers that can no longer be represented with full precision in a 32 bit format.
Rounding
The MicroBlaze FPU only implements the default rounding mode, “Round-to-nearest”,
specified in IEEE 754. By definition, the result of any floating-point operation should return
the nearest single precision value to the infinitely precise result. If the two nearest
representable values are equally near, then the one with its least significant bit zero is
returned.
Operations
All MicroBlaze FPU operations use the processors general purpose registers rather than a
dedicated floating-point register file, see
Arithmetic
The FPU implements the following floating-point operations:
•addition, fadd
•subtraction, frsub
•multiplication, fmul
•division, fdiv
•square root, fsqrt (available if
C_USE_FPU = 2, EXTENDED)
General Purpose Registers.
MicroBlaze Processor Reference Guide91
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Comparison
The FPU implements the following floating-point comparisons:
•compare less-than, fcmp.lt
•compare equal, fcmp.eq
•compare less-or-equal, fcmp.le
•compare greater-than, fcmp.gt
•compare not-equal, fcmp.ne
•compare greater-or-equal, fcmp.ge
•compare unordered, fcmp.un (used for NaN)
Conversion
The FPU implements the following conversions (available if C_USE_FPU = 2, EXTENDED):
•convert from signed integer to floating-point, flt
•convert from floating-point to signed integer, fint
Exceptions
The floating-point unit uses the regular hardware exception mechanism in MicroBlaze.
When enabled, exceptions are thrown for all the IEEE standard conditions: underflow,
overflow, divide-by-zero, and illegal operation, as well as for the MicroBlaze specific
exception: denormalized operand error.
A floating-point exception inhibits the write to the destination register (Rd). This allows a
floating-point exception handler to operate on the uncorrupted register file.
Software Support
The SDK compiler system, based on GCC, provides support for the floating-point Unit
compliant with the MicroBlaze API. Compiler flags are automatically added to the GCC
command line based on the type of FPU present in the system, when using SDK.
All double-precision operations are emulated in software. Be aware that the xil_printf()
function does not support floating-point output. The standard C library
related functions do support floating-point output, but will increase the program code size.
printf() and
MicroBlaze Processor Reference Guide92
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
Libraries and Binary Compatibility
The SDK compiler system only includes software floating-point C runtime libraries. To take
advantage of the hardware FPU, the libraries must be recompiled with the appropriate
compiler switches.
For all cases where separate compilation is used, it is very important that you ensure the
consistency of FPU compiler flags throughout the build.
Operator Latencies
The latencies of the various operations supported by the FPU are listed in Chapter 5,
“MicroBlaze Instruction Set Architecture.” The FPU instructions are not pipelined, so only
one operation can be ongoing at any time.
C Language Programming
To gain maximum benefit from the FPU without low-level assembly-language
programming, it is important to consider how the C compiler will interpret your source
code. Very often the same algorithm can be expressed in many different ways, and some are
more efficient than others.
Immediate Constants
Floating-point constants in C are double-precision by default. When using a singleprecision FPU, careless coding could result in double-precision software emulation routines
being used instead of the native single-precision instructions. To avoid this, explicitly
specify (by cast or suffix) that immediate constants in your arithmetic expressions are
single-precision values.
For example:
float x = 0.0;
...
x += (float)1.0; /* float addition */
x += 1.0F;/* alternative to above */
x += 1.0;/* warning - uses double addition! */
Note that the GNU C compiler can be instructed to treat all floating-point constants as
single-precision (contrary to the ANSI C standard) by supplying the compiler flag -fsingleprecision-constants.
Avoiding Unnecessary Casting
While conversions between floating-point and integer formats are supported in hardware
by the FPU, when
possible.
C_USE_FPU is set to 2 (Extended), it is still best to avoid them when
MicroBlaze Processor Reference Guide93
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
The following not-recommended example calculates the sum of squares of the integers
from 1 to 10 using floating-point representation:
float sum, t;
int i;
sum = 0.0f;
for (i = 1; i <= 10; i++) {
t = (float)i;
sum += t * t;
}
The above code requires a cast from an integer to a float on each loop iteration. This can be
rewritten as:
float sum, t;
int i;
t = sum = 0.0f;
for(i = 1; i <= 10; i++) {
t += 1.0f;
sum += t * t;
}
Note: The compiler is not at liberty to perform this optimization in general, as the two code
fragments above might give different results in some cases (for example, very large t).
Using Square Root Runtime Library Function
The standard C runtime math library functions operate using double-precision arithmetic.
When using a single-precision FPU, calls to the square root functions (
inefficient emulation routines being used instead of FPU instructions:
Here the math.h header is included to avoid a warning message from the compiler.
When used with single-precision data types, the result is a cast to double, a runtime library
call is made (which does not use the FPU) and then a truncation back to float is performed.
The solution is to use the non-ANSI function sqrtf() instead, which operates using single
precision and can be carried out using the FPU. For example:
#include <math.h>
...
float x=-1.0F;
...
x = sqrtf(x); /* uses single precision */
sqrt()) result in
MicroBlaze Processor Reference Guide94
UG984 (v2018.2) June 21, 2018www.xilinx.com
Note: When compiling this code, the compiler flag -fno-math-errno (in addition to
-mhard-float and -mxl-float-sqrt) must be used, to ensure that the compiler does not
generate unnecessary code to handle error conditions by updating the errno variable.
Chapter 2: MicroBlaze Architecture
MicroBlaze
Link x
// Configure fx
cput Rc, RFSLx
// Store operands
put Ra, RFSLx // op 1
put Rb, RFSLx // op 2
// Load result
Register
File
Custom HW Accelerator
Op 1 RegOp 2 Reg
ConfigReg
f
x
Result Reg
Link x
X19783-091317
SendFeedback
Stream Link Interfaces
MicroBlaze can be configured with up to 16 AXI4-Stream interfaces, each consisting of one
input and one output port. The channels are dedicated uni-directional point-to-point data
streaming interfaces.
For detailed information on the AXI4-Stream interface, please refer to the AMBA 4 AXI4-Stream Protocol Specification, Version 1.0 (
The interfaces on MicroBlaze are 32 bits wide. A separate bit indicates whether the
sent/received word is of control or data type. The get instruction in the MicroBlaze ISA is
used to transfer information from a port to a general purpose register. The put instruction
is used to transfer data in the opposite direction. Both instructions come in 4 flavors:
blocking data, non-blocking data, blocking control, and non-blocking control. For a
detailed description of the get and put instructions, see
Set Architecture.
Arm IHI 0051A) [Ref 14] document.
Chapter 5, MicroBlaze Instruction
X-Ref Target - Figure 2-25
Hardware Acceleration
Each link provides a low latency dedicated interface to the processor pipeline. Thus they are
ideal for extending the processors execution unit with custom hardware accelerators. A
simple example is illustrated in the following figure. The code uses RFSLx to indicate the
used link.
Figure 2-25: Stream Link Used with HW Accelerated Function f
x
MicroBlaze Processor Reference Guide95
UG984 (v2018.2) June 21, 2018www.xilinx.com
This method is similar to extending the ISA with custom instructions, but has the benefit of
not making the overall speed of the processor pipeline dependent on the custom function.
Also, there are no additional requirements on the software tool chain associated with this
type of functional extension.
Chapter 2: MicroBlaze Architecture
SendFeedback
Debug and Trace
Debug Overview
MicroBlaze features a debug interface to support JTAG based software debugging tools
(commonly known as BDM or Background Debug Mode debuggers) like the Xilinx System
Debugger (XSDB) tool. The debug interface is designed to be connected to the Xilinx
Microprocessor Debug Module (MDM) core, which interfaces with the JTAG port of Xilinx
FPGAs. Multiple MicroBlaze instances can be interfaced with a single MDM to enable
multiprocessor debugging.
To be able to download programs, set software breakpoints and disassemble code, the
instruction and data memory ranges must overlap, and use the same physical memory.
Debug registers are accessed using the debug interface, and are not directly visible to
software running on the processor, unless the MDM is configured to enable software access
to user-accessible debug registers. The debug interface can either use JTAG serial access or
AXI4-Lite parallel access, controlled by the parameter
C_DEBUG_INTERFACE.
See the MicroBlaze Debug Module (MDM) Product Guide (PG115) [Ref 4] for a detailed
description of the MDM features.
The basic debugging features enabled by setting C_DEBUG_ENABLED to 1 (Basic) include:
•Configurable number of hardware breakpoints and watchpoints and unlimited software
breakpoints
•External processor control enables debug tools to stop, reset, and single step
MicroBlaze
•Read from and write to: memory, general purpose registers, and special purpose
register, except EAR, EDR, ESR, BTR and PVR0 - PVR12, which can only be read
•Support for multiple processors
The extended debugging features enabled by setting C_DEBUG_ENABLED to 2 (Extended)
include:
•Configurable number of performance monitoring event and latency counters
•Program Trace:
Embedded program trace with configurable trace buffer size
-
MicroBlaze Processor Reference Guide96
UG984 (v2018.2) June 21, 2018www.xilinx.com
External program trace for multiple processors, provided by the MDM
-
•Non-intrusive profiling support with configurable profiling buffer size
•Cross trigger support between multiple processors, and external cross trigger inputs
and outputs, provided by the MDM
Chapter 2: MicroBlaze Architecture
μ
ΣL
N
-------
=
σ
NΣ L
2
ΣL()
2
–
N
-----------------------------------------
=
SendFeedback
Performance Monitoring
With extended debugging, MicroBlaze provides performance monitoring counters to count
various events and to measure latency during program execution. The number of event
counters and latency counters can be configured with
C_DEBUG_LATENCY_COUNTERS respectively, and the counter width can be set to 32, 48 or 64
bits with
C_DEBUG_COUNTER_WIDTH. With the default configuration, the counter width is set
to 32 bits and there are five event counters and one latency counter.
An event counter simply counts the number of times a certain event has occurred, whereas
a latency counter provides the following information:
•Number of times the event has occurred (N)
•The sum of each event latency measured by counting clock cycles from the event starts
until it finishes (ΣL), used to calculate the mean latency
2
•The sum of each event latency squared (ΣL
), used to calculate the latency standard
deviation
C_DEBUG_EVENT_COUNTERS and
•The minimum, shortest, measured latency for all events (L
•The maximum, longest, measured latency for all events (L
min
max
)
)
The mean latency (μ) is calculated by the formula:
The standard deviation (σ) of the latency is calculated by the formula:
Counting can be started or stopped using the Performance Counter Command Register or
by cross trigger events (see
Table 2-62).
When configuring, reading or writing counters, they are accessed sequentially through the
performance counter registers. After every access the selected counter item is incremented.
All counters are sampled simultaneously for reading using the Performance Counter
Command Register. This can be done while counting, or after counting has been stopped.
When an event counter reaches its maximum value, the overflow status bit is set, and the
external interrupt signal
Dbg_Intr is set to one. The interrupt signal is reset to zero by
clearing the counters using the Performance Counter Command Register.
By using one of the event counters to count number of clock cycles, and initializing this
counter to overflow after a predetermined sampling interval, the external interrupt can be
used to periodically sample the performance counters.
The available events are described in Table 2-41, listed in numerical order.
MicroBlaze Processor Reference Guide97
UG984 (v2018.2) June 21, 2018www.xilinx.com
Chapter 2: MicroBlaze Architecture
SendFeedback
A typical procedure to follow when initializing and using the performance monitoring
counters is delineated in the steps below.
1. Initialize the events to be monitored:
Use the Performance Command Register (Table 2-44) to reset the selected counter
-
to the first counter, by setting the Reset bit.
Write the desired event numbers for all counters in order, using the Performance
-
Control Register (Table 2-43). With the default configuration this means writing the
register five times for the event counters and then once for the latency counter.
2. Clear all counters and start monitoring using the Performance Command Register, by
setting the Clear and Start bits.
3. Run the program or function to be monitored.
4. Sample counters and stop monitoring using the Performance Command Register, by
setting the Sample and Stop bits.
5. Read the results from all counters:
Use the Performance Command Register to reset the selected counter to the first
-
counter, by setting the Reset bit.
Read the status for all counters in order, using the Performance Counter Status
-
Register (Table 2-45). With the default configuration this means reading the register
five times for the event counters and then once for the latency counter. Ensure that
the result is valid by checking that the overflow and full bits are not set.
Use the Performance Command Register to reset the selected counter to the first
-
counter, by setting the Reset bit.
Read the counter items for all counters in order, using the Performance Counter
-
Data Read Register (Table 2-46). With the default configuration this means reading
the register five times for the event counters and then four times for the latency
counter as described in Table 2-47.
6. Calculate the final results, depending on the measured events, for example:
Use the formulas above to determine the mean latency and standard deviation for
-
any measured latency.
The clock cycles per instruction (CPI) can be calculated by E30 / E0.
-
The instruction and data cache hit rates can be calculated by E11 / E10 and E47 / E46.
-
MicroBlaze Processor Reference Guide98
UG984 (v2018.2) June 21, 2018www.xilinx.com
The instruction cache miss latency is determined by (E60(ΣL) - E60(N)) / (E10 - E11),
-
and equivalent formulas can be used to determine the data cache read and write
miss latencies.
The ratio of floating-point instructions in a program is E29/E0.
23Exception taken52Branch target cache hit for a branch or return
24Interrupt occurred53MMU instruction side access request
25Pipeline stalled due to operand fetch stage (OF)54MMU instruction TLB (ITLB) hit
26Pipeline stalled due to execute stage (EX)55MMU data TLB (DTLB) hit
27Pipeline stalled due to memory stage (MEM)56MMU unified TLB (UTLB) hit
28Integer divide (idiv, idivu) executed
38Taken conditional branch with delay slot executed
executed
57Interrupt latency from input to interrupt vector61MMU address lookup latency
58Data cache latency for memory read62Peripheral AXI interface data read latency
59Data cache latency for memory write63Peripheral AXI interface data write latency
60Instruction cache latency for memory read
MicroBlaze Processor Reference Guide99
UG984 (v2018.2) June 21, 2018www.xilinx.com
Latency and Event Counter events
Chapter 2: MicroBlaze Architecture
0
7
Event
Reserved
318
X19762-082517
SendFeedback
The debug registers used to configure and control performance monitoring, and to read or
write the event and latency counters, are listed in
Table 2-42. All of these registers except
the Performance Counter Command register are accessed repeatedly to read or write
information, first for all of the event counters followed by all of the latency counters.
The DBG_CTRL value indicates the value to use in the MDM Debug Register Access Control
Register to access the register, used with MDM software access to debug registers.
Select event for each configured
counter, according to
Command to clear counters, start or
stop counting, or sample counters
Read the sampled status for each
configured performance counter
Read the sampled values for each
configured performance counter
Write initial values for each
configured performance counter
Table 2-41
Performance Counter Control Register
The Performance Counter Control Register (PCCTRLR) is used to define the events that are
counted by the configured performance counters. To define the events for all configured
counters, the register should be written repeatedly for each of the counters. This register is
a write-only register. Issuing a read request has no effect, and undefined data is read.
Every time the register is written, the selected counter is incremented. By using the
Performance Counter Command Register, the selected counter can be reset to the first
counter again. See the following figure and table.
X-Ref Target - Figure 2-26
Figure 2-26: Performance Counter Control Register
Table 2-43: Performance Counter Control Register (PCCTRLR)
BitsNameDescriptionReset Value
7:0EventPerformance counter event, according to Table 2-41.
MicroBlaze Processor Reference Guide100
UG984 (v2018.2) June 21, 2018www.xilinx.com
0
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.