with 7-stage pipeline, 8 register windows, 4x4 KiB
instruction and 4x4 KiB data caches.
• Double-precision IEEE-754 floating point units
• 2 MiB Level-2 cache
• 64-bit PC100 SDRAM memory interface with ReedSolomon EDAC*
• 8/16-bit PROM/IO interface with EDAC*
• SpaceWire router with eight SpaceWire links
• 2x 10/100/1000 Mbit Ethernet interfaces*
• PCI Initiator/Target interface*
• MIL-STD-1553B interface*
• 2x CAN 2.0 controller interface*
• 2x UART, SPI, Timers and watchdog, 16+22 GPIO*
• CPU and I/O memory management units
• SpaceWire Time Distribution Protocol controller and
support for time synchronisation
• JTAG, Ethernet* and SpaceWire* debug links
* Interfaces have shared pins
Description
The GR740 device is a radiation-hard system-onchip featuring a quad-core fault-tolerant LEON4
SPARC V8 processor, eight port SpaceWire router,
PCI initiator/target interface, MIL-STD-1553B
interface, CAN 2.0 interfaces and 10/100/1000
Mbit Ethernet interfaces.
Specification
• System frequency: 250 MHz
• Main memory interface: PC100 SDRAM
• SpaceWire router with SpaceWire links: 300
Mbit/s
• 33 MHz PCI 2.3 initiator/target
interface
• Ethernet 10/100/1000 Mbit MACs
• CCGA625 / LGA625 package
Applications
The GR740 device is targeted at high-performance general purpose
processing. The architecture is suitable for both symmetric and
asymmetric multiprocessing. Shared resources can be monitored to
support mixed-criticality applications.
GR740-UM-DS, Nov 2017, Version 1.7www.cobham.com/gaisler
1.2Preliminary data sheet limitations ...........................................................................................................8
1.3Updates and feedback.............................................................................................................................. 8
3.2Configuration for flight .........................................................................................................................28
3.4Complete signal list ...............................................................................................................................32
4.9Clock gating unit ................................................................................................................................... 40
4.10Debug AHB bus clocking......................................................................................................................41
4.11Notes on Ethernet interface clock and mode switch .............................................................................41
6.2LEON4 integer unit ...............................................................................................................................50
12.8ASMP support ..................................................................................................................................... 131
13.5Configuration port ............................................................................................................................... 177
14Gigabit Ethernet Media Access Controller (MAC) ............................................................. 204
16.7Clocking and reset ...............................................................................................................................252
17.4Status and monitoring.......................................................................................................................... 264
23.4Loop back mode ..................................................................................................................................325
Specification, SPARC-V8E, Version 1.0, SPARC International Inc.
GR740-UM-DS, Nov 2017, Version 1.79www.cobham.com/gaisler
GR740
1.8Document revision history
Change record information is provided in table 1.
Table 1. Change record
VersionDateNote
1.02015 AprilFirst public release of GR740 document.
1.12015 November Fix typo of CE/NE bit in AHBSTAT section.
1.22016 JanuaryCorrect name of TOV field in DSU Instruction trace buffer control register 1
1.32016 FebruaryCorrect information on LEON4 AMBA access size in section 6.7.4.
1.42016 JuneChange status from advanced to preliminary data sheet
Clarify that Level-2 cache is unified.
Correct L4STAT section 26.1 to state that the unit has sixteen counters.
Correct GRSPWROUTER documentation: Error in the description of the ICODEGEN register. The UA bit is independent of the setting of the AH bit. It is not required for AH to be set
in order for UA to have effect.
Corrected AHBTRACE TIMETAG register APB address offset in table caption.
Corrected MEMSCRUB APB address offsets in table captions for two last range registers.
Corrected SPICTRL MASK register access attributes.
Added missing reset values for L2C Scrub delay register and Access control register.
Document TCTRL register WS and WN fields in timer unit section.
Correct reset value for LEON4 %asr17.DBP, CCTRL.DS and %tbr.
Update pinlist in section 40.3
Updated front page and back page.
Converted to new headers and footers.
Corrected description for EDCL 1 bootstrap signals (GPIO[5:4])
Corrected register table headings and add value for trace buffer FDEPTH field in GRPCI2
section 20.
Add note about pulsed interrupts in interrupt controller section
Update footer
Correct typos in %ASR22-23 description in section 6.10.3
Correct typo on Memory scrubber Error Threshold registers, BECTE field.
Add package drawing in section 40.4.
Correct to PCIMODE_ENABLE=HIGH in table 27, row 1, column 3.
Correct Level-2 cache tag and checkbit register layout in section 9.4
Correct SDCFG2 register reference in section 10.4.6.
Correct reference to description of tick-out connection SpaceWire router register descriptions under section 13.4.8.
Added description in section 4.11 of how to handle Ethernet TXCLK and mode switch to
Gigabit operation.
Clarify in section 29.1 that temperature sensor is disabled on current prototype and engineering model devices.
Update description of GRGPIO IFLAG register in section 22.3.10.
Add note about development board in new section 1.5.
Clarify trace point usage in sections 6.9.1 and 33.4.
Clarify in section 13.5.3 that SpaceWire router RTR.RTCOMB register is only accessible via
RMAP.
Rephrase unified cache description in Level-2 cache section 9.1.
Update Level-2 cache error injection description in sections 9.3.6 and 9.4.5.
Minor updates to supplies in section 39.
GR740-UM-DS, Nov 2017, Version 1.710www.cobham.com/gaisler
GR740
Table 1.
VersionDateNote
1.52016 November Update system frequency and package types on front page.
1.62017 MarchUpdate feature list on front page to mark interfaces subject to pin sharing.
Change record
Add bootstrap signal requirement for flight (section 3.2) and pin driver configuration (3.5)
Add note that it is the top part of the data bus that is used for PROM in 8-bit mode in section
3.3.1.
Restructure pin multiplexing tables 24,25,27 to have consistent naming with table 28.
Correct missing PROMIO_READ signal in table 28
Correct maximum number of SDRAM banks supported (4) in section 10.1, correct register
name in 10.5.4
Rename section 15 to Spacewire Debug Link for clarity.
Revised GRSPWROUTER section 13 for readability.
Add note that GPTIMER TCTRL LD is automatically cleared after load in section 20.3.
Major update to electrical characteristics section 39, update list of parameters, add power-up/
down sequencing, cold sparing, add MDIO diagram, update clock table, reference internal
clocks in diagrams, update thermal information and AC limits.
Add errata in section 43.
Update SpaceWire link speed on front page
Changed title of GRSPW2 (SpaceWire Debug Link) section
Clarify signal names in pin-multiplexing tables 24 and 25.
Add pin driver configuration section 3.5, add reference in section 13.1.
Remove references to PC133 SDRAM operation.
Added placement diagram in section 40.2.
Removed LEON4 section on partial WRPSR (unsupported due to errata)
Update section 1.1 (Scope).
Move description of Debug AHB bus and corresponding controller documentation to be last
bus described in document. This modifies section numbers for section 12 to 36.
Change order of IOMMU and SpaceWire router sections.
Update errata section 43 overview, added LVDS ESD sensitivity erratum.
Correct UART1_RXD signal name in table 24.
Added section 1.6 with reference to technical note on validation and benchmarking.
Updates under section 12 to clarify that bus selection can be made even if IOMMU is disabled.
Describe planned package dimension change in section 40.4.
Note that TESTEN should be connected to ground in section 3.4
Correct PCI_HOSTN signal name typo in table 27.
Correct PCIMODE_ENABLE heading in table 27.
Effect of bootstrap signal GPIO[15] was inverted. LOW enables full PROM/IO interface,
corrected in table 23 and section 3.3.1.
GR740-UM-DS, Nov 2017, Version 1.711www.cobham.com/gaisler
GR740
Table 1. Change record
VersionDateNote
1.72017 November Updated ordering information in section 42.
Updated placement diagram under section 40.
Add new package drawings in section 40.4.
Add information on booting over RMAP, changes in sections 1.7 and 5.3.
Add information about bridges, posted writes and AMBA ERROR response propagation to
sections 2.3, 5.10, 6.2.13, 6.3.5, 6.7.4, 10.5.1, 13.4.4.9, 13.4.5.7.2, 14.3.3, 14.4.4, 15.4.4,
16.4.5, 17.6.6, 19.7.1, 19.7.2, 19.8, 19.9, 35.5.9, 35.6.7, 35.7.2, 35.8.2, 35.9, 37.2.2.
Add information on PROM EDAC handling with multiple external devices in section 1.7
and 19.7.1.
Change errata section 43 to also include design changes between silicon revisions. Update
and add additional errata descriptions. Add silicon revision 1 column in table 602.
Document new L2 cache register fields in section 9.4.
Add partial WRPSR description to LEON4 section 6.2.16.
Extend LEON4 MMU TLB disable description in section 6.10.8.
Describe new IRQMP boot/monitor interface in sections 21.2.10 and 21.3.
Update GRGPIO interrupt flag register description in section 22.3.10.
Added description of AHB status register multiple error logging and filtering in section 27.
Correct number of up-counter bits in section 5.9.2.
Clarify timetag counter behaviour in sections 6.10.4 and 36.1.
Document PCI controller DFA bit in section 15.10.1.
Clarify PCI target supported byte-enables in section 15.5.3.
Update PCI DMA controller description in section 15.6.3.
Update register for bootstrap signals description in section 28.3 for silicon revision 1.
Add reference to GRLIB-AN-0004 in sections 1.7 and 6.11.4.
Indicate AHB and instruction trace buffer sizes in section 2.1.
Add note about using the MMU to mark memory as cacheable in section 6.3.6.
Describe SDRAM bus parking functionality in section 10.6.2.
Update description of SDRAM controller BANKSZ field in section 10.6.1.
Clarifications about internal and external SDRAM banks under section 10.
Update SpaceWire router configuration port memory range in sections 2.3 and 13.5.3.
Document SpaceWire router AMBA port interrupt in section 2.4 and table 193.
Describe SpaceWire TDP functionality added for silicon revision 1 in sections 3.1, 5.9.2, and
31.
Added information on SpaceWire receive rate in section 13.3.1.2. Clarify that t
t
in table 587 are valid assuming use of SpW PLL in nominal mode.
SPW5
Extend section 5.5 ASMP configurations to 5.5 Separation and ASMP configurations.
Add description of LEON4 %ASR16 register in section 6.10.2.
Updated LEON4 %ASR17 description in section 6.10.3.
Corrected range and recommended values of RTR.AMBADMACTRL.INTNUM register in
table 160.
Corrected range of RTR.ICODEGEN.IN register in table 193.
Update temperature sensor controller documentation in section 29.
Corrected field ranges in SPI controller mode register description in table 421.
Updated package references to CCGA/LGA on front page and in sections 40 and 42.
Update processor status monitoring description in 21.2.4.
Clarify that PROC_ERRORN is connected to processor 0 only, in section 6.2.13
Clarify bootstrap signal effects in section 3.1. Clarify that GPIO[7:6] are still used to disable
EDCL 1. Update clock gate unit conditions in section 25.
Add GRLIB-TN-0013 issue in section 43.2.27.
Clarify that WDOGN and ERRORN are open-drain in tables 28 and 597.
Updated Absolute Maximum Ratings and recommended operating conditions, adding overshoot specifications, in section 39.
SPW4
and
GR740-UM-DS, Nov 2017, Version 1.712www.cobham.com/gaisler
GR740
1.9Acronyms
Table 2. Acronyms
AcronymComment
AHBAdvanced High-performance bus, part of [AMBA]
AMBAAdvanced Microcontroller Bus Architecture
AMPSee ASMP
APBAdvanced Peripheral Bus, part of [AMBA]
ASMPAsymmetric Multi-Processing (in the context of this document: different OS instances run-
ning on own processor cores)
BCHBose-Hocquenghem-Chaudhuri, class of error-correcting codes
CANController Area Network, bus standard
CPUCentral Processing Unit, used to refer to one LEON4 processor core.
DCLDebug Communication Link. Provides a bridge between an external interface and on-chip
AHB bus.
DDRDouble Data Rate
DMADirect Memory Access
DSUDebug Support Unit
EDACError Detection and Correction
EDCLEthernet Debug Communication Link
FIFOFirst-In-First-Out, refers to buffer type
FPUFloating Point Unit
Gb
GB
GiB
Gigabit, 10
Gigabyte, 10
Gibibyte, gigabinary byte, 2
9
bits
9
bytes
30
bytes, unit defined in IEEE 1541-200
I/OInput/Output
IP, IPv4Internet Protocol (version 4)
ISRInterrupt Service Routine
JTAGJoint Test Action Group (developer of IEEE Standard 1149.1-1990)
kB
KiB
Kilobyte, 10
Kibibyte, 2
3
bytes
10
bytes, unit defined in IEEE 1541-2002
L2Level-2, used in L2 cache abbreviation
MACMedia Access Controller
Mb, Mbit
MB, Mbyte
MiB
Megabit, 10
Megabyte, 10
Mebibyte, 2
6
bits
6
bytes
20
bytes, unit defined in IEEE 1541-2002
OSOperating System
PCIPeripheral Component Interconnect
PROMProgrammable Read Only Memory. In this document used to signify boot-PROM.
RAMRandom Access Memory
RMAPRemote Memory Access Protocol
SEESingle Event Effects
SEL/SEU/SETSingle Event Latchup/Upset/Transient
GR740-UM-DS, Nov 2017, Version 1.713www.cobham.com/gaisler
GR740
Table 2. Acronyms
1.10Definitions
This section and the following subsections define the typographic and naming conventions used
throughout this document.
1.10.1 Bit numbering
The following conventions are used for bit numbering:
•The most significant bit (MSb) of a data type has the leftmost position
•The least significant bit of a data type has the rightmost position
AcronymComment
SMPSymmetric Multi-Processing
SPARCScalable Processor ARChitecture
TCPTransmission Control Protocol
UARTUniversal Asynchronous Receiver/Transmitter
UDPUser Datagram Protocol
•Unless otherwise indicated, the MSb of a data type has the highest bit number and the LSb the
lowest bit number
1.10.2 Radix
The following conventions is used for writing numbers:
•Binary numbers are indicated by the prefix "0b", e.g. 0b1010.
•Hexadecimal numbers are indicated by the prefix "0x", e.g. 0xF00F
•Unless a radix is explicitly declared, the number should be considered a decimal.
1.10.3 Data types
Byte (BYTE)8 bits of data
Halfword (HWORD)16 bits of data
Word (WORD)32 bits of data
Double word (DWORD)64 bits of data
Quad word (4WORD)128-bits of data
GR740-UM-DS, Nov 2017, Version 1.714www.cobham.com/gaisler
GR740
1.11Register descriptions
An example register, showing the register layout used throughout this document, can be seen
in table 3. The values used for the reset value fields are described in table 4, and the values
used for the field type fields are described in table 5. Fields that are named RESERVED,
RES, or R are read-only fields. These fields can be written with zero or with the value read
from the same register field.
<Reset value for EF3><Reset value for EF2><Reset value for EF1><Reset value for EF0>
<Field type for EF3><Field type for EF2><Field type for EF1><Field type for EF0>
31: 24Example field 3 (EF3) - <Field description>
23: 16Example field 2 (EF2) - <Field description>
15: 8Example field 1 (EF1) - <Field description>
7: 0Example field 0 (EF0) - <Field description>
Table 4. Reset value definitions
ValueDescription
0Reset value 0.
1Reset value 1. Used for single-bit fields.
0xNNHexadecimal representation of reset value. Used for multi-bit fields.
0bNNBinary representation of reset value. Used for multi-bit fields.
NRField not reset
*Special reset condition, described in textual description of the field. Used for example when reset
value is taken from a pin.
-Don’t care / Not applicable
Table 5. Field type definitions
ValueDescription
rRead-only. Writes have no effect.
wWrite-only. Used for a writable field in a register where the field’s read-value has no meaning.
rwReadable and writable.
rw*Readable and writable. Special condition for write, described in textual description of field.
wcWrite-clear. Readable, and cleared when written with a 1
casReadable, and writable through compare-and-swap. Only applies to SpaceWire Plug-and-Play regis-
ters.
GR740-UM-DS, Nov 2017, Version 1.715www.cobham.com/gaisler
GR740
2Architecture
2.1Overview
The system is built around five AMBA AHB buses; one 128-bit Processor AHB bus, one 128-bit
Memory AHB bus, two 32-bit I/O AHB buses and one 32-bit Debug AHB bus. The Processor AHB
bus houses four LEON4FT processor cores connected to a shared L2 cache. The Memory AHB bus is
located between the L2 cache and the main external memory interface (SDRAM) and attaches a memory scrubber.
The two separate I/O AHB buses connect peripherals. Slave interfaces of the PCI master/target and
PROM/IO memory controller are placed on one bus (Slave I/O AHB bus). All master/DMA interfaces
are placed on the other bus (Master I/O AHB bus). The Master I/O AHB bus connects to the Processor
AHB bus via an AHB/AHB bridge that provides access restriction and address translation (IOMMU)
functionality. The IOMMU also has an AHB master interface connected to the Memory AHB bus.
The AHB master interface to use when propagating traffic from a peripheral on the Master I/O AHB
bus is dynamically configurable.
Peripheral unit register interfaces such as timers, interrupt controllers, UARTs, general purpose I/O
port, SPI controller, MIL-STD-1553B interface, Ethernet MACs, CAN controllers, and SpaceWire
router AMBA interfaces are connected via two AHB/APB bridges that are attached to the Processor
AHB bus.
The fifth bus, a dedicated 32-bit Debug AHB bus, connects a debug support unit (DSU), one AHB
trace buffer monitoring the Master I/O AHB bus and several debug communication links. The Debug
AHB bus allows for non-intrusive debugging through the DSU and direct access to the complete system, as the Debug AHB bus is not placed behind an AHB bridge with access restriction functionality.
The chapters in this document have been grouped after the bus topology. The first chapters describe
components connected to the Processor AHB bus, followed by the Memory AHB bus, Master I/O
AHB bus and finally Slave I/O AHB bus, APB buses and Debug AHB bus.
GR740-UM-DS, Nov 2017, Version 1.716www.cobham.com/gaisler
GR740
The GR740 has the following on-chip functions:
•4x LEON4 SPARC V8 processor cores with MMU and GRFPU floating-point unit
•Level-2 cache, 4-ways, BCH protection, supports locking of 1-4 ways
•Debug Support Unit (DSU) with instruction (512 lines) and AHB trace (256 lines) buffers
•Ethernet, JTAG and SpaceWire debug communication links
•96-bit PC100 SDRAM memory controller with Reed-Solomon EDAC
•Hardware memory scrubber
•8/16-bit PROM/IO controller with BCH EDAC
•I/O Memory Management Unit (IOMMU) with support for eight groups of DMA units
•8-port SpaceWire router/switch with four on-chip AMBA ports with RMAP
•SpaceWire TDP controller
•2x 10/100/1000 Mbit Ethernet MAC
•32-bit 33 MHz PCI master/target interface with DMA engine
•MIL-STD-1553B interface controller
•2x CAN 2.0B controllers
•2x UART
•SPI master/slave controller
•Interrupt controller with extended support for asymmetric multiprocessing
•1x Timer unit with five timers, time latch/set functionality and watchdog functionality
•4x Timer unit with four timers and time latch/set functionality
•Separate AHB and PCI trace buffers
•Temperature sensor
•Clock gating unit
•LEON4 statistics unit (performance counters)
•Pad and PLL control unit
•AHB status registers
GR740-UM-DS, Nov 2017, Version 1.717www.cobham.com/gaisler
GR740
2.2Cores
The design is based on the following IP cores from the GRLIB IP Library:
GRETH_GBIT10/100/1000 Ethernet MAC with DCL140x010x01D
GRGPIOGeneral Purpose I/O Port220x010x01A
GRGPRBANKGeneral Purpose Register Bank300x010x08F
GRGPREGGeneral Purpose Register280x010x087
GRIOMMUAHB/AHB bridge with protection (IOMMU)120x010x04F
GRPCI2Fast 32-bit PCI bridge150x010x07C
GRSPW2SpaceWire codec with RMAP350x010x029
GRSPWROUTERSpaceWire router switch130x010x08B
GRSPWTDPSpaceWire - Time Distribution Protocol310x010x097
FTMCTRL8/16/32-bit memory controller with EDAC190x010x054
L2CACHELevel 2 cache90x010x04B
L4STATLEON4 statistical unit260x010x047
LEON4LEON4 SPARC V8 32-bit processor60x010x048
MEMSCRUBMemory scrubber110x010x057
SPICTRLSPI controller240x010x02D
GR740THSENSGR740 Temperature sensor controller290x010x099
sectionVendorDevice
The information in the last two columns is available via plug’n’play information in the system and is
used by software to detect units and to initialize software drivers.
GR740-UM-DS, Nov 2017, Version 1.718www.cobham.com/gaisler
GR740
2.3Memory map
The memory map of the internal AHB and APB buses as seen from the processor cores can be seen
below. Software does not need to be aware that a bridge is positioned between the processor and a
peripheral since the address mapping between buses is one-to-one.
0xFFEFF000 - 0xFFEFFFFFMemory bus plug&play areaMemory
0xFFF00000 - 0xFFFFEFFFUnusedProcessor
0xFFFFF000 - 0xFFFFFFFFProcessor bus plug&play areaProcessor
cessor AHB bus
Slave I/O AHB bus
tiplexed pins.
Processor
Processor
Processor
When connecting to the system via one of the debug communication links (JTAG, Ethernet, USB, or
SpaceWire) connected to the Debug AHB bus, several debug support peripherals will be visible.
Table 8 below lists the address map of these peripherals. Note that peripherals in the address range
0xE0000000 - 0xEFFFFFFF are not accessible from the processors or from any peripherals on the
Master I/O AHB bus. Accesses to this range from any peripheral not located on the Debug AHB bus
will result in an AMBA ERROR response (see also the AMBA ERROR propagation description in
section 5.10.). Apart from the area 0xE0000000 - 0xEFFFFFFF, the AMBA memory space seen via
the debug communication links is identical to the address space seen from other master in the system.
Accesses to unused AMBA AHB address space will result in an AMBA ERROR response, this
applies to the memory areas that are marked as "Unused" in the table above. Accesses to unused areas
located on one of the AHB/APB bridges will not have any effect, note that these unoccupied address
ranges are not
marked as "Unused" in the table above. No AMBA ERROR response will be given for
memory allocated to one of the APB bridges. See also the AMBA ERROR propagation description in
section 5.10.
GR740-UM-DS, Nov 2017, Version 1.720www.cobham.com/gaisler
GR740
Table 8. AMBA address range 0xE0000000 - 0xEFFFFFFF on Debug AHB bus
PeripheralAddress rangeComment
DSU40xE0000000 - 0xE07FFFFF
0xE1000000 - 0xE17FFFFF
0xE2000000 - 0xE27FFFFF
0xE3000000 - 0xE37FFFFF
APBBRIDGED0xE4000400 - 0xE40FFFFF APB bridge on Debug AHB bus
A
GRSPW20xE4000000 - 0xE40000FFSpaceWire RMAP target with AMBA interface
P
L4STAT0xE4000200 - 0xE40003FFLEON4 Statistics unit, secondary port
APBBRIDGED0xE40FFF00 - 0xE40FFFFFDebug APB bus plug&play area
0xE4100000 - 0xEEFFFFFFUnused
AHBTRACE0xEFF00000 - 0xEFF1FFFFAHB trace buffer, tracing master I/O AHB bus
0xEFF20000 - 0xEFFFEFFFUnused
0xEFFFF000 - 0xEFFFFFFFDebug AHB bus plug&play area
Debug Support Unit area for processor 0
Debug Support Unit area for processor 1
Debug Support Unit area for processor 2
Debug Support Unit area for processor 3
GR740-UM-DS, Nov 2017, Version 1.721www.cobham.com/gaisler
GR740
2.4Interrupts
The table below indicates the interrupt assignments. Note that the table below describes interrupt bus
lines, these can be remapped in the interrupt controller.
Table 9. Interrupt assignments
InterruptPeripheralComment
1GPTIMER0GPTIMER unit 0, timer 1
2GPTIMER0GPTIMER unit 0, timer 2
3GPTIMER0GPTIMER unit 0, timer 3
4GPTIMER0GPTIMER unit 0, timer 4
5GPTIMER0GPTIMER unit 0, timer 5
6GPTIMER1Shared interrupt for all timers on GPTIMER unit 1
7GPTIMER2Shared interrupt for all timers on GPTIMER unit 2
8GPTIMER3Shared interrupt for all timers on GPTIMER unit 3
9GPTIMER4Shared interrupt for all timers on GPTIMER unit 4
10IRQ(A)MPExtended interrupt line.
11GRPCI/PCIDMAPCI master/target and PCI DMA
12UnassignedSuitable for use by software for inter-processor and
13Unassigned
14Unassigned
15UnassignedNote: Not maskable by processor
16GRGPIO0 /1 / CANThe GPIO port has configuration registers that deter-
27AHBSTAT/ST65THSENSShared by all AHB Status registers in design and by
28MEMSCRUB/L2CACHEMemory scrubber and L2 cache
29APBUART0UART 0
30APBUART1UART 1
31GRIOMMU / GRSPWTDP /
SPWROUTER
inter-process synchronization.
mine the mapping between general purpose I/O lines
and the four interrupt lines allocated to the GPIO port.
Interrupt lines 16 -18 are shared between the GPIO port
and CAN controllers.
Interrupt line 19 is shared between the GPIO port and
the SPI controller.
temperature sensor.
IOMMU register interface interrupt.
CCSDS TDP controller interrupt
SpaceWire router AMBA configuration port interrupt
(only applies to silicon revision 1)
2.5Plug & play and bus index information
The format of GRLIB AMBA Plug&play information is given in sections 37 and 38. The address
ranges of the plug&play configuration areas are given in the preceding section and is also replicated
GR740-UM-DS, Nov 2017, Version 1.722www.cobham.com/gaisler
GR740
for each unit in the tables below. The plug&play areas are used by software to detect the system-onchip architecture. The values in the tables below are fixed. The tables also include the bus indexes for
all masters and slaves on the system’s AHB and APB buses.
The plug & play memory map and bus indexes for AMBA AHB masters on the Processor AHB bus
are shown in table 10.
Table 10. Plug & play information for masters on Processor AHB bus
The bus index for the AMBA AHB slave on the Master I/O AHB bus is shown in table 19.
Table 19. Bus index information for slaves on Master I/O AHB bus
SlaveIndexFunctionAddress range
GRIOMMU0IOMMU slave interfaceNot applicable
The plug & play memory map and bus indexes for AMBA APB slaves connected via the AHB/APB
bridges on the Slave I/O AHB bus are shown in tables 20 and 21.
Table 20. Plug & play information for APB slaves connected via the first APB bridge on Slave I/O AHB bus
GR740-UM-DS, Nov 2017, Version 1.726www.cobham.com/gaisler
GR740
3Signals
3.1Bootstrap signals
The power-up and initialisation state is affected by several external signals as shown in table 23. The
bootstrap signals taken via GPIO are saved when the on-chip system reset is released. This occurs
after deassertion of the SYS_RESETN input and lock of all active PLLs (see also reset description in
section 4). This means that if a peripheral, such as the Ethernet controller, is clock gated off and then
reset and enabled at a later time, the bootstrap signal value will be taken from the saved value present
in a general purpose register described in section 28. See also section 4.9 for further information on
the conditions for clock gating per peripheral.
Table 23. Bootstrap signals
Bootstrap signalDescription
DSU_ENEnables the Debug Support Unit (DSU) and other members connected to the Debug AHB bus. If
BREAKPuts all processors in debug mode when asserted while DSU_EN is HIGH. When DSU_EN is
PCIMODE_ENABLEEnables PCI mode. If the bootstrap signal MEM_IFWIDTH is HIGH then PCIMODE_EN-
MEM_IFWIDTHSelects the width of SDRAM interface. If this signal is LOW then the external memory interface
MEM_CLKSELThe value of this signal determines the clock source for the SDRAM memory. If this signal is
GPIO[5:0]Sets the least significant address nibble of the IP and MAC address for Ethernet Debug Commu-
DSU_EN is HIGH the DSU and the Debug AHB bus will be clocked. If DSU_EN is LOW the
DSU and all members on the Debug AHB bus will be clock gated off.
A special case exists for the Ethernet controllers. These controller have master interfaces connected to the Debug AHB bus and debug traffic can optionally be routed to this bus. If DSU_EN
is LOW then the Ethernet Debug Communications Link (EDCL) functionality will be disabled
and the Ethernet controllers will be clock gated off after reset. If DSU_EN is HIGH then the
Ethernet controller clocks will be enabled. With DSU_EN HIGH, the EDCL functionality will
be further configured by GPIO[7:0] as described further down in this table.
LOW, BREAK is assigned to the timer enable bit of the watchdog timer and also controls if the
first processor starts executing after reset.
ABLE selects if the top-half of the SDRAM interface should be used for the PCI controller
(HIGH) or Ethernet port 1 (LOW).
uses 64 data bits with up to 32 check bits. If this signal is HIGH then the external memory interface uses 32 data bits with up to 16 check bits and the top half of the SDRAM interface is used
for PCI or Ethernet port 1, as determined by the PCIMODE_ENABLE bootstrap signal.
low then the memory clock and the system clock has the same source, otherwise the source for
the memory clock is the MEM_EXTCLOCK clock input.
nication Link (EDCL) 0 and 1. GPIO [1:0] is also connected to the SpaceWire TDP controller:
For the Ethernet controllers:
GPIO[1:0] sets the least significant bits of the nibble for EDCL 0 and EDCL1
GPIO[3:2] sets the top nibble bits for EDCL 0 and GPIO[5:4] set the top nibble bits for EDCL1.
It is possible to disable the EDCLs at reset with bootstrap signals. As mentioned, when DSU_EN
is LOW then the EDCLs will be disabled. EDCL 0 is also disabled if GPIO[3:0] is set to 0b1111
when Ethernet controller 0 leaves reset. EDCL 1 is disabled when GPIO[7:4] is set to 0b1111
when Ethernet controller 1 leaves reset. Note that this means that the disable condition for EDCL
1 makes use of the bootstrap signals GPIO[7:6] that are used to configure SpaceWire router distributed interrupts.
The connections to the SpaceWire TDP controller are as follows:
GPIO[0] is connected to the set elapsed time input, see section 31.3.11.
GPIO[1] is connected to the increment elapsed time input, see section 31.3.3.
Note: The TDP connections are only available in silicon revision 1.
GR740-UM-DS, Nov 2017, Version 1.727www.cobham.com/gaisler
"00" - Interrupts with acknowledgment mode (32 interrupts with acknowledgments);
"01" - Extended interrupt mode (64 interrupts, no acknowledgments);
"10" - Distributed interrupts disabled, all Dist. Interrupt codes treated as Time-Codes;
"11" - Dist. interrupt disabled, Control code treated as Time-Code if CTRL flags are zero.
GPIO[9:8]Selects if Ethernet Debug Communication Link 0 (GPIO[8]) and Link 1(GPIO[9]) traffic should
be routed over the Debug AHB bus (HIGH) or the Master I/O AHB bus (LOW).
GPIO[10]Selects the PROM width. 0: 8-bit PROM, 1: 16-bit PROM
GPIO[11]Controls the clock gate settings for the SpaceWire router.
GPIO[13:12]Sets the two least significant bits of the SpaceWire router’s instance ID.
GPIO[14]Controls reset value of PROM/IO controller’s PROM EDAC enable (PE) bit. When this input is
’1’ at reset, EDAC checking of the PROM area will be enabled.
GPIO[15]Selects if the PROM/IO interface should be enabled after reset. If this signal is LOW then the
PROM/IO interface is enabled. Otherwise the PROM/IO interface pins are routed to their alternative functions.
PLL_BYPASS[2:0]Bypass PLL and use clock input directly. 2: SpW clock, 1: SDRAM clock, 0: System clock PLL
bypass.
PLL_IGNLOCKThe PLL outputs of the device are gated until the PLL lock outputs have been asserted. Setting
this signal HIGH disables this clock gating for all PLLs, and also removes the lock signals from
the reset generation.
3.2Configuration for flight
To achieve the intended radiation tolerance in flight, certain bootstrap signals must be held at a fixed
configuration:
•DSU_EN must be held low (disabling debug interfaces)
•JTAG_TRST must be held low (disabling the JTAG TAP)
3.3Pin multiplexing
The device shares pin between the following groups of interfaces:
•Part of the PROM/IO interface shares pins with UART 0, UART 1, CAN 0, CAN 1, SpaceWire
debug and MIL-STD-1553B. The pins can also be controlled as general-purpose I/O.
•The top half of the SDRAM interface shares pins with PCI and Ethernet port 1.
The sections below describes multiplexing for the affected interfaces. Section 30 describes the peripheral through which software controls the multiplexing.
3.3.1PROM/IO interface multiplexing
The selection between the PROM/IO interface and the other low-speed interfaces on the same pins is
done at boot time via the bootstrap signal GPIO[15]. When GPIO[15] is LOW during reset, then the
full PROM/IO interface will be available. When GPIO[15] is HIGH after reset, the alternative function is routed to the shared pins.
The multiplexing has been designed so that even if starting with all the multiplexed pins set to their
alternative (peripheral) mode, enough dedicated PROM/IO pins are still available to access an 8-bit,
64 KiB boot PROM for bootstrapping the system. Note that it is the top part of the data bus (PROMIO_DATA[15:8]) that is used for the PROM in 8-bit mode.
GR740-UM-DS, Nov 2017, Version 1.728www.cobham.com/gaisler
GR740
After reset, the setting can be reconfigured on a pin by pin basis by software using a register interface
(see the General Purpose Register Bank section). The register interface can also reconfigure the multiplexed I/O:s to function as general-purpose I/Os.
If only a subset of the alternative functions are desired and a larger PROM or IO interface is desired,
then GPIO[15] should be kept LOW during reset and software can then during boot assign a subset of
the signals to alternative functions. In this case, the effect of address lines tied to peripherals on the
board toggling during the first PROM accesses before they have been re-configured to their correct
function will need to be considered at the system design level.
A few inputs belonging to the SpaceWire debug and UART CTS signals are shared with GPIO bus
pins without any explicit multiplexing, these inputs are simply connected to both functions at the
same time. Note that the UART CTS signals are ignored by default and will therefore not affect
UART operation unless flow control is enabled in the UART’s control register.
Table 24. Multiplexed PROM/IO interface pins with alternative functions and control register bit position
Register
Pin name*
Primary functionAlternative functionGPIO2 function
GR740-UM-DS, Nov 2017, Version 1.729www.cobham.com/gaisler
GR740
Table 25. Shared GPIO interface pins with slow interfaces
Pin name* Primary functionSecond function
SignalDirSignalDir
GPIO[7](as pin name)IOSPWD_RXDI
GPIO[6](as pin name)IOSPWD_RXSI
GPIO[5](as pin name)IOUART0_CTSNI
GPIO[4](as pin name)IOUART1_CTSNI
* See section 40.3 for pin assignments
3.3.2SDRAM interface multiplexing
The top half of the SDRAM interface shares pins with PCI and Ethernet port 1. The selection between
full SDRAM, PCI and Ethernet is made with the bootstrap signals MEM_IFWIDTH and PCIMODE_ENABLE.
This configuration is static and should be kept constant during the runtime
of the device (a change will require a full reset of the device). Some of the data mask (DQM)
bits are used as clock inputs in the alternative modes, and their direction will therefore depend on configuration.
Table 26. Selection between SDRAM, PCI and Ethernet 1
MEM_IFWIDTH PCIMODE_ENABLESDRAM interfaceEthernet port 1PCI
0064 data bits, 32 check bits UnavailableUnavailable
1
1032 data bits, 16 check bits AvailableUnavailable
1UnavailableAvailable
Table 27. Multiplexed SDRAM interface pins with PCI or Ethernet interfaces
Pin name*
MEM_DQ[95](as pin name)IOETH1_TXD[7]OPCI_AD[31]IO
MEM_DQ[94](as pin name)IOETH1_TXD[6]OPCI_AD[30]IO
MEM_DQ[93](as pin name)IOETH1_TXD[5]OPCI_AD[29]IO
MEM_DQ[92](as pin name)IOETH1_TXD[4]OPCI_AD[28]IO
MEM_DQ[91](as pin name)IOETH1_TXD[3]OPCI_AD[27]IO
MEM_DQ[90](as pin name)IOETH1_TXD[2]OPCI_AD[26]IO
MEM_DQ[89](as pin name)IOETH1_TXD[1]OPCI_AD[25]IO
MEM_DQ[88](as pin name)IOETH1_TXD[0]OPCI_AD[24]IO
MEM_DQ[87](as pin name)IOETH1_TXENOPCI_AD[23]IO
MEM_DQ[86](as pin name)IOETH1_TXEROPCI_AD[22]IO
MEM_DQ[85](as pin name)IO(none)IPCI_AD[21]IO
MEM_DQ[84](as pin name)IO(none)IPCI_AD[20]IO
MEM_DQ[83](as pin name)IO(none)IPCI_AD[19]IO
MEM_DQ[82](as pin name)IO(none)IPCI_AD[18]IO
SDRAM function
(MEM_IFWIDTH=LOW)
SignalDirSignalDirSignalDir
ETHERNET1 function
(MEM_IFWIDTH=HIGH,
PCIMODE_ENABLE=LOW)
PCI function
(MEM_IFWIDTH=HIGH,
PCIMODE_ENABLE=HIGH)
GR740-UM-DS, Nov 2017, Version 1.730www.cobham.com/gaisler
GR740
Table 27. Multiplexed SDRAM interface pins with PCI or Ethernet interfaces
Pin name*
MEM_DQ[81](as pin name)IO(none)IPCI_AD[17]IO
MEM_DQ[80](as pin name)IO(none)IPCI_AD[16]IO
MEM_DQ[63](as pin name)IOETH1_RXD[7]IPCI_AD[15]IO
MEM_DQ[62](as pin name)IOETH1_RXD[6]IPCI_AD[14]IO
MEM_DQ[61](as pin name)IOETH1_RXD[5]IPCI_AD[13]IO
MEM_DQ[60](as pin name)IOETH1_RXD[4]IPCI_AD[12]IO
MEM_DQ[59](as pin name)IOETH1_RXD[3]IPCI_AD[11]IO
MEM_DQ[58](as pin name)IOETH1_RXD[2]IPCI_AD[10]IO
MEM_DQ[57](as pin name)IOETH1_RXD[1]IPCI_AD[9]IO
MEM_DQ[56](as pin name)IOETH1_RXD[0]IPCI_AD[8]IO
MEM_DQ[55](as pin name)IOETH1_RXDVIPCI_AD[7]IO
MEM_DQ[54](as pin name)IOETH1_RXERIPCI_AD[6]IO
MEM_DQ[53](as pin name)IOETH1_COLIPCI_AD[5]IO
MEM_DQ[52](as pin name)IOETH1_CRSPCI_AD[4]IO
MEM_DQ[51](as pin name)IOETH1_MDINTPCI_AD[3]IO
MEM_DQ[50](as pin name)IO(none)IPCI_AD[2]IO
MEM_DQ[49](as pin name)IO(none)IPCI_AD[1]IO
MEM_DQ[48](as pin name)IO(none)IPCI_AD[0]IO
MEM_DQ[47](as pin name)IO(none)IPCI_CBE[3]IO
MEM_DQ[46](as pin name)IO(none)IPCI_CBE[2]IO
MEM_DQ[45](as pin name)IO(none)IPCI_CBE[1]IO
MEM_DQ[44](as pin name)IO(none)IPCI_CBE[0]IO
MEM_DQ[43](as pin name)IO(none)IPCI_FRAMEIO
MEM_DQ[42](as pin name)IO(none)IPCI_REQO
MEM_DQ[41](as pin name)IO(none)IPCI_GNTI
MEM_DQ[40](as pin name)IO(none)IPCI_IRDYIO
MEM_DQ[39](as pin name)IO(none)IPCI_TRDYIO
MEM_DQ[38](as pin name)IO(none)IPCI_PARIO
MEM_DQ[37](as pin name)IO(none)IPCI_PERRIO
MEM_DQ[36](as pin name)IO(none)IPCI_SERRIO
MEM_DQ[35](as pin name)IO(none)IPCI_DEVSELIO
MEM_DQ[34](as pin name)IO(none)IPCI_STOPIO
MEM_DQ[33](as pin name)IO(none)IPCI_INTAIO
MEM_DQ[32](as pin name)IO(none)IPCI_INTBI
MEM_DQM[11] (as pin name)OETH1_GTXCLKIPCI_M66ENI
MEM_DQM[10] (as pin name)OETH1_TXCLKIPCI_HOSTNI
MEM_DQM[7](as pin name)OETH1_RXCLKIPCI_IDSELI
MEM_DQM[6](as pin name)O(none)IPCI_CLKI
MEM_DQM[5](as pin name)O(none)IPCI_INTCI
MEM_DQM[4](as pin name)O(none)IPCI_INTDI
* See section 40.3 for pin assignments
SDRAM function
(MEM_IFWIDTH=LOW)
SignalDirSignalDirSignalDir
ETHERNET1 function
(MEM_IFWIDTH=HIGH,
PCIMODE_ENABLE=LOW)
PCI function
(MEM_IFWIDTH=HIGH,
PCIMODE_ENABLE=HIGH)
GR740-UM-DS, Nov 2017, Version 1.731www.cobham.com/gaisler
GR740
3.4Complete signal list
The listing below shows all interface signals, sorted by interface. Some of these signals are located on
shared pins as indicated in the table, therefore some physical pins will map to more than one entry in
this table, with the pin name taken from the primary function of that pin. Section 3.3 and the device
pin assignments in section 40.3 detail the pin sharing.
Table 28. All external signals, before pin sharing
CAN_RXD[1:0]CAN controller, receive data (shares pin
with PROM/IO interface)
CAN_TXD[1:0]CAN controller, transmit data (shares pin
with PROM/IO interface)
TESTENTest enable signal.
This signal puts the device in test mode.
Connect to ground.
PLL_BYPASS[2:0]Bypass PLL. See description of bootstrap
signals.
PLL_IGNLOCKIgnore PLL lock. See description of boot-
strap signals.
PLL_LOCKED[5:0]PLL coarse/fine lock. See description in
clocking section
See 3.3.1InHigh
See 3.3.1InHigh
See 3.3.1OutHigh
See 3.3.1InHigh
See 3.3.1InHigh
See 3.3.1InHigh
See 3.3.1OutHigh
See 3.3.1OutHigh
See 3.3.1In-
See 3.3.1Out-
NoInHigh
NoInHigh
NoInHigh
NoOutHigh
3.5Pin driver configuration
The drive strength of the single-ended outputs in the device are software programmable through the
general-purpose register bank (see section 30).
LVDS drivers that are not used in the application can be turned off to save power. This is controlled
via the register bank interface. Note that there is no automatic turning off of the LVDS drivers of disabled or inactive SpaceWire links in this device, so this must be managed by the application software.
Applications not using the SpaceWire router at all are recommended to disable all Spacewire LVDS
drivers during boot.
GR740-UM-DS, Nov 2017, Version 1.735www.cobham.com/gaisler
GR740
4Clocking and reset
4.1Clock inputs
The table below specifies the clock inputs to the device.
Table 29. Clock inputs
Clock inputDescriptionRecommended frequency
SYS_CLKSystem clock input. A clock based on this clock input via PLL
MEM_EXTCLKAlternative memory interface clock. Clock that either directly, or
SPW_CLKSpaceWire clock. Clock that either directly, or through a PLL (rec-
JTAG_TCKJTAG clock10 MHz
ETH0_GTXCLKEthernet Gigabit MAC 0 clock125 MHz
ETH0_TXCLKEthernet MAC 0 transmit clock25 MHz
ETH0_RXCLKEthernet MAC 0 receive clock25 MHz (MII)
ETH1_GTXCLKEthernet Gigabit MAC 1 clock125 MHz
ETH1_TXCLKEthernet MAC 1 transmit clock25 MHz
ETH1_RXCLKEthernet MAC 1 receive clock25 MHz (MII)
PCI_CLKPCI interface clock66 or 33 MHz (TBD)
GR1553_CLKMIL-STD-1553B interface clock20 MHz
50 MHz
(unless PLL is bypassed) is used to clock the processors, on-chip
buses and on-chip peripherals.
50 MHz
through a PLL, provides an alternative clock for the SDRAM memory interface. See description in table 23, section 3.1.
50 MHz
ommended operating mode), provides a clock for the SpaceWire
interfaces. See also sections 13.3.1.2 and 13.3.2.
125 MHz (GMII)
125 MHz (GMII)
The design makes use of clock multipliers to create the system clock, memory interface clock, and the
SpaceWire transmitter clock.
4.2Clock loop for SDRAM
Due to the drive strength limitations, the device may not be suitable to feed the clock directly to
SDRAMs at higher speeds. The device therefore implements a clock looping scheme for the SDRAM
clock, where the generated SDRAM clock goes out on either the single-ended MEM_CLK_OUT or
the differential MEM_CLK_OUT_DIFF output, should then on the PCB be split and fed both to the
SDRAM and back to the device’s mem_clk_in input. In the device, the MEM_CLK_IN input clocks
both the SDRAM interfacing registers as well as the SDRAM controller. See figure 1.
Both the differential and single-ended clock outputs are on by default after reset, software can during
boot disable the output that is unused in order to avoid unnecessary switching activity.
While what is described above is the intended usage, technically there is no requirement that the clock
fed to the MEM_CLK_IN input is related in frequency or phase to the clock going out the loop or any
other clock in the system. Other ways of generating the SDRAM clocks such as external PLL:s are
also possible.
Note: The external feedback loop is always required, no matter which clock source that is selected.
The memory controller SDRAM domain is never clocked internally, only through MEM_CLK_IN.
GR740-UM-DS, Nov 2017, Version 1.736www.cobham.com/gaisler
GR740
4.3Reset scheme
The device has an on-chip reset generator that creates a reset signal that is fed to the rest of the system.
This is asynchronously asserted when the external SYS_RESETN input is asserted and synchronously
deasserted a few cycles after the SYS_RESETN input has been deasserted.
The reset generation also considers the locking status of the PLLs, and will not deassert reset until the
PLL:s have achieved lock. In the event PLL lock is lost, the system will again go into reset. Only the
lock signals of PLLs that are used (not in bypass, or deselected by MEM_CLKSEL) are considered. If
external PLLs are also used on the board, a separate input SYS_EXTLOCK is available to allow also
including the lock status of these PLLs in the reset generation.
Where this default behavior is unwanted, the PLL_IGNLOCK bootstrap signal, when tied HIGH, will
cause the lock statuses of the internal PLLs to be ignored (treated as always in lock) in the reset generation. The SYS_EXTLOCK signal is never ignored. Since all the lock signals are available on package pins, custom lock handling can be implemented on board level.
The bootstrap signal sampling, the general purpose register bank, and the PLL reconfiguration module have separate reset generation that is only reset when the master resetn signal is asserted and will
not be affected by PLL lock status.
The JTAG_TRST input asynchronously resets the JTAG TAP in the device. This can be asserted at
any time while the device is running without affecting device function provided that a JTAG debug
access into the system is not currently in progress. The JTAG_TRST input must be asserted on powerup to ensure that the TAP instruction register can not power-up set to a test command. If JTAG is
unused, JTAG_TRST should be tied low on the board.
Other peripherals, such as Ethernet, SpaceWire and PCI are all reset via internal signals generated
from the SYS_RESETN input and PLL lock signals, as described above. To ensure proper reset of all
the clock domains in the device, care must be taken to ensure that all external clocks for interfaces that
will be used are active and toggling before the interface is enabled and ungated in the clock gating
unit.
GR740-UM-DS, Nov 2017, Version 1.737www.cobham.com/gaisler
GR740
Figure 1. GR740 clock multiplexing
SYSPLL
sys_clkin
pll_locked[1:0]
MEMPLL
mem_extclk
Clock to
SDRAM controller
&
dsu_en
Debug bus and debug unit clocks
Clock Gating
Unit
Gated CPU, FPU
and peripheral clocks
control registers
cpu idle
System clock
SPWPLL
spw_clk
to SPW codec
gr1553_clk
eth0*clk
jtag_tck
to 1553 codec
to GRETH0
to TAP and scan chain
1
0
mem_clksel
mem_clk_in
1
0
1
0
1
0
pll_bypass[0]
pll_bypass[1]
pll_bypass[2]
mem_clk_out
mem_clk_out_diff
1
0
mem_dqm[11]
&
&
&
&
mem_dqm[10]
mem_dqm[7]
mem_ifwidth AND pcimode_enable
mem_ifwidth AND (NOT pcimode_enable)
to PCI core
to GRETH1
to GRETH1
to GRETH1
1
0
Pos clk
Neg clk
mem_dqm[6]
pll_locked[3:2]
pll_locked[5:4]
Clock to
SDRAM devices
external clock split/loop
4.4Clock multiplexing for main system clock, SDRAM and SpaceWire
The diagram below shows how the clocks are multiplexed in the design.
GR740-UM-DS, Nov 2017, Version 1.738www.cobham.com/gaisler
GR740
4.5PLL control and configuration
Each PLL is put in power down mode whenever either:
•The master reset signal SYS_RESETN is asserted
•The PLL reconfiguration is commanded to reprogram the PLLs
•The PLL is set to be bypassed using the PLL_BYPASS bootstrap signal
The rest of the PLL configuration is controlled by the PLL reconfiguration unit. When SYS_RESETN
is asserted this will be reset asynchronously to the default configuration. The reconfiguration unit can
then be reprogrammed to other PLL configurations via the general purpose register bank interface
(see section 30.2.4).
The configuration values tabulated below are the only supported configurations, other configurations
are invalid and may lead to malfunction. Note also that when overclocking the device by exceeding
the maximum clock frequencies given in the datasheet, correct functionality is not guaranteed and
power consumption may exceed typical values.
Table 30. Supported SYSPLL configurations
SYSPLL
Config word
00001010140-85 MHz (50 MHz
00000110033.3-70 MHz6 x SYS_CLK1 x SYS_CLK
00000101025-53 MHz8 x SYS_CLK 2 x SYS_CLK
SYS_CLK
Input range
nom)
System clock
5 x SYS_CLK
(250 MHz nom)
Memory clock
if
MEM_CLKSEL=LOW
1 x SYS_CLKDefault configuration
Comment
Table 31. Supported MEMPLL configurations
MEMPLL
Config word
00000101025-53 MHz2 x MEM_EXTCLKDefault configuration
Table 32. Supported SPWPLL configurations
SPWPLL
Config word
00001000025-53 MHz8 x SPW_CLKDefault configuration
00000110033.3-70 MHz6 x SPW_CLK
MEM_EXTCLOCK
Input range
SPW_CLK
Input range
Memory clock if
MEM_CLKSEL=HIGH
SpaceWire clock Comment
Comment
GR740-UM-DS, Nov 2017, Version 1.739www.cobham.com/gaisler
GR740
4.6PLL watchdog
An additional watchdog is included in the system to detect if the main system clock stops running due
to PLL malfunction or other unforeseen issue. The PLL watchdog is combined with the regular
watchdog (GPTIMER0 timer 5) status and output on the WDOGN open-drain output. The watchdog
has no other effect on the system so if no watchdog functionality is wanted then the WDOGN output
can be ignored.
The PLL watchdog is clocked by the SYS_CLK input clock, and will trigger after 100 million SYS_CLK cycles (2.0 seconds at the nominal 50 MHz input frequency) unless it is restarted. It is restarted
whenever the GPTIMER0 TCTRL5 register is written (regardless of value written). Since this register
is written as part of the normal system watchdog handling, the PLL watchdog will not need any additional handling by software. The timeout value is fixed and can not be reprogrammed, and the current
status of the PLL watchdog is not accessible from software.
4.7PCI clock
The PCI clock is taken from the MEM_DQM11 signal when the SDRAM is in half-width and PCI
mode is enabled. The device is capable of 33 MHz and 66 MHz operation (TBC). The input signal
PCI_M66EN must reflect the frequency of the input PCI clock. PCI_M66EN should be HIGH if the
PCI clock is a 66 MHz clock and LOW if the PCI clock frequency is 33 MHz.
4.8MIL-STD-1553B clock
The 20 MHz clock for the MIL-STD-1553B codec is taken from the dedicated pin GR1553_CLK.
4.9Clock gating unit
The design has a clock gating unit through which individual units can have their AHB clocks enabled/
disabled and resets driven. The peripherals connected to the clock gating unit are listed in the table
below.
Table 33. Devices with gatable clock
DeviceState after system reset
Ethernet MAC 0The Ethernet MACs are gated off after reset unless the Debug Sup-
Ethernet MAC 1
SpaceWire routerThe SpaceWire router is disabled after reset unless general purpose
PCI Target/Initiator and PCI DMA unitEnabled after reset if PCIMODE_ENABLE=HIGH. Otherwise
MIL-STD-1553B interface controllerDisabled after reset
CAN 2.0 controllerDisabled after reset
LEON4 Statistics unitDisabled after reset
UART 0Enabled after reset
UART 1Enabled after reset
SPI controllerDisabled after reset
PROM/IO memory controllerEnabled after reset
port Unit is enabled via the DSU_EN signal.
Ethernet MAC 1 is also disabled whenever mem_ifwidth is LOW
or PCI mode is enabled (PCIMODE_ENABLE = HIGH)
I/O line 11 (GPIO[11]) signals a prom-less system
disabled.
The LEON4 processor cores will automatically be clock gated when the processor enters power-down
or halt state. A processor’s floating-point unit (GRFPU) will be clock gated when the corresponding
processor has disabled FPU operations by setting the %psr.ef bit to zero, or when the processor has
entered power-down/halt mode. After reset, processors 1 to 3 will be in power-down mode. Processor
0 will start executing if the BREAK bootstrap signal is LOW. If the BREAK bootstrap signal is HIGH
GR740-UM-DS, Nov 2017, Version 1.740www.cobham.com/gaisler
GR740
then processor 0 will also enter power-down mode. If the device has debug mode enabled via the
DSU_EN signal (=HIGH) then the processors will enter debug mode instead of power-down mode.
For more information see chapter about the clock gating unit, section 25.
4.10Debug AHB bus clocking
All members of the Debug AHB bus will be gated off when the DSU_EN signal is low.
4.11Notes on Ethernet interface clock and mode switch
The Ethernet interface transmit clocks (ETH0_TXCLK and ETH1_TXCLK) are used internally in the
device to clock registers that selects between 10/100 and 1000 Mbit (Gigabit) mode for the respective
Ethernet controller. The default PHY (Ethernet transceiver) behaviour is to enter 10/100 Mbit mode
after reset. When a mode switch is made to 1000 Mbit mode, a signal will change internally in the
Ethernet controller. For this signal change to propagate through the register that selects between 10/
100 and 1000 Mbit mode the corresponding TXCLK must be present. Ethernet PHYs may disable the
TXCLK when entering 1000 Mbit mode and this may cause the internal register value in the GR740
to remain at the 10/100 Mbit value after the PHY has entered 1000 Mbit operation. When this happens
the system will not be able to transmit Ethernet traffic.
When the Ethernet debug communication link (EDCL) is enabled then the Ethernet controller will
automatically try to configure the PHY after reset. If the device is connected to a Gigabit network then
care must be taken to ensure that the TXCLK is available after the PHY has switched to 1000 Mbit
mode.
If the device will only be used on a 10/100 Mbit network then the TXCLK inputs can be connected
directly to the PHY 10/100 transmit clock. If the device will only be used on a 1000 Mbit network
then the Gigabit transmit clock can be connected to the TXCLK input.
If the device should adapt to both 10/100 Mbit networks and 1000 Mbit networks then the TXCLK
input(s) should be connected to the PHY 10/100 transmit clock(s). Software then needs to perform a
special sequence when the PHY has determined that it is connected to a Gigabit network: If the software driver finds that the device is connected to a Gigabit network then software needs to force the
PHY into 10/100 Mbit mode in order to enable the TXCLK. Software can then re-enable 1000 Mbit
operation. Note that this allows the system to adapt to both 10/100 networks and 1000 Mbit networks.
With this configuration, the EDCL will still be unavailable after reset when connected to a Gigabit
network.
GR740-UM-DS, Nov 2017, Version 1.741www.cobham.com/gaisler
GR740
5Technical notes
5.1GRLIB AMBA plug&play scanning
The bus structure in this design requires some special consideration with regard to plug&play scanning. The default behavior of GRLIB AMBA plug&play scanning routines is to start scanning at
address 0xFFFFF000. If any AHB/AHB bridges, APB bridges or L2 cache controllers are detected
during the scan, the general scanning routine traverses the bridge and reads the plug&play information from the bus behind the bridge. In this design, the default 0xFFFFF000 address gives plug&play
information for the Processor AHB bus. This plug&play area contains information which allows software to detect all devices on the Processor, Slave I/O, Master I/O and Memory AHB buses.
The plug&play information on the Processor bus does not contain any references to the plug&play
information on the Debug AHB bus, nor can the peripherals on the Debug AHB bus be accessed from
the Processor AHB bus as the buses are connected using a uni-directional bridge. In order to detect the
peripherals on the Debug AHB bus, the debug monitor used must be informed about the memory map
of the bus, or be instructed to start plug&play scanning at address 0xEFFFF000 from where all the
other plug&play areas in the system can be found.
Depending on the debug monitor used, the monitor may detect that it connects to a GR740 design and
start scanning on the Debug AHB bus (this applies to GRMON2 from Cobham Gaisler). Otherwise
the address 0xEFFFF000 should be specified to the monitor. In the case where the monitor detects that
it is connected to a GR740 design, it may be necessary to force the monitor to start scanning at the
default address 0xFFFFF000 when connecting with a debug monitor through the Master I/O bus, from
which the Debug AHB bus cannot be accessed (this is not required for GRMON2).
5.2Processor register file initialisation and data scrubbing
Please refer to section 6.11.
5.3PROM-less systems and SpaceWire RMAP
The system has support for PROM less operation where system software is uploaded by an external
entity. In order to allow system software to be uploaded via RMAP the bootstrap signal GPIO[11]
should be low in order to not clock gate off the SpaceWire router after system reset. The IOMMU will
be in pass-through after reset allowing an external entity to upload software, change the processor
reset start address, and wake the processors up via the multiprocessor interrupt controller’s register
interface. In order to prevent the processor from starting execution, the external BREAK signal
should be asserted (and the DSU needs to be disabled, see bootstrap signal descriptions in section
3.1). This will also prevent the timer unit’s watchdog timer from being started. Note that the PLL
watchdog described in section 4.6 will still be active and external units must either pet this watchdog
or have the WDOGN signal disconnected from reset circuitry to prevent reset of the device.
If the system has a boot PROM available it is recommended to have the SpaceWire router gated off
after reset by setting the bootstrap signal GPIO[11] high during system reset. If router functionality
needs to be immediately available, the designer should consider disabling RMAP or enable IOMMU
protection early in the software boot process so that external entities cannot interfere with system
operation. It takes 20 microseconds for the SpaceWire links to enter run state. Before that, incoming
RMAP traffic cannot enter the system. This leaves time (4000 cycles at 200 MHz system frequency)
for the processors to disable RMAP via a register write, or to set up rudimentary IOMMU protection.
Additional information and example software is available in [SPWBT].
GR740-UM-DS, Nov 2017, Version 1.742www.cobham.com/gaisler
GR740
5.4System integrity and debug communication links
The debug communication links have unrestricted access to all parts of the system. When the Debug
AHB bus is clock gated off via the external dsu_en signal, all debug communication links will be disabled. However, the Ethernet Debug Communication Links (EDCLs) can still be enabled via the
Ethernet controllers’ register interfaces. Since the Debug AHB bus is gated off, the only path for
EDCL traffic into the system is through the IOMMU. Since EDCL traffic flows through the same
AHB master interface as normal Ethernet traffic the IOMMU may not provide adequate protection. To
ensure that EDCL traffic cannot be harmful, even if accidentally enabled, it is recommended to tie
GPIO[9:8] HIGH during system reset in order to force EDCL traffic onto the gated Debug AHB bus.
5.5Separation and ASMP configurations
The system supports running different OS instances on each of the processor cores. The use of ASMP
configurations is eased by:
•The multiprocessor interrupt controller that contains four internal interrupt controllers. This
means that each OS (up to four) can have direct access to its own interrupt controller. It is also
possible to run two SMP operating systems simultaneously.
•The availability of several general purpose timer units allows each OS to have a dedicated timer
unit.
•All peripheral registers are mapped on 4 KiB address boundaries. This allows using the system’s
memory management units to provide separation between operating systems.
•The I/O memory management unit (IOMMU) can prevent DMA capable peripheral controllers
belonging to one OS from overwriting memory areas belonging to another OS.
•The L2 cache supports replacement policies based on AHB master bus index. This means that the
L2 cache can be configured so that one processor cannot evict data allocated by accesses from
another processor.
The system does not provide full separation between operating systems. The main memory interface
and AMBA buses are shared. Since space separation is provided by processor memory management
units, it is possible for one operating system to disable the memory management unit and access memory areas assigned to another operating system.
There are also other shared resources that require all software instances that can access them to
behave properly:
•The Ethernet MDIO bus is shared. Both Ethernet controllers can access the same MDIO bus and
it is possible to use one Ethernet controller's interface to reconfigure a transceiver connected to
the other Ethernet controller. It is also possible to generate MDIO interrupts that will, if
unmasked, assert a processor interrupt.
•The General Purpose IO port provides functionality that allows use by multiple processors using
logical-AND/OR/XOR registers to change the registers. All processors that use these registers
can access the full register interface of the GPIO port and can interfere with each other.
Overview of the start-up process of a separated ASMP system:
•On power up, processor 0 starts executing.
•Processor 0 boot code sets up memory controller and other critical shared resources.
•Processor 0 sets up the IOMMU access protection vectors so that each peripheral DMA can only
access memory address space belonging to its chosen partition, and sets up the IRQ routing so
that peripheral IRQs will go to internal interrupt controller belonging to that partition.
•Processor 0 now starts all the other cores by writing to the interrupt controller.
•Each processor now runs supervisor code that sets up it's MMU page tables so that it can not
access peripherals and memory belonging to other partitions, and also so that it can only access
GR740-UM-DS, Nov 2017, Version 1.743www.cobham.com/gaisler
GR740
•Code in each partition can now run separated.
Note that the supervisor code running on the processors have to be trusted/audited to not manipulate
the MMU setup in an illegal way (through stores to MMU configuration ASI 0x19) after it has been
setup.
The level-2 cache can be either shared between the partitions, or it can be partitioned to reduce timing
interference. This may be done by using the level-2 caches single master per way option. Another
more flexible option is using the MMU address translation inside each partition to map the virtual
address space to physical address ranges that can never end up in the same level-2 sets (only physical
addresses with the same address bits 18:5 can end up in the same L2 set, and the MMU translation
allows translation of bits 31:12).
5.6Clock gating
Some peripherals are clock gated after reset (see section 4.9). Software drivers for LEON systems
generally assume that the peripheral clocks are enabled and the clock gating unit should be configured
by the bootloader or debug tool. The GRMON debugger has support for enabling all clocks when connecting to the device and clocks for specific peripherals can also be enabled via the command line
interface. Please see the GRMON user manual and operating system documentation for more information.
it's own timer and interrupt controller, and also so the processor can not write to the page tables
themselves.
GR740-UM-DS, Nov 2017, Version 1.744www.cobham.com/gaisler
GR740
5.7Software portability
5.7.1Instruction set architecture
The LEON4 processor used in this design implements the SPARC V8 instruction set architecture.
This means that any compiler that produces valid SPARC V8 executables can be used. Full instruction
set compatibility is kept with LEON2FT and LEON3FT applications. The LEON4 processor implements the SPARC V9 compare and swap (CAS) instruction. This instruction is not available on
LEON2FT and is optional for LEON3FT implementations. Programs that utilize this instruction may
therefore not be backward compatible with legacy systems. See also information about the memory
map in section 5.7.4 below.
5.7.2Peripherals
All peripherals in the design are IP cores from Cobham Gaisler’s GRLIB IP library. Standard GRLIB
software drivers can be used.
For software driver development, this document describes the capabilities offered by the GR740 system. In order to write a generic driver for a GRLIB IP core, that can be used on all systems based on
GRLIB, please also refer to the generic IP core documentation. Note, however, that the generic documentation may describe functionality not present in this implementation and that this datasheet supersedes any IP core documentation.
5.7.3Plug and play
Standard GRLIB AMBA plug&play layout is used (see sections 37 and 38). The same software routines used for typical LEON/GRLIB systems can be used.
5.7.4Memory map
Many LEON2FT and LEON3FT systems use a memory map with ROM mapped at 0x0 and RAM
mapped at 0x40000000. This design has RAM mapped at 0x0 and ROM mapped at 0xC0000000.
This does in general not affect applications running on an operating system but it has implications for
software running on bare-metal. Please refer to operating system documentation to see if and how
special consideration can be taken for systems with RAM at 0x0 and ROM at 0xC0000000.
Differences in memory map may also mean that prebuilt system software images may not be portable
between systems, and need to be rebuilt, even if software makes use of plug’n’play to discover
peripheral register addresses.
5.8Level-2 cache
The Level-2 (L2) cache controller is disabled after system reset. From a performance perspective it is
recommended that the L2 cache is enabled as early in the boot process as possible. The L2 cache contents must be invalidated when the cache is enabled, see section 9 for details.
GR740-UM-DS, Nov 2017, Version 1.745www.cobham.com/gaisler
GR740
5.9Time synchronisation
5.9.1Overview
The system includes hardware functionality for time synchronization where the system can be configured to save the current time value on certain events. The system also supports toggling GPIO lines on
timer ticks and when the current time value is latched. The event triggering the time latch and GPIO
toggle responses is fully handled in hardware without requiring involvement from software, except
for the need for initial configuration of the peripherals. This section provides an overview of the available resources, please refer to the documentation of the peripherals for further details.
The following events can trigger time latching:
•Assertion of any of the interrupt lines, includes CAN controller RX and TX events and also
events signaled via GPIO inputs since the GPIO ports can be configured to generate interrupts.
•SpaceWire Time-Code reception (signaled via router tick outputs 0 - 3, via SpaceWire router
AMBA port interrupts and via TDP controller)
•MIL-STD-1553B reception of synchronize mode command (when operating as RT).
The following events can trigger a synchronization message or action:
•GPIO lines can be toggled on GPTIMER 0 timer ticks and all events that trigger time latching
•The TDP controller can initiate transmission of SpaceWire Time-Codes
•MIL-STD-1553B message transmission can be triggered by the timer 3 tick on GPTIMER 0
(when operating as BC)
5.9.2Available timers
The following timers are available in the system:
•Processor up-counter - The up-counters accessible via internal registers %ASR22 and %ASR23,
in the processors provide a 56-bit value (see section 6.10.4). All four processors share the same
counter. The low part of this counter is also used for interrupt timestamping, the system’s trace
buffers, and performance counter timestamping.
•General purpose timer units - Five general purpose timers units (GPTIMER0 - 4) provides 21 32bit timers. GPTIMER0 has five timers where the last timer is used as the system watchdog and
GPTIMER1-4 each has four 32-bit timers. All timers units are capable of latching or setting the
time based on events on the interrupt bus or on separate inputs (called external events).
•The TDP controller provides basic time keeping functions such as Elapsed Time counter according to the CCSDS Unsegmented Code specification. It provides support for setting and sampling
the Elapsed Time counter. The Elapsed Time counter can be incremented either using an internal
frequency synthesizer or by using the external ET increment signal (mapped to GPIO[1], only
available in silicon revision 1). The TDP controller also implements the Time Distribution Protocol (TDP). The aim of TDP is to distribute and synchronize time across a SpaceWire network.
The TDP controller also provides external datation services, there are four external datation services implemented which can latch the elapsed time counter when a specified event occurs. All
external datation services share the same event inputs. The event on which time stamp must
occur is configurable individually (using mask registers) for all the external datation services
GR740-UM-DS, Nov 2017, Version 1.746www.cobham.com/gaisler
GR740
5.9.3Generation of synchronization messages and events
The systems first general purpose I/O port supports toggling external signals GPIO(15:0) based on
on-chip events. The PULSE register controlling this functionality is described in section 22.3.11. The
table below gives an overview of the connections:
Table 34. Events that can invert GPIO output
GPIO lines numberEvent
0GPTIMER 0 tick 0
1GPTIMER 0 tick 1
2GPTIMER 0 tick 2
3GPTIMER 0 tick 3
4GPTIMER 0 tick 4
5TDP controller CTICK
A pulse is generated when SpaceWire Time-Code is transmitted
when TDP controller is acting as initiator.
A pulse is also generated when a diagnostic SpaceWire Time-Code
is generated when TDP controller is acting as target.
6TDP controller JTICK
The incoming SpaceWire Time-Code provides an output pulse
when the TDP controller is acting as target, this output is used to
visualize the jitter in incoming SpaceWire Time-Codes.
7TDP controller datation pulse 0
8TDP controller datation pulse 1
9TDP controller datation pulse 2
10TDP controller datation pulse 3
11GPTIMER 0 latch disable
12GPTIMER 1 latch disable
13GPTIMER 2 latch disable
14GPTIMER 3 latch disable
15GPTIMER 4 latch disable
Note that the connection of the GPTIMER latch disable events and TDP controller datation pulses
allow the system to be configured so that any interrupt in the device can invert the value of the corresponding GPIO lines. In this case the TDP controller and timer unit act as filters since they have mask
registers to select which interrupts, or other event sources, that should cause time to be latched.
5.10Bridges, posted-writes and ERROR response propagation
The GR740 system consists of several AHB buses connected via bridges. The bridges in the system
make use of posted writes. Write operations on the slave side of a bridge will complete and then the
write operation on the master side of the bridge will be started. In case the write operation then
receives an AMBA ERROR response, this will not be propagated back to the first master since that
write operation has already completed. This means that peripherals capable of DMA and the processors may perform write accesses to unmapped areas, memory with uncorrectable errors and write-protected regions without seeing the AMBA ERROR response, which would otherwise cause a processor
to trap or a peripheral to stop processing and assert an interrupt. Instead, write errors need to be monitored using the system’s status registers.
Please note that status register monitoring is an important part of handling EDAC errors in external
memory. See [GR-AN-0004] for further information.
GR740-UM-DS, Nov 2017, Version 1.747www.cobham.com/gaisler
LEON4 is a 32-bit processor core conforming to the IEEE-1754 (SPARC V8) architecture [SPARC]
with a subset of the V8E extensions [V8E]. It is designed for embedded applications, combining high
performance with low complexity and low power consumption.
The LEON4 core has the following main features: 7-stage pipeline with Harvard architecture, separate instruction and data caches, hardware multiplier and divider, on-chip debug support and multiprocessor extensions.
The LEON4 processors in this device have fault-tolerance against SEU errors. The fault-tolerance is
focused on the protection of on-chip RAM blocks, which are used to implement IU/FPU register files
and the L1 cache memory.
6.1.1Integer unit
The LEON4 integer unit is implemented according to the SPARC V8 manual [SPARC], including
hardware multiply and divide instructions. The number of register windows is eight. The pipeline
consists of 7 stages with a separate instruction and data cache interface.
6.1.2Cache sub-system
LEON4 has a cache system consisting of a separate instruction and data cache. Both caches have
four ways, four KiB/way, and 32 bytes per line. The instruction cache maintains one valid bit per
cache line and uses streaming during line-refill to minimize refill latency. The data cache has one
valid bit per cache line, uses write-through policy and implements a double-word write-buffer. Bussnooping on the AHB bus maintains cache coherency for the data cache.
6.1.3Floating-point unit and co-processor
The LEON4 integer unit provides interfaces for the high-performance GRFPU floating-point
unit.´The floating-point processor executes in parallel with the integer unit, and does not block the
operation unless a data or resource dependency exists. The floating-point controller and floating-point
unit are further describes in sections 7 and 8.
GR740-UM-DS, Nov 2017, Version 1.748www.cobham.com/gaisler
GR740
6.1.4Memory management unit
Each processor core contains a SPARC V8 Reference Memory Management Unit (SRMMU). The
SRMMU implements the full SPARC V8 MMU specification, and provides mapping between multiple 32-bit virtual address spaces and physical memory. A three-level hardware table-walk is implemented, and the MMU has 16 instruction and 16 data fully associative TLB entries.
6.1.5On-chip debug support
The LEON4 pipeline includes functionality to allow non-intrusive debugging on target hardware. To
aid software debugging, up to four watchpoint registers can be enabled. Each register can cause a
breakpoint trap on an arbitrary instruction or data address range. When the (optional) debug support
unit is attached, the watchpoints can be used to enter debug mode. Through a debug support interface,
full access to all processor registers and caches is provided. The debug interfaces also allows single
stepping, instruction tracing and hardware breakpoint/watchpoint control. An internal trace buffer can
monitor and store executed instructions, which can later be read out via the debug interface.
6.1.6Interrupt interface
LEON4 supports the SPARC V8 interrupt model with a total of 15 asynchronous interrupts. The interrupt interface provides functionality to both generate and acknowledge interrupts.
6.1.7AMBA interface
The cache system implements an AMBA AHB master to load and store data to/from the caches. The
interface is compliant with the AMBA-2.0 standard. During line refill, incremental burst are generated to optimise the data transfer. The AMBA interface makes use of the full width of the 128-bit bus
on cache line fills. The processor also has a snoop AHB slave input port which is used to monitor the
accesses made by other masters on the processor AHB bus.
6.1.8Power-down mode
The LEON4 processor core implements a power-down mode, which halts the pipeline and caches
until the next interrupt. The processor supports clock gating during the power down period by providing a clock-enable signal to the system’s clock gating unit. A small part of the processor is always
clocked, to check for wake-up conditions and maintain cache coherency.
6.1.9Multi-processor support
LEON4 is designed to be used in multi-processor systems. Each processor has a unique index to allow
processor enumeration. The write-through caches and snooping mechanism guarantees memory
coherency in shared-memory systems.
GR740-UM-DS, Nov 2017, Version 1.749www.cobham.com/gaisler
GR740
Figure 3. LEON4 integer unit datapath diagram
alu/shift
mul/div
y
64-bit 4-port register file
D-cache
dcache read data
64
op2rs1
Y
wres
resultm_y
Decode
Execute
Memory
Write-back
rs2
rs1
rd
tbr, wim, psr
32
dcache address
e pc
30
+1
d_pc
jmpa
f_pc
Add
call/branch address
tbr
‘0’
e_pc
m_pc
w_pc
d_inst
e_inst
m_inst
w_inst
Fetch
I-cache
address
data
Register Access
x_y
xres
Exception
x_pcx_inst
r_pcr_inst
y, tbr, wim, psr
r_imm
rs3
stdata
64
dcache write data
6.2LEON4 integer unit
6.2.1Overview
The LEON4 integer unit implements the integer part of the SPARC V8 instruction set. The implementation is focused on high performance and low complexity. The LEON4 integer unit has the following
main features:
•7-stage instruction pipeline
•Separate instruction and data cache interface
•Support for eight register windows
•Hardware multiplier and Radix-2 divider (non-restoring)
•Static branch prediction
•Single-vector trapping for reduced code size
GR740-UM-DS, Nov 2017, Version 1.750www.cobham.com/gaisler
GR740
6.2.2Instruction pipeline
The LEON4 integer unit uses a single instruction issue pipeline with 7 stages:
1.FE (Instruction Fetch): If the instruction cache is enabled, the instruction is fetched from the instruction cache.
Otherwise, the fetch is forwarded to the memory controller. The instruction is valid at the end of this stage and is
latched inside the IU.
2.DE (Decode): The instruction is decoded and the CALL/Branch target addresses are generated.
3. RA (Register access): Operands are read from the register file or from internal data bypasses.
4.EX (Execute): ALU, logical, and shift operations are performed. For memory operations (e.g., LD) and for JMPL/
RETT, the address is generated.
5.ME (Memory): Data cache is accessed. Store data read out in the execution stage is written to the data cache at this
time.
6. XC (Exception) Traps and interrupts are resolved. For cache reads, the data is aligned.
7.WR (Write): The result of ALU and cache operations are written back to the register file.
Table 35 lists the cycles per instruction (assuming cache hit and no icc or load interlock):
Table 35. Instruction timing
InstructionCycles (MMU disabled)
JMPL, RETT3
SMUL/UMUL
SDIV/UDIV35
Taken Trap5
Atomic load/store5
All other instructions1
1*
* Multiplication cycle count is 1 clock (1 clock issue rate, 2 clock data latency), for the 32x32 multiplie
Additional conditions that can extend an instructions duration in the pipeline are listed in the table and
text below.
Branch interlock: When a conditional branch or trap is performed 1-2 cycles after an instruction
which modifies the condition codes, 1-2 cycles of delay is added to allow the condition to be computed. If static branch prediction is enabled, this extra delay is incurred only if the branch is not taken.
Load delay: When using data shortly after the load instruction, the second instruction will be delayed
to satisfy the pipeline’s load delay.
Mul latency: For pipelined multiplier implementations there is 1 cycle extra data latency, accessing
the result immediately after a MUL will then add one cycle pipeline delay.
Hold cycles: During cache miss processing or when blocking on the store buffer, the pipeline will be
held still until the data is ready, effectively extending the execution time of the instruction causing the
miss by the corresponding number of cycles. Note that since the whole pipeline is held still, hold
cycles will not mask load delay or interlock delays. For instance on a load cache miss followed by a
data-dependent instruction, both hold cycles and load delay will be incurred.
FPU: The floating-point unit or coprocessor may need to hold the pipeline or extend a specific
instruction.
GR740-UM-DS, Nov 2017, Version 1.751www.cobham.com/gaisler
GR740
Certain specific events that cause these types of locks and their timing are listed in table 36 below.
Table 36. Event timing
EventCycles
Instruction cache miss processing, MMU disabled3 + mem latency
Instruction cache miss processing, MMU enabled5 + mem latency
Data cache miss processing, MMU disabled (read), L2 hit3 + mem latency
Data cache miss processing, MMU disabled (write), write-buffer empty0
Data cache miss processing, MMU enabled (read)5 + mem latency
Data cache miss processing, MMU enabled (write), write-buffer empty0
Branch prediction miss, one instruction between branch and ICC setting1
Pipeline restart due to register file or cache error correction7
6.2.3SPARC Implementor’s ID
Cobham Gaisler is assigned number 15 (0xF) as SPARC implementor’s identification. This value is
hard-coded into bits 31:28 in the %psr register. The version number for LEON4 is 3 (same as for
LEON3 to provide software compatibility), which is hard-coded in to bits 27:24 of the %psr.
6.2.4Divide instructions
Full support for SPARC V8 divide instructions is provided (SDIV, UDIV, SDIVCC & UDIVCC). The
divide instructions perform a 64-by-32 bit divide and produce a 32-bit result. Rounding and overflow
detection is performed as defined in the SPARC V8 manual.
6.2.5Multiply instructions
The LEON processor supports the SPARC integer multiply instructions UMUL, SMUL UMULCC
and SMULCC. These instructions perform a 32x32-bit integer multiply, producing a 64-bit result.
SMUL and SMULCC performs signed multiply while UMUL and UMULCC performs unsigned
multiply. UMULCC and SMULCC also set the condition codes to reflect the result. The multiply
instructions are performed using a 32x32 pipelined hardware multiplier.
6.2.6Multiply and accumulate instructions
This implementation does not support multiply-and-accumulate (UMAC; SMAC) instructions.
6.2.7Compare and Swap instruction (CASA)
LEON4 implements the SPARC V9 Compare and Swap Alternative (CASA) instruction. The CASA
operates as described in the SPARC V9 manual. The instruction is privileged, except when setting
ASI = 0xA (user data).
GR740-UM-DS, Nov 2017, Version 1.752www.cobham.com/gaisler
GR740
6.2.8Branch prediction
Static branch prediction can be optionally be enabled, and reduces the penalty for branches preceded
by an instruction that modifies the integer condition codes. The predictor uses a branch-always strategy, and starts fetching instruction from the branch address. On a prediction hit, 1 or 2 clock cycles
are saved, and there is no extra penalty incurred for misprediction as long as the branch target can be
fetched from cache.
6.2.9Register file data protection
The integer and FPU register files are protected against soft errors. Data errors will then be transparently corrected without impact at application level. Correction is done for the read data value. The
error remains in the register file and will be corrected on the next write to the register file position.
6.2.10 Hardware breakpoints
The integer unit can supports four hardware breakpoints. Each breakpoint consists of a pair of ancillary state registers (see section 6.10.5). Any binary aligned address range can be watched for instruction or data access, and on a breakpoint hit, trap 0x0B is generated.
6.2.11 Instruction trace buffer
The instruction trace buffer consists of a circular buffer that stores executed instructions. This is
enabled and accessed only through the processor’s debug port via the Debug Support Unit. When
enabled, the following information is stored in real time, without affecting performance:
•Instruction address and opcode
•Instruction result
•Load/store data and address
•Trap information
•30-bit time tag
The operation and control of the trace buffer is further described in section 33.4. Note that each processor has its own trace buffer allowing simultaneous tracing of all instruction streams.
The time tag value in the trace buffer has the same time source as the up-counter described in section
6.10.4.
6.2.12 Processor configuration register
The ancillary state register 17 (%asr17) provides information on implementation-specific characteristics for the processor. This can be used to enhance the performance of software. See section 6.10.5 for
layout.
6.2.13 Exceptions
LEON4 adheres to the general SPARC trap model. The table below shows the implemented traps and
their individual priority. When PSR (processor status register) bit ET=0, an exception trap causes the
processor to halt execution and enter error mode. When processor 0 enters error mode, the external
PROC_ERRORN signal will be asserted.
GR740-UM-DS, Nov 2017, Version 1.753www.cobham.com/gaisler
GR740
Table 37. Trap allocation and priority
TrapTTPri DescriptionClass
reset0x001Power-on resetInterrupting
data_store_error0x2b2write buffer error during data storeInterrupting
instruction_access_exception 0x013Error or MMU page fault during instruction fetchPrecise
privileged_instruction0x034Execution of privileged instruction in user modePrecise
illegal_instruction0x025UNIMP or other un-implemented instructionPrecise
fp_disabled0x046FP instruction while FPU disabledPrecise
cp_disabled0x246CP instruction while Co-processor disabledPrecise
The fp_exception trap is deferred. The data_store_error is delivered as a deferred exception but is
non-resumable and therefore classed as interrupting in above table.
For data_store_error, see also the AMBA ERROR propagation description in section 5.10.
6.2.14 Single vector trapping (SVT)
Single-vector trapping (SVT) is an SPARC V8e [V8E] option to reduce code size for embedded applications. When enabled, any taken trap will always jump to the reset trap handler (%tbr.tba + 0). The
trap type will be indicated in %tbr.tt, and must be decoded by the shared trap handler. SVT is enabled
by setting bit 13 in %asr17.
GR740-UM-DS, Nov 2017, Version 1.754www.cobham.com/gaisler
GR740
6.2.15 Address space identifiers (ASI)
In addition to the address, a SPARC processor also generates an 8-bit address space identifier (ASI),
providing up to 256 separate, 32-bit address spaces. During normal operation, the LEON4 processor
accesses instructions and data using ASI 0x8 - 0xB as defined in the SPARC standard. Using the
LDA/STA instructions, alternative address spaces can be accessed. The different available ASIs are
described in section 6.9.
6.2.16 Partial WRPSR
Partial write %PSR (WRPSR) is a SPARC V8e option that allows WRPSR instructions to only affect
the %PSR.ET field. If the WRPSR instruction’s rd field is non-zero, then the WRPSR write will only
update ET.
Partial WRPSR should only be used on silicon revision 1 of the GR740 device, see section 43.2.4.
6.2.17 Power-down
The processor has a power-down feature to minimize power consumption during idle periods. The
power-down mode is entered by performing a WRASR instruction to %asr19:
wr %g0, %asr19
During power-down, the pipeline is halted until the next interrupt occurs. Signals inside the processor
pipeline and caches are then static, reducing power consumption from dynamic switching. The default
setting of the clock-gating unit is to also disable the processor and FPU clock when the processor
enters this idle mode
Note: %asr19 must always be written with the data value zero to ensure compatiblity with future
extensions.
Note: This instruction must be performed in supervisor mode with interrupts enabled.
When resuming from power-down, the pipeline will be re-filled from the point of power-down and the
first instruction following the WRASR instruction will be executed prior to taking the interrupt trap.
Up to six instructions after the WRASR instruction will be fetched (possibly with cache miss if they
are not in cache) prior to fetching the trap handler.
6.2.18 Processor reset operation
The following table indicates the reset values of the registers which are affected by system reset. See
also reset values specified for other registers, such as the cache control register in sections 6.9 and
6.10. All other registers maintain their value or are undefined.
Table 38. Processor reset values
RegisterReset value
Trap Base RegisterTrap Base Address field reset to 0xC0000000
PC (program counter)0xC0000000
nPC (next program counter)0xC0000004
PSR (processor status register)ET=0, S=1
By default, the execution will start from address 0xC0000000. This can be overridden by setting the
reset start address register on the interrupt controller.
6.2.19 Multi-processor systems
The LEON4 processor supports symmetric multi-processing (SMP) and asymmetric multi-processing
(ASMP) configurations. The ID of the processor on which the code is executing can be read out by
reading the index field of the LEON4 configuration register.
GR740-UM-DS, Nov 2017, Version 1.755www.cobham.com/gaisler
GR740
After system reset, only the first processor will start (note that this depends on the value of the external signal BREAK. If BREAK is high after system reset. The first processor will either be halted or go
into debug mode, depending on the value of external signal DSU_EN. All other processors will
remain halted in power-down mode.
After the system has been initialized, the remaining processors can be started by writing to the MP
status register, located in the multi-processor interrupt controller. The halted processors start execuing
from the reset address. Note that if the reset start address is changed (via the interrupt controller) then
the processors must be started via the interrupt controller’s Processor boot register.
6.3Cache system
6.3.1Overview
The LEON4 processor pipeline implements a Harvard architecture with separate instruction and data
buses, connected to two separate cache controllers. As long as the execution does not cause a cache
miss, the cache controllers can serve one beat of an instruction fetch and one data load/store per cycle,
keeping the pipeline running at full speed.
On cache miss, the cache controller will assert a hold signal freezing the IU pipeline, and after delivering the data the hold signal is again lifted so execution continues. For accessing the bus, the cache
controllers share the same AHB connection to the on-chip bus. Certain parts of the MMU (table walk
logic) are also shared between the two caches.
Another important component included in the data cache is the write buffer, allowing stores to proceed in parallel to executing instructions.
Cachability (memory areas that are cachable) for both caches is described in section 6.7.2.
6.3.2Cache operation
Each cache controller has two main memory blocks, the tag memory and the data memory. At each
address in the tag memory, a number of cache entries, ways, are stored for a certain set of possible
memory addresses. The data memory stores the data for the corresponding ways.
For each way, the tag memory contains the following information:
•Valid bits saying if the entry contains valid data or is free. Both caches have a single valid bit for
each cache line.
•The tag, all bits of the cached memory address that are not implied by the set
•If MMU is enabled, the context ID of the cache entry
•Check bits for detecting errors
When a read from cache is performed, the tags and data for all cache ways of the corresponding set
are read out in parallel, the tags and valid bits are compared to the desired address and the matching
way is selected. In the hit case, this is all done in the same cycle to support the full execution rate of
the processor.
In the miss case, the cache will at first deliver incorrect data. However on the following cycle, a hold
signal will be asserted to prevent the processor from proceeding with that data. After the miss has
been processed, the correct data is injected into the pipeline using a memory data strobe (mds) signal,
and afterwards the hold signal can be released. If the missed address is cacheable, then the data read in
from the cache miss will be stored into the cache, possibly replacing one of the existing ways.
In the instruction streaming case, the processor pipeline is stepped one step for every received instruction. If the processor needs extra pipeline cycles to stretch a multi-cycle instruction or due to an interlock condition (see section 6.2), or if the processor jumps/branches away, then the instruction cache
will hold the pipe, fetch the remainder of the cache line, and the pipeline will then proceed normally.
GR740-UM-DS, Nov 2017, Version 1.756www.cobham.com/gaisler
GR740
Figure 4. Cache address mapping
045
11
1231
Tag
4 KiB way, 32 bytes/line
OffsetIndex
6.3.3Address mapping
The addresses seen by the CPU are divided into tag, index and offset bits. The index is used to select
the set in the cache, therefore only a limited number of cache lines with the same index part can be
stored at one time in the cache. The tag is stored in the cache and compared upon read.
6.3.4Data cache policy
The data cache employs a write-through policy, meaning that every store made on the CPU will propagate, via the write buffer, to the bus and there are no “dirty” lines in the cache that has not yet been
written out apart from what is in the buffer. The store will also update the cache if the address is present, however a new line will not be allocated in that case.
Table 39. LEON4 Data caching behavior
OperationIn cacheCacheableBus actionCache actionLoad data
Data loadNoNoReadNo changeBus
NoYesReadLine allocated/replacedBus
Yes-NoneNo changeCache
Data load with
forced cache
miss (ASI 1)
Data load with
MMU bypass
(ASI 0x1C)
Data storeNoNoWrite (via buffer)No change(N/A)
Data store with
MMU bypass
(ASI 0x1C)
NoNoReadNo changeBus
NoYesReadLine allocated/replacedBus
Yes-ReadData updatedBus
--Read (phys addr)No changeBus
NoYesWrite (via buffer)No change(N/A)
Yes-Write (via buffer)Data updated(N/A)
--Write (via buffer,
phys addr)
No change(N/A)
6.3.5Write buffer
The data cache contains a write buffer able to hold a single 8,16,32, or 64-bit write. For half-word or
byte stores, the stored data replicated into proper byte alignment for writing to a word-addressed
device. The write is processed in the background so the system can keep executing while the write is
being processed. However, any following instruction that requires bus access will block until the write
buffer has been emptied. Loads served from cache will however not block, due to the cache policy
used there can not be a mismatch between cache data and store buffer (the effect of this behavior on
SMP systems is discussed in section 6.7).
Since the processor executes in parallel with the write buffer, a write error will not cause an exception
to the store instruction. Depending on memory and cache activity, the write cycle may not occur until
several clock cycles after the store instructions has completed. If a write error occurs, the currently
GR740-UM-DS, Nov 2017, Version 1.757www.cobham.com/gaisler
GR740
executing instruction will take trap 0x2b. This trap can be disabled using the DWT configuration (see
section 6.10.3). See also the AMBA ERROR propagation description in section 5.10.
Note: a 0x2b trap handler should flush the data cache, since a write hit would update the cache while
the memory would keep the old value due the write error
6.3.6Operating with MMU
When MMU is enabled, the virtual addresses seen by the running code no longer correspond directly
to the physical addresses on the AHB bus. The cache uses tags based on the virtual addresses, as this
avoids having to do any additional work to translate the address in the most timing-critical hit case.
However, any time a bus access needs to be made, a translation request has to be sent to the MMU to
convert the virtual address to a physical address. For the write buffer, this work is included in the
background processing of the store. The translation request to the MMU may result in memory
accesses from the MMU to perform table walk, depending on the state of the MMU.
The MMU context ID is included in the cache tags in order to allow switching between multiple
MMU contexts mapping the same virtual address to different physical addresses. Note that the cache
does not detect aliases to the same physical address so in that case the same physical address may be
cached in multiple ways (also see snooping below).
Note: The processor requires cachable areas to support wide (128-bit) bus accesses. The MMU must
not be used to mark uncacheable areas (such as AMBA plug&play and PCI memory space) as cacheable since this will violate the requirements in section 6.7.3.
6.3.7Snooping
The data cache supports AHB bus snooping. The AHB bus the processor is connected to, is monitored
for writes from other masters to an address which is in the cache. If a write is done to a cached
address, that cache line is marked invalid and the processor will be forced to fetch the (new) data from
memory the next time it is read.
For using snooping together with the MMU, an extra tag memory storing physical tags is used to
allow comparing with the physical address on the AHB bus.
The processor can snoop on itself and invalidate any other cache lines aliased to the same physical
address in case there are multiple virtual mappings to the same physical address that is being written.
However, note that this does not happen until the write occurs on the bus so the other virtual aliases
will return the old data in the meantime.
6.3.8Enabling and disabling cache
Both I and D caches are disabled after reset. They are enabled by writing to the cache control register
(see 6.10.6). Before enabling the caches after a reset they must be flushed to ensure that all tags are
marked invalid.
6.3.9Cache freeze
Each cache can be in one of three modes: disabled, enabled and frozen. If disabled, no cache operation
is performed and load and store requests are passed directly to the memory controller. If enabled, the
cache operates as described above. In the frozen state, the cache is accessed and kept in sync with the
main memory as if it was enabled, but no new lines are allocated on read misses.
If the DF or IF bit is set, the corresponding cache will be frozen when an asynchronous interrupt is
taken. This can be beneficial in real-time system to allow a more accurate calculation of worst-case
execution time for a code segment. The execution of the interrupt handler will not evict any cache
lines and when control is returned to the interrupted task, the cache state is identical to what it was
before the interrupt. If a cache has been frozen by an interrupt, it can only be enabled again by
GR740-UM-DS, Nov 2017, Version 1.758www.cobham.com/gaisler
GR740
enabling it in the CCR. This is typically done at the end of the interrupt handler before control is
returned to the interrupted task.
6.3.10 Flushing
Both instruction and data cache are flushed either by executing the FLUSH instruction, setting the FI/
FD bits in the cache control register, or by writing to certain ASI address spaces.
Cache flushing takes one clock cycle per cache set, during which the IU will not be halted, but during
which the caches are disabled. When the flush operation is completed, the cache will resume the state
(disabled, enabled or frozen) indicated in the cache control register. Diagnostic access to the cache is
not possible during a flush operation and will cause a data exception (trap=0x09) if attempted.
Note that while the SPARC V8 specifies only that the instructions pointed to by the FLUSH argument
will be flushed, the LEON4 will additionally flush the entire I and D cache (which is permitted by the
standard as the additional flushing only affects performance and not operation). While the LEON4
currently ignores the address argument, it is recommended for future compatibility to only use the
basic flush %g0 form if you want the full flush behavior.
6.3.11 Locking
Cache line locking is not supported by LEON4.
6.3.12 Diagnostic access
The cache tag and data contents can be directly accessed for diagnostics and for locking purposes via
various ASI:s, see section 6.9.5.
6.3.13 Local scratch pad RAM
Local scratch pad RAM is not supported by LEON4.
6.3.14 Fault tolerance support
The cache memories (tags and data) are protected against soft errors using byte-parity codes. On a
detected parity error, the corresponding cache (I or D) will be flushed and the data will be refetched
from external memory. This is done transparently to software execution.
6.4Memory management unit
6.4.1Overview
The memory-management unit is compatible with the SPARC V8 reference MMU (SRMMU) architecture described inthe SPARC V8 manual, appendix H.
The MMU provides address translation of both instructions and data via page tables stored in memory.When needed, the MMU will automatically access the page tables to calculate the correct physical
address. The latest translations are stored in a special cache called the translation lookaside buffer
(TLB), also referred to as Page Descriptor Cache (PDC) in the SRMMU specification. The MMU also
provides access control, making it possible to “sandbox” unpriviledged code from accessing the rest
of the system.
6.4.2MMU/Cache operation
When the MMU is disabled, the MMU is bypassed and the caches operate with physical address mapping. When the MMU is enabled, the caches tags store the virtual address and also include an 8-bit
context field. Both the tag address and context field must match to generate a cache hit. If cache
snooping is used, physical tags must be enabled for it to work when address translation is used, see
section 6.3.7.
GR740-UM-DS, Nov 2017, Version 1.759www.cobham.com/gaisler
GR740
Because the cache is virtually tagged, no extra clock cycles are needed in case of a cache load or
instruction cache hit. In case of miss or write buffer processing, a translation is required which might
add extra latency to the processing time, depending on if there is a TLB miss. TLB lookup is done at
the same time as tag lookup and therefore add no extra clock cycles.
If there is a TLB miss the page table must be traversed, resulting in up to four AMBA read accesses
and one possible writeback operation. See the SRMMU specification for the exact format of the page
table.
An MMU page fault will generate trap 0x09 for the D-cache and trap 0x01 for the I cache, and update
the MMU status registers according to table 40 and the SRMMU specification. In case of multiple
errors, they fault type values are prioritized as the SRMMU specification requires. The cache and
memory will not be modified on an MMU page fault.
Table 40. LEON4 MMU Fault Status Register, fault type values
Fault type SPARC V8 refPriorityCondition
6Internal error1Never issued by LEON SRMMU
4Translation error2AHB error response while performing table walk. Transla-
tions errors as defined in SPARC V8 manual. A translation
error caused by an AMBA ERROR response will overwrite all other errors. Other translation errors do no overwrite existing translation errors when FAV = 1.
1Invalid address error3Page table entry for address was marked invalid
3Privilege violation
error
2Protection error5
0None-No error (inside trap this means the trap occurred when
4Access denied based on page table and su status (see
SRMMU spec for how privilege and protection error are
prioritized)
fetching the actual data)
6.4.3Translation look-aside buffer (TLB)
The MMU has separate TLBs for instructions and data. The number of TLB entries (for each implemented TLB) is 16. The organisation of the TLB and number of entries is not visible to the software
and does thus not require any modification to the operating system. The TLB can be flushed using an
STA instruction to ASI 0x18, see section 6.9.6.
6.5Floating-point unit
The high-performance GRFPU operates on single- and double-precision operands, and implements all
SPARC V8 FPU operations including square root and division. The FPU is interfaced to the LEON4
pipeline using a LEON4-specific FPU controller (GRFPC) that allows FPU instructions to be executed simultaneously with integer instructions. Only in case of a data or resource dependency is the
integer pipeline held. The GRFPU is fully pipelined and allows the start of one instruction each clock
cycle, with the exception is FDIV and FSQRT which can only be executed one at a time. The FDIV
and FSQRT are however executed in a separate divide unit and do not block the FPU from performing
all other operations in parallel.
All instructions except FDIV and FSQRT has a latency of three cycles, but to improve timing, the
LEON4 FPU controller inserts an extra pipeline stage in the result forwarding path. This results in a
GR740-UM-DS, Nov 2017, Version 1.760www.cobham.com/gaisler
GR740
latency of four clock cycles at instruction level. The table below shows the GRFPU instruction timing
when used together with GRFPC:
The GRFPC controller implements the SPARC deferred trap model, and the FPU trap queue (FQ) can
contain up to 7 queued instructions when an FPU exception is taken. The version field in %fsr has the
value of 2 to signal that the processor is implemented with the GRFPU.
The GRFPU does not handle denormalized numbers as inputs and will in that case cause an fp_exception with the FPU trap type set to unfinised_FPOP (tt=2). There is a non-standard mode in the FPU
that will instead replace the denormalized inputs with zero and thus never create this condition.
6.6Co-processor interface
The coprocessor interface is unused and disabled in this device.
6.7AMBA interface
6.7.1Overview
The LEON4 processor has one AHB master interface. The types of AMBA accesses supported and
performed by the processor depend on the accessed memory area’s cachability, if the corresponding
cache is enabled, and if the accessed memory area has been marked as being on the wide bus.
Cacheable instructions are fetched with a burst of two 128-bit accesses.
The HPROT signals of the AHB bus are driven to indicate if the accesses is instruction or data, and if
it is a user or supervisor access.
Table 42. HPROT values
Type of access User/SuperHPROT
Instruction User1100
InstructionSuper1110
DataUser1101
DataSuper1111
MMUAny1101
In case of atomic accesses, a locked access will be made on the AMBA bus to guarantee atomicity as
seen from other masters on the bus.
6.7.2Cachability
The processor treats the memory areas 0x00000000 - 0x7FFFFFFF and 0xC0000000 - 0xCFFFFFFF
as cacheable. The test of the physical address space is treated as uncached.
GR740-UM-DS, Nov 2017, Version 1.761www.cobham.com/gaisler
GR740
6.7.3AMBA access size
Cacheable data is fetched in a burst 128-bit accesses Data access to uncacheable areas may only be
done with 8-, 16- and 32-bit accesses, i.e. the LDD and STD instructions may not be used. If an area is
marked as cacheable then the data cache will automatically try to use 128-bit accesses. This means
that if 128-bit accesses are unwanted and a memory area is mapped as cacheable then software should
only perform data accesses with cache bypass (ASI 0x1C) and no 64-bit loads (LDD) when accessing
the slave. One example of how to use forced cache miss for loads is given by the following function:
In the GR740 device, this may primarily be of interest when accessing the PROM area (base address
at 0xC0000000) and possibly also for using the processor to test word and sub-word accesses to the
Level-2 cache and memory controller (memory area 0x00000000 - 0x7FFFFFFF).
The processor only supports using wide accesses to memory areas that are marked as cached. This
means that LDD shall not be used for peripheral register areas.
Store instructions result in a AMBA access with size corresponding to the executed instruction, 64-bit
store instructions (STD) are always translated to 64-bit accesses (never converted into two 32-bit
stores as is done for LEON3). The table below indicates the access types used for instruction and data
accesses depending on cachability and cache configuration.
Processor
operation
Instruction
fetch
Data load <=
32-bit
Data load 64bit (LDD)
Data store <=
32-bit
Data store 64bit (STD)
1
Cached memory regions are 0x00000000 - 0x7FFFFFFF and 0xC0000000 - 0xCFFFFFFF.
2
Bus accesses for reads will only be made on L1 cache miss or on load with forced cache miss.
3
Data accesses to uncached areas may only be done with 8-, 16- and 32-bit accesses.
Area not cacheable
Burst of 32-bit read accesses
Read access with size specified by load
instruction
3
Illegal
Single 64-bit access will be performed
Store access with size specified by store instruction.
Illegal (64-bit store to 32-bit area)
64-bit store access will be performed.
1
Area is cacheable
Cache enabled
Burst of 128-bit accesses
Burst of 128-bit accesses
Single accesses can be performed via ASI 0x1C.
Burst of 128-bit accesses
64-bit store access
1
2
Cache disabled
Read access with size specified by load instruction
Single 64-bit read access
GR740-UM-DS, Nov 2017, Version 1.762www.cobham.com/gaisler
GR740
6.7.4Error handling
An AHB ERROR response received while fetching instructions will normally case an instruction
access exception (tt=0x1). However if this occurs during streaming on an address that is not needed,
the I cache controller will just not set the corresponding valid bit in the cache tag. If the IU later
fetches an instruction from the failed address, a cache miss will occur, triggering a new access to the
failed address.
An AHB ERROR response while fetching data into the data cache will normally trigger a data_access_exception trap (tt=0x9). If the error was for a part of the cache line other than what was currently
being requested by the pipeline, a trap is not generated and the valid bit for that line is not set.
An ERROR response during an MMU table walk will lead the MMU to set the fault type to Internal
error (1) and generate an instruction or data access exception, depending on which type of access that
caused the table walk.
For store operations, see also the AMBA ERROR propagation description in section 5.10.
6.8Multi-processor system support
This section gives an overview of issues when using the LEON4 in multi-processor configuration.
6.8.1Start-up
Only the first processor will start after reset, assuming that the BREAK bootstrap signal is low, and all
other processors will remain halted in power-down mode. After the system has been initialized, the
remaining processors can be started by writing to the ‘multiprocessor status register’, located in the
multiprocessor interrupt controller. The halted processors start executing from the reset address (see
section 6.2.18).
An application in a multiprocessor system can determine which processor it is executing on by checking the processor index field in the LEON4 configuration register (%asr17). As all processors typically have the same reset start address value, boot software must check the processor index and
perform processor specific setup (e.g. initialization of stack pointer) based on the value of the processor index.
It is only possible for a processor to wake other processors up via the ‘multiprocessor status register’.
Once a processor is running it cannot be reset via the interrupt controller. If software detects that one
processor is unresponsive and needs to restart the processor then the full system should be reset, for
example by triggering the system’s watchdog. In order for software to monitor that all processors in a
system are up and running it is recommended to implement a heartbeat mechanism in software.
6.8.2Shared memory model
Each processor core has it’s own separate AHB master interface and the AHB controller will arbitrate
between them to share access to the on-chip bus.
If caches are not used, the processors will form a sequentially consistent (SC) system, where every
processor will execute it’s loads, stores and atomics to memory in program order on the AHB bus and
the different processors operations will be interleaved in some order through the AHB arbitration. The
shared memory controller AHB slave is assumed to not reorder accesses so a read always returns the
latest written value to that location on the bus.
When using caches with snooping (and with physical tags if using the MMU), the shared memory will
act according to the slightly weaker SPARC Total Store Order (TSO) model. The TSO model is close
to SC, except that loads may be reordered before stores coming from the same CPU. The stores and
atomics are conceptually placed in a FIFO (see the diagrams in the SPARC standard) and the loads are
allowed to bypass the FIFO if they are not to the same address as the stores. Loaded data from other
addresses may therefore be either older or newer, with respect to the global memory order, than the
stores that have been performed by the same CPU.
GR740-UM-DS, Nov 2017, Version 1.763www.cobham.com/gaisler
GR740
In the LEON4 case this happens because cache hits are served without blocking even when there is
data in the write buffer. The loaded data will always return the stored data in case of reading the same
address, because if it is cached, the store updates the cache before being put in the write buffer, and if
it was not in cache then the load will result in a miss which waits for the write buffer to complete.
Loaded data from a different address can be older than the store if it is served by cache before the
write has completed, or newer if it results in a cache miss or if there is a long enough delay for the
store to propagate to memory before reading.
See relevant literature on shared memory systems for more information. These details are mainly of
concern for complex applications using lock-free data structures such as the Linux kernel, the recommendation for applications is to instead avoid concurrent access to shared structures by using
mutexes/semaphores based on atomic instructions, or to use message passing schemes with one-directional circular buffers.
6.8.3Memory-mapped hardware
Hardware resource (peripheral registers) are memory mapped on uncacheable address spaces. They
will be accessible from all the CPU:s in a sequentially consistent manner. Since software drivers usually expect to be “alone” accessing the peripheral and the peripheral’s register interfaces are not
designed for concurrent use by multiple masters, using a bare-C application designed for single-processor usage on multiple cores at the same time will generally not work. This can be solved by partitioning the applications so that each peripheral is only accessed by one of the CPU:s. This partitioning
also need to be done between the interrupts so the peripheral’s interrupts will be received by the correct processor.
6.9ASI assignments
6.9.1Summary
The table shows the ASI usage for LEON.
Table 43. ASI usage
ASIUsage
0x01Forced cache miss.
0x02System control registers (cache control register)
0x08, 0x09, 0x0A, 0x0BNormal cached access (replace if cacheable)
0x0CInstruction cache tags
0x0DInstruction cache data
0x0EData cache tags
0x0FData cache data
0x10Flush instruction cache (and also data cache when system is implemented with MMU)
0x11Flush data cache
0x13MMU only: Flush instruction and data cache
0x14MMU only: MMU diagnostic D context cache access (deprecated, do not use in new SWapplications)
0x15MMU only: MMU diagnostic I cache context access (deprecated, do not use in new SW applications)
0x18MMU only: Flush TLB and I/D cache
0x19MMU only: MMU registers
0x1CMMU only: MMU and cache bypass
0x1DMMU only: MMU diagnostic access (deprecated, do not use in new SW applications)
0x1EMMU only: MMU snoop tags diagnostic access
GR740-UM-DS, Nov 2017, Version 1.764www.cobham.com/gaisler
GR740
The processor implements SPARC V8E nonprivileged ASI access, accesses to ASI 0x80 - 0xFF do
not require supervisor privileges. No registers are mapped at ASI 0x80 - 0xFF and the instructions
used to access these areas can be used as trace points for software tracing. Trace filtering (see section
33.4) allows filtering of these instructions.
6.9.2ASI 0x1, Forced cache miss
ASI 1 is used for systems without cache coherency, to load data that may have changed in the background, for example by DMA units. It can also be used for other reasons, for example diagnostic purposes, to force a AHB load from memory regardless of cache state.
The address mapping of this ASI is matched with the regular address space, and if MMU is enabled
then the address will be translated normally. Stores to this ASI will perform the same way as ordinary
data stores.
For situations where you want to guarantee that the cache is not modified by the access, the MMU and
cache bypass ASI, 0x1C, can be used instead. However this is only available when MMU is implemented.
GR740-UM-DS, Nov 2017, Version 1.765www.cobham.com/gaisler
GR740
6.9.3ASI 0x2, System control registers
ASI 2 contains a few control registers that have not been assigned as ancillary state registers. These
should only be read and written using 32-bit LDA/STA instructions.
All cache registers are accessed through load/store operations to the alternate address space (LDA/
STA), using ASI = 2. The table below shows the register addresses:
Table 44. ASI 2 (system registers) address map
AddressRegister
0x00Cache control register
0x04Reserved
0x08Instruction cache configuration register
0x0CData cache configuration register
6.9.4ASI 0x8-0xB, Data/Instruction
These ASIs are assigned by the SPARC standard for normal data and instruction fetches.
Accessing the instruction ASIs explicitly via LDA/STA instructions is not supported in the LEON4
implementation. Using LDA/STA with the user/supervisor data ASI will behave as the affect the
HPROT signal emitted by the processor according to section 6.7.1, but MMU access control will still
be done according to the super-user state of the %psr register.
ASI 0xC-0xF provide diagnostic access to the instruction cache memories. These ASIs should only be
accessed by 32-bit LDA/STA instructions. These ASIs can not be used while a cache flush is in progress.
The same address bits used normally as index are used to index the cache also in the diagnostic
access. For a multi-way cache, the lowest bits above the index part, the lowest bits that would normally be used as tag, are used to select which way to read/write. The remaining address bits are don’t
cares, leading the address map to wrap around.
The tag parity and context bits can also be read out through these ASIs by setting the PS bit in the
cache configuration register. When this bit is set, the parity data is read instead of the ordinary data.
When writing the tag bits, the context bits will always be written with the current context in the MMU
control register. The parity to be written is calculated based on the supply write-value and the context
ID in the MMU control register. The parity bits can be modified via the TB field in the cache control
register.
GR740-UM-DS, Nov 2017, Version 1.766www.cobham.com/gaisler
GR740
Figure 5. ASI 0xC-0xF address mapping and data layout
045
11
1231
(don’t care)
Example for 4 KiB way, 32 bytes/line, 4 ways
Offset Index
14
Way
15
045
11
1231
(don’t care)(don’t care)Index
14
Way
15
Data diagnostic ASIs (ASI 0xD,F):
Tag diagnostic ASIs (ASI 0xC,E):
Addr:
078
9
1031
VA L I DATA G
00
Data:
031
Addr:
Data:Cached data word
031531
TPARCTXID
Parity:
1623
ReservedReserved
4
0331
DPAR
Parity:
Reserved
4
Field Definitions:
•Address Tag (ATAG) - Contains the tag address of the cache line.
•Valid (V) - When set, the cache line contains valid data. The LEON4 caches only have one valid
bit per cache line which is replicated for the whole 8-bit diagnostic field to keep software backward compatibility.
•CTXID - Context ID, used when MMU is enabled
•TPAR - Byte-wise parity of tag bits, context ID parity is XOR:ed into bit 3.
•DPAR - Byte-wise parity of data bits
6.9.6ASI 0x10, 0x11, 0x13, 0x18 - Flush
For historical reasons there are multiple ASIs that flush the cache in different ways.
Writing to ASI 0x10 will flush the entire instruction cache. If MMU is implemented in the core, both
instruction and data cache will be flushed.
Writing to ASI 0x11 will flush the data cache only.
Writing to ASI 0x13 will flush the instruction cache and data cache. Only available when MMU is
implemented.
Writing to ASI 0x18, which is available only if MMU is implemented, will flush both the MMU TLB,
the I-cache, and the D-cache. This will block execution for a few cycles while the TLB is flushed and
then continue asynchronously with the cache flushes continuing in the background.
GR740-UM-DS, Nov 2017, Version 1.767www.cobham.com/gaisler
GR740
Figure 6. Snoop cache tag layout
0112
1
1231
ATA G
PA R I V
“0000”
6.9.7ASI 0x19 - MMU registers
This ASI provides access to the MMU:s control and status registers. The following MMU registers
are implemented:
Table 45. MMU registers (ASI = 0x19)
AddressRegister
0x000MMU control register
0x100Context pointer register
0x200Context register
0x300Fault status register
0x400Fault address register
6.9.8ASI 0x1C - MMU and cache bypass
Performing an access via ASI 0x1C will act as if MMU and cache were disabled. The address will not
be translated and the cache will not be used or updated by the access.
6.9.9ASI 0x1E - MMU snoop tags diagnostic access
If the MMU has been configured to use separate snoop tags, they can be accessed via ASI 0x1E. This
is primarily useful for RAM testing, and should not be performed during normal operation. This ASI
is addressed the same way as the regular diagnostic ASI:s 0xC, 0xE, and the read/written data has the
layout as shown below:
[31:10] Address tag. The physical address tag of the cache line.
[1]:Parity. The odd parity over the data tag. Only used when processor is implemented with fault-tolerance features.
[0]: Invalid. When set, the cache line is not valid and will cause a cache miss if accessed by the processor. Only present
if fast snooping is enabled.
GR740-UM-DS, Nov 2017, Version 1.768www.cobham.com/gaisler
GR740
6.10Configuration registers
6.10.1 PSR, WIM, TBR registers
The %psr, %wim, %tbr registers are implemented as required by the SPARC V8 standard.
Table 46. %psr- Processor state register
3128 2724 2320 1914 13 12 1187 65 40
IMPLVERICCRESERVEDEC EFPILS PS ETCWP
0b11110b001100b0000000 00x01 1 00b00000
rrrrrrwrwrwrwrwrw
31: 28Implementation ID (IMPL), read-only hardwired to “1111” (15)
27: 24Implementation version (VER), read-only hardwired to “0011” (3) for LEON3/LEON4.
23: 20Integer condition codes (ICC), see sparcv8 for details
19: 14Reserved
13Enable coprocessor (EC) - read-only
12Enable floating-point (EF)
11 8Processor interrupt level (PIL) - controls the lowest IRQ number that can generate a trap
7Supervisor (S)
6Previous supervisor (PS), see SPARC V8 manual for details
5:Enable traps (ET)
4: 0Current window pointer (CWP)
Table 47. %wim - Window Invalid Mask
31870
RESERVEDWIM
0NR
rrw
31: 8RESERVED
7: 0Window Invalid Mask (WIM)
Table 48. %tbr - Trap Base Register
3112 114 30
TBATTR
Taken from interrupt controller. Default is 0xC000000
rwrr
31: 12Trap base address (TBA) - Top 20 bits used for trap table address
11: 4Trap type (TT) - Last taken trap type.
3: 0RESERVED
GR740-UM-DS, Nov 2017, Version 1.769www.cobham.com/gaisler
3: 2FP register file correctable error (FPCE) - Flag set when a correctable error has been detected in the
FP register file. Bit 1 flags uneven registers and bit 0 flags even registers.
1: 0IU register file correctable error (IUCE) - Flag set when a correctable error has been detected in the
IU register file. Bit 1 flags uneven registers and bit 0 flags even registers.
GR740-UM-DS, Nov 2017, Version 1.770www.cobham.com/gaisler
GR740
6.10.3 ASR17, LEON4 configuration register
The ancillary state register 17 (%asr17) provides information on how the LEON4 implementation has
been configured. This can be used to enhance the performance of software, or to support enumeration
in multi-processor systems. There are also a few bits that are writable to configure certain aspects of
the processor.
31: 28Processor index (INDEX) - Each LEON core gets a unique index to support enumeration. The pro-
27Disable Branch Prediction (DBP) - Disables branch prediction when set to ‘1’.
26Reserved field (R1) - This field must always be written with ’0’.
25Disable Branch Prediction on instruction cache misses (DBPM) - When set to ‘1’ this avoids instruc-
24Reserved field (R2) - This field must always be written with ’0’.
23: 18Reserved for future implementations
17Clock switching (CS) - This field is 0 to signify that this implementation does not support clock
16: 15CPU clock frequency (CF) - This field is 0 to signify that the CPU runs at the same frequency as the
14Disable write error trap (DWT) - When set, a write error trap (tt = 0x2b) will be ignored. Set to zero
13Single-vector trapping (SVT) enable - If set, will enable single-vector trapping.
12Load delay (LD) - 0 to signify that a 1-cycle load delay i s used.
11: 10FPU option (FPU) - "01” = GRFPU.
9Multiply and accumulate (M) - 0 to signify that (MAC) instructions are unsupported.
8SPARC V8 (V8) - Set to 1, to signify that the SPARC V8 multiply and divide instructions are avail-
7: 5Number of implemented watchpoints (NWP) - Value is 4.
4: 0Number of register windows (NWIN) - Number of implemented registers windows corresponds to
R1 D
B
P
R2RESERVEDCSCFDW SV LD FPUM V8NWPNWIN
B
P
M
cessors are numbered 0 - 3.
tion cache fetches (and possible MMU table walk) for predicted instructions that may be annullated.
switching.
AMBA bus.
after reset.
able.
NWIN+1. Field has value 7.
GR740-UM-DS, Nov 2017, Version 1.771www.cobham.com/gaisler
GR740
6.10.4 ASR22-23 - Up-counter
The ancillary state registers 22 and 23 (%ASR22-23) contain a internal up-counter that can be read by
software without causing any access on the on-chip bus. The number of available bits in the counter is
56 and corresponds to the DSU time tag counter. %ASR23 contains the least significant part of the
counter value and %ASR22 contains the most significant part.
The time tag value accessible in these registers is the same time tag value used for the system’s trace
buffers and for all processors. The time tag counter will increment when any of the trace buffers is
enabled, or when the time tag counter is forced to be enabled via the DSU register interface, or when
any processor has its %ASR22 Disable Up-counter (DUCNT) field set to zero. It is possible to control
if the time tag counter should increment when the processors enter debug mode, this is configured via
DSU AHB trace buffer control register’s TE and DF fields, see section 33.6.7.
The up-counter value will increment even if all processors have entered power-down mode.
Table 51. %asr22 - LEON4 Up-counter MSbs
31 3024 230
D
U
C
N
T
10*
rwr
RESERVEDUPCNT(55:32)
31Disable Up-counter (DUCNT) - Disable upcounter. When set to ‘1’ the up-counter may be disabled.
The value for the up-counter in each processor is taken from a shared timer. The shared counter is
stopped when all processors have DUCNT set to one and the time tag counter in the DSU is disabled.
When cleared, the counter will increment each processor clock cycle. Default (reset) value is ‘1’.
30: 24RESERVED
23: 0Counter value (UPCNT(62:32)) - Most significant bits of internal up-counter. Counter is reset to 0 at
reset but may start counting due to conditions described for the DUCNT field.
Table 52. %asr23 - LEON4 Up-counter LSbs
310
UPCNT(31:0)
*
r
31: 0Counter value (UPCNT(31:0)) - Least significant bits of internal up-counter. Counter is reset to 0 at
reset but may start counting due to conditions described for the DUCNT field in %asr22.
GR740-UM-DS, Nov 2017, Version 1.772www.cobham.com/gaisler
Each breakpoint consists of a pair of ancillary state registers (%asr24/25, %asr26/27, %asr28/29 and
%asr30/31) registers; one with the break address and one with a mask:
31: 2Watchpoint mask (WMASK) - Bit mask controlling which bits to check (1) or ignore (0) for match
1Break on data load (DL) - Break on data load from the specified address/mask combination
0Break on data store (DS) - Break on data store to the specified address/mask comination
Note: Setting IF=DL=DS=0 disables the breakpoint
When there is a hardware watchpoint match and DL or DS is set then trap 0x0B will be generated.
Hardware watchpoints can be used with or without the LEON4 debug support unit (DSU) enabled.
GR740-UM-DS, Nov 2017, Version 1.773www.cobham.com/gaisler
GR740
6.10.6 Cache control register
The cache control register located at ASI 0x2, offset 0, contains control and status registers for the I
and D cache.
Table 55. ASI 0x2, 0x00 - CCR - Cache control register
15Instruction cache flush pending (IP). This bit is set when an instruction cache flush operation is in
progress
14Data cache flush pending (DP). This bit is set when an data cache flush operation is in progress.
13: 12Instruction Tag Errors (ITE) - Number of detected parity errors in the instruction tag cache.
11: 10Instruction Data Errors (IDE) - Number of detected parity errors in the instruction data cache.
9: 8Data Tag Errors (DTE) - Number of detected parity errors in the data tag cache.
7:6Data Data Errors (DDE) - Number of detected parity errors in the data data cache.
5Data Cache Freeze on Interrupt (DF) - If set, the data cache will automatically be frozen when an
asynchronous interrupt is taken.
4Instruction Cache Freeze on Interrupt (IF) - If set, the instruction cache will automatically be frozen
when an asynchronous interrupt is taken.
3:2Data Cache state (DCS) - Indicates the current data cache state according to the following: X0= dis-
abled, 01 = frozen, 11 = enabled.
1:0Instruction Cache state (ICS) - Indicates the current data cache state according to the following: X0=
disabled, 01 = frozen, 11 = enabled.
GR740-UM-DS, Nov 2017, Version 1.774www.cobham.com/gaisler
GR740
6.10.7 I-cache and D-cache configuration registers
The configuration of the two caches if defined in two registers: the instruction and data configuration
registers. These registers are read-only, except for the REPL field that can be written, and indicate the
size and configuration of the caches. They are located under ASI 2 at offset 8 and 12.
Table 56. ASI 0x2, 0x08 and 0x09C - CCFG - Cache configration registers
31: 28MMU Implementation ID (IMPL) - Hardcoded to “0000”
27: 24MMU Version ID (VER) - Hardcoded to “0000”.
23: 21
20: 18
Number of ITLB entries (ITLB) - The number of ITLB entries is calculated as 2
Number of DTLB entries (DTLB) - The number of DTLB entries is calculated as 2
17: 16RESERVED
15TLB disable (TD) - When set to 1, the TLB will be disabled and each data access will generate an
MMU page table walk. The TLB should not be disabled on GR740 silicon revision 0, see section
43.2.9.
14Separate TLB (ST) - This bit is set to 1 to signify that separate instruction and data TLBs are imple-
mented
13: 8RESERVED
7Partial Store Ordering (PSO) - This field is writable but does not have an effect on processor opera-
tion.
6: 2RESERVED
1No Fault (NF) - When NF= 0, any fault detected by the MMU causes FSR and FAR to be updated
and causes a fault to be generated to the processor. When NF= 1, a fault on an access to ASI 9 is handled as when NF= 0; a fault on an access to any other ASI causes FSR and FAR to be updated but no
fault is generated to the processor.
GR740-UM-DS, Nov 2017, Version 1.776www.cobham.com/gaisler
GR740
6.10.9 MMU context pointer and context registers
The MMU context pointer register is located in ASI 0x19 offset 0x100 and the MMU context register
is located in ASI 0x19 offset 0x200. They together determine the location of the root page table
descriptor for the current context. Their definition follow the SRMMU specification in the SPARC V8
manual with layouts shown below.
In the LEON4, the context bits are OR:ed with the lower MMU context pointer bits when calculating
the address, so one can use less context bits to reduce the size/alignment requirements for the context
table.
GR740-UM-DS, Nov 2017, Version 1.777www.cobham.com/gaisler
GR740
6.10.10 MMU fault status register
The MMU fault status register is located in ASI 0x19 offset 0x300, and the definition is based on the
SRMMU specification in the SPARC V8 manual. The SPARC V8 specifies that the fault status register should be cleared on read, on the LEON4 only the FAV bit is cleared on read. The FAV bit is
always set on error in the LEON4 implementation, so it can be used as a valid bit for the other fields.
Table 60. ASI 0x19, offset 0x300 - FSR - MMU Fault Status Register
3118 1710 98 75 42 1 0
RESERVEDEBELATFTF
0000000
rrrrrrr
31: 18RESERVED
17: 10External bus error (EBE) - Never set on the LEON4
9: 8Level (L) - Level of page table entry causing the fault
7: 5Access type (AT) - See V8 standard
4: 2Fault type (FT) - See table 40.
1Fault address valid (FAV) - Cleared on read, always written to 1 on fault
0Overwrite (W) - Multiple faults of the same priority encountered
O
W
A
V
6.10.11 MMU fault address register
The MMU fault address register is located in ASI 0x19 offset 0x400, and the definition follows the
SRMMU specification in the SPARC V8 manual..
Table 61. ASI 0x19, offset 0x400 - FAR - MMU Fault Address Register
3112 110
31: 12Fault Address (FAULT ADDRESS) - Top bits of virtual address causing translation fault
11: 0RESERVED
6.11Software considerations
6.11.1 Register file initialization on power up
It is recommended that the boot code for the processor writes all registers in the IU and FPU register
files before launching the main application. This allows software to be portable to both FT and nonFT versions of the LEON3 and LEON4 processors.
6.11.2 Start-up
RESERVED
NR0
rr
After reset, the caches are disabled and the cache control register (CCR) is 0. Before the caches may
be enabled, a flush operation must be performed to initialized (clear) the tags and valid bits. A suitable
assembly sequence could be:
flush
set 0x81000f, %g1
sta %g1, [%g0] 2
GR740-UM-DS, Nov 2017, Version 1.778www.cobham.com/gaisler
GR740
6.11.3 Data scrubbing
There is generally no need to perform data scrubbing on either IU/FPU register files or the cache
memory. During normal operation, the active part of the IU/FPU register files will be flushed to memory on each task switch. This will cause all registers to be checked and corrected if necessary. Since
most real-time operating systems performs several task switches per second, the data in the register
files will be frequently refreshed.
The similar situation arises for the cache memory. In most applications, the cache memory is significantly smaller than the full application image, and the cache contents is gradually replaced as part of
normal operation. For very small programs, the only risk of error build-up is if a part of the application is resident in the cache but not executed for a long period of time. In such cases, executing a
cache flush instruction periodically (e.g. once per minute) is sufficient to refresh the cache contents.
6.11.4 Other considerations
Please see the application note Handling of External Memory EDAC Errors in LEON/GRLIB systems
[GR-AN-0004].
GR740-UM-DS, Nov 2017, Version 1.779www.cobham.com/gaisler
GR740
7Floating-point Control Unit
The GRFPU Control Unit (GRFPC) is used to attach the GRFPU to the LEON integer unit (IU).
GRFPC performs scheduling, decoding and dispatching of the FP operations to the GRFPU as well as
managing the floating-point register file, the floating-point state register (FSR) and the floating-point
deferred-trap queue (FQ). Floating-point operations are executed in parallel with other integer instructions, the LEON integer pipeline is only stalled in case of operand or resource conflicts.
Each of the four LEON4 processor cores in the system integrates a GRFPU control unit that connects
to one GRFPU unit per processor. Each processor has its own dedicated FPU.
7.1Floating-Point register file
The GRFPU floating-point register file contains 32 32-bit floating-point registers (%f0-%f31). The
register file is accessed by floating-point load and store instructions (LDF, LDDF, STD, STDF) and
floating-point operate instructions (FPop).
7.2Floating-Point State Register (FSR)
The GRFPC manages the floating-point state register (FSR) containing FPU mode and status information. All fields of the FSR register as defined in SPARC V8 specification are implemented and
managed by the GRFPU conforming to the SPARC V8 specification and the IEEE-754 standard.
Implementation-specific parts of the FSR managing are the NS (non-standard) bit and ftt field.
If the NS (non-standard) bit of the FSR register is set, all floating-point operations will be performed
in non-standard mode as described in section 8.2.6. When the NS bit is cleared all operations are performed in standard IEEE-compliant mode.
Following floating-point trap types never occur and are therefore never set in the ftt field:
- unimplemented_FPop: all FPop operations are implemented
- hardware_error: non-resumable hardware error
- invalid_fp_register: no check that double-precision register is 0 mod 2 is performed
GRFPU implements the qne bit of the FSR register which reads 0 if the floating-point deferred-queue
(FQ) is empty and 1 otherwise.
The FSR is accessed using LDFSR and STFSR instructions.
7.3Floating-Point Exceptions and Floating-Point Deferred-Queue
GRFPU implements the SPARC deferred trap model for floating-point exceptions (fp_exception). A
floating-point exception is caused by a floating-point instruction performing an operation resulting in
one of following conditions:
•an operation raises IEEE floating-point exception (ftt = IEEE_754_exception) e.g. executing
invalid operation such as 0/0 while the NVM bit of the TEM field id set (invalid exception
enabled).
•an operation on denormalized floating-point numbers (in standard IEEE-mode) raises unfinished_FPop floating-point exception
•sequence error: abnormal error condition in the FPU due to the erroneous use of the floatingpoint instructions in the supervisor software.
The trap is deferred to one of the floating-point instructions (FPop, FP load/store, FP branch) following the trap-inducing instruction (note that this may not be next floating-point instruction in the program order due to exception-detecting mechanism and out-of-order instruction execution in the
GRFPC). When the trap is taken the floating-point deferred-queue (FQ) contains the trap-inducing
instruction and up to seven FPop instructions that were dispatched in the GRFPC but did not complete.
GR740-UM-DS, Nov 2017, Version 1.780www.cobham.com/gaisler
GR740
After the trap is taken the qne bit of the FSR is set and remains set until the FQ is emptied. The
STDFQ instruction reads a double-word from the floating-point deferred queue, the first word is the
address of the instruction and the second word is the instruction code. All instructions in the FQ are
FPop type instructions. The first access to the FQ gives a double-word with the trap-inducing instruction, following double-words contain pending floating-point instructions. Supervisor software should
emulate FPops from the FQ in the same order as they were read from the FQ.
Note that instructions in the FQ may not appear in the same order as the program order since GRFPU
executes floating-point instructions out-of-order. A floating-point trap is never deferred past an
instruction specifying source registers, destination registers or condition codes that could be modified
by the trap-inducing instruction. Execution or emulation of instructions in the FQ by the supervisor
software gives therefore the same FPU state as if the instructions were executed in the program order.
GR740-UM-DS, Nov 2017, Version 1.781www.cobham.com/gaisler
GR740
operand1
opid
opcode
operand2
start
9
6
64
64
round
flushid
2
6
flush
result
resid
allow
except
ready
3
6
64
6
cc
2
nonstd
Pipelined execution
unit
Iteration unit
GRFPU
clk
reset
8High-performance IEEE-754 Floating-point Unit
8.1Overview
GRFPU is a high-performance FPU implementing floating-point operations as defined in the IEEE
Standard for Binary Floating-Point Arithmetic (IEEE-754) and the SPARC V8 standard (IEEE-1754).
Supported formats are single and double precision floating-point numbers. The advanced design combines two execution units, a fully pipelined unit for execution of the most common FP operations and
a non-blocking unit for execution of divide and square-root operations.
The logical view of the GRFPU is shown in figure 7.
Figure 7. GRFPU Logical View
8.2Functional description
8.2.1Floating-point number formats
GRFPU handles floating-point numbers in single or double precision format as defined in the IEEE754 standard with exception for denormalized numbers. See section 8.2.5 for more information on
denormalized numbers.
8.2.2FP operations
GRFPU supports four types of floating-point operations: arithmetic, compare, convert and move. The
operations implement all FP instructions specified by SPARC V8 instruction set, and most of the
operations defined in IEEE-754. All operations are summarized in table 62.
GR740-UM-DS, Nov 2017, Version 1.782www.cobham.com/gaisler
Multiplication, FSMULD gives
exact double-precision product of
two single-precision operands.
Division
Integer to floating-point conversion
The result is rounded in round-tozero mode.
Rounding according to RND input.
Conversion between floating-point
formats
exception is generated if either operand is a signaling NaN.
exception is generated if either operand is a NaN (quiet or signaling).
put.
Arithmetic operations include addition, subtraction, multiplication, division and square-root. Each
arithmetic operation can be performed in single or double precision formats. Arithmetic operations
have one clock cycle throughput and a latency of four clock cycles, except for divide and square-root
operations, which have a throughput of 16 - 25 clock cycles and latency of 16 - 25 clock cycles (see
GR740-UM-DS, Nov 2017, Version 1.783www.cobham.com/gaisler
GR740
table 63). Add, sub and multiply can be started on every clock cycle, providing high throughput for
these common operations. Divide and square-root operations have lower throughput and higher
latency due to complexity of the algorithms, but are executed in parallel with all other FP operations
in a non-blocking iteration unit. Out-of-order execution of operations with different latencies is easily
handled through the GRFPU interface by assigning an id to every operation which appears with the
result on the output once the operation is completed.
* Throughput and latency are data dependant with two possible cases with equal statistical possibility.
14
Conversion operations execute in a pipelined execution unit and have throughput of one clock cycle
and latency of four clock cycles. Conversion operations provide conversion between different floating-point numbers and between floating-point numbers and integers.
Comparison functions offering two different types of quiet Not-a-Numbers (QNaNs) handling are
provided. Move, negate and absolute value are also provided. These operations do not ever generate
unfinished exception (unfinished exception is never signaled since compare, negate, absolute value
and move handle denormalized numbers).
8.2.3Exceptions
GRFPU detects all exceptions defined by the IEEE-754 standard. This includes detection of Invalid
Operation (NV), Overflow (OF), Underflow (UF), Division-by-Zero (DZ) and Inexact (NX) exception conditions. Generation of special results such as NaNs and infinity is also implemented. Overflow (OF) and underflow (UF) are detected before rounding. If an operation underflows the result is
flushed to zero (GRFPU does not support denormalized numbers or gradual underflow). A special
Unfinished exception (UNF) is signaled when one of the operands is a denormalized number which is
not handled by the arithmetic and conversion operations.
8.2.4Rounding
All four rounding modes defined in the IEEE-754 standard are supported: round-to-nearest, round-to+inf, round-to--inf and round-to-zero.
8.2.5Denormalized numbers
Denormalized numbers are not handled by the GRFPU arithmetic and conversion operations. A system (microprocessor) with the GRFPU could emulate rare cases of operations on denormals in software using non-FPU operations. A special Unfinished exception (UNF) is used to signal an arithmetic
or conversion operation on the denormalized numbers. Compare, move, negate and absolute value
operations can handle denormalized numbers and do not raise the unfinished exception. GRFPU does
not generate any denormalized numbers during arithmetic and conversion operations on normalized
numbers. If the infinitely precise result of an operation is a tiny number (smaller than minimum value
representable in normal format) the result is flushed to zero (with underflow and inexact flags set).
GR740-UM-DS, Nov 2017, Version 1.784www.cobham.com/gaisler
GR740
8.2.6Non-standard Mode
GRFPU can operate in a non-standard mode where all denormalized operands to arithmetic and conversion operations are treated as (correctly signed) zeroes. Calculations are performed on zero operands instead of the denormalized numbers obeying all rules of the floating-point arithmetics including
rounding of the results and detecting exceptions.
8.2.7NaNs
GRFPU supports handling of Not-a-Numbers (NaNs) as defined in the IEEE-754 standard. Operations on signaling NaNs (SNaNs) and invalid operations (e.g. inf/inf) generate the Invalid exception
and deliver QNaN_GEN as result. Operations on Quiet NaNs (QNaNs), except for FCMPES and
FCMPED, do not raise any exceptions and propagate QNaNs through the FP operations by delivering
NaN-results according to table 64. QNaN_GEN is 0x7fffe00000000000 for double precision results
and 0x7fff0000 for single precision results.
Table 64. : Operations on NaNs
Operand 2
Operand 1
noneFPQNaN2QNaN_GEN
FPFPQNaN2QNaN_GEN
QNaN1QNaN1QNaN2QNaN_GEN
SNaN1QNaN_GENQNaN_GENQNaN_GEN
FPQNaN2SNaN2
GR740-UM-DS, Nov 2017, Version 1.785www.cobham.com/gaisler
GR740
Figure 8. Block diagram
CPU
Memory
L2C
CPU
Controller
Memory AHB bus
Processor AHB bus
9Level 2 Cache controller
9.1Overview
The L2 cache works as an AHB to AHB bridge, caching the data that is read or written via the bridge.
The cache is a unified cache and data may exist in both the processor Level-1 caches and the Level-2
cache, or only in a Level-1 or the Level-2 cache. A front-side AHB interface is connected to the Processor AHB bus, while a backend AHB interface is connected to the Memory AHB bus. Figure 8
shows a system block diagram for the cache controller.
Note that the L2 cache is disabled after reset and should be enabled by boot software.
9.2Operation
The Level-2 cache is implemented as a multi-way cache with an associativity of four. The replacement policy can be configured as: LRU (least-recently-used), pseudo-random or master-index (where
the way to replace is determine by the master index). The way size is 512 KiB with a line size of 32
bytes.
9.2.1Replacement policy
The cache supports three different replacement policies: LRU (least-recently-used), (pseudo-) random
and master-index. The reset value for replacement policy is LRU.
With the master-index replacement policy, master 0 would replace way 1, master 1 would replace way
2 and so on. For master indexes corresponding to a way number larger than the number of implemented ways there are two options to determine which way to replace. One option is to map all these
master index to a specific way. This is done by specifying this way in the index-replace field in the
control register and selecting this option in the replacement policy field also located in the control register. It is not allowed to select a locked way in the index-replace field. The second option is to replace
way = ((master index) modulus (number of ways)). This option can be selected in the replacement
policy field.
9.2.2Write policy
The cache can be configured to operate as write-through or copy-back cache. Before changing the
write policy to write-through, the cache has to be disabled and flushed (to write back dirty cache lines
to memory). This can be done by setting the Cache disable bit when issue a flush all command. The
GR740-UM-DS, Nov 2017, Version 1.786www.cobham.com/gaisler
GR740
write policy is controlled via the cache control register. More fine-grained control can also be
obtained by enabling the MTRR registers (see text below).
9.2.3Memory type range registers
The memory type range registers (MTRR) are used to control the cache operation with respect to the
address. Each MTRR can define an area in memory to be uncached, write-through or write-protected.
Each MTRR register consist of a 14-bit address field, a 14-bit mask and two 2-bit control fields. The
address field is compared to the 14 most significant bits of the cache address, masked by the mask
field. If the unmasked bits are equal to the address, an MTRR hit is declared. The cache operation is
then performed according to the control fields (see register descriptions). If no hit is declared or if the
MTRR is disabled, cache operation takes place according to the cache control register. The number of
implemented MTRRs is sixteen. When changing the value of any MTRR register, the cache must be
disabled and flushed (this can be done by setting the Cache disable bit when issuing a flush all command).
Note that the write-protection provided via the MTRR registers is enforced even if the cache is disabled.
9.2.4Cachability
The cache considers the address range 0x00000000 - 0x7FFFFFFF to be cachable. The cache can also
be configured to use the HPROT signal to override the default cachable area. An access can only be
redefined as non-cachable by the HPROT signal. See table 65 for information on how HPROT can
change the access cachability within a cachable address area. The AMBA AHB signal HPROT[3]
defines the access cacheable when active high and the AMBA AHB signal HPROT[2] defines the
access as bufferable when active high.
Read missMemory accessMemory accessCache allocation and Memory access
Write hitCache and Memory accessCache accessCache access
Write miss Memory accessMemory accessCache allocation
* When the HPROT-Read-Hit-Bypass bit is set in the cache control register this will generate a Memory access.
9.2.5Cache tag entry
Table 66 shows the different fields of the cache tag entry for a cache with a way size of 512 KiB.
Table 66. L2C Cache tag entry
311918109876540
TAG000000ValidDirtyRESLRU
31 : 19Address Tag (TAG) - Contains the address of the data held in the cache line.
9 : 8Valid bits. When set, the corresponding sub-block of the cache line contains valid data. Valid bit 0
corresponds to the lower 16 bytes sub-block (with offset 1) in the cache line and valid bit 1 corresponds to the upper 16 bytes sub-block (with offset 0) in the cache line.
7 : 6Dirty bits When set, this sub-block contains modified data.
5RESERVED
4 : 0LRU bits
GR740-UM-DS, Nov 2017, Version 1.787www.cobham.com/gaisler
GR740
9.2.6AHB address mapping
The AHB slave interface occupies three AHB address ranges. The first AHB memory bar is used for
memory/cache data access and is mapped at 0x00000000 - 0x7FFFFFFF. The second AHB memory
bar is used for access to configuration registers and the diagnostic interface and is mapped at
0xF0800000 - 0xF08FFFFF. The last AHB memory bar is used to map the IO area of the backend
AHB bus (to access the plug&play information on that bus) and maps the Memory AHB bus area
0xFFE00000 - 0xFFEFFFFF.
9.2.7Memory protection and Error handling
The L2 cache provides Error Detection And Correction (EDAC) protection for the data and tag memory. One error can be corrected and two errors can be detected with the use of a (39, 32, 7) BCH code.
The EDAC functionality can dynamically be enabled or disabled. Before being enabled the cache
should be flushed. The dirty and valid bits fore each cache line is implemented with TMR. When
EDAC error or backend AHB error or write-protection hit in a MTRR register is detected, the error
status register is updated to store the error type. The address which caused the error is also saved in
the error address register. The error types is prioritised in the way that a uncorrected EDAC error will
overwrite any other previously stored error in the error status register. In all other cases, the error status register has to be cleared before a new error can be stored. Each error type (correctable-, uncorrectable EDAC error, write-protection hit, backend AHB error) has a pending register bit. When set
and this error is unmasked, a interrupt is generated. When an uncorrectable error is detected in the
read data, the cache will respond with an AHB error. AHB error responses can also be enabled for
access that match a stored error in the error status register. Error detection is done per cache line. The
cache also provides a correctable error counter accessible via the error status register. After power-up
the error status register needs to be cleared before any valid data can be read out.
Table 67. Cache action on detected EDAC error
Access/Error typeCache-line not dirtyCache-line dirty
Read, Correctable
Tag error
Read, Uncorrectable
Tag error
Write, Correctable
Tag error
Write, Uncorrectable Tag error
Read, Correctable
Data error
Read, Uncorrectable
Data error
Write (<32-bit), Correctable Data error
Write (<32-bit),
Uncorrectable Data
error
Tag is corrected before read is handled, Error status is updated with a correctable error.
Cache-line invalidated before read is handled,
Error status is updated with a correctable error.
Tag is corrected before write is handed, Error status is updated with a correctable error.
Cache-line invalidated before write is handled,
Error status is updated with a correctable error.
Cache-data is corrected and updated, Error status
is updated with a correctable error. AHB access
is not affected.
Cache-line is invalidated, Error status is updated
with a correctable error. AHB access is terminated with retry.
Cache-data is corrected and updated, Error status
is updated with a correctable error. AHB access
is not affected.
Cache-line is re-fetched from memory, Error status is updated with a correctable error. AHB
access is not affected.
Tag is corrected before read is handled, Error
status is updated with a correctable error.
Cache-line invalidated before read is handled,
Error status is updated with a uncorrectable
error. Cache data is lost.
Tag is corrected before write is handled, Error
status is updated with a correctable error.
Cache-line invalidated before write is handled,
Error status is updated with a uncorrectable
error. Cache data is lost.
Cache-data is corrected and updated, Error status is updated with a correctable error. AHB
access is not affected.
Cache-line is invalidated, Error status is
updated with a uncorrectable error. AHB
access is terminated with error.
Cache-data is corrected and updated, Error status is updated with a correctable error. AHB
access is not affected.
Cache-line is invalidated, Error status is
updated with a uncorrectable error. AHB
access write data and cache data is lost.
GR740-UM-DS, Nov 2017, Version 1.788www.cobham.com/gaisler
GR740
9.3Operation
9.2.8Scrubber
The cache is implemented with an internal memory scrubber to prevent build-up of errors in the cache
memories. The scrubber is controlled via two registers in the cache configuration interface. To scrub
one specific cache line the index and way of the line is set in the scrub control register. The scrub
operation is started by setting the the pending bit to 1. The scrubber can also be configured to continuously loop through and scrub each cache line by setting the enabled bit to 1. In this mode, the delay
between the scrub operation on each cache line is determine by the scrub delay register (in clock
cycles).
9.2.9Locked way
One or more ways can be configured to be locked (not replaced). The number of ways that should be
locked is configured by the locked-way field in the control register. The way to be locked is starting
with the uppermost way (for a 4-way associative cache way 4 is the first locked way, way 3 the second, and so on). After a way is locked, the cache-way has to be flushed with the “way flush” function
to update the tag to match the desired locked address. During this “way flush” operation, the data can
also be fetched from memory.
9.3.1Read
A cachable read access to the cache results in a tag lookup to determine if the requested data is located
in the cache memory. For a hit (requested data is in the cache) the data is read from the cache and no
read access is issued to the memory. If the requested data is not in the cache (cache miss), the cache
controller issues a read access to the memory controller to fetch the cache line containing the
requested data. The replacement policy determines which cache line in a multi-way configuration that
should be replaced and its tag is updated. If the replaced cache line is modified (dirty) this data is
stored in a write buffer and after the requested data is fetched from memory the replaced cache line is
written to memory.
For a non-cachable read access to the cache, the cache controller can issue a single read access or a
burst read access to fetch the data from memory. The access type is determine by how the cache is
configured regarding hprot support and bypass line fetch in the access control register. The data is
stored in a read buffer and the state of the cache is not modified in any way.
The cache will insert wait-states until the read access is determined to be a cache hit or miss. For a
cache hit the data is then delivered. For a miss the cache can insert wait-states during the memory
fetch or issue a AMBA SPLIT (depending on how the cache is configured).
9.3.2Write
A cachable write access to the cache results in a tag lookup to determine if the cache line is present in
the cache. For a hit the cache line is updated. No access is issued to the memory for a copy-back configuration. When the cache is configured as a write-through cache, each write access is also issued
towards memory. For a miss, the replacement policy determines which cache line in a multi-way configuration that should be replaced and updates its tag. If the replaced cache line is dirty, it is stored in
a write buffer to be written back to the memory. The new cache line is updated with the data from the
write access and for a non-128-bit access the rest of the cache line is fetched from memory. Last the
replaced cache line is written to memory (when copy-back policy is used and the replaced cache line
was marked dirty). When the cache is configured as a write-through cache, no cache lines are marked
as dirty and no cache line needs to be written back to memory. Instead the write access is issued
towards the memory as well. A new cache line is allocated on a miss for a cacheable write access
independent of write policy (copy-back or write-through).
GR740-UM-DS, Nov 2017, Version 1.789www.cobham.com/gaisler
GR740
For a non-cachable write access to the cache, the data is stored in a write buffer and the cache controller issue single write accesses to write the data to memory. The state of the cache is unmodified during
this access.
The cache can accept a non sub-word write hit access every clock cycle. When the cache is unable to
accept a new write access the cache inserts wait-states or issue a AMBA SPLIT response depending
on how the cache is configured.
9.3.3Cache flushing
The cache can be flushed by accessing a cache flush register. There are three flush modes: invalidate
(reset valid bits), write back (write back dirty cache lines to memory, but no invalidation of the cache
content) and flush (write back dirty cache lines to memory and invalidate the cache line). The flush
command can be applied to the entire cache, one way or to only one cache line. The cache line to be
flushed can be addresses in two ways: direct address (specify way and line address) and memory
address (specify which memory address that should be flushed in the cache. The controller will make
a cache lookup for the specified address and on a hit, flush that cache line). When the entire cache is
flushed the Memory Address field should be set to zero. To invalidate a cache line takes 5 clock
cycles. If the cache line needs to be written back to memory one additional clock cycle is needed plus
the memory write latency. When the whole cache is flushed the invalidation of the first cache line
takes 5 clock cycles, after this one line can be invalidated each clock cycle. When a cache line needs
to be written back to memory this memory access will be stored in an access buffer. If the buffer is
full, the invalidation of the next cache line will stall until a slot in the buffer has opened up. If the
cache also should be disabled after the flush is complete, it is recommended to set the cache disable
bit together with the flush command in the Fush set/index register instead of writing ‘0’ to the cache
enable bit in the cache control register.
Note that after a processor (or any other AHB master) has initiated a flush the processor is not blocked
by the flush unless it writes or requests data from the Level-2 cache. The cache blocks all accesses
(responds with AMBA SPLIT or wait-states depending on cache configuration) until the flush is complete.
9.3.4Disabling Cache
To be able to safely disable the cache when it is being accessed, the cache need to be disabled and
flushed at the same time. This is accomplished by setting the cache disable bit when issue the flush
command.
9.3.5Diagnostic cache access
The diagnostic interface can be used for RAM block testing and direct access to the cache tag, cache
data content and EDAC check bits. The read-check-bits field in the error status/control register selects
if data content or the EDAC check bits should be read out. On writes, the EDAC check bits can be
selected from the data-check-bit or tag-check-bit register. These register can also be XOR:ed with the
correct check bits on a write. See the error status/control register for how this is done.
9.3.6Error injection
Error injection can be performed for data and tag lines either by modifying the value or the checkbits.
The checkbits can be modified by defining a mask that will be XOR:ed with the generated checkbits
or by defining the full checkbits to be written via the tag-check-bit register or data-check-bit-registers.
The value can be modified by performing a diagnostic access while keeping the existing checkbits.
EDAC checkbits can be modified on a regular cache access by setting the xor-check-bit field in the
error status/control register the data EDAC check bits will be XOR:ed with the data-check-bit register
on the next write, or the tag EDAC check bits will be XOR:ed with the tag-check-bit register on the
next tag replacement. The tag check bit manipulation is only done if the tag-check-bit register is not
zero. The xor-check-bit is reset on the next tag replacement or data write. Errors can also be injected
GR740-UM-DS, Nov 2017, Version 1.790www.cobham.com/gaisler
GR740
by writing an address together with the inject bit to the “Error injection” register. This will XOR the
check-bits for the specified address with the data-check-bit register. If the specified address in not
cached, the cache contents will be unchanged.
9.3.7AHB slave interface
The cache can accept 8-bit (byte), 16-bit (half word), 32-bit (word), 64-bit, and 128-bit single
accesses and also 32-bit, 64-bit, and 128-bit burst accesses. For an access during a flush operation, the
cache will respond with an AHB SPLIT response or with wait-states. For an uncorrectable error or a
backend AHB error on a read access, the cache will respond with an AMBA ERROR response. For a
correctable data error which require a cache line to be re-fetched from memory the cache will respond
with a AMBA RETRY response.
9.3.8AHB master interface
The master interface is the cache’s connection to the memory controller. During cache line fetch, the
controller can issue either a 32-bit, 64-bit or 128-bit burst access. For a non cachable access and in
write-through mode the cache can also issue a 8-bit (byte), 16-bit (half word), 32-bit (word), 64-bit, or
128-bit single write access.
9.3.9Cache status
The cache controller has a status register that provides information on the cache configuration (multiway configuration and set size). The cache also provides access, hit and error correction counters via
the LEON4 statistics unit (see section 26).
GR740-UM-DS, Nov 2017, Version 1.791www.cobham.com/gaisler
GR740
9.4Registers
The cache is configured via registers mapped into the AHB memory address space.
Table 68. L2C: AHB registers
AHB address offsetRegister
0x00Control register
0x04Status register
0x08Flush (Memory address)
0x0CFlush (set, index)
0x10 - 0x1CReserved
0x20Error status/control
0x24Error address
0x28TAG-check-bit
0x2CData-check-bit
0x30Scrub Control/Status
0x34Scrub Delay
0x38Error injection
0x3CAccess control
0x50Error handling / injection configuration
0x80 - 0xFCMTRR registers
0x80000 - 0x8FFFCDiagnostic interface (Tag)
0x80000: Tag 1, way-1
0x80004: Tag 1, way-2
0x80008: Tag 1, way-3
0x8000C: Tag 1, way-4
0x80010: Tag check-bits way-0,1,2,3 (Read only)
bit[31] = RESERVED
bit[30:24] = check-bits for way-1.
bit[23] = RESERVED
bit[22:16] = check-bits for way-2.
bit[15] = RESERVED
bit[14:8] = check-bits for way-3.
bit[7] = RESERVED
bit[6:0] = check-bits for way-4.
0x80020: Tag 2, way-1
0x80024: ...
0x200000 - 0x3FFFFCDiagnostic interface (Data)
0x200000 - 0x27FFFC: Data or check-bits way-1
0x280000 - 0x2FFFFF: Data or check-bits way-2
0x300000 - 0x27FFFC: Data or check-bits way-3
0x380000 - 0x3FFFFF: Data or check-bits way-4
When check-bits are read out:
Only 32-word at offset 0x0, 0x10, 0x20,... are valid check-bits.
bit[31] = RESERVED
bit[30:24] = check-bits for data word at offset 0x0.
bit[23] = RESERVED
bit[22:16] = check-bits for data word at offset 0x4.
bit[15] = RESERVED
bit[14:8] = check-bits for data word at offset 0x8.
bit[7] = RESERVED
bit[6:0] = check-bits for data word at offset 0xc.
GR740-UM-DS, Nov 2017, Version 1.792www.cobham.com/gaisler
GR740
9.4.1
Control register
Table 69. 0x00 - L2CC - L2C Control register
3129 28 2719 1816 1512 118 765 432 10
EN EDACREPLRESERVEDBBSINDEX-WAYLOCKRES HP
0 0000b1000000 00 00 0
rw rwrwrrwrwrwrrw rw rw rw rw rw
HPBUC HC WP HP
RH
B
31Cache enable (EN) - When set, the cache controller is enabled. When disabled, the cache is
bypassed.
30EDAC enable (EDAC)
29: 28Replacement policy (REPL) -
00: LRU
01: (pseudo-) random
10: Master-index using index-replace field
11: Master-index using the modulus function
27: 19RESERVED
18: 16Backend bus size configuration (BBS) -
“100”: Configure backend bus size to 128-bit.
“011”: Configure backend bus size to 64-bit.
“010”: Configure backend bus size to 32-bit.
“000”: No configuration update is done.
Other values: not supported.
15: 12Master-index replacement (INDEX-WAY) - Way to replace when Master-index replacement policy
and master index is larger than number of ways in the cache.
11: 8Locked ways (LOCK) - Number of locked ways.
7: 6RESERVED
5HPROT read hit bypass (HPRHB) - When set, a non-cacheable and non-bufferable read access will
bypass the cache on a cache hit and return data from memory. Only used with HPROT support.
4HPROT bufferable (HPB) - When HPROT is used to determine cachability and this bit is set, all
31: 5Memory Address (ADDR) - (For flush all cache lines, this field should be set to zero)
4RESERVED
3Cache disable (DI) - Setting this bit to ‘1’ is equal to setting the Cache enable bit to ‘0’ in the Cache
Control register
2: 0Flush mode (FMODE) -
“001“: Invalidate one line, “010”: Write-back one line, “011“: Invalidate & Write-back one line.
“101“: Invalidate all lines, “110”: Write-back all lines, “111“: Invalidate & Write-back all lines.
Only dirty cache lines are written back to memory.
Flush set/index register
INDEX / TAGFL VB DB RWAY DI WF FMODE
NR00000000
rwrw rw rw rrww rwrw
31: 16Cache line index (INDEX) - used when a specific cache line is flushed
31: 10(TAG) - used when “way flush” is issued. If a specific cache line is flushed, bit should be set to zero.
When a way flush is issued, the bits in this field will be written to the TAGs for the selected cache
way.
9Fetch Line (FL) - If set to ‘1’ data is fetched form memory when a “way flush” is issued. If a specific
cache line is flushed, this bit should be set to zero
8Valid bit (VB) - used when “way flush” is issued. If a specific cache line is flushed, this bit should be
set to zero.
7Dirty bit (DB) - used when “way flush” is issued. If a specific cache line is flushed, this bit should be
set to zero
6RESERVED
5: 4Cache way (WAY) -
3Cache disable (DI) - Setting this bit to ‘1’ is equal to setting the Cache enable bit to ‘0’ in the Cache
Control register.
2Way-flush (WF) - When set one way is flushed, If a specific cache line should be flushed, this bit
should be set to zero
1: 0Flush mode (FMODE) -
line flush:
“01“: Invalidate one line
“10”: Write-back one line (if line is dirty)
“11“: Invalidate & Write-back one line (if line is dirty).
way flush:
“01“: Update Valid/Dirty bits according to register bit[8:7] and TAG according to register
bits[31:10]
“10”: Write-back dirty lines to memory
“11“: Update Valid/Dirty bits according to register bits [8:7] and TAG according to register
bits[31:10], and Write-back dirty lines to memory.
GR740-UM-DS, Nov 2017, Version 1.794www.cobham.com/gaisler
7: 6Selects (CB) - data-check-bits for diagnostic data write:
00: use generated check-bits
01: use check-bits in the data-check-bit register
10: XOR check-bits with the data-check-bit register
11: use generated check-bits
Note: If this field is set to "01" or "10" then check-bits are overridden for all accesses. To get controlled error injection, the internal scrubber should be disabled and no accesses should be made to
the Level-2 cache.
5: 4Selects (TCB) - tag-check-bits for diagnostic tag write:
00: use generated check-bits
01: use check-bits in the tag-check-bit register
10: XOR check-bits with the tag-check-bit register
11: use generated check-bits
Note: If this field is set to "01" or "10" then check-bits are overridden for all accesses. To get controlled error injection, the internal scrubber should be disabled and no accesses should be made to
the Level-2 cache.
3Xor check-bits (XOR) - If set, the check-bits for the next data write or tag replace will be XOR:ed
withe the check-bit register. Default value is 0.
2Read check-bits (RCB) - If set, a diagnostic read to the cache data area will return the check-bits
related to that data.When this bit is set, check bits for the data at offset 0x0 - 0xc can be read at offset
0x0, the check bits for data at offset 0x10 - 0x1c can be read at offset 0x10, ...
1Compare error status (COMP) - If set, a read access matching a uncorrectable error stored in the
error status register will generate a AHB error response. Default value is 0.
0Resets (RST) - clear the status register to be able to store a new error. After power up the status reg-
ister needs to be cleared before any valid data can be read out.
R
S
T
GR740-UM-DS, Nov 2017, Version 1.795www.cobham.com/gaisler
15: 0Scrub Delay (DEL) - Delay the scrubber waits before issue the next line scrub operation
Error injection register
ADDRR INJ
000
rwr rw
31: 2Error Inject address (ADDR)
1:RESERVED
0Inject error (INJ) - Set to ‘1’ to inject a error at “address”.
GR740-UM-DS, Nov 2017, Version 1.797www.cobham.com/gaisler
GR740
9.4.12
Access control register
Table 80. 0x3C - L2CACCC - L2C Access control register
3115 14 13 12 11 10 987 65 432 10
RESERVEDD
000 0 00000000000
rrw* rw*rrw rw rw rw rw rw rw r rw rw r
SH RES SP
S
C
NHMBERROAPMFLINEDBPF128WFRDB
LIT
Q
PW
SP
LIT
S
31: 15RESERVED
14Disable cancellation and reissue of scrubber operation (DSC) - When set to ’0’, a write access to the
same index as an ongoing scrubber operation will cancel and reissue the scrubber operation. When
set to ’1’ the scubber operation will complete without detection of the write access. This field is only
available in silicon revision 1.
13Scrubber hold (SH) - When set to ’1’ the cache will delay any new access until the current scrubber
operation is complete. This field is only available in silicon revision 1.
12: 11RESERVED
10SPLIT queue write order (SPLITQ) When set, all write accesses (except locked) will be placed in the
split queue when the split queue is not empty
9No hit for cache misses (NHM) - When set, the unsplited read access for a read miss will not trig the
access/hit counters.
8Bit error status (BERR) - When set, the error status signals will represent the actual error detected
rather then if the error could be corrected by refetching data from memory.
7One access/master (OAPM) - When set, only one ongoing access per master is allowed to enter the
cache. A second access would receive a SPLIT response
6(FLINE) - When set, a cache line fetched from memory can be replaced before it has been read out
by the requesting master.
5Disable bypass prefetching (DBPF) - When set, bypass accesses will be performed as single accesses
towards memory.
4128-bit write line fetch (128WF) - When set, a 128-bit write miss will fetch the rest of the cache
from memory.
3RESERVED
2Disable wait-states for discarded bypass data (DBPWS) - When set, split response is given to a
bypass read access which data has been discarded and needs to refetch data from memory.
1Enabled SPLIT response (SPLIT) - When set the cache will issue a AMBA SPLIT response on
cache miss
0RESERVED
R
GR740-UM-DS, Nov 2017, Version 1.798www.cobham.com/gaisler
10 (EDI) - Enable invalidation off cache line with un-correctable data error.
When set to 1 and a un-correctable data error is detected, the cache line will be invalidated (removing the error form the cache).
.This field is only available in silicon revision 1.
9(TER) - Disable error response on un-correctable TAG error detection.
When set to 0 the access detecting a un-correctable TAG error would generate a AMBA error
response. When set to 1 this access would not generate an error response.
This field is only available in silicon revision 1.
8(IMD) - Disable index match only after un-correctable TAG error.
When set to 1 the TAG and INDEX are matched against the error address register after a detected uncorrectable TAG error. When set to 0 only the INDEX are matched against the error address register.
This field is only available in silicon revision 1
7: 0RESERVED
9.4.14
Memory type range registers
Table 82. 0x80-FC - L2CMTRR - L2C Memory type range register
3118 17 16 152 1 0
ADDRACCMASKWP AC
00000
rwrwrwrw rw
31: 18Address field (ADDR) - to be compared to the cache address [31:18]
17: 16Access field (ACC) - 00: uncached, 01: write-through
15: 2Address mask (MASK) - Only bits set to 1 will be used during address comparison
1Write-protection (WP) - 0: disabled, 1: enabled
0Access control field (AC) -. 0: disabled, 1: enabled
GR740-UM-DS, Nov 2017, Version 1.799www.cobham.com/gaisler
GR740
Figure 9. Memory controller connected to AMBA bus and SDRAM
AHB Front-end
with EDAC
Read/Write
data buffers
SDR back-end
AHB slave I/F
to SDRAM
mem_ifwidth
mem_ifwidth
10SDRAM Memory Controller with Reed-Solomon EDAC
10.1Overview
The SDRAM memory controller is a 64+32-bit memory controller which is divided into a front-end
and a back-end part.
10.2Operation
10.2.1 Memory data width
The controller supports a full-width and a half-width mode, selected via the MEM_IFWIDTH input
signal. In full-width mode, the memory bus has 64 data bits, and 0,16 or 32 check bits depending on
EDAC configuration. In half-width mode, the memory bus has 32 data bits, plus 0,8 or 16 check bits.
10.2.2 Memory access
When an AHB access is done to the controller, the corresponding request is sent to the memory backend which performs the access. For read bursts, the controller streams the read data so each burst item
is delivered to the bus as soon as it arrives and wait states are added as needed between each part of
the burst.
The controller has a write buffer holding one write access in EDAC configuration, and two write
accesses in non-EDAC configuration. Each write access can be up to the configured burst length in
size. The controller will mask the write latency by storing the data into the write buffer and releasing
the AHB bus immediately. The latency will be seen however if a read access is done before the writes
have completed or an additional write access is made when all buffers are used.
Writes of 32 bits or less will result in a read-modify-write cycle to update the checkbits (this is done
even if EDAC has been disabled in the control register). In this case, the memory controller generates
wait states on the AHB bus until the read part of the cycle has completed.
10.3Limitations
The AHB front-end with EDAC is optimized for 64/128-bit masters and does not handle 32-bit bursts
efficiently, each access will result in a RMW cycle in the write case, and a read cycle in the read case.
In this device, this case only happens when the Level-2 cache is disabled or set to write-through
mode.
GR740-UM-DS, Nov 2017, Version 1.7100www.cobham.com/gaisler
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.