Fujitsu SPARC64 V User Manual

Fujitsu Limited 4-1-1 Kamikodanak a Nahahara-ku, Ka w as ak i, 211 -858 8 Japan
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V
Fujitsu Limited Release 1.0, 1 July 2002
Part No. 806-6755-1.0
Copyright2002 Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, California 94303 U.S.A. All rights reserved. Portions of this document are protected by copyright 1994 SPARC International, Inc. This product or document is protected by copyright and distributed under lic enses restricting its use, copying, distributio n, and decompilation. No part of this
product or document may be reprod uced in any form by any means without prior written authorization of Sun and its lic ensors, if any. Thir d-party software, including font technology, is copyrighted and license d from Sun suppliers.
Parts of the product may be derived fr om Berkeley BSD systems, licensed from the University of California. UNIX is a r egistered trademark in the U.S. and other countries, exclusively licensed through X/Open C ompany, Ltd.
Sun, Sun Microsystems, the Sun logo, SunSoft, SunDocs, SunExpres s, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsy stems, Inc. in the U.S. and other countries. All SP ARC trademarks ar e used under license and are trademarks or r egistered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SP ARC trademarks are based upon an architec ture developed by Sun Microsystems, Inc.
The OPEN LOOK and S un™ Graphi cal Use r Interfac e was devel oped by Su n Micr osystems, Inc. for its users a nd license es. Sun ac knowledges the pioneering efforts of Xerox in r esearching and developing the con cept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which lice nse also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
RESTRICTED RIGHTS: Use, duplication, or disclosure by the U.S. Government is subject to restrictions of F AR 52.227-14(g)(2)(6/87) and F AR 52.227-19(6/87), or DFAR 252.227-7015(b)(6/95) and DF AR 227.7202-3(a).
DOCUMENTATION IS P ROVIDED “AS I S” AND ALL EXPR ESS OR IMPLIED CO NDITIONS, REP RESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED W ARRANTY OF MERCHANTABILITY , FITNESS FOR A P AR TICUL AR PURPOSE OR NON-INFRINGEMENT , ARE DISCL AIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc., 901 San Antonio Road • Palo Alto, CA 94303-4900 Etats-Unis. Tous droits r éservés. Ce produit ou document est protégé par un copy right et distribué avec des licences qui en r estreignent l’utilisation, la copie, la distribution, et la décompilation.
Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit, sans l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a. Le logiciel détenu par des tiers, et qu i comprend la technologie r elative aux polices de caractèr es, est protégé par un copyright et licencié par des fournisseurs de Sun.
Des parties de ce produit pourront êtr e dérivées des systèmes Berkeley BS D licenciés par l’Université de Californie. UN IX est une marque déposée aux E tats-Unis et dans d’autres pays et licenciée exclusivement par X/Open C ompany, Ltd. La notice suivante est applicable à Netscape Communicator™: Copyright 1995 Netscape Communications Corporation. T ous droi ts réservés.
Sun, Sun Microsystems , the Sun logo, Ans werBook2, docs.s un.com, et Sol aris sont des mar ques de fab rique ou des mar ques déposées, ou marques de service, de Sun Microsystems, Inc. aux Etats-Unis et dans d’autr es pays. Toutes les marques SP ARC sont utilisées so us licence et sont des marques de fabrique ou des marques déposées de SP ARC International, Inc. aux Etats-Unis et dans d ’autres pays. Les produits portant les mar ques SP ARC sont bas és sur une ar chitecture développée par Sun Microsystems, Inc.
L ’interface d’utilisation graphique OPEN LOOK et Sun™ a été dévelo ppée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun r econnaît les eff orts de pionniers de Xerox pour la r echerche et le développement d u concept des interfaces d’utilisation visuelle o u graphique pour l’industrie de l’informatique. Sun détient une licence non exclusive de Xerox sur l’interface d’utilis ation graphique Xerox, ce tte licence couvrant également les licenciés d e Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui en outre se con forment aux licences écrites de Sun.
CETTE PUBLICATION EST FOURNIE "EN L’ETAT" ET AUCUNE GARANTIE, EXPRESSE OU IMPLICITE, N’EST ACCORDEE, Y COMPRIS DES GARANTIES CONCERNANT LA VALEUR MARCHANDE, L ’APTITUDE DE LA PUBL ICA TION A REPOND RE A UNE UTILISA TI ON P AR TICUL IERE, OU LE FAIT QU’ELLE NE SOIT PAS CONTREFAISA NTE DE PRO DUIT DE TIERS . CE DENI DE GARANTI E NE S’APPLIQU ERAIT PAS, DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU.
Copyright© 2002 Fujitsu Limited, 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, Japan. All rights reserved. This product and related documentation ar e protected by copyright and distributed under licenses r estricting their use, copying, distribution, and decompilation.
No part of this product or related documentation may be r eproduced in any form by any means without prior written authorization of Fujits u Limited and its licensors, if any.
Portions of this product may be derived from the UNIX and Berk eley 4.3 BSD Systems, licensed from UNIX System Laboratories, Inc., a wholly owned subsidiary of Novell, Inc., and the University of California, respectively.
The product described in this book may be pro tected by one or more U.S. patents, f oreign patents, or pending applications. Fujitsu and the Fujitsu logo are trademarks of Fujitsu Limited. This publication is provided “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of
merchantability, fitness for a particular purpose, or noninfringement. This publication could include technical inaccuracies or typographical errors. changes are periodically added to the inf ormation herein; these chang es will be
incorporated in new editions of the publication. Fujitsu limited may make impr ovements and/or changes in the pr oduct(s) and/or the pr ogram(s) described in this publication at any time.
Sun Microsystems, Inc. Fujitsu Limited 901 San Antonio 4-1-1 Kamikodanaka Palo Alto, California, 94303 Nakahara-ku, Kawasaki, 211-8588 U.S.A. Japan
http://www.sun.com http://www.fujitsu.com/
Release 1.0, 1 July 2002 F. Chapter 2
3 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
F.CHAPTER

Contents

1. Ove r v iew 1
Navigating the SPARC64 V Implementation Supplement 1 Fonts and Notational Conventions 1 The SPARC64 V processor 2
Component Overview 4 Instruction Control Unit (IU) 6 Execution Unit (EU) 6 Storage Unit (SU) 7 Secondary Cache and External Access Unit (SXU) 8
2. Def i n i t ions 9
3. Architectura l Ov e rvi ew 13
4. Data Formats 15
5. Registers 17
Nonprivileged Registers 17
Floating-Point State Register (FSR) 18 Ti ck (TICK) Reg ister 19
Privileged Registers 19
Trap State (TSTATE) Register 19 Ver sion (VER) Re g i ster 20 Ancillary State Registers (ASRs) 20 Registers Referenced Through ASIs 22
i
Floating-Point Deferred-Trap Queue (FQ) 24 IU Deferred-Trap Queue 24
6. Instructions 25
Instruction Execution 25
Data Prefetch 25 Instruction Prefetch 26
Syncing Instructions 27 Instruction Formats and Fields 28 Instructi o n Categories 29
Control-Transfer Instructio ns (CTI s) 29
Floating-Point Operate (FPop) Instructio ns 30
Implementation-Dependent Instructions 30 Processor Pipeline 31
Instruction Fetch Stages 31
Issue Stages 33
Execution Stages 33
Completion Stages 34
7. Traps 35
Processor States, Normal and Sp ec ial Traps 35
RED_state 36
error_state 36 Trap Categories 37
Deferred Trap s 37
Reset Traps 37
Uses of the Trap Categories 37 Trap Control 38
PIL Control 38 Trap-Table Entry Addresses 38
Trap Type (TT) 38
Details of Supported Tr aps 39 Trap Processing 39 Exception and Interrupt Descriptions 39
SPARC V9 Implementation-Dependent, Optional Traps That Are
Mandatory in SPARC JPS1 39
ii SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
SPARC JPS1 Implementation-Dependent Traps 39
8. Mem ory Models 41
Overview 42
SPARC V9 M em or y Mo de l 42
Mode Control 42 Synchronizing Instruction and Data Memo ry 42
A. Instruction Definitions: SPARC64 V Extensions 45
Block Load and Store Instructions (VIS I) 47
Call and Link 49
Implementation-Dependent Instructions 49
Floating-Point Multiply-Add/Subtract 50 Jump and Link 53 Load Quadword, Atomic [Physical] 54 Memory Barrier 55 Partial Store (VIS I) 57 Prefetch Data 57 Read State Register 58 SHUTDOWN (VIS I) 58 Wr ite St at e Re gis t er 59 Deprecated I ns tru c ti on s 59
Store Barrier 59
B. IEEE S td 754 - 198 5 Re qu ir eme nt s fo r SPARC V9 61
Traps Inhibiting Results 61 Floating-Point Nonstandard Mode 6 1
fp_exception_other Exception (ftt=unfinished_FPop) 62
Operation Under FSR.NS = 1 65
C. Implementation Dependencies 69
Definition of an Implementation Depend ency 69 Hardware Characteristics 70 Implementation Dependency Categories 70 List of Implementation Dependencies 70
Release 1.0, 1 July 2002 F. Chapter Contents iii
D. Forma l S pe cifi c atio n o f t he M emo ry M od els 81
E. Opc ode M ap s 8 3
F. Memory Management Unit 85
Virtual Address Translation 85 Translation Table Entry (TTE) 86
TSB Organization 88 TSB Pointer Formation 88
Faults and Traps 89 Reset, Disable, and RED_state Behavior 91 Internal Register s an d ASI ope rat ion s 92
Accessing MMU Registers 92 I/D TLB Data In, Data Access, and Tag Rea d Regis ters 93 I/D TSB Extension Registers 97 I/D Synchronous Fault Status Registers (I-SFSR , D-SF SR) 97
MMU Bypass 104
TLB Replacement Policy 105
G. Assembly Language Syntax 107
H. Software Considerations 109
I. Extending the SPARC V9 Architecture 111
J. Changes from SPARC V8 to SPARC V9 113
K. Programming with the Memory Models 115
L. Addre ss S pa ce Id enti fi er s 117
SPARC64 V ASI Assignments 117
Special Memory Ac ce ss AS Is 119
Barrier Assist for Parallel Processing 121
Interface Definition 121 ASI Registers 122
M. Cache Orga nizat io n 125
Cache Types 125
Level-1 Instruction Cache (L1I Cache) 126
iv SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
Level-1 Data Cac he (L1D C a ch e) 127
Level-2 Unified Cache (L2 Cache) 127 Cache Coherency Pro tocols 128 Cache Control/Status Instructions 128
Flush Level-1 Instruction Cache (ASI_FLUSH_L1I) 129
Level-2 Cache Control Register (ASI_L2_CTRL) 130
L2 Diagnostics Tag Read (ASI_L2_DIAG_ TAG_READ) 130
L2 Diagnostics Tag Read Registers (AS I_L 2_DI AG_TAG_READ_REG) 131
N. Interrupt Handling 133
Interrupt Dis p at c h 13 3 Interrupt Re ce iv e 1 35 Interrupt Global Registers 136 Interrupt-Related AS R Regis ter s 13 6
Interrupt Vector Dispatch Register 136
Interrupt Vector Dispatch Status Register 136
Interrupt Vector Receive Register 136
O. Rese t, RED_ s tate , and err or_st at e 137
Reset Types 137
Power-on Reset (POR) 137
Watchdog R eset (W DR) 138
Externally Initiated Reset (XIR) 138
Software-Initiat ed R ese t (S I R) 13 8 RED_state and error_st ate 139
RED_state 140
error_state 140
CPU Fatal Error state 141 Processor State after Reset and in RED_state 141
Operating Status Register (OPSR) 146
Hardw are Power-On Rese t Sequ ence 147
Firmware Initialization Sequence 147
P. Error Handling 14 9
Error Classification 149
Fatal Error 149
Release 1.0, 1 July 2002 F. Chapter Contents v
error_state Transition Error 150 Urgent Error 150 Restrainable Error 152
Acti on an d Erro r Cont ro l 153
Registers Related to Error Handling 153 Summary of Actions Upon E rror Detection 154 Extent of Automatic Source Data Correction for Correctable Error 157 Error Marking for Cacheable Data Error 157 ASI_EIDR 161 Cont rol of E rror A c tion (ASI_ERROR_CONTROL) 161
Fatal Er ro r and e r ro r_s t a t e Tran s i tion Erro r 1 63
ASI_STCHG_ERROR_INFO 163
Fatal Error Types 164 Types of error_state Tra nsition Errors 164
Urgent Error 165
URGENT ERROR STATUS (ASI_UGESR) 165 Action of
async_data_error
(ADE) Trap 168 Instruction End-Method at ADE Trap 170 Expected Soft w are Hand ling of ADE Trap 171
Instruction Access Errors 173 Data Access Errors 173 Restrainable Errors 174
ASI_ASYNC_FAULT_STATUS (ASI_AFSR) 174 ASI_ASYNC_FAULT_ADDR_D1 177 ASI_ASYNC_FAULT_A DDR_U 2 17 8 Expected Software Handling of Restrainable Errors 179
Handling of Internal Register Errors 181
Register Error Handling (Excluding ASRs and ASI Registers) 181 ASR Error Handling 182 ASI Register Error Handling 183
Cache Error Handling 188
Handling of a Cache Tag Error 188 Handling of an I1 Cache Data Error 190 Handling of a D1 Cache Data Error 190 Handling of a U2 Cache Data Error 192 Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache 193
vi SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002
TLB Error Handling 195
Handling of TLB Entry Errors 195 Automatic Way Reduction of sTLB 196
Handling of Extended UPA Bus Interface Error 197
Handling of Extended UPA Address Bus Error 197 Handling of Extended UPA Data Bus Error 197
Q. Perform anc e Ins trume nt atio n 20 1
Performance Monitor Overview 201
Sample Pseudo co des 2 01
Performance Monitor Description 203
Instruction Statistics 204 Trap-R el at ed S tat istic s 2 06 MMU Event Counters 207 Cache Event Counters 208 UPA Event Counters 210 Miscellaneous Counters 211
R. UPA Programmer’s Model 213
Mapping of the CPUs UPA Port Slave Area 213 UPA Por tID Re gist er 214 UPA Conf ig Re giste r 215
S. Summary of Differences between SPARC64 V and UltraSPARC-III 219
Bibliography 223
General References 223
Index 225
Release 1.0, 1 July 2002 F. Chapter Contents vii
viii SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Rele a se 1. 0, 1 July 20 02
F.CHAPTER
1

Overview

1.1 Navigating the SPARC64 V Implementation Supplement
We sugg est that you approach this Impl ementation Suppl ement SPARC Joint Programming Specification as follows.
1. Familiarize yourself with the SPARC64 V processor and its components by reading these sections:
The SPARC64 V processor on page 2
Component Overview on page 4
Processor Pipel ine on page 31
2. Study the terminology in Chapter 2, Definitions:
3. For details of architectural changes, se e the remaining cha pters in this Implementation Supplement as your interests direct.
For this revision, we added new appendixes: Appendix R, and Appendix S, Summary of Differences between SPARC64 V and UltraSPARC-III.
UPA Programmer’s Model

1.2 Fonts and Notational Conventions

Please refer to Section 1.2 of Commonality for font a nd notational conventions.
,
1
1.3 The SPARC64 V processor
The SPARC64 V processor is a high-performance, high-reliability, and high-integrity processor that fully implements the instruction set architecture that conforms to SPARC V9, as described in JPS1 Commonality. In addition, the SPARC64 V processor implements the following features:
64-bit virtual a ddress space and 4 3-bit physical address space
Advanced RAS features that enable high-integrity error handling
Microarchitecture for High Performance
The SPARC6 4 V is an out-of-order execution superscala r processor that issues up to four instructions per cycle. Instructions in the predicted path are issued in program order and are stored temporarily in of program order to appropriate execution units. Instructions commit in program order when no exceptional conditions occur during execution and all prior instructions commit (that is, the result of the instruction execution becomes visible). Out-of-order execution in SPARC64 V contributes to high performance.
SPARC64 V implements a large branch history buffer to predict its instruction path. The history buffer is large enough to sustain a good prediction rate for large-scale programs such as DBMS and to support the advanced instruction fetch mechanism of SPARC64 V. This instruction fetch scheme predicts the execution path beyond the multiple conditional branches in accordance with the branch history. It then tries to prefetch instructions on the predicted path as much as possible to reduce the effect of the performance penalty caused by instruction cache misses.
reservation st ations
until they are dispatched out
High Integration
SPARC64 V integrates an on-board, associative, level-2 cache. The level-2 cache is unified for instruction and data. It is the lowest layer in the cache hierarchy.
This integration contributes to both performance and reliability of SPARC64 V. It enables shorter access time and more associativity and thus contributes to higher performance. It contributes to higher reliability by eliminating the external connections for level-2 cache.
High Reliability and High Integrity
SPARC64 V implements the following advanced RAS features for reliability and integrity beyond that of ordinary microprocessors.
2 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
1. Advanced RAS features for caches
Strong cache error protection:
ECC protection for D1 (Data level 1) cache data, U2 (unified level 2) cache data,
and the U2 cache tag. Parity protection for I1 (Instruction level 1) cache data.
Parity protection and duplication for the I1 cache tag and the D1 cache tag.
Automatic correction of all types of single-bit error:
Automatic single-bit error correction for the ECC protected data.
Invalidation and refilling of I1 cache data for the I1 cache data parity error.
Copying from duplicated tag for I1 cache tag and D1 cache tag parity errors.
Dynamic way reduction while cache consistency is maintained.
Error marking for cacheable data uncorrectable errors:
Special error-marking pattern for cacheable data with uncorrectable errors. The
identification of the module that first detects the error is embedded in the special pattern. Error-source isolation with faulty module identification in the special error-
marking. The identification information enables the processor to avoid repetitive error logging for the same error cause.
2. Advanced RAS features for the core
Strong error protection:
Parity protection for all data paths.
Parity protection for most of software-visible registers and internal temporary
registers. Parity predicti on or residue che cking for t he accumula tor output.
Hardware instruction retry
Support for software instruction retry (after failure of hardware instruction retry)
Error isolation for software recovery:
Error indication for each programmable register group.
Indication of retryability of the trapped instruction.
Use of different error traps to differentiate degrees of adverse effects on the
CPU and the system.
3. Extended RAS interface to software
Error classification according to the severity of the effect on program execution:
Urgent error (nonmaskable): Unable to continue execution without OS
intervention; reported through a trap. Restrainable error (maskable): OS controls whether the error is reported
through a trap, so error does not directly affect program execution.
Isolated error indication to determine the effect on software
Release 1.0, 1 July 2002 F. Chapter 1 Overview 3
Asynchronous data error (
Relaxed i nstruct ion en d method (precise , retryab le, not ret ryable ) for the
async_data_error
exception to indicate how the instruction should end; depends
ADE
) trap for additional errors:
on the executing instruction and the detected error.
ADE
Some
Simultaneous reporting of all detected
traps that are deferred but retryable.
handling of retryability.

1.3.1 Component Overview

The SPARC64 V processor contains these components.
Instruction control Unit (IU)
Execution Unit (EU)
Storage Unit (SU)
Secondary cache and eXternal access Unit (SXU)
ADE
errors at the error barrier for correct
FIGURE 1-1
illustrates the major units; the following subsections describe them.
4 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Extended UPA Bus
SX-Unit
UPA interface logic
MoveIn buffer
S-Unit interface
S-Unit
SX interface
I-TLB tag data 2048
+ 32
entry
Level-1 I cache
128 KB, 2-way
MoveOut buffer
U2$ U2$ data tag 2M 4-way
SX order queue Store queue
D-TLB tag data 2048
+ 32 entry
Level-1 D cache
128 KB, 2-way
E-Unit
ALU Input Registers
and Output Registers
GUB FUB
GPR FPR
ALUs EXA
EXB FLA FLB EAGA EAGB
I-Unit
Instruction Instruction fetch buffer pipeline
Commit stack entry Reservation stations
PC nPC
CCR
E-unit control
logic
FSR
Branch history
FIGURE 1-1
Release 1.0, 1 July 2002 F. Chapter 1 Overview 5
SPARC64 V Major Units

1.3.2 Instruction Control Unit (IU)

The IU predicts the instruction execution path, fetches instructions on the predicted path, distributes the fetched instructions to appropriate reservation stations, and dispatches the instructions to the execution pipeline. The instructions are executed out of order, and the IU commits the instructions in order. Major blocks are defined
TABLE 1-1
in
.
TABLE 1-1
Name Description
Instruction fetch pipeline Five stages: fetch address generation, iTLB access, iTLB match,
Branch history 16K entries, 4-way set associative. Instruction buffer Six entries, 32 bytes/entry. Reservation s tation Six reservation s tations to ho ld instructio ns until the y can
Commit stack entries Sixty-four ent ries; basica lly one inst ruction/ent ry, to hold
PC, nPC, CCR, FSR Program-vi sible regist ers for in structio n execu tion con trol.
Instruction Control Unit Major Blocks
I-Cache fetch, and a write to I-buffer.
execute: RSBR for branch and the other control-transfer instructions; RSA for l oad/sto re instruction s; RSEA and RSEB for integer arithmetic instructions; RSFA and RSFB for floating-point arithmetic and V IS instructio ns.
information about instructions issued but not yet committed.

1.3.3 Execution Unit (EU)

The EU carries out execution of all integer arithmetic, logical, shift instructions, all floating-point instructions, and all VIS graphic instructions. EU major blocks.
TABLE 1-2
Execution Un it Major B locks
TABLE 1-2
describes the
Name Description
General register (gr) renaming regis te r fi le (GUB: gr update buffer)
Gr a rch ite ctu re re gis te r fi le ( GPR) 160 entries, 1 read port, 2 write ports Floating-point (fr) renaming
regis te r fi le (FUB: fr update buffer)
Fr arc hi tec ture reg is ter fil e (FPR )Thirty-two entries,
EU control logic Controls the in struction exe cution sta ges: instru ction
6 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Thirty-two entries, 8 read ports, 2 write ports
Thirty-two entries, 8 read ports, 2 write ports
6 read ports, 2 write ports
selection, register read, and execution.
TABLE 1-2
Name Description
Execution Un it Major B locks (C o n tinued)
Interface registers Input/output registers to other units. Tw o integer ex ecution pipelin es
(EXA, EXB) Two floating-point and graphics
execution pipelines (FLA, FLB)
Two virtual address adders for memory access pipeline (EAGA, EAGB)

1.3.4 Storage Unit (SU)

The SU handles all sourcing and sinking of data for load and store instructions.
TABLE 1-3
describes the SU major blocks.
64-bit ALU and shifters.
Each floating-point execution pipeline can execute floating point multiply, floating point add/sub, floa ting-point multiply and add, floating point div/sqrt, and floating­point graphi cs instruction .
Two 64- bit virtual addresses for load/store.
TABLE 1-3
Name Description
Storage Unit Major Blocks
Instruction level-1 cache 128-Kbyte, 2-way associative, 64-byte line; provides low latency
instruction source
Data level-1 cache 128-Kbyte, 2-way associative, 64-byte line, writeback; provides
the low latency data source for loads and stores.
Instruction Translation Buffer
1024 entries, 2-way associative TLB for 8-Kbyte pages,
1
1024 entries, 2-way associative TLB for 4-Mbyte pages 32 entries, fully associative TLB for unlocked 64-Kbyte, 512-
Kbyte, 4-Mbyte
1
pages and locked pages in all sizes.
,
Data Translation Buffer 1024 entries, 2-way associative TLB for 8- Kbyte pages,
1024 entries, 2-way associative TLB for 4-Mbyte pages 32 entries, fully associative TLB for unlocked 64-Kbyte, 512-
Kbyte, 4-Mbyte
1
pages and locked pages in all sizes.
1
,
Store queue Decouples the pipeline from the latency of store operations.
Allows the pipeline t o contin ue flowing while the store w aits for data, and eventually writes into the data level 1 cache.
1. Unloced 4-Mbyte page entry is stored either in 2-way associative TLB or fully associative TLB exclusively, depending on the setting.
Release 1.0, 1 July 2002 F. Chapter 1 Overview 7

1.3.5 Secondary Cache and External Access Unit (SXU)

The SXU controls the operation of unified level-2 caches and the external data access interface (extended UPA interface).
TABLE 1-4
describes the major blocks of the SXU.
TABLE 1-4
Name Description
Unified level-2 cache 2-Mbyte, 4-way associative, 64-byte line, writeback; provides low
Movein buffer Sixteen entries, 64-bytes/entry; catches returning data from
Moveout buffer Eight entries, 64-bytes/entry; holds writeback data. A maximum
Extended UPA interface control logic
Secondary Cache and External Access Unit Major Blocks
latency data s ource for both instruction l evel-1 c ache and data level-1 cache.
memory system in response to the cache line read request. A maximum of 16 outstanding cache read operations can be issued.
of 8 outstanding writeback requests can be issued. Send/receive transaction packets to/from Extended UPA
interface connected to the system.
8 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
2

Definitions

This chapter defines concepts unique to the SPARC64 V, the Fujitsu implementation of SPARC JPS1. For definition of terms that are common to all implementations, please refer to Chapter 2 of Commonality.
committed Term applied to an instruction when it has completed w ithout error and all
prior instructions have completed without error and have been committed. When an instruction is committ ed, the state of the machine is permanently change d to reflect the result of the ins truction; the previously existing state is no longer needed and can be disca rded.
completed Term applied to an instruction after it has finished, has sent a no nerror stat us to
the issue unit, and all of its source operands are nonspeculative. Note: Although the state of the machine has been temporarily altered by completion of an instruction, th e state has not y et been permanentl y changed and the old state can be recovered until the instruction has been committed.
executed Te rm applied to an instru ction that ha s been proces sed by an execution u nit
such as a load unit. An instruction is in execution as long as it is still being processed by an execution unit.
fetched Term applied to an instruction that is obtained from the I2 instruction cache or
from the on-chip internal cache an d sent to the i ssue unit.
finished Term applied to an instruction when it has completed execution in a functional
unit and has forwarded its result onto a result bus. Results on the result bus are transferred to the register file, as are the waiting instructions in the instruction queues.
initiated Term applied to an inst ruct ion wh en it ha s al l of t he resources that it nee ds (fo r
example, source operands) and has been selected for execution.
instruction dispatch Synonym: instruction initiation.
instruction issued Term applied to an instruction whe n it has bee n dispatched to a reservation
station.
9
instruction retired Term applied to an instruct ion when all machine resources (s erial numbers,
renamed registers) have been reclaimed and are avai lable for use by o ther instructions. An instruc tion can only be retired after it has been committed.
instruction stall Term applied to an instruc tion that is not allowed to be issu ed. Not every
instruction can be issued in a given cycle. The SPARC64 V implementation imposes certain issue constraints bas ed on resource availability and program requirements.
issue-stalling
instruction An instruction that prevents ne w instructions from being is sued until it has
committed.
machine sync The state of a machine when all previously executing instructions have
committed; that is, when no i ssued but uncommitted instructions are in the machine.
Memory Manageme nt
Unit (MMU) Refers to the address translation h ardware in SPARC6 4 V that translates 64-bit
virtual address in to physica l address. The MMU is co mposed of the mITLB, mDTLB, uITLB, uDTLB, and the ASI registers used to manage address translation.
mTLB Main TLB. Sp lit into I and D, cal led m ITLB and mD TLB, respe ctive ly. Contains
address translations for the uITLB and uDTLB . When the uITLB o r uDTLB do not contain a translatio n, they ask the mTLB for the translation. If the mTLB contains the translation, it sends the tran slation to the respective uTLB. If the mTLB does not contain the translation, it ge nerates a fast access exceptio n to a software translation trap handler, which will load the translation information (TTE) into the mTLB and retry the access. See also TLB.
uDTLB Micro Data TLB. A small, fully associative buffer that contains address
translations for data accesses. Misses in the uDTLB are handled by the mTLB.
uITLB Micro Instruction TLB. A small , fully associ ative buffer that contai ns address
translations for instruction accesses. Misses in the uTLB are han dled by th e mTLB.
nonspeculative A distri bution syst em whereby a result i s guaranteed known cor rect or an
operand stat e is known to be valid . SPARC64 V employs sp eculative distribution, meaning that results can be distributed from functional units before the point at which guaranteed validity of the result is known.
reclaimed The status when all instruction-related resources that were held until commit
have been released and are availabl e for subsequent instructions. Instruct ion resources are usually reclaimed a few cycles after they are committed.
rename registers A large set of hardware registers implemented by SPARC64 V that are invisible
to the programmer. Before instructions are issued, source and destination registers are mapped on to this s et of rename registers. This al lows inst ructions that normally would be blocked, waiting for an architected register, to proceed
10 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
in parallel . When i nstruction s are committed, results in renamed registers are posted to the architected registers in the proper sequence to produce the correct program results.
scan A method used to initialize all of the machine state within a chip. In a chip that
has been desi gned to be scann able, a ll of t he machin e state is conne cted i n one or several loops c alled scan ri ngs. Initi alization data can be sca nned into the chip through the scan rings. The state of the machine also can be s canned out through the scan rings.
reservation station A holding location that b uffers di spatc h ed in struc ti ons u nt il al l i nput o pera nds
are available. SPARC64 V implements dataflow execution based on operand availability. When opera nds are availabl e, the in structions in the reservation station are scheduled for ex ecution. Reservati on stations also contai n special tag-matching logic that captures the appropriate operand data. Reservation stations are sometimes referred to as queues (for example, the integer queue).
speculative A distribution syst em whereby a result is not g uaranteed as kn own to be
correct or an operan d state is not known to be valid. SPARC64 V employs speculative distribution, meaning results can be distributed from functional units before the point at which guaranteed validity of the result is known.
superscalar An implementation that allows several instructions to be issued, executed, and
committed in one clock cycle. SPARC64 V issues up to 4 instructions per clock cycle.
sync Synonym: machine sync.
syncing instruction An instruction that causes a machine sync. Thus, before a syncing instruction is
issued, all previo us instructions (in program order) must hav e been committed. At that point, the syncing instruction is issued, executed, completed, and committed by itself.
TLB Translation l ookaside b uffer.
Release 1.0, 1 July 2002 F. Chapter 2 Definitions 11
12 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
3

Architectural Overview

Please refer to Chapter 3 in the Commonality section of SPARC Joint Programming Specification.
13
14 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
4

Data Formats

Please refer to Chapter 4, Data Formats in Commonality.
15
16 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
5

Registers

The SPARC64 V processor includes two types of registers: general-purposethat is, working, data, control/statusand ASI registers.
The SPARC V9 architecture also defines two implementation-dependent registers: the IU Deferred-Trap Queue and the Floating-Point Deferred-Trap Queue (FQ); SPARC64 V does not need or contain either queue. All processor traps caused by instruction executio n are precise, and there are severa l disruptin g traps cause d by asynchronous events, such as interrupts, asynchronous error conditions, and RED_state entry traps.
For general information, please see parallel subsections of Chapter 5 in Commonality. For easier referencing, this chapter follows the organization of Chapter 5 in Commonality.
For information on MMU registers, please refer to Section F.10, Inte rnal Regist ers and ASI operations, on page 9 2.
The chapter contains these sections:
Nonprivileged Regi sters on page 17
Privileged Registers on page 19

5.1 Nonprivileged Register s

Most of the definitions for the registers are as described in the corresponding sections of Commonality. Only SPARC64 V-specific features are described in this section.
17

5.1.7 Floating-Point State Register (FSR)

Please refer to Section 5.1.7 of Commonality for the description of FSR. The sections below describe SPARC64 V-specific features of the FSR regi st er.
FSR_nonstandard_fp (NS)
SPARC V 9 defines th e FSR.NS bit which, when set to 1, causes the FPU to produce implementation-dependent results that may not conform to IEEE Std 754-1985. SPARC 64 V implements th is bit.
When FSR.NS = 1, denormal input operands and denormal results that would otherwise trap are flushed to 0 of the same sign and an inexact exception is signalled (that may be masked by FSR.TEM.NXM). See Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.
When FSR.NS = 0, the normal IEEE Std 754-1985 behavi or is implemented.
FSR_version (
For each SPARC V9 IU implementation (as identified by its VER.impl field), there may be one or more FPU implementations or none. This field identifies the particular FPU implementation present. For the first SPARC64 V, FSR.ver =0 (impl. dep. #19); however, future versions of the architecture may set FSR.ver to other values. Consult the SPARC64 V Data Sheet for the setting of FSR.ver for your chipset.
FSR_floating-point_trap_type (
The complete conditions under which SPARC64 V triggers trap type on page 61 (impl. de p. #248).
unfinished_FPop
)
ver
)
ftt
fp_exception_other
is described in Section B.6, Floating-Point Nonstandard Mode,
with
FSR_current_exception (cexc)
Bits 4 through 0 indicate that one or more IEEE_754 f loating-point exceptions were generated b y the most recent ly execute d FPop inst ruction. Th e absence of an exception causes the corresponding bit to be cleared.
In SPAR C64 V, the cexc bits are set according to the following pseudocode:
if (<LDFSR or LDXFSR commits>)
<update using data from LDFSR or LDXFSR>;
else if (<FPop commits with ftt = 0>)
<update using value from FPU>
18 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
else if (<FPop commits with IEEE_754_exception>)
<set one bit in the CEXC field as supplied by FPU>;
else if (<FPop commits with unfinished_FPop error>)
<no change>;
else if (<FPop commits with unimplemented_FPop error>)
<no change>;
else
<no change>;
FSR Conformance
SPARC V 9 allow s th e TEM, cexc, and aexc fields to be implemented in hardware in either of two ways (both of which co mply with IEEE Std 754-1985) . SPAR C64 V follows case (1); that is, it implements all t hree fields in conformance with IEEE Std 754-1985. See FSR Conform ance in Section 5.1.7 of Commonality for more information about other implementation methods.

5.1.9 Tick (TICK) Register

SPARC64 V impl ements TICK.counter register as a 63-bit register (impl. dep. #105).
Implementation Note –
when the TICK register is read is the value of TICK.counter when the RDTICK instruction is executed. The difference between the counter values read from the TICK register on two reads reflects the number of processor cycles executed between the executions of the RDTICK instructions, not their commits. In longer code sequences, the difference between this value and the value that would have been obtained when the instructions are committed would have been small.
On SPARC64 V, the counter part of the value returned

5.2 Privileged Registers

Please refer to Section 5.2 of Commonality for the description of privileged registers.

5.2.6 Trap State (TSTATE) Register

SPARC64 V implem ents onl y bits 2: 0 of the TS TATE.CWP field. Writes to bits 4 and 3 are ignored, and reads of these bits always return zeroes.
Release 1.0, 1 July 2002 F. Chapter 5 Registers 19
Note –
be performed, since it will take the SPARC64 V into RED_state withou t the required sequencing.
Spurious set ting of t he PSTATE .RED bit by privileged software should not

5.2.9 Version (VER) Register

TABLE 5-1
TABLE 5-1
Bits Field Value
63:48 manuf 000416 (impl. dep. #104) 47:32 impl 5 (impl. dep. #13) 31:24 mask n (The value of n depends on the processor chip version) 15:8 maxtl 5 4:0 maxwin 7
The manuf field contains Fujitsus 8-bit JEDEC code in the lower 8 bits and zeroes in the upper 8 bits. The manuf, impl, and mask fields are implemented so that they may change in future SPARC64 V processor versions. The mask field is incremented by 1 any time a programmer-visible revision is made to the processor. See the SPARC64 V Data Sheet to determine the current setting of the mask field.
shows the values for the VER register for SPARC64 V.
VER
Register Encodings

5.2.11 Ancillary State Registers (ASRs)

Please refer to Section 5.2.11 of Commonality for details of the ASRs.
Performance Control Register (PCR) (ASR 16)
SPARC64 V implements the PCR register as described in SPARC JPS1 Commonality, with additional features as described in this section.
In SPARC 64 V, the accessibili ty of PCR when PSTATE.PRIV = 0 is determined by
PCR.PRIV. If PSTATE.PRIV =0 and PCR.PRIV = 1, an attempt to execu te either RDPCR or WRPCR will cause a PCR.PRIV =0, RDPCR operates without privilege violation and WRPCR causes a
privileged_action
to) PCR.PRIV (impl. dep. #2 50). See Appendix Q, Perf ormance Instr umentation , for a detailed discussion of the PCR
and PIC register usage and event count definitions.
20 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
exception only when an attempt is made to change (that is, write 1
privileged_action
exception. If PSTATE.PRIV =0 and
The Performance Control Register in SPARC64 V is illustrated in described in
TABLE 5-2
.
FIGURE 5-1
and
0
63 16 10
TABLE 5-2
Bit Field Description
OVF 0 SLSU0SC
4748
FIGURE 5-1
PCR
Bit Description
0
26273132
SPARC64 V Performance Control Register (PCR) (ASR 16)
NC
0OVRO
25
0
21
2224
1718
20
9
ULRO UT ST PRIV
40
12311
47:32 OVF Overflow Clear/Set/Status. Used to read counter overflow status (via RDPCR) and cle ar
or set counter overflow status bits (via WRPCR). PCR.OVF is a SPARC64 V-specific f ield (impl. dep. #207).
The following figure depicts the bit layout of SPARC64 V OVF field for four counter pairs. Counter status bits are cleared on write of 0 to the appropriate OVF bit.
L2U2L3U3
15
L0U0L1U10
01234567
26 OVRO Overflow read-only. Write-only/read-as-zero field specifying PCR.OVF update behavior
for WRPCR. P CR. The OVRO field is implementation -dependent (impl. dep. #207). WRPCR.PCR with PCR.OVRO = 1 inhibits updating of PCR.OVF for the current write only. T he intention of PCR.OVRO is t o write PCR while preserving current PCR.OVF value. PCR.OVF is maint ained inter nally by h ardware, so a sub sequent R DPCR .PCR returns accurate overflow status at the time.
24:22 NC Number of c ounter pa irs. Three- bit, read-o nly field specifyi ng the n umber of counter
pairs, encoded as 0–7 for 1–8 counter pairs (impl. dep. #207). For SPARC64 V, the hardcoded value of NC is 3 (indicating presence of 4 counter pairs).
20:18 SC Select PIC. In SPARC64 V, three-bit field specif ying which c ounter p air is currently
selected as PIC (ASR 17) and which SU/SL values are visible to software. On write, PCR.SC selects which counter pair is updat ed (unless PCR. ULRO is set; see below). On read, PCR.SC selects which counter pair is to be read through PIC (ASR 17).
16:11 SU Defined (as S1) in SPARC JPS1 Commonality. 9:4 SL Defined (as S0) in SPARC JPS1 Commonality. 3 ULRO Implementation-dependent field (impl. dep. #207) that specifies whether SU/SL are
read-only. In SPARC64 V, this field is write-only/read-as-zero, specifying update behavior of SU/SL on write. When PC R.ULR O = 1, SU/SL are considered as read-only; the values s et on PCR.SU/P CR.SL are not written into SU/SL. When PCR.ULRO = 0, SU/SL are updated. PCR.ULRO is intended to switch visible PIC by writing PCR.SC, without affecting current selection of SU/SL of that PIC. On PCR read, PCR.SU/PCR.SL always shows the current setting of the PIC regardless of PCR.ULRO.
2 UT Defi ned in SPARC JPS1 Commonality. 1 ST Defi ned in SPARC JPS1 Commonality.
Release 1.0, 1 July 2002 F. Chapter 5 Registers 21
TABLE 5-2
Bit Field Description
0 PRIV Defined in SPARC JPS1 Commonality, with the additional function of controlling PCR
PCR
Bit Description (Continued)
accessibility as described above (impl. de p. #250).
Performance Instrumentation Counter (PIC) Register (ASR
17)
The PI C register is implemented as described in SPARC JPS1 Commonality. Four PICs are implemented in SPARC64 V. Each is accessed through ASR 17, using
PCR.SC as a select field . Read/write acce ss to the PIC will access the PICU/PIC L counter pair selected by PCR. For PICU/PICL enc odings of sp ecific ev ent counte rs,
see
Appendix Q, Performance Instrumentation
.
Counter Overflow.
and an interrupt level-15 exception is generated. The counter overflow trap is triggered on th e trans ition f rom value FFFF FFFF are generated simultaneously, then multiple overflow status bits will be set. If overflow status bits are already set, then they remain set on counter overflow.
Overflow status bits are cleared by software writing 0 to the appropriate bit of PCR.OVF and may be set by writing 1 to the appropriate bit. Setting these bits by software does not generate a level 15 inte rrupt.
On overflow, counters wrap to 0, SOFTINT register bit 15 i s set,
to value 0. If multiple overflows
16
Dispatch Control Register (DCR) (ASR 18)
The DC R is not implemented in SPARC64 V. Zero is returned on read, and writes to the register are ignored. The DCR is a privileged register; attempted access by nonprivileged (user) code generates a
privileged_opcode
exception.

5.2.12 Registers Referenced Thro ugh ASIs

Data Cache Unit Control Register (DCUCR)
ASI 4516 (ASI_DCU_CONTROL_REGISTER), VA = 016. The Data Cache Unit Control Register contains fields that control several memory-
related hardware functions. The functions include Instruction, Prefetch, write and data caches, MMUs, and watchpoint setting. SPARC64 V implements most of DCUCUR’s functions described in Section 5.2.12 of Commonality.
22 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Aft er a p ow er- on re se t ( POR), all fields of DCUCR, including implementation­dependent fields, are set to 0. After a WDR, XIR, or SIR reset, all fields of DCUCR, including i mplement ation-d ependen t fields, are set to 0.
The Data Cache Unit Control Register is illustrated in
TABLE 5-3
5063
TABLE 5-3
Bits Field Type Use Description
0
Implementation dependent PM VM PR PW VR DM 0
0
4849
FIGURE 5-2
DCUCR Description
. In the table, bits are grouped by function rather than by strict bit sequence.
WEAK_SPCA
41
DCU Control Register Access Data Format (ASI 4516)
2425323347
FIGURE 5-2
VW
and described in
IM 0
012342122234042 20
49:48 CP, CV RW No t implemented in SPARC64 V (impl. dep. #232). It reads as 0 and writes to
it are ignored. 47:42 impl. dep. Not used. It reads as 0 and writes to it are ignored. 41 WEAK_SPCA RW U sed for disabling speculative memory access (impl. dep. #240). When
DCUCR.WEAK_SPCA = 1, the branch history table is cleared and no longer
issues aggressive instruction prefetch.
During DCU CR.WE AK_SP CA = 1, aggressive in struction prefet ching is
disabled and any load and store instructions are considered presync
instructions that are executed w hen all previo us instruction s are committed .
Because all CTI are considered as not taken, instructions residing beyond 1
Kbyte of a CTI may be fetched and executed.
On entering aggressive instruction Prefetch disable mode, supervisor
software should issue membar #Sync, to make sure all in-flight instructions
in the pipeline are discarded.
During DCU CR.WE AK_SP CA = 1, an L2 cache flush by writing 1 to
ASI_L2_CTRL.U2_FLUSH remains pending internally until
DCUCR.WEAK_SPCA is set to 0. To wait for completion of the cache flush, a
member #Sync must be issued after DCUCR.WEAK_SPCA is set to 0.
Executing a membar #Sync while the DCUCR.WEAK_SPCA = 1 after writing 1
to ASI_L2_CTRL. U2_FL USH d oes not w ait for th e cache flush to co mplete . 40:33 PM<7:0> Defined in SPARC JPS1 Commonality. 32:25 VM<7:0> Defined in SPARC JPS1 Commonality. 24, 23 PR, PW Defined in SPARC JPS1 Commonality. 22, 21 VR, VW Defined in SPARC JPS1 Commonality. 20:4 Reserved. 3 DM Defined in SPARC JPS1 Commonality. 2 IM Defined in SPARC JPS1 Commonality.
Release 1.0, 1 July 2002 F. Chapter 5 Registers 23
TABLE 5-3
Bits Field Type Use Description
1 DC RW Not implemented in SPARC64 V (impl. dep. #252). It reads as 0 and writes to
0 IC RW Not implemented in SPARC64 V (impl. dep. #253). It reads as 0 and writes to
DCUCR Description (Continued)
it are ignored.
it are ignored.
Data Watchpoint Registers
No implement ation-dep endent feature o f SPAR C64 V reduces the reli ability of da ta watchpoints (imp l. dep. #244).
SPARC64 V employs conservative check of PA/VA watchpoint over partial store instruction. See Section A.42, Partial Store (VIS I), on page 57 for details.
Instruction Trap Regist er
SPARC64 V implements the Inst ruction Trap Register (impl. dep. #20 5). In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALL or branch (BPcc,
FBPfcc, Bicc, BPr) instruction in an instruction cache are identical to their architectural encoding (as it appe ars in main memory) (impl. dep. #245).

5.2.13 Floating-Point Deferred-Trap Queue (FQ)

SPARC64 V does not contain a Floating-Point Deferred-trap Queue (impl. dep. #24). An attempt to read FQ with an RDPR instruction generates an exception (impl. de p. #25).
illegal_instruction

5.2.14 IU Deferred-Trap Queue

SPARC64 V neither has nor needs an IU deferred-trap queue (impl. dep. #16)
24 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
6

Instructions

This chapter presents SPARC64 V implementation-specific instruction details and the processor pipeline information in these subsections:
Instruction Execution on page 25
Instructi on Format s and Field s on page 28
Instruction Categories on page 29
Processor Pipel ine on page 31
For additional, general information, please see parallel subsections of Chapter 6 in
Commonality. For easy referencing, we follo w the organization of Chapter 6 in Commonality.

6.1 Instruction Execution

SPARC64 V is an advanced superscalar imp lementation of SPARC V9. Several instructions may be issued and executed in parallel. Although SPARC64 V provides serial program execution semanti cs, some of the implemen tation ch aracteri stics described below are part of the architecture visible to software for correctness and efficiency. The affected software includes optimizing compilers and supervisor code.

6.1.1 Data Prefetch

SPARC64 V employs speculative (out of program order) execution of instructions; in most cases, the effect of these instructions can be undone if the speculation proves to be incorrect. prefetching. Formally, SPARC64 V employs the following rules regarding speculative prefetching:
1. An async_data_error may be signalled during speculative data pref etching.
1
However, exceptions can occur because of speculative data
25
1. If a memory operation y resolves to a volatile memory address (location[y]), SPARC64 V will not speculatively prefetch location[y] for any re ason; location[y] will be fetched or stored to only when operation y is commitable.
2. If a memory opera tion y resolves to a nonvolatile memory address (location[y]), SPARC64 V may speculatively prefetch location[y] subject, adhering to the following subrules:
a. If an operation y can be speculatively prefetched according to the prior rule,
operations with store semantics are speculatively prefetched for ownership only if they are prefetched to cacheable locations. Operations without store semantics are speculatively prefetched even if they are noncacheable as long as they are not volatile.
b. Atomic operations (CAS(X)A, LDSTUB, SWAP) are never speculatively
prefetched.
SPARC64 V provides two mechanisms to avoid speculative execution of a load:
1. Avoid specul ation by disallowing speculative accesses to certain memory pages or I/O spaces.
This can be done by setting the E (side-effect) bit in the PTE for all memory pages that should not allow speculation. All accesses made to memory pages that have th e E bit set in their PTE will be delayed until they are no longer speculativ e or unt il the y are canc elle d
.
See Appendix F, Memory Mana geme nt Unit,
for details.
2. Alternate space load instructions that force program order, such as ASI_PHYS_BYPASS_WITH_EBIT[_L] (AS I = 15
, 1D16), will not be speculatively
16
executed.

6.1.2 Instruction Prefetch

The processor prefetches instructions to minimize cases where the processor must wait for instruction fetch. In combination with branch prediction, prefetching may cause the processor to access instructions that are not subsequently executed. In some cases, the specula tive instruction accesses w ill reference data pages. SPARC64 V does not generate a trap for any exception that is caused by an instruction fetch until all of the instructions before it (in program order) have been committed.
1. Hardware errors and other asynchronous errors may generate a trap even if the instruction that caused the
trap is never committed.
26
SPARC JPS1 Implementation Supplement:
1
Fujitsu SPARC64 V
Release 1.0, 1 July 2002

6.1.3 Syncing Instructions

SPARC64 V has instructions, called syncing instructions, that stop execution for the number of cycles it takes to clear the pipeline and to synchronize the processor. There are two types of synchronization, pre and post. A presyncing instruction waits for all previous instructions to commit, commits by itself, and then issues successive instructions. A postsyncing instruction issues by itself and prevents the successive instructions from issuing until it is committed. Some instructions have both pre- and postsync attributes.
In SPARC64 V almost all instructions commit in order, but store instruction commit before becoming globally visible. A few syncing instructions cause the processor to discard prefetched instructions an d to refetch the successive i nstructions. lists all pre-/postsync instructions and the effects of instruction execution.
TABLE 6-1
TABLE 6-1
Opcode
ALIGNADDRESS{_LITTLE} Yes BMASK Yes DONE Yes Yes FCMP(GT,LE,NE,EQ)(16,32)Yes FLUSH Yes Yes Yes FMOV(s,d)icc Yes FMOVr Yes LDD Yes Yes LDDA Yes Yes LDDFA Yes
memory access with
ASI=ASI_PHYS_BYPASS_EC{_LI TTLE} , ASI_PHYS_BYPASS_EC_WITH_E_ BIT{_ LITTL E}
LDFSR, LDXFSR Yes MEMBAR Yes Yes MOVfcc Yes MULScc Yes PDIST Yes RDASR Yes RETRY Yes Yes SIAM Yes STBAR Yes STD Yes
SPARC64 V Syncing Instructions
Presyncing Postsyncing
Sync?
Yes
Wai t fo r store global visibility?
1
Sync?
Yes
Discard prefetched instructions?
Release 1.0, 1 July 2002 F. Chapter 6 Instructions 27
TABLE 6-1
Opcode
SPARC64 V Syncing Instructions (Conti nued)
Sync?
Presyncing Postsyncing
Wai t fo r store global visibility?
Sync?
Discard prefetched instructions?
STDA Yes STDFA Yes STFSR, STXFSR Yes Tcc Yes Yes Yes WRASR Yes
cmask !=0
1. When
WRGSR
2.
#
only.
.
2
Yes

6.2 Instruction Formats and Fields

Instructions are encoded in five major 32- bit formats and several mino r formats. Please refer to Section 6.2 of Commonality for illustrations of four major formats.
FIGURE 6-1
Format 5 (op = 2, op3 = 3716): FMADD, FMSUB, FNMADD, and FNMSUB (in place of IMPDEP2B)
illustrates Format 5, unique to SPARC64 V.
op3rdop rs1 rs3 rs2var
31 141924 18 13 12 5 4 02530 29 11 10 9 7 617 8
FIGURE 6-1
Summary of Instruction Formats: Format 5
Instruction fields are those shown in Section 6.2 of Commonality. Three additional fields are implemented in SPARC64 V. They are described in
TABLE 6-2
Bits Field Description
Instruction Fields Specific to
13:9 rs3 This 5-bit field is the address of the third f register source operand for
the floating-poin t multiply-a dd and multipl y-subtract in struction.
8.7 var This 2-bit field specifi es which s pecific op eration (variation) to perfor m for the floating-po int multiply-ad d and multip ly-subtract instruc tions
6.5 size This 2-bit field specifies the size of the operands for the floating-point multiply-add an d multiply-s ubtract ins tructions.
28 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
SPARC64 V
TABLE 6-2
size
.
size
Since
= 00 is not
IMPDEP2B
and since
size
is not implemented in SPARC64 V, the instruction with
illegal_instruction
exception in SPARC64 V.

6.3 Instruction Categories

SPARC V9 instructions comprise the categories listed below. All categories are described in Section 6.3 of Commonality. Subsections in bold face are SPARC64 V implementation dependencies.
Memory access
Memory synchronization
Integer arithmetic
Control transfer (CTI)
Conditional moves
Register window management
State register access
Privileged register access
Floating-point operate (FPop)
Implementation-dependent
= 11 assumed quad operati ons but
= 00 or 11 generates an
size

6.3.3 Control-Transfer Instructions (CTIs)

These are the basic control-transfer instruction types:
Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)
Unconditional branch
Call and link (CALL)
Jump and link (JMPL, RETURN)
Return from trap (DONE, RETRY)
Tr ap ( Tcc)
Instructions other than CALL and JMPL are described in their entirety in Section 6.3.2 of Commonality. SPARC64 V implements CALL and JMPL as described below.
CALL and JMPL Instructions
SPARC64 V writes all 64 bits of the PC into the destination register when PSTATE.AM = 0. The upper 32 bits of r[15] (CALL) or of r[rd] (JMPL) are written as zeroes when PSTATE.AM = 1 (impl. dep. #125).
Release 1.0, 1 July 2002 F. Chapter 6 Instructions 29
SPARC64 V im plements JMPL and CALL return prediction hard ware in a form of special stack, called the Return Address Stack (RAS). Whenever a CALL or JMPL that writes to %o7 (r[15]) occurs, SPARC64 V “push es” the return address (PC+8) onto the RAS. When either of the synthetic ins truct ions retl (JMPL [%o7+8]) and ret (JMPL [%i7+8]) are subsequently executed, the return address is predicted to be the address stored on the top of th e RAS and the RAS is “popped.” If the prediction in the RAS is incorrect, SPARC64 V backs up and starts issuing instructions from the correct target address. This backup takes a few extra cycles.
Programming Note –
take into account how the RAS works. For example, tricks that do nonstandard returns in hopes of boosting performance may require more cycles if they cause the wrong RAS value to be used for predicting the address of the return. Heavily nested calls can also cause earlier entries in the RAS to be overwritten by newer entries, since the RAS only has a limited number of entries. Eventually, some return addresses will be mispredicted because of the overflow of the RAS.
For maximum performance, software and compilers must

6.3.7 Floating-Point Operate (FPop) Instructions

The complete conditions of generating an FSR.ftt = Mode on page 61.
The SPARC64 V-specific FMADD and FMSUB instructions (described below) are also floating-point operations. They require the floating-point unit to be enabled; otherwise, an instructions. However, these instructions are not included in the FPop category and, hence, reserved encodings in these opcodes generate an defined in Section 6.3.9 of Commonality.
unfinished_FPop
fp_disabled
trap is generated. They also affect the FSR, like FPop
are described in Section B. 6, Floating-Point Nonstandard
fp_exception_other
illegal_instruction
excepti on with

6.3.8 Implementation-Dependent Instructions

exception, as
SPARC64 V uses the IMPDEP2 instruction to implement the Floating-Point Multiply­Add/Subtract and Negative Multiply-Add/Subtract instructions; these have an op3 field = 37 definitions of these instructions. Opcode space is reserved in IMPDEP2 for the quad­precision forms of these instructions. However, SPARC64 V does not currently implement the quad-precision forms, and the processor generates an exception if a quad-precision form is specified. Since these instructions are not part of the required SPARC V9 architecture, the operating system does not supply software emulat ion routines f or the quad v ersions of these instruc tions.
SPARC64 V uses the IMPDEP1 instruction to implement the graphics acceleration instructions.
30 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
(IMPDEP2). See Floating-Point Multiply-Add/Subtract on page 50 for fuller
16
illegal_instruction

6.4 Processor Pipeline

The pipeline of SPARC64 V consists of fifteen stages, shown in FIGURE 6-2. Each stage is referenced by one or two letters as follows:
IA IT IM IB IR
EDPBX UW

6.4.1 Instruction Fetch Stages

IA (Instruction Address generation) Calculate fetch target address.
IT (Instruction TLB Tag access) Instruction TLB tag search. Search of BRHIS and RAS is also started.
IM (Instruction TLB tag Match) Check TLB tag is matched. The result of BRHIS and RAS search is also avail able at this stage and is forwarded to IA stage for subsequent fetch.
IB (Instruction cache Buffer read) Read L1 cache data if TLB is hit.
IR (Instruction read Result) Writ e to I-Buffer.
IA through IR stages are dedicated to instruction fetch. These stages work in concert with the cache access unit to supply instructions to subsequent stages. The instructions fetched from memory or cache are stored in the Instruction Buffer (I­buffer). The I-buffer has six entries, each of which can hold 32-byte-aligned 32-byte data (eight instructions).
Ps Ts Ms Bs Rs
SPARC64 V has a bra nch prediction mechanism and resources named BRH IS (BRanch HIStory) and RAS (Return Address Stack). Instruction fetch stages use these resources to determine fetch a ddresses.
Instruction fetch stages are designed so that they work independently of subsequent stages as much as possible. And they can fetch instructions even when execution stages stall. These stages fetch until the I-Buffer is full; further fetches are possible by requesting prefetches to the L1 cache.
Release 1.0, 1 July 2002 F. Chapter 6 Instructions 31
BRHIS
IF EAG
iTLB
L1I
Instruction Buffer
IA
IT
IM
IB
IR
E
IWR
RSFA
FXB EXBFXA EXA EAGA EA GB
RSFB RSEBRSEA
FUB
RRRRRR
RR
GUB
RSA
dTLB
L1D
LB
LR
FPR
GPR
CSE
ccr fsr
RSBR
PCnPC
Ps
Ts
Ms
Bs
Rs
D
P
B
X
U
W
FIGURE 6-2
32 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
SPARC64 V Pipeline

6.4.2 Issue Stages

E (Entry) Instructions a re passed from fetc h stages.
D (Decode) Assign resources and dispatch to reservation station (RS.)
SPARC64 V is an out-of-order execution CPU. It has six execution units (two of arithmetic and logic unit, two of floating-point unit, two of load/store unit). Each unit except the load/store unit has its own reservation station. E and D stages are issue stages th at de code instr uction s and d isp atch th em to the target R S. SPARC64 V can issue up to four instructions per cycle.
The resources needed to execute an instruction are assigned in the issue stages. The resources to be allocated include the following:
Commit stack en try (CSE)
Renaming registers of integer (GUB) and floating-point (FUB)
Entries of reservations stations
Memory access ports
Resources needed for an instruction are specific to the instruction, but all resources must be assigned at these stages. In normal execution, assigned resources are released at the very last stage of the pipeline, W-stage. stage and W-stage are considered to be in-flight. When an exception is signalled, all in-flight instructions and the resources used by them are released immediately. This behavior enables the decoder to restart issuing instructions as quickly as possible.
The number of in-flight instructions depends on how many resources are needed by them. The maximu m number is 64.
1
Instructio ns betwe en the E-

6.4.3 Execution Stages

P (priority) Select an instruction from those that have met the conditions for execution.
B (buffer read) Read register file, or receive forwarded data from another pipelines.
X (execute) Execution.
Instructions in reservation stations will be executed when certain conditions are met, for example, the values of source registers are known, the execution unit is available. Execution latency varies from one to many, depending on the instruction.
1. An entry in a reservation stat ion is rel eased at the X- stage.
Release 1.0, 1 July 2002 F. Chapter 6 Instructions 33
Execution Stages for Cache Access
Memory access requests are passed to the cache access pipeline after the target address is calculated. Cache ac cess stages work the sam e way as instruction fet ch stages, except for the handl ing of bran ch prediction. See Section 6.4 .1, Instruction Fetch Stages, for details. Stages in instruction fetch and cache access correspond as follows:
Instruction Fetch Stages Cache Access
IA Ps
IT Ts
IM Ms
IB Bs IR Rs
When an exception is signall ed, fetch ports and store ports used by memory access instructions are released. The cache access pipeline itself remains working in order to complete ou tgoing m emory acce sses. When data is return ed, it is the n stored to the cache.

6.4.4 Completion Stages

U (Update) Update of physical (renamed) register.
W (Writ e) Update of architectural registers and retire; excep tion handli ng.
After an out-of-order execution, execution reverts to program order to complete. Exception handling is done in the completion stages. Exceptions occurring in execution stage s are not handled immedi ately but are signall ed when the instruction is completed.
1
1. RAS-related except ion m ay be s igna lled b efor e co mplet ion.
34 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
7

Traps

Please refer to Chapter 7 of Commonality. Section numbers in this chapter correspond to those in Chapter 7 of Commonality.
This chapter adds SPARC64 V-specific information in the following sections:
Processor States, Normal and Special Traps on page 35
RED_state on page 36
error_state on page 36
Trap C a t e g o r i es on page 37
Deferred Traps on page 37
Reset Traps on page 37
Uses of the Trap Categories on page 37
Trap C o n t rol on page 38
PIL Control on page 38
Trap-Table Entry Addresses on page 38
Trap Typ e (TT) on page 38
Details of Supported Traps on page 39
Exception and Interrupt Descriptions on page 39

7.1 Processor States, Normal and Special Traps

Please refer to Section 7.1 of Commonality.
35

7.1.1 RED_state

R ED_state Trap Table
The RE D_st ate trap vector is located at an implementation-dependent address refe rre d t o as RSTVaddr. The value of RSTVaddr is a constant within each implementation; in SPARC64 V this virtual address is FFFF FFFF F000 0000 which translates to physical address 0000 07FF F000 0000 dep. #114).
RED_state Execution Environment
In RED_state, the processor is forced to execute in a restricted environment by overriding the values of some processor controls and state registers.
,
in RED_state (impl.
16
16
Note –
SPARC 64 V has the foll owing imp lementat ion-depen dent behav ior in RED_state (impl. dep. #115):
Note –
should attempt to recover f rom potentially catastrophic error condition s or to disa ble the failing c omponents . When RED_state i s entered after a reset, the software should create the environment necessary to restore the system to a running state.
The values are overridden, not set, allowing them to be switched atomically.
While in RED_state, all inte rnal ITLB-bas ed translati on functions are disabled. DTLB-based translations are disabled upon entry but may be reenabled by software while in RED_state. However, ASI-based access functions to the TLBs are still available.
While mTLBs and uTLBs are disabled, all accesses are assumed to be noncacheable and strongly ordered for data access.
XIR errors are not masked and can cause a trap.
When RED_state is entered because of component failures, the handler

7.1.2 error_state

The processor enters error_state when a trap occurs while the processor is already at its maximum su pported trap le vel (that is, when TL = MAXTL) (impl. dep. #39).
36 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Although the standard behavior of the CPU upon an entry into error_state is to internally generate a entry to error_state depending on a setting in the OPSR register (impl. dep #40, #254).
watchdog_reset

7.2 Trap Categories

Please refer to Section 7.2 of Commonality. An exception or interrupt request can cause any of the following trap types:
Precise trap
Deferred trap
Disrupting trap
Reset trap

7.2.2 Deferred Traps

Please refer to Section 7.2.2 of Commonality. SPARC64 V implements a deferred trap to signal certain error conditions (impl. dep.
#32). Please refer to the description of the instruction that caused the error row i n Instruction End-Method at ADE Trap on page 170.
(WDR), the CPU optionally stays halted upon an
I_UGE
error on Rel ation b etween %tpc and
TABLE P-2
(page 156) for details. See also

7.2.4 Reset Traps

Please refer to Section 7.2.4 of Commonality. In SPARC64 V, a watchdog reset (WDR) occurs when the processor has not
committed an instruction for 2
33
processor clocks.

7.2.5 Uses of the Trap Categories

Please refer to Section 7.2.5 of Commonality. All exceptions that occur as the result of program execution are precise in
SPARC64 V (impl. dep. #33). An exception caused after the initial access of a multiple-access load or store
instruction (LDD(A), STD(A), LDSTUB, CASA, CASXA, or SWAP) that causes a catastrophic exception is precise in SPARC64 V.
Release 1.0, 1 July 2002 F. Chapter 7 Traps 37

7.3 Trap Control

Please refer to Section 7.3 of Commonality.

7.3.1 PIL Control

SPARC64 V receives external interrupts from the UPA interconnect. They cause an
interrupt_vector_trap
information and then schedules SPARC V9-compatible interrupts by writing bits in the SOFTINT register. Please refer to Section 5.2.11 of Commonality for details.
During handling of SPARC V9-compatible int errupts by SPAR C64 V, the PIL register is checked. If an interrupt has sufficient priority, SPARC64 V will stop issuing new instructions, will flush all uncommitted instructions, and then will vector to the trap handler. The only exception to this process occurs when SPARC64 V is processing a higher-priority trap.
SPARC 64 V takes a normal disrup ting trap upon receip t of an interrupt reques t.
(TT =6016). The interrupt vector trap handler reads the interrupt

7.4 Trap-Table Entry Addresses

Please refer to Section 7.4 of Commonality.

7.4.2 Trap Type (TT)

Please refer to Section 7.4.2 of Commonality. SPARC64 V impl ements all mandatory SPARC V 9 and SPAR C JPS1 exceptions, as
described in Chapter 7 of Commonality, plus the exception listed in is specific to SPARC64 V (impl. dep. #35; impl. dep. #36).
TABLE 7-1
Exception or Interrupt Request TT Priority
async_data_error 040
38 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Exceptions Specific to
SPARC64 V
16
2
TABLE 7-1
, which

7.4.4 Details of Supported Traps

Please refer to Section 7.4.4 in Commonality.
SPARC64 V Implementation-Specific Traps
SPARC64 V supports the following implementation-specific trap type:
async_data_error

7.5 Trap Processing

Please refer to Section 7.5 of Commonality.

7.6 Exception and Interrupt Descriptions

Please refer to Section 7.6 of Commonality.

7.6.4 SPARC V9 Implementation-Dependent, Optional Traps That Are Mandatory in SPARC JPS1

Please refer to Section 7.6.4 of Commonality. SPARC64 V implements all six traps that are implementation dependent in SPARC
V9 but mandatory in JPSI (imp l. dep. #35). Se Section 7.6.4 of Commonality for details.

7.6.5 SPARC JPS1 Implementation-Dependent Traps

Please refer to Section 7.6.5 of Commonality. SPARC64 V implements the following traps that are implementation dependent
(impl. dep. #35).
async_data_error
SPARC64 V implements the errors.
Release 1.0, 1 July 2002 F. Chapter 7 Traps 39
[tt =04016] (Preemptive or disrupting) (impl. dep. #218)
async_data_error
exception to signal the following
Uncorrectable errors in the internal architecture registers (general registers–gr, floating-point registers–fr, ASR, ASI registers)
Uncorrectable errors in the core pipeline
System data corruption
Watc h dog timeo ut first ti me
TLB access error upon access by an ldxa or stxa ins tructi on
Multiple errors may be reported in a single generation of the exception. Depending on the situation, the
async_data_error
async_data_error
trap become s a precise trap, a disrupting trap, or a preemptiv e trap upon error detection. The TPC and TNPC stacked by the exception may indicate the exact instruction, the preceding instruction, or the subsequent instruction inducing the error. See Appendix P for details of the
async_data_error
exception in SPARC64 V.
40 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.CHAPTER
8

Memory Models

The SPARC V9 architecture is a model that specifies the behavior observable by software on SPARC V9 systems. Therefore, access to memory can be implemented in any manner, as long as the behavior observed by software conforms to that of the models described in Chapter 8 of Commonality and defined in Appendix D, Formal Specification of the Memory Models, also in Commonality.
The SPARC V9 architecture defines three different memory models: Tot a l S t o re Order (TSO), Partial Store Order (PSO), and Relaxed Memor y Order (RMO). All SPARC V9 processors must provide Total Store Order (or a more strongly ordered model, for example, Sequent ial Consisten cy) to ensure SPARC V8 compatibi lity.
Whether the PSO or RMO models are support ed by SPARC V9 systems is implementation dependent; SPARC64 V behaves in a manner that guarantees adherence to whichever memory model is currently in effect
.
This chapter describes the following major SPARC64 V-specific details of memory models.
SPARC V9 Memory Model on page 42
For general information, please see parallel subsections of Chapter 8 in Commonality. For easier referencing, this chapter follows the organization of Chapter 8 in Commonality, listing subsections whether or not there are implementation-specific details.
41

8.1 Overview

Note –
memory models as differentiated from the “SPARC V9 me mory model, which is the memory model the programmer selects in PSTATE.MM.
SPARC64 V supports only one mode of memory handling to guarantee correct operation under any of the three SPARC V9 memory ordering models (impl. dep. #113):
The words hardware memory modeldenote the underlying hardware
Total Store Order All loads are ordered with respect to loads, and all stores are ordered with respect to loads and stores. This behavior is a superset of the requirements for the SPARC V9 memory models TSO, PSO, and RMO. When PSTATE.MM selects TSO or PSO, SPARC64 V operates in this mode. Since programs written for PSO (or RMO) will always work if run under Total Store Order, this behavior is safe but does not take advantage of the reduced restrictions of PSO.

8.4 SPARC V9 Memory Model

Please refer to Section 8.4 of Commonality. In addition, this section describes SPARC64 V-specific details about the processor/
memory inte rface model.

8.4.5 Mode Control

SPARC64 V implements Total Store Ordering for all PSTATE.MM. Writing 112 into PSTATE.MM also causes the machine to use TSO (impl. dep. #119). However, the encoding 11 encoding for a new memory model.
should not be us ed, since fu ture version of SPARC64 V may use this
2

8.4.6 Synchronizing Instruction and Data Memory

All caches in a SPARC64 V-based system (uniprocessor or multiprocessor) have a unified cache consistency protocol and implement strong coherence between instruction and data caches. Writes to any data cache cause invalidations to the
42 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
corresponding locations in all instruction caches; references to any instruction cache cause corresponding modified data to be flushed and corresponding unmodified data to be invalidated from all data caches. The flush operation is still operative in SPARC64 V, however.
Since the FLUSH instruction synchronizes the processor, the total latency varies depending on the situation in SPARC64 V. Assuming all prior instructions are completed , the laten cy of FLUSH is 18 CPU cycles.
Release 1.0, 1 July 2002 F. Chapter 8 Memory Models 43
44 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
A

Instruction Definit ions: SPARC64 V Extensions

This appendix describes the SPARC64 V-specific implementation of the instructions in Appendix A of Commonality. If an instruction is not described in this appendix, then no SPARC64 V implementation-dependency applies.
TABLE A-1
See the instruction can be found.
Section numbers refer to the parallel section numbers in Appendix A of Commonality.
of Commonality for the location at which general information about
TABLE A-1
TABLE A-1
Operation Name Page V9 Ext?
FMADD(s,d) Floating-point multiply ad d page 50 FMSUB(s,d) Floating-point multiply s ubtract page 50 FNMADD(s,d) Floating-po int multipl y negate add page 50 FNMSUB(s,d) Floating-po int multiply negate subtract page 50
Each instruction definition consists of these parts:
1. A table of the opco des defined i n the subsect ion with the val ues of the fie ld(s)
2. An illustration of the applicable instruction format(s). In these illustrations a dash
3. A list of the sug gested assembl y language syntax, as de scribed in Appendix G,
lists four instructions that are unique to SPARC64 V.
Implementation-Specific Instructions
✓ ✓ ✓ ✓
that uniquely identify the instruction(s).
() indicat es that the f ield is reserved for future versions of the architecture and shall be 0 in any instance of the instruction. If a conforming SPARC V9 implementation encounters nonzero values in these fields, its behavior is undefined.
Assembly Language Syntax.
45
4. A description of the features, restrictions, and exception-causing conditions.
5. A list of exceptions that can occur as a consequence of attempting to execute the instruction(s). Exceptions due to an
instruction_access_exception, fast_instruction_access_MMU_miss, async_data_error ECC_error
, and interrupts are not listed because they can occur on any instruction.
instruction_access_error
,
,
Also, any instruction that is not implemented in hardware shall generate an
illegal_instruction
ftt =
unimplemented_FPop
illegal_instruction
The
exception (or
trap can occur during chip debug on any instruction that has
been programmed into the processor ’s IIU_INST_TRAP (ASI = 60
fp_exception_other
exception with
for floating-point instructions) when it is executed.
, VA = 0).
16
These traps are also not listed under each instruction. The following traps never occur in SPARC64 V:
instruction_access_MMU_miss
data_access_MMU_miss
data_access_protection
unimplemented_LDD
unimplemented_STD
LDQF_mem_address_not_aligned
STQF_mem_address_not_aligned
internal_processor_error
fp_exception_other
(ftt =
invalid_fp_register
)
This appendix does not include any timing information (in either cycles or clock time).
The following SPARC64 V-specific extensions are described.
Block Load and Store Instructions (VIS I) on page 47
Call and Link on page 49
Implementation-Dependent Instructions on page 49
Jump and Link on page 53
Load Quadword, Atomic [Physical] on page 54
Memory Barrier on page 55
Partial Store (VIS I) on page 57
Prefetch Data on page 57
Read State Register on page 58
SHUTDOWN (VIS I) on page 58
Write State Register on page 59
Deprecated Instructions on page 59
46 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002

A.4 Block Load and Store Instr uctions (VIS I)

The following notes summarize behavior of block load/store instructions in SPARC64 V.
1. Block load and store operations are not atomic, in that the y are internally decomposed into eight independent, 8-byte load/store operations in SPARC64 V. Each load/store is always issued and performed in the RMO memory model and obeys all prior MEMBAR and atomic instruction-imposed ordering constraints.
2. Block load/store instructions are out of the scope of V9 memory models, meaning that self-consistency of memory reference instruction is not always maintained if block load/store instructions are involved in the execution flow. The following table describes the implemented ordering constraints for block load/store instructions with re spect to the other memory reference instructions with an operand address conflict in SPARC64 V:
Program Order for conflicting bld/bst/ld/st
store blockstore Ordered store blockloa d Ordered load blockstore Ordered load blockload Ordered blockstore store Out-of-Order blockstore load Out-of-Order blockstore blockstore Out-of-Order blockstore blockload Out-of-Order blockload store Ordered blockload load Ordered blockload blockstore Ordered blockload blockload Ordered
Ordered/ Out-of-Orderfirst next
To ma inta in th e me mory o rdering eve n for the me mory address conf lic ts, MEMBAR instructions shall be inserted into appropriate location in the program.
Although self-consistency with respect to the block load/store and the other memory reference instructions is not maintained in some cases, register conflicts between the other instructions and block load/store instructions are maintained in SPARC64 V. The read-after-write, write-after-read, and write-after- write obstructions between a block load/store instruction and the other arithmetic instructions are detected and handled appropriately.
3. Block load instruction operate on the cache if the operand is present.
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 47
4. The block store with commit instruction always stores the operand in main storage and invalidates the line in the L1D cache if it is present. The invalidation is performed throug h an S_INV_REQ transaction through UPA by the system controller.
5. The block store instruction stores the operand into main storage if it is not present in the operand cache and the status of the line is invalid, shared, or owned. In case the line is not present in the L1D cache and is exclusive or modified on the L2 cache, the block store instruction modifies only the line in L2 cache. If the line is present in the operand cache and the status is either clean/shared or clean/ owned, the line is stored in main storage. If the line is present in the operand cache and the status is clean/exclusive, the line in the operand cache is invalidated and the operand is stored in the L2 cache. If the line is in the operand cache and the status is modified/modified, the operand is stored in the operand cache. The following table summarizes each cache status before block store and the results of the bl ock store. Blank ce lls mean tha t no action occ urred in the corresponding cache or memory, and the data, if it exists, is unchanged.
Storage Status
Cache status before bst
Action
L1 Invalid Valid L2 E, M I, S, O E M S, O L1 ——invalidate —— L2 update update update S Memory update ——update
Exceptions fp_disabled
PA_watchpoint VA _watchpoint illegal_instruction (misaligned rd) mem_address_not_aligned data_access_exception LDDF_mem_address_not_aligned data_access_error fast_data_access_MMU_miss fast_data_access_protection
48 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
(see Block Load and Store ASIs on page 120)
(see Block Load and Store ASIs on page 120)
(see Block Load and Store ASIs on page 120)

A.12 Call and Link

SPARC64 V cle ars the upper 32 bits of the PC value in r[15] when PSTATE.AM is set (impl. dep. #125). The value written into r[15] is visible to the instruction in the delay slot.
SPARC64 V has a special hardware table, called the return address stack, to predict the return address from a subroutin e. Though the return prediction stac k achieves better performance in normal cases, there is a special use of the CALL instruction (call.+8) that may have an undesirable effect on the return address stack. In this case, the CALL instruction is used to read the PC contents, not to call a subroutine. In SPARC64 V, the return address of the CALL (PC+8) is not stored in its return address stack, to avoid a detrimenta l performance effect. When a ret or retl is executed, th e value i n the return ad dress stack i s used to pred ict the return address.

A.24 Implementat ion-Dependent Inst ructio ns

Opcode op3 Operation
IMPDEP1 11 0110 Implementation-Dependent Instruction 1 IMPDEP2 11 0111 Implementation-Dependent Instruction 2
The IMPDEP1 and IMPDEP2 instructions are completely implementation dependent. Implementation-dependent aspects include their operation, the interpretation of bits 29–25 and 18–0 in their encodings, and which (if any) exceptions they may cause.
SPARC64 V uses IMPDEP1 to encode VIS instructions (impl. dep. #106). SPARC64 V uses IMPDEP2B to encode the Floating-Point Multiply Add/Subtract
instructions (impl. dep. #106). See Section A.24.1, Floating-Point Multiply-Add/ Subtract, on page 50 for details.
See I.1.2, Implementation-Dependent and Reserved Opcodes, in Commonality for information about extending the SPARC V9 instruction set by means of the implement ation-d ependent i nstruction s.
Compatibility Note –
SPARC V 8.
Exceptions
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 49
implementation-dependent (IMPDEP2)
These instructions replace the CPopn instructions in

A.24.1 Floating-Point Multiply-Add/Subtract

SPARC64 V uses IMPDEP2B opcode space to encode the Floating-Point Multiply Add/Subtract instructions.
Opcode Variation Size Operation
FMADDs 00 01 Multiply-A dd Single FMADDd 00 10 Multiply-A dd Double FMSUBs 01 01 Multiply-S ubtract Single FMSUBd 01 10 Multiply-S ubtract Double FNMADDs 11 01 Negative Multiply-Add Single FNMADDd 11 10 Negative Multiply-Add Double FNMSUBs 10 01 Negative Multiply-Subtract Single FNMSUBd 10 10 Negative Multiply-Subtract Double
11 is reserved for quad.
Format (5)
10 110111 rs2rd
31 1824 02530 29 19
Operation Implementation
Multiply-Add Multiply-Sub trac t Negative Multip ly-Subtract Negative Multip le-Add
Assembly Language Syntax
fmadds freg fmaddd freg fmsubs freg fmsubd freg fnmadds freg fnmaddd freg fnmsubs freg fnmsubd freg
rs1 rs1 rs1 rs1 rs1 rs1 rs1 rs1
, freg , freg , freg , freg , freg , freg , freg , freg
rs2 rs2 rs2 rs2 rs2 rs2 rs2 rs2
, freg , freg , freg , freg , freg , freg , freg , freg
rd rs1 rd rs1
− (
rd
− (
rd
, freg
rs3
, freg
rs3
, freg
rs3
, freg
rs3
, freg
rs3
, freg
rs3
, freg
rs3
, freg
rs3
×
rs2+rs3
×
rs2−rs3 rs1×rs2−rs3 rs1×rs2+rs3
rd rd rd rd rd rd rd rd
sizevarrs3rs1
4567891314
) )
50 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Description
The Floating-point Multiply-Add instructions multiply the registers specified by the rs1 field times the registers sp ecified by the rs2 field, add that product to the registers specifi ed by the rs3 field, then write the result into the registers specified by the rd field.
The Floating-point Multiply-Subtract instructions multiply the registers specified by the rs1 field times the registers specified by the rs2 field, subtract from that product the registers specifi ed by the rs3 field, and then write the result into the registers specifi ed by the rd field.
The Floating-point Negative Multiply-Add instructions multiply the registers specified by the rs1 field times the registers specified by the rs2 field, negate the product, subtract from that negated value the registers specified by the rs3 field, and then write the result into the registers specified by the rd field.
The Floating-point Negative Multiply-Subtract instructions multiply the registers specified by the rs1 field times the registers specified by the rs2 field, negate the product, add that negated product to the registers specified by the rs3 field, and then write the result into the registers specified by the rd field.
All of the operations above are treated as separate multiply and add/subtract operations in SPARC64 V. That is, a multiply operation is first performed with a complete rounding step (as if it were a single multiply operation), and then an add/ subtract operation is performed with a complete rounding step (as if it were a single add/subtract operation). Consequently, at most two rounding errors can be incurred.
1
Special behaviors in handling traps are generated in a Floating-point Multiply-Add/ Subtract instruction in SPARC64 V because of its implementation characteristics. If any trapping exception is detected in the multiply part in the process of a Floating­point Multiply-Add/Subtract instruction, the execution of the instruction is aborted, the exception condition is recorded in FSR.cexc and FSR.a exc, and the CPU tra ps with the exception condition. The add/subtract part of the instruction is only performed when the multiply-part of the instruction does not have any trapping exceptions.
As described in the
TABLE A-2
, if there are trapping IEEE754 exception conditions in either of the ope rati ons FMUL or FADD/SUB, only the trapping exception condition is reco rded i n the cexc, and the aex c is not modified. If there are no trapping IEEE754 exception conditions, every nontrapping exception condition is ORed into the cexc and the cexc is accumu lated into th e aexc. The boundary conditions of an
unfinished_FPop
trap for Floating-point Multiply-Add/Subtract instructions are
exactly same as for FMUL and FADD/SUB instructions; if either of the operations
1. Note that this implementation differs from previous SPARC64 implementations, which incurred at most one
rounding error.
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 51
detects any conditions for an
unfinished_FPop
Subtract instruction generates the cexc, or aexc are modified.
trap, the Floating-point Multiply-Add/
unfinished_FPop
exception. In this case, none of rd,
TABLE A-2
FMUL FADD/SUB cexc
aexc
Exceptions in Floating-Point Multiply-Add/Subtract Instructions
IEEE754 trap No trap No trap IEEE754 trap No trap Exception condition of FMUL Exc eption condition of FADD Logical or of the nontrapping exception
conditions of FMUL and FADD/SUB
No change No change Logical OR of the cexc (above) and the
aexc
Detailed contents of cexc and aexc depending on the various conditions are described in
TABLE A-3
and
TABLE A-4
. The following terminology is used: uf, of, inv, and nx are nontrapping IEEE exception conditionsunderflow, overflow, invalid operation, and inexact, respectively.
TABLE A-3
FMUL
Non-Trapping cexc When
none nx of nx inv none none nx of nx inv nx nx nx of nx inv nx of nx of nx of nx of nx inv of nx uf nx uf nx inv in v ——inv
uf nx uf of nx uf inv nx
FSR.NS
FADD
=0
TABLE A-4
FMUL
Non-Trapping aexc When
none nx of nx uf nx inv none none nx of nx uf nx inv nx nx nx of nx uf nx inv nx of nx of nx of nx of nx inv of nx uf nx uf nx —— — uf inv nx inv inv —— — inv
FSR.NS
FADD
=1
In the tables, the conditions in the shaded columns are all reported as an
unfinished_FPop
trap by SPARC64 V. In addition, the conditions with —” do not
exist.
52 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Programming Note –
SPARC V 9 IMPDEP2 opcode space, and they are specifi c to the SPARC64 V implementation. They cannot be used in any programs that will be executed on any other SPARC V9 processor, unless that implemen tation exactly matches the SPARC64 V use for the IMPDEP2 opcode.
The Multiply Add/Subtract instructions are encoded in the
Exceptions
fp_disabled fp_exception_ieee_754 illegal_instruction fp_exception_other (unfinished_FPop
(NV, NX, OF, UF)
(size = 002 or 112) (

A.29 Jump and Link

SPARC64 V clears the upper 32 bits of the PC value in r[rd] when PSTATE.AM is set (impl. dep. #125). The value written into r[rd ] is visible to the instruction in the delay slot.
If either of the low-order two bits of the jump address is nonzero, a
mem_address_not_aligned
causes a If the JMPL instruction has r[rd] = 15, SPARC64 V stores PC + 8 in a hardware table
called return address stack (RAS). When a ret (jmpl %i7+8, %g0) or retl (jmpl
%o7+8, %g0) is executed, the value in the RAS is used to predict the return address. JMPL with rd = 0 can be used to return from a subroutine. The typical return
address is r[31] + 8 if a nonleaf routine (one that uses the SAVE instruction) is entered by a CALL instruction, or r[15] + 8 if a leaf routine (one that does not use the SAVE instruction) is entered by a CALL instruction or by a JMPL instruction with rd = 15.
mem_address_not_aligned
exception occurs. However, when the JMPL instruction
fp_disabled
)
trap, DSFSR and DSFAR are not updated.
is not checked for these encodings)
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 53

A.30 Load Quadwor d, Atomic [Physical]

The Load Quadword ASIs in this section are specific to SPARC64 V, as an extension to SPARC JPS1.
opcode imm_asi ASI value operation
LDDA ASI_QUAD_LDD_PHYS 34
LDDA ASI_QUAD_LDD_PHYS_L 3C
Format (3) LDDA
16
16
128-bit atomic load, physically addressed
128-bit atomic load, little-endian, physically addressed
rd11 010011 imm_asirs1 rs2
rd11 010011 simm_13rs1
31 24 02530 29 19 18 14 13 5 4
Assembly Language Syntax
ldda [reg_addr] imm_asi, reg
%asi
Description
ldda [reg_plus_imm]
ASIs 3416 and 3C16 are used with the LDDA instruction to atomically read a 128-bit
rd
, reg
i=0
i=1
rd
data item, using physical addressing. The data are placed in an even/odd pair of 64­bit registers. The lowest-address 64 bits are placed in the even-numbered register; the highest-address 64 bits are placed in the odd-numbered register. The reference is made from the nucleus context.
In addition to the usual traps for LDDA using a privileged ASI, a
data_access_exception
exception occurs for a noncacheable access or for the use of the
quadword-load ASIs with any instruction other than LDDA. A
mem_address_not_aligned
exception is generated if the access is not aligned on a 16-
byte boundary. ASIs 34
Quadword Atomic for virtually addressed data (ASIs 24
and 3C16 are supported in SPARC 64 V in addition to those for Load
16
and 2C16).
16
The memory access for a load quad instructi on with ASI_QUAD_LDD_PH YS{_L} behaves as if the following TTE is set:
54 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TTE.NFO = 0
TTE.CP = 1
TTE.CV = 0
TTE.E = 0
TTE.P = 1
TTE.W = 0
Note –
TTE.IE depends on the endianness of the ASI. When the ASI is 034
TTE.IE =0; TTE.IE = 1 when the ASI i s 03C
Therefore, the atomic quad load physical instruction can only be applied to a cacheable memory area. Semantically, ASI_QUAD_LDD_PHYS{_L} (034
) is a combination of ASI_NUCLEUS_QUAD_LDD and ASI_PHYS_USE_EC.
03C
16
With respect to little endian memory, a Load Quadword Atomic instruction behaves as if it comprises two 64-bit loads, each of which is byte-swapped independently before being written into its respective destination register.
Exceptions: pri vileged_action
PA_watchpoint illegal_instruction mem_address_not_aligned data_access_exception data_access_error fast_data_access_MMU_miss fast_data_access_protection
(recognized on only the first 8 bytes of a transfer)
(misaligned rd)

A.35 Memory Barrier

,
16
16
and
.
16
Format (3)
10 0 op3 0 1111
31 141924 18 13 12 02530 29
Assembly Language Syntax
membar membar_mask
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 55
i=1
cmask
6
7
mmask
43
Description
The memory barrier instruction, MEMBAR, has two complementary functions: to express order const rain ts bet wee n memo ry refe rences an d to provi de exp lici t con trol of memory-reference completion. The membar_mask field in the suggested assembly language is the concatenation of the cmask and mmask instruc tion fiel ds.
The mmask field is encoded in bits 3 through 0 of the instruction.
TABLE A-5
specifies the order constraint that each bit of mmask (selected when set to 1) imposes on mem or y re fe ren ce s a pp ea ri ng be fo re a n d a fte r th e MEMBAR. From zero to four mask bits can be selected in the mmask field.
TABLE A-5
Mask Bit Name Description
mmask<3> #StoreStore The effects of all stores appearing before the MEMBAR instruction must be
mmask<2> #LoadStore All loads appearing before the MEMBAR ins truction mus t have been perf ormed
mmask<1> #StoreLoad The effects of all stores appearing before the MEMBAR instruction must be
mmask<0> #LoadLoad All loads appearing before the MEMBAR instruction m ust have been pe rformed
Order Constraints Imposed by mmask Bits
visible to all processors before the e ffect of any st ores follow ing the MEMBA R. Equivalent to the deprecated STBAR instruction. Has no effect on SPARC64V since all stores are performe d in program order.
before the effects of any stores following the MEMBAR are visibl e to any other processor. Has no effect on SPARC64 V since all stores are performed in program order and must occur after pe rformance of any loa d.
visible to all processor s before loads follo wing the M EMBAR may be performed.
before any loads following the MEMBAR may be performed. Has no effect on SPARC64 V since all loads are performed after any prior loads.
The cm ask field is encoded in bits 6 through 4 of the instruction. Bits in the cmask field, described in
TABLE A-6
, specify additional constraints on the order of memory references and the processing of instructions. If cmask is zero, then MEMBAR enforces the partial ordering specified by the mmask field; if cmask is nonzero, then completion and partial order constraints are appl ied.
TABLE A-6
Mask Bit Function Name Description
cmask<2> Synchronization
cmask<1> Memory issue
cmask<0> Lookaside
56 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Bits in the cmask Field
barrier
barrier
barrier
#Sync All operations (including nonmemory reference operations)
appearing before the MEMBAR must have been performed, and
the effects of any exceptions become visible before any
instruction after the MEMBAR may be initiated.
#MemIssue All mem o ry reference op era tions appe arin g b efore the MEMBAR
must have been performed before any memory operation after
the MEMBAR may b e initiated . Equivalen t to #Sync in
SPARC64 V.
#Lookaside A store appearing before the MEMBAR must complete before
any load following the MEMBAR ref ere nci ng t he sa me a dd ress
can be initiat ed. Equiva lent to #Sync in SPARC64 V.

A.42 Partial Stor e (VIS I)

Please refer A.42 in Commonality for general details. Watchpoint exceptions on partial store instructions occur conservatively on
SPARC6 4 V. The DCUCR Data Watchpoint masks are only checked for nonzero value (watchpoint enabl ed). The byte store mask (r[r s2] ) in the partial store instruction is ignored, and a watchpoint exception can occur even if the mask is zero (that is, no store will take place) (impl. dep. #249).
For a partial store instruction with mask = 0, SPARC64 V still issues a UPA transaction w ith zero-byte mask.
Exceptions: fp_disabled
PA_watchpoint VA _watchpoint illegal_instruction mem_address_not_aligned data_access_exception LDDF_mem_address_not_aligned data_access_error fast_data_access_MMU_miss fast_data_access_protection
(misaligned rd)
(see Partial Store ASIs on page 120)
(see Partial Store ASIs on page 120)
(see Partial Store ASIs on page 120)

A.49 Pr efetch Data

Please refer to Section A.49, Prefetch Data, of The p refet cha instruction of SPARC64 V works for the following ASIs.
ASI_PRIMARY (080
ASI_SECONDARY (081
ASI_NUCLEUS (04
ASI_PRIMARY_AS_IF_USER (010
)
(018
16
ASI_SECONDARY_AS_IF_USER (011
)
( 019
16
), ASI_PRIMARY_LITTLE (08816)
16
), ASI_SECONDARY_LITTLE (08916)
16
), ASI_NUCLEUS_LITTLE (0C16)
16
16
If an ASI other than the above is specified, prefetcha is executed as a nop.
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 57
Commonality for principal informatio n.
), ASI_PRIMARY_AS_IF_USER_LITTLE
), ASI_SECONDARY_AS_IF_USER_LITTLE
16
TABLE A-7
describes prefetch variants implemented in SPARC64 V.
TABLE A-7 fcn F etch to: Status Description
0L1D S 1L2 S 2L1D M 3L2 M 4 ——NOP 5-15 reserved (SPA RC V9) 16-19 implementation
20 L1D S If an access causes an mTLB miss,
21 L2 S If an access causes an mTLB miss,
22 L1D M If an access causes an mTLB miss,
23 L2 M If an access causes an mTLB miss,
24-31 implementation
Prefetch Variants
dependent.
dependent
illegal_in struction
NOP
fast_data_access_MMU_miss
fast_data_access_MMU_miss
fast_data_access_MMU_miss
fast_data_access_MMU_miss
NOP
exce ption is signall ed.
e xception is s ignalled.
e xception is s ignalled.
exception is sign alled.
e xception is s ignalled.

A.51 Read State Register

In SPARC 64 V, an RDPCR instruction will generate a
PSTATE.PRIV =0 and PCR.PRIV =1. If PSTATE.PRIV =0 and PCR.PRIV =0, RDPCR will not cause any access privilege violation exception (impl. dep. #250).
privileged_action

A.70 SHUTDOWN (VIS I)

In SPAR C64 V, SHUTDOWN acts as a NOP in privileged mode (impl. dep. #206).
58 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
exception if

A.70 Write State Register

In SPARC 64 V, a WRPCR instruction will cause a
PSTATE.PRIV =0 and PCR.PRIV =1. If PSTATE.PRIV =0 and PCR.PRIV =0, WRPCR causes a (that is, write 1 to) PCR.PRIV (impl. dep. #250).
privileged_action
exception on ly when an atte mpt is made to change

A.71 Deprecated Instructions

The deprecated instructions in A.71 of Commonality are provid ed only for compatibility with previous versions of the architecture. They should not be used in new software.

A.71.10 Store Barrier

In SPAR C64 V, STBAR behaves as NOP since the hardware memory models always enforce the semantics of these MEMBARs for al l memory accesses.
privileged_action
exception if
Release 1.0, 1 July 2002 F. Chapter A Instruction Definitions: SPARC64 V Extensions 59
60 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
B

IEEE Std 754-1985 Requirements for SPARC V9

The IEEE Std 754-1985 floating-point standard contains a number of implementation dependencies.
Please see Appendix B of Commonality for choices for these implementation dependencies, to ensure that SPARC V9 implementations are as consistent as possible.
Following is information specific to the SPARC64 V implementation of SPARC V9 in these sections:
Traps Inhibiting Results on page 61
Floating-Point Nonstandard Mode on page 61

B.1 Traps Inhibiting Results

Please refer to Sect ion B.1 of Commonality. The SPARC64 V hardware, in conjunction with kernel or emulation code, produces
the results described in this sec tion.

B.6 Floating-Point Nonstandard Mode

In this section, the hardware boundary conditions for the and the nonstandard mode of SPARC64 V floating-point hardware are discussed.
unfinished_FPop
exception
61
SPARC64 V floating-point hardware has its specific range of computation. If either the values of in put operands or the value of the intermedi ate result shows that the computation may not fall in the range that hardware provides, SPARC64 V generates
fp_exception_other
an and the operation i s taken over by software.
The kernel emulation routine completes the remaining floating-point operation in accordance with the IEEE 754-1985 floating-point standard (impl. dep. #3).
SPARC64 V implements a nonstandard mode, enabled when FSR.NS is set (see FSR_nonstandard_fp (NS) on page 18). Depending on the setting in FSR.NS, the behavior of SPARC64 V with respect to the floating-point computation varies.
exception (tt = 02216) with FSR.ftt =02
unfinished_FPop
(
16
)
B.6.1
fp_exception_other
SPARC64 V m ay in vo ke a n
unfinished_FPop
FsMULd(s,d), FMUL (s,d), FDIV(s,d), FSQRT(s,d) floating-point instructions. In addition, Floating-point Multiply-Add/Subtract instructions generate the exception, since the instruction is the combination of a multiply and an add/subtract operation: FMADD(s,d), FMSUB(s,d), FNMADD(s,d), and FNMAD D(s,d).
The following basic policies govern the detection of boundary conditions:
1. When one of the operands is a denormalized number and the other operand is a
normal non-zero floating-point number (except for a NaN or an infinity), an
fp_exception_other
the result is a zero or an overflow are excluded.
2. When both operands are denormalized numbers, except for the cases in which the
result is a zero or an overflow, an is signalled.
3. When both operands are normal, the result before rounding is a denormalized
number and TEM.UFM =0, and is signalled, except for the cases in which the result is a zero.
When the result is expected to be a constant, such as an exact zero or an infinity, and an insignificant computation will furnish the result, SPARC64 V tries to calculate the result without signalling an
(ftt = 0216) in FsTOd, FdTOs , FADD(s,d), FSUB(s,d),
with
Exception (ftt=
fp_exception_other
unfinished_FPop
fp_exception_other
fp_exception_other
unfinished_FPop
(tt = 02216) exception with FSR.ftt =
condition is signalled. The cases in which
exception .
unfinished_FPop
unfinished_FPop
with
unfinished_FPop
with
)
condition
condition
62 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
Implementation Note –
Detecting the exact boundary conditions requires a large amount of hardware. SPARC64 V detects approximate boundary conditions by calculating the exponent intermediate result (the exponent before rounding) from input operands, to avoid the hardware cost. Since the computation of the boundary conditions is approximate, the detection of a zero result or an overflow result shall be pessimistic. SPARC64 V generates an
unfinished_FPop
exception pessimistically.
The equations to calculate the result exponent to detect the boundary conditions from the input exponents are presented in
TABLE B-1
, where Er is the approximation of the biased result exponent before rounding and is calculated only from the input exponents (esrc1, esrc2). Er is to be used for detecting the boundary condition for an
unfinished_FPop
.
TABLE B-1
Result Exponent Approximation for Detecti ng Conditions
Operation For mula
fmuls Er = esrc1 + esrc2 126 fmuld Er = esrc1 + esrc2 1022 fdivs Er = esrc1 - esrc2 + 126 fdivd Er = esrc1 - esrc2 + 1022
esrc1 and esrc2 are the biased exponents of the input operands. When the corresponding input operand is a denormalized number, the value is 0.
From Er, eres is calculated. eres is a bias ed result ex ponen t, af ter man tissa alig nmen t and before roundi ng, where the appropriate adjustment of the expon ent is applied to the result mantissa: left-shifting or right-shifting the mantissa to the implicit 1 at the left of the binary point, subtracting or adding the shift-amount to the exponent. The result mantissa is assume d to be 1.xxxx in calcu lating eres. If the result is a denormalized number, eres is less than zero.
TABLE B-2
generates an
TABLE B-2
Operation Boundary Conditions
FdTOs 25 < eres < 1 and TEM.U FM = 0. FsTOd Seco nd ope rand (rs2) is a denormalized number. FADDs, FSUBs,
FADDd, FSUBd
unfinished_FPop
describes the boundary condition of each floating-point instruction that
unfinished_FPop
exception.
Boundary Conditions
1. One of the operands is a denormalized number, and the other operand is a normal, nonzero floating-point number (except for a NaN and an infinity)
2. Both oper ands are denormalized numbers.
3. Both ope rands are normal nonzero floating-point numbers (except for a NaN and an infinity), eres < 1, and TEM.UFM = 0.
unfinished_FPop
1
.
Boundary
Release 1.0, 1 July 2002 F. Chapter B IEEE Std 754-1985 Requirements for SPARC V9 63
TABLE B-2
Operation Boundary Conditions
FMULs, FMULd 1. One of the operands is a denormalized number, the other operand is a normal,
FsMULd 1. One of the operands is a denormalized number, and the other operand is a normal,
FDIVs, FDIVd 1. The dividend (operand1; rs1) is a normal, nonzero floating-point number (except
FSQRTs, FSQRTd The input o perand (operand2; rs2) is a positive nonzero and is a denormalized
unfinished_FPop
1. Operation of 0 and denormalized number generates a result in accordance with the IEEE754-1985 standard.
Boundary Conditions (Continued)
nonzero floating-point number (except for a NaN and an infinity), and
single precision: -25 < Er double precision: -54 < Er
2. Both operands are normal, nonzero floating-point numbers (except for a NaN and an infinity), TEM.UFM = 0, and
single precision: 25 < eres < 1 double precision: 54 < eres < 1
nonzero floating-point number (except for a NaN and an infinity).
2. Both oper ands are denormalized numbers.
for a NaN and an infinity), the divisor (operand2; rs2) is a denormalized number, and
single precision: E r < 255 double precision: Er < 2047
2. The dividend (operand1; rs1) is a denormalized number, the divisor (operand2; rs2) is a normal, nonzero floating-point number (except for a NaN and an infinity), and
single precision: 25 < Er double precision: 54 < Er
3. Both oper ands are denormalized numbers.
4. Both operands are normal, nonzero floating-point numbers (except for a NaN and an infinity), TEM.UFM = 0 and
single precision: 25 < eres < 1 double precision: 54 < eres < 1
number.
Pessimistic Zero
If a condition in zero, meaning that the result is a denormalized minimum or a zero, depending on the rounding mode (FSR.RD).
64 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TABLE B-3
is true, SPARC64 V generates the result as a pessimistic
TABLE B-3
Operations
FdTOs always eres -25 FMULs,
FMULd FDIVs,
FDIVd
Conditions fo r a Pessimis tic Zero
Conditions
1
One operand is denormalized
single precision: Er ≤−25 double precision: Er ≤−54
single precision: Er ≤−25 double precision: Er ≤−54
1. Both operands are non-zero, non-NaN, and non-infinity numbers.
2. Both may be zero, but both are non-NaN and non-infinity numbers.
Both are denormalized Both are normal fp-number
Always single precision: eres ≤−25
Never single precision: eres ≤−25
double precision: eres ≤−54
double precision: eres ≤−54
Pessimistic Overflow
2
If a condition in
TABLE B-4
is true, SPARC64 V regards the operation as having an
overflow condition.
TABLE B-4
Operations Conditions
FDIVs The divisor (operand2; rs2) is a denormalized number and, Er 255. FDIVd The divisor (operand2; rs2) is a denormalized number and, E 2047.
Pessimistic Overflow Conditions

B.6.2 Operation Under FSR.NS = 1

When FSR.NS = 1 (nonstandard mode), SPARC64 V zeroes all the input denormalized operands before the operatio n and signals an inexact except ion if enabled. If the operation generates a denormalized result, SPARC64 V zeroes the result and also signals an inexact exception if enabled. The following list defines the operation in detail.
If either operand is a denormalized num ber and both operands are non-zero, non­NaN, and non-infinity numbers, the input denormalized operand is replaced with a zero with same sign, and the operation i s performed. If enabled, inexac t exceptio n is signa lled; a n nxc=1 in FSR. cexc (FSR.ftt=01 operation is FDIV(s,d) and either a condition is detected, or if the operation is FSQRT(s,d) and an condition is detected, the inexact condition is not reported.
If the result before rounding is a denormalized number, the result is flushed to a zero with a same sign and signals either an underflow exception or an inexact exception, depending on FSR.TEM.
fp_exception_ieee_754
IEEE754_exception
;
16
division_by_zero
(tt = 021
or an
) is generated, with
16
). However, if the
invalid_operation
invalid_operation
As observed from the preceding, when FSR.NS = 1, SPARC64 V generates neither
unfinished_FPop
an
Release 1.0, 1 July 2002 F. Chapter B IEEE Std 754-1985 Requirements for SPARC V9 65
exception nor a denormalized number as a result.
TABLE B-5
summarizes the behavior of SPARC64 V floating-point hardware depending on FSR.NS.
TABLE B-5
FSR.NSDenorm :
Norm
1
No Yes
0
Yes n/a
No Yes
1
Yes TABLE B-6
Note –
The result and behavior of SPARC64 V of the shaded column in the tables
Table B-5 and Table B-6 conform to IEEE754-1985 standard.
Note –
Throughout Table B-5 and Table B-6, lowercase exception conditions such as nx, uf, of, dv and nv are nontrapping IEEE 754 exceptions. Uppercase exception conditions such as NX, UF, OF, DZ and NV are trapping IEEE 754 exceptions.
Floating-Point Ex ceptional Con ditions and Res ults
Result
Denorm
Pessimistic
2
Zero
Yes
Pessimistic
Overflow UFM OFM NXM Resu lt
1 ——
UF
0 1 NX
0
uf + nx, a signed zero, or a signed Dmin
3
No 1 ——UF
0 ——
unfinished_FPop
4
No ———Conforms to IEEE754-1985
Yes
1 01
UF
NX
0 uf + nx, a signed zero, or a signed
Dmin
No Yes
No ——
1 ——
1 01
0
OF NX of + nx, a signed infinity, or a
signed Nmax
unfinished_FPop
5
UF
0 1NX
——
No ———
1. One o f the operands is a denormalized number, and the other operand is a normal or a denormalized number
(non- zero, non-NaN, and non-infinity).
2. The result before rounding turns out to be a denormalized number.
3. Dmin = denormalized minimum.
4. If the FPop is either
not generate an unfinished_FPop and generates a result according to IEEE754-1985 standard.
5. Nmax = normalized maximum.
FADD{s,d
}, or
FSUB{s,d
} and the operation is 0 ± denormalized number, SPARC64 V does
0 uf + nx, a signed zero
Conforms to IEEE754-1985
66 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TABLE B-6
describes how SPARC64 V behaves when FSR.NS = 1 (nonstandard mode).
TABLE B-6
Operations o p1= denorm
Nonarithmetic Operations Un der FSR.NS = 1
op2= denorm UFM NXM DVM NVM Result
FsTOd Ye s 1 ——NX
0 ——nx, a signed zero
FdTOs Ye s 1 ———
01——NX
0 ——uf + nx, a signed zero
FADDs, FSUBs, FADDd, FSUBd
Yes No
No Yes 1 ——NX
1 ——NX 0 ——nx, op2
0 ——nx, op1
Yes Yes 1 ——NX
0 ——nx, a signed zero
FMULs, FMULd, FsMULd
Yes
Yes 1 ——NX
1 ——NX 0 ——nx, a signed zero
0 nx, a signed zero
FDIVs, FDIVd
Yes No
1 ——NX 0 ——nx, a signed zero
No Yes 1 DZ
0 dz, a signed infinity
Yes Yes ——1NV
——0nv, dNaN
FSQRTs, FSQRTd
1. A sing le precision dNaN is 7FFF.FFFF
Y es and op2 > 0
Y es and op2 < 0
1 ——NX 0 ——nx, zero
——1 ——0 nv, dNaN
and a double precision dNaN is 7FFF.FFFF.FFFF.FFFF16.
16,
UF
1
NV
Release 1.0, 1 July 2002 F. Chapter B IEEE Std 754-1985 Requirements for SPARC V9 67
68 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
C

Implementation Dependencies

This appendix summarizes implementation dependencies. In SPARC V9 and SPARC JPS1, the no tation IMPL. DEP. #nn:” identifies the definition of an implementation dependency; the notation “(impl. dep. #nn)” identifies a reference to an implementat ion depend ency. Th ese dependen cies are describ ed by their n umber nn
TABLE C-1
in document for SPARC64 V modified to include descriptions of the manner in which SPARC64 V each implem entatio n depen dency.
on page 70. These numbers have been removed from the body of this
to make th e docume nt more readab le.
TABLE C-1
has resolved
has been
Note –
Current SPARC-V9-based Products, Revision 9.x, that describes the implementa tion­dependent design features of all SPARC V9-compliant implementations. Contact SPARC International for this document at
SPARC International maintains a document, Implementation Characteristics of
home page: www.sparc.org email: info@sparc.org
C.1 Definition of an Implementation
Dependency
Please refer to Sect ion C.1 of Commonality.
69

C.2 Hardware Characteristics

Please refer to Sect ion C.2 of Commonality.

C.3 Implementation Dependency Categories

Please refer to Sect ion C.3 of Commonality.

C.4 List of Implementation Dependencies

TABLE C-1
treated in the SPARC64 V
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
1 Software emulation of instructions
2 Number of IU registers
3 Incorrect IEEE Std 754-1985 results
4–5 Reserved. 6 I/O registers privileged status
7 I/O register definitions
8 RDASR/WRASR target registers
provides a complete list of how each implementation dependency is
implementation.
SPARC64 V Implementation Dependencies (1 of 11)
The operating system emulates all instructions that generate
illegal_in struction
SPARC64 V
SPARC64 V supports an additional two global register sets (Interrupt
globals and MMU globals) for a total of 160 integer registers.
See Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.
This dependency is beyond the scope of t his publicati on. It should be
defined in each system that uses
This dependency is beyond the scope of t his publicati on. It should be
defined in each system that uses
See A.50 and A.70 in Commonality for details of implementation-dependent
RDASR/WRASR instruct ions.
unimplemented_FPop
or
supports eight register windows (NWINDOWS =8).
exceptions.
SPARC64 V
SPARC64 V
.
.
62
70 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
9 RDASR/WRASR privileged status
10–12 Reserved. 13 VER.impl
14–15 Reserved. 16 IU deferred-trap queue
17 Reserved. 18 Nonstandard IEEE 754-1985 results
19 FPU version, FSR.ver
20–21 Reserved. 22 FPU TEM, ce xc, and aexc
23 Floating-point traps
24 FPU deferred-trap queue (FQ)
25 RDPR of FQ with nonexistent FQ
26–28 Reserved. 29 Address space identifier (ASI) defi nitions
30 ASI address decoding
31 Catastrophic error exceptions
SPARC64 V Implementation Dependencies (2 of 11)
See A.50 and A.70 in Commonality for details of implementation-dependent
RDASR/WRASR instruct ions.
VER.impl =5 for the
SPARC64 V
SPARC64 V
FSR.NS = 1. For the treatment of denormalized numbers, please refer to Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.
FSR.ver =0 for
SPARC64 V
hardware.
In
SPARC64 V
SPARC64 V
Attempting to execute an RDPR of the FQ causes an exception.
The ASIs that are supported by Address Space Identifiers.
SPARC64 V
SPARC64 V
has been committed for a specified number of cycles. If the timer times out, the CPU tries to invoke an count to reach 2 error_state, the processor optionally generates a WDR reset to recover from error_s tate .
neither has nor needs an IU deferred-trap queue.
flushes denormal operands and results to zero when
implements a ll bits in the TEM, cexc, and aexc fields in
neither has nor needs a floating-point deferred-trap queue.
supports all o f the listed ASIs.
contains a watchdog timer that times out after no instruction
SPARC64 V
SPARC64 V
floating-point traps are always precise; no FQ is needed.
33
, the processor enters error_state. Upon an entry to
processor.
.
SPARC64 V
async_data_error
illegal_ins tructio n
are defined in Appendix L,
trap. If the counter continues to
20
24
18, 62
18
19
24
24
24
117
138
Release 1.0, 1 July 2002 F. Chapter C Implementation Dependencies 71
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
32 Deferred traps
SPARC64 V Implementation Dependencies (3 of 11)
37, 149
SPARC64 V signals a deferred trap in a few of its severe error conditions.
SPARC64 V does not contain a deferred trap queue.
33 Trap precision
There are no de ferred traps in
SPARC64 V
other than the trap caused by a
37
few sev ere e rro r co nd it ion s. A ll tra ps tha t oc cu r as th e res ul t of prog ra m
execution are precise.
34 Interrupt clearing
For details of in terrupt han dling see A ppendix N, Interrupt Han dling.
35 Implementation-dependent traps
SPARC64 V
supports the following traps that are implementation
39, 39
dependent:
interrupt_vector_trap
PA_watchpoint
VA_watchpoint
ECC_error
fast_instruction_access_MMU_miss
fast_data_access_MMU_miss
fast_data_access_protection
async_data_error
(tt = 06316)
36 Trap priorities
SPARC64 V
s implementation-dependent traps have the following
(tt = 06016) (tt = 06116) (tt = 06216)
(tt = 06416 through 06716)
(tt = 06816 through 06B16)
(tt = 06C16 through 06F16)
(tt =04016)
38
priorities:
interrupt_vector_trap
PA_watchpoint
VA_watchpoint
ECC_error
fast_instruction_access_MMU_miss
fast_data_access_MMU_miss
fast_data_access_protection
async_data_error
(priority = 33)
37 Reset trap
SPARC64 V
implements power-on reset (POR) and watchdog reset.
38 Effect of reset trap on implementation-dependent registers
(priority = 16) (priority =1 2) (priority =1)
(priority = 2)
(priority = 1 2)
(priority = 12)
(priority = 2)
37
141
See Section O.3, Processor State after Reset and in RED_state, on page 141.
39 Entering error_state on implem entation-dependent errors
CPU watchdog tim eout at 2
33
ticks, a normal trap, or an SIR at TL = MAXTL
36
causes the CPU to enter error_state.
40 Error_state processor state
SPARC64 V
optionally takes a watchdog reset trap after entry to
36
error_state. Most error-logging register state will be preserved. (See also impl. dep. #254.)
41 Reserved.
72 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
42 FLUSH instruction
SPARC64 V Implementation Dependencies (4 of 11)
SPARC64 V
implements the FLUSH instruction in hardware.
43 Reserved. 44 Data access FPU trap
The destination register(s) are unchanged if an access error occurs. 45–46 Reserved. 47 RDASR
See A.50, Read State Register, in Commonality for details. 48 WRASR
See A.70, Write State Register, in Commonality for details. 49–54 Reserved. 55 Floating-point underflow detection
FSR_underflow
See
in Section 5.1.7 of Commonality for details.
56–100 Reserved. 101 Maximu m trap level
20
MAXTL =5. 102 Clean windows trap
SPARC64 V
generates a
clean_window
exception; register windows are
cleaned in software. 103 Prefetch instruct ions
SPARC64 V
implements PREFETCH variations 0–3 and 20–23 with the
following implementation-dependent characteristics:
The prefetches have observable effect s in privileg ed code.
Pref etch vari ants 03 do not cause a
because the prefetch is dropped when a
fast_data_access_MMU_miss
fast_data_access_MMU_miss
trap,
condition happens. On the other hand, prefetch variants 20–23 cause
data_access_MMU_miss
traps on TLB mi sses.
All prefetches are for 64-byte cache lines, which are aligned on a 64-byte boundary.
See Section A.49, Prefetch Data , on page 57, for implemented variations and their characteristics.
Pref etches will w ork norm ally if th e ASI is ASI_PR IMARY, ASI_SECONDARY, or ASI_NUCLE US, ASI_PRIMARY_AS_ IF_U SER, ASI_SECONDARY_AS_IF_USER, a nd their littl e-endian pairs.
104 VER.manuf
VER.manuf = 0004
. The least significant 8 bits are Fujitsu’s JEDEC
16
20
manufacturing code.
105 TICK register
SPARC64 V
implemen ts 6 3 b its o f th e T ICK register; it increments on every
19
clock cycle.
Release 1.0, 1 July 2002 F. Chapter C Implementation Dependencies 73
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
106 IMPDEPn instructions
SPARC64 V Implementation Dependencies (5 of 11)
SPARC64 V
instructions.
uses the IMPDEP2 opcode for the Multiply Add/Subtract
SPARC64 V
also conforms to Suns specification for VIS -1 and
49
VIS-2.
107 Unimplem ented LDD trap
SPARC64 V
implements LDD in hard ware .
108 Unimplem ented STD trap
SPARC64 V
implements STD in hard ware .
109 LDDF_mem_address_not_aligned
If the address is w ord aligned b ut not do ubleword align ed, generates the
LDDF_mem_address_not_aligned
SPARC64 V
exception. The trap handler
software emulates the instruction.
110
STDF_mem_address_not_aligned
If the address is w ord aligned b ut not do ubleword align ed, generates the
STDF_mem_address_not_aligned
exception. The trap handler
SPARC64 V
software emulates the instruction.
111
LDQF_mem_address_not_aligned
SPARC64 V
generates an
illegal_instructio n
processor does not perform the check for
exception for all LDQFs. The
fp_disabled
. The trap handler
software emulates the instruction.
112
STQF_mem_address_not_aligned
SPARC64 V
generates an
illegal_instructio n
processor does not perform the check for
exception for all STQFs. The
fp_disabled
. The trap handler
software emulates the instruction.
113 Imp lemented memory models
SPARC64 V
implements Total Store Order (TSO) for all the mem ory mod els
42
specified in PSTATE.MM. See Chapte r 8, Memory Models, for det ails.
114 RED_stat e tra p vector address (RSTVaddr)
RSTVaddr is a constant in
VA=FFFF FFFF F000 0000 PA=07FF F000 0000
16
SPARC64 V
and
16
, where:
115 RED_stat e proce ssor state
36
36
See RED_state on page 36 for details of implementation-specific actions in RED_state.
116 SIR_enab le cont rol flag
See Section
A.60 SIR in
Commonality for details.
117 MMU disa bled prefetch behavior
Prefetch and n onfaulti ng Load a lways succeed when the M MU is d isabled .
118 Ident ifying I/O locations
This dependency is beyond the scope of t his publicati on. It should be defined in a system that uses
74 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
SPARC64 V
.
91
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
119 Unim plemented values f or PSTATE.MM
120 Coherence and atom icity of memory operations
SPARC64 V Implementation Dependencies (6 of 11)
Writing 11 model. However, the encoding 11 of
SPARC64 V
into PSTATE.MM causes the machine to use the TSO memory
2
should not be used, since future versions
2
may use this encoding for a new memory model.
42
Although SPARC64 V implements the UPA-based cache coherency mechanism, this dependency is beyond the scope of this publication. I t should be defined in a system that uses
SPARC64 V
121 Implementation-dependent memory model
.
SPARC64 V implements TSO, PSO, and RMO memory models. See Chapter 8, Memory Models, for details.
Accesses to pages with the E (Volatile) bit of their MMU page table entry set are also made in program order.
122 FLUSH la tency
Since the FLUSH instruction synchronizes the processor, its total latency varies depending on many portions of the SPARC64 V processors stat e. Assuming t hat all prio r instruction s are completed , the laten cy of FLUSH is 18 processor cycles
123 Input /out put (I/O) semantics
.
This dependency is beyond the scope of t his publicati on. It should be defined in a system that uses
124 Implic it ASI when TL >0
SPARC64 V
.
See Section 5.1.7 of Commonality for details.
125 Address masking
When PSTATE.AM =1,
SPARC64 V
does mask out the high-order 32 bits of
29, 49, 53
the PC when transmitting it to the destination register.
126 Register Windows State Registers w idth
NWINDOWS for
SPARC64 V
is 8; therefore, only 3 bits are implemented for
the following registers: CWP, CANSAVE, CANRESTORE, OTHERWIN. I f an attempt is made to write a value greater than NWINDOWS 1 to any of these registers, the extraneous upper bits are discarded. The CLEANWIN registe r
contains 3 bits. 127–201 Reserved. 202
fast_ECC_error
fast_ECC_error trap is not implemented in 203 Dispatch Control Register bits 13:6 and 1
SPARC64 V
204 DCR bits 5:3 and 0
SPARC64 V
205 Instruction Trap Register
SPARC64 V
Release 1.0, 1 July 2002 F. Chapter C Implementation Dependencies 75
trap
SPARC64 V
does not implement DCR.
does not implement DCR.
implements the Instruction Trap Register.
.
22
22
24
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
206 SHUTDOWN in struction
SPARC64 V Implementation Dependencies (7 of 11)
58
In privileged m ode the S HUTDO WN instruction executes as a NOP in
SPARC64 V
207 PCR register bi ts 47:32, 26:17, and bit 3
SPARC64 V
.
uses these bits for the following purposes:
20, 21, 201
Bits 47:32 for set/clear/show status of overflow (OVF).
Bit 26 for validit y of OVF field (OVRO).
Bits 24:22 for number of co unter pair (NC).
Bits 20:18 for counter selector (SC ).
Bit 3 for validi ty of SU/SL field (ULRO).
Other impl ementa tion- dependen t bits a re read as 0 and writes to them a re ignored.
208 Ordering of errors c aptured in instruction execution
The order in which e rrors are captured duri ng instructio n execution i s implementation dependent. Orderin g can be in program order or in order of detection.
209 Software intervention after instruction-induced error
Precision of the trap to signal an instruction-induced error for which recovery requires software intervention is implementation dependent.
210 ERROR output signal
The causes and the semantics of ERROR output signal are implementation dependent.
211 Error logging registers information
The information that the error logging registers preserves beyond the reset induced by an ERROR signal is implementation dependent.
212 Trap with fatal error
Generation of a trap along with ERROR signal assertion upon detection of a fatal error is implementation dependent.
213 AFSR.PRIV
SPARC64 V
does no t implem ent the AFSR.PRIV bit.
214 Enable/disable control for deferred traps
SPARC64 V
does not implement a control feature for deferred traps.
215 Error barrier
DONE and RETRY instructions may implicit ly provide an error bar rier function as MEMBAR #Sync. Whe ther DONE and RETRY instructio ns provide an error barrier is impleme ntation de pendent.
216
217
76 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
data_access_error
data_access_error
instruction_access_error
instruction_access_error
trap precision
trap is always precise in
trap is always precise in
trap precision
SPARC64 V
SPARC64 V
.
.
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
218
SPARC64 V Implementation Dependencies (8 of 11)
async_data_error
async_data_error
trap is implemented in
SPARC64 V
39
, using tt =4016. See
Appendix P for det ails.
) allocation
219 Asynchronous Fault Address Register (
SPARC64 V
VA = 00
VA = 08
implements two AFARs:
for an error occurring in D1 cache.
16
for an error occurring in U2 cache.
16
AFAR
220 Addition of logging and control registers for error handl ing
SPARC64 V
implements various features for sustaining reliability. See
177, 178
Appendix P for det ails. 221 Special/signalling ECCs
The method to generate “special” or signalling ECCs and whether
processor-ID is embedded into the data associated with special/signalling
ECCs is implementation dependent. 222 TLB organization
85
SPA RC64 V has the follow ing TLB organiza tion:
Level-2 micro ITLB (uITLB), 32-way fully associative
Level-1 micro DTLB (uDTLB), 32-way fully associative
Level-2 IMMU-TLBco nsisting of sITLB (set-associative Instruction TLB)
and fITLB (fully associative Instruction TLB).
Level-2 DMMU-TLBconsisting of sDTLB (set-associative Data TLB) and fDTLB (fully associative Data TLB).
223 TLB multiple-hit detection
86
On SPARC64 V, TLB multiple hit detection is supported. However, the multiple hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sTLB a nd fTLB is not detecte d. The mu ltiple hit is detected only when the micro-TLB mismatches and the main TLB is referenced.
224 MMU physical address width
86
The SPARC64 V MMU implements 43-bit physical addresses. The PA field of
TTE
the
holds a 43-bit physical address. Bits 46:43 of each TTE always read as 0 and writes to them are ignored. The MMU translates virtual addresses into 43-bit physical addresses. Each cache tag holds bits 42:6 of physical addresses.
225 TLB locking of entries
87
In SPARC64 V, when a TTE with its lock bit set is written into TLB through the Data In register, the TTE is automatically written into the corresponding fully associativ e TLB and lo cked in the T LB. Otherwi se, the TTE i s written into the corresponding sTLB of fTLB, depending on its page size.
226 TTE support for CV bit
SPARC64 V
does not support the CV bit in TTE. Since I1 and D1 are
virtually indexe d caches, unalia sing is suppo rted by
SPARC64 V
. See also
87
impl. dep. #232.
Release 1.0, 1 July 2002 F. Chapter C Implementation Dependencies 77
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
227 TSB number of entries
SPARC64 V Implementation Dependencies (9 of 11)
SPARC64 V
supports a maximum of 16 million entries in the common TSB
88
and a maximum of 32 million lines the Split TSB.
228 TSB_Hash supplied from TSB or context-ID register
TSB_Hash is generated from the context-ID register in
229 TSB_Base address generation
SPARC64 V
generates the TSB_Base address directly from the TLB
SPARC64 V
.
88
88
Extension Re gisters. By maintain ing compa tibility wit h UltraSPARC I/II, SPARC64 V provides mode flag MCNTL.JPS1_TSBP. When MCNTL.JPS1_TSBP =0, the TSB_Base register is used.
230
data_access_exception
SPARC64 generates
data_access_exception
trap
89
only for the causes listed in
Section 7.6.1 of Commonality.
231 MMU physical address variability
SPARC64 V
supports both 41-bit and 43-bit physical address mode. The
91
initial width of the physi cal address is co ntrolled by OPSR.
232 DCU Control Register CP and CV bits
SPARC64 V
does not implement CP and CV bits in the DCU Control
23, 91
Register. See also impl. dep. #226.
233 TSB_Hash field
SPARC64 V
does not implement TSB_Hash.
92
234 TLB replacement algorithm
For fTLB, SPARC64 V implements a pseudo-LRU. For sTLB, LRU is used.
235 TLB data access address assignment
The MMU TLB data-access address assignment and the purpose of the address are implementation dependent.
236 TSB_Size field width
SPARC64 V
In
, TSB_Siz e is 4 bits wide, occup ying bits 3:0 of the TSB
register. The maximum number of TSB entries is, therefore, 512 × 2 entries).
237 DSFAR/DSFSR for JMPL/RETURN
mem_address_not_aligned
A
mem_address_not_a ligned
exception that occurs during a JMPL or RETURN
instruction does not update either the D-SFAR or D-SFSR reg iste r.
238 TLB page offset for large page sizes
SPARC64 V
On
, even for a large page, written data for TLB Data Register is preserved for bits representing an offset in a page, so the data previously written is returned reg ardless of the pag e size.
239 Register access by ASIs 55
In
SPARC64 V
, VA<63:19> of IM MU ASI 5516 and DMMU ASI 5D16 are ignored. An access to virtual addresses 40000 access 00000
78 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
to 20FF8
16
and 5D
16
16
16
to 60FF816 is treated as an
16
15
(16M
93
94
97
89, 97
87
92
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
240 DCU Control Register bits 47:41
241 Address Masking and
242 TLB lock bit
243 Interrupt Vector Dispatch Status Register BUSY/NACK pairs
244 Data Watchpoi nt Reliabilit y
245 Call/Branch displacement encoding in I-Cache
246 VA<38:29> for Interrupt Vector Dispatch Regi ster Access
247 Interrupt Vector Receive Register SID fields
248 Conditions for
249 Data watchpoint for Partial Store instru ction
250 PCR accessibility when PSTATE.PRIV = 0
251 Reserved.
SPARC64 V Implementation Dependencies (10 of 11)
SPARC64 V
access in speculative paths.
uses bit 41 for WEAK_SPCA, which en ables/dis ables me mory
DSFAR
SPARC64 V
In SPARC64 V, only the fITLB and the fDTLB support the lock bit. The lock bit in sITLB and sDTLB is read as 0 and writes to it are ignored.
SPARC64 V
In Vector Dispatch Status Register.
No implemen ta tion- dep en de nt fe at ures of of data watchpoints.
SPARC64 V
In (BPcc, FBPfcc , Bicc, BPr) instruction in an instruction cache are identical to the architectural encoding (as they appear in main memory).
SPARC64 V
Dispatch Register is written.
SPARC64 V
packet.
SPARC64 V
under the standard conditions described in Commonality Section 5.1.7.
Watchpoint exceptions on Partial Store instructions occur conservatively on
SPARC64 V
nonzero value (watchpoint enabled). The byte store mask (r[rs2]) in the Partial Store inst ruction is igno red, and a watch point excep tion can occu r even if the mask is zero (that is, no store will take place).
In
SPARC64 V
determined by PCR.PRIV. If PSTATE.PRIV =0 and PCR.PRIV =1, an attempt to execute either RDPCR or WRPCR will cause a exception. If PSTATE.PRIV =0 and PCR.PRIV =0, RDPCR operates wit hout privilege viola tion and WRPCR generates a when an attempt is made to change (that is, write 1 to) PCR.PRIV.
writes zeroes to the more significant 32 bits of DSFAR.
, 32 BUSY/NACK pairs are implemented in the Interrupt
SPARC64 V
, the least significant 11 bits (bits 10:0) of a CALL or branch
ignores all 10 b its of VA<38:29> when the Interrupt Vector
obtains the interrupt source identifier SID_L from the UPA
fp_exception_other
triggers
. The DCUCR Data Watchpoint masks are only checked for
fp_exception_other
, the accessibility of PCR when PSTATE.PRIV =0 is
unfinished_FPop
with
with trap type
privileged_action
reduce the reliability
unfinished_FPop
privileged_action
exception only
23
87
136
24
24
136
136
18
57
20, 22, 58
Release 1.0, 1 July 2002 F. Chapter C Implementation Dependencies 79
TABLE C-1
Nbr SPARC64 V Implementation Notes Page
252 DCUCR.DC (D ata Cache Enable)
253 DCUCR.IC (Instruction Cache Enabl e)
SPARC64 V Implementation Dependencies (11 of 11)
SPARC64 V
does not implement DCUCR.DC.
24
24
SPARC64 V does not implement DCUCR.IC.
254 Means of e xiting error_state
The standard behavior of a
SPARC64 V
error_state is to reset itself by internally generating a
CPU up on en try in to
watchdog_reset
37, 146
(WDR). Howe ver, OPSR can be set so that when error_state is entered, the processor remains halted in error_state instead of generating a
255
256
watchdog_reset
LDDFA with ASI E0
No exception is generated based on the destination register rd.
LDDFA with ASI E0
For LDDFA with ASI E0
.
or E116 and misaligned destination register number
16
or E116 and misaligne d memory address
16
or E11 and a memory address aligned on a 2n-byte
16
120
120
boundary, a SPARC64 V processor behaves as follows: n 3 ( 8-byte alignment): no exception related to memory address alignment is generated. n = 2 (4-byte alig nment):
LDDF_mem_address_not_aligned
exception is
generated. n 1 ( 2-byte alignment):
mem_address_not_aligned
exception is
generated.
257
LDDFA with ASI C0
For LDDFA with C0
n
-byte boundary, a SPARC64 V processor behaves as follows:
a 2
–C5
16
C5
16
16
16
or
or
C8
CD16 and misaligned memory address
16
C8
CD16 and a memory address aligned on
16
120
n 3 ( 8-byte alignment): no exception related to memory address alignment is generated. n = 2 (4-byte alig nment):
LDDF_mem_address_not_aligned
exception is
generated. n 1 ( 2-byte alignment):
mem_address_not_aligned
exception is
generated.
258
ASI_SERIAL_ID
119
SPARC64 V provides an identification code for each processor.
80 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
D

Formal Specification of the Memory Models

Please refer to Appendix D of
Commonality
.
81
82 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
E

Opcode Maps

Please refer to Appendix E in SPARC64 V
TABLE E-1
(instruction<6:5>)
IMPDEP2
IMPDEP2 (op = 2, o p3 = 37
size
instruc tion.
00 01 10 11
Commonality
)
16
00 01 10 11
FMADDs FMSU Bs FNM ADDs FNMADDs
FMADDd FMSU Bd SNM SUBd FNMSUBd
TABLE E-1
.
var (instruction <8:7>)
(not used reserved)
(reserved for quad operations)
lists the opcode map for the
83
84 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
F.APPENDIX
F

Memory Management Unit

The Memory Ma nagement Un it (MMU) archite cture of SPARC64 V confo rms to the MMU architecture defined in Appendix F of dependency. See Appendix F in SPARC64 V MM U.
Section numbers in this appendix correspond to those in Appendix F of
Commonality
This appendix describes the implementation dependencies and other additional information about the SPARC64 V MMU. For SPARC64 V implementations, we first list the implementation dependency as given in describe the SPARC64 V implementation.
. Figures and tables, however, are numbered consecutively.
Commonality
Commonality
for the basic definition s of the
TABLE C-1
but with some mo del
Commonality
of
, then

F.1 V irtual Address Translation

IMPL. DEP. #222
SPARC64 V has the following TLB organization:
Level-1 micro ITLB (uITLB), 32-way fully associative
Level-1 micro DTLB (uDTLB), 32-way fully associative
Level-2 IMMU-TLB consists of sITLB (set-associative Instruction TLB) and fITLB (fully associative Instruction TLB).
Level-2 DMMU-TLB consists of sDTLB (set-associative Data TLB) and fDTLB (fully associative Data TLB).
TABLE F-1
Hardware contains micro-ITLB and micro-DTLB as the t emporary mem ory of the main TLBs, as shown in are called main TLBs.
shows the organization of SPARC64 V TLBs.
:
TLB
organization is JPS1 implementation dependent.
TABLE F-1
. In contrast to the micro-TLBs, sTLB and fTLB
85
The micro-TLBs are coherent to main TLBs and are not visible to software, with the exception of TLB multiple hit detection. Hardware maintains the consistency between micro-TLBs and main TLBs.
No other details on micro-TLB are provided because software cannot execute direct operations to micro -TLB and its configuration is invisible to software.
TABLE F-1
Feature sITL B and sDTLB fITLB and fD TLB
Entries 2048 32 Associativity 2-way set associative Fully associative Page size supported 8 KB/4MB 8 KB/64 KB/512 KB/4 MB Locked translation entry Not supported Supported Unlocked translation entry Supported Supported
IMPL. DEP. # 223
Organization of SPARC64 V TLBs
:
Whether TLB multiple-hit detections are supported in JPS1 is
implementation dependent.
On SPARC64 V, TLB multiple hit detection is supported. However, the multiple hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sT LB and fTLB is not detected. The multiple hit is detected only when the micro-TLB mismatches and main TLB is referenced.

F.2 Translation Table Entry (TTE)

IMPL DEP.
in Commonality
dependent.
On SPARC64 V,
TTE_Data
TABLE
F-1:
TTE_Data bits 46–43 are implem entation
bits 46:43 are reserved.
IMPL. DEP. # 224
:
Physical address width support by the MMU is implementation
dependent in JPS1; minimum PA width is 43 bits.
The SPARC64 V MMU implements 43-bit physical addresses. The PA field of the
TTE
holds a 43-bit physical address. The MMU translates virtual addresses into
43-bit physical addresses. Eac h cache tag holds bit s 42:6 of physical addresse s. Bits 46:43 of each TTE always read as 0 and write s to them are ignored.
A cacheable access for a physical address ≥ 400 0000 0000
always causes the
16
cache miss for the U2 cache and generates a UPA request for the cacheable access. The urgent error
ASI_UGESR.SDC
is signalled after the UPA cacheable access is
requested.
86 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
The physical address length to be passed to the UPA interface is 41 bits or 43 bits, as designated i n the
ASI_UPA_CONFIG.AM
in
ASI_UPA_CONFIG.AM
field. When the 41-bit PA is specified
, the most si gnifican t 2 bits of the CPU intern al physic al address are discarded and only the remaining leas t significant 41 bits are passed to the UPA address bus. If the discarded most sign ificant 2 bits are not 0, th e urgent error
ASI_UGESR.SDC
is detected afte r the invalid ad dress transfer to the
UPA interface. Otherwise, when the 43-bit PA is specified in
ASI_UPA_CONFIG.AM,
the entire 43 bits of CPU internal physical address are
passed to the UPA address bus.
IMPL. DEP. #238
:
When page offset bits for larger page size (PA<15:13>, PA<18:13>, and PA<21:13> for 64-Kbyte, 512-Kbyte, and 4 -Mbyte pages, respectively) are stored in the TLB, it is implementation dependent whether the data returned from those fields by a Data Access read are zero or the d ata previously wr itten to the m.
On SPARC64 V, the data returned from PA<15:13>, PA<18:13>, and PA<21:13> for 64-Kbyte, 512-Kbyte , and 4-Mbyte pages, respec tively, by a Dat a Access read are the data previously written to them.
IMPL. DEP. #225
:
The mechanism by which entries in TLB are locked is implementation dependent in JPS1.
In SPARC64 V, when a TTE with its lock bit set is written into TLB through the Data In register, the TTE is automatically written into the corresponding fully associative TLB and locked in the TLB. Otherwise, the TTE is written into the corresponding sTLB or fTLB, depending on its page size.
IMPL. DEP. #242
:
An implementation containing multiple TLBs may implement the L (lock) bit in all TLBs but is only required to implement a lock bit in one TLB for each page size. If the lock bit is not implemented in a particular TLB, it is read as 0 and writes to it are ignored.
In SPARC64 V, only the fITLB and the fDTLB support the lock bit as described in
TABLE F-1
. The lock bit in sITLB and sDTLB is read as 0 and writ es to it are
ignored.
IMPL. DEP. #226
:
Whether th e CV bit is supported in
dependent in JP S1. When th e CV bit in
TTE
is not provided and the implementation
TTE
is implementation
has virtually indexed caches, the implementation should support hardware unaliasing for the caches.
In SPARC64 V, no TLB supports the CV bit in unaliasing for the caches. The CV bit in any
TTE
. SPARC 64 V supports h ardware
TLB
entry is read as 0 and writes t o it
are ignored.
Release 1.0, 1 July 2002 F. Chapter F Memory Management Unit 87

F.3.3 TSB Organization

IMPL. DEP. # 227
dependent in JPS1. See impl. dep. #228 for the limitation of registers.
SPARC64 V supports a maximum of 16 million lines in the common TSB and a maximum 32 million lines in the split TSB. The maximum number N in
FIGURE
F-4 of
:
The maximum number of entries in a TSB is implementation
Commonality
is
16 million (16 * 220).

F.4.2 TSB Pointer Formation

IMPL. DEP. # 228
from a context-ID register is implementation dependent in JPS1. Only for cases of direct hash with context-ID can the width of the bits.
On SPARC64 V,
TSB_size
the
IMPL. DEP. # 229
exclusive-ORing the TSB Base Register and a TSB Extension Register or by taking the
TSB_Base
dependent in JPS1. This implementation dependency is only to maintain compatibility with the TLB miss handling software of UltraSPARC I/II.
On SPARC64 V, when generated by taking
:
Whether
TSB_Hash
field is 4 bits.
:
Whether the implementation generates the TSB Base address by
field directly from the TSB Extension Register is implementation
TSB_Hash
is supplied from a context-ID register. The width of
ASI_MCNTL.JPS1_TSBP
TSB_Base
field directly from the TSB Extension Register.
TSB_size
is supplied from a TSB Extension Register or
TSB_size
= 1, the TSB Base address is
field be wider than 3
in TSB
TSB Pointe r Formation
On SPARC64 V, the number N in the foll owing equatio ns rang es from 0 t o 15; N is defined to be the
SPARC64 V supports the TSB Base from TSB Extension Registers as follows when
ASI_MCNTL.JPS1_TSBP
For a shared TSB (TSB Register split field = 0):
8K_POINTER = TSB_Extension[63:13+N] (VA[21+N:13] TSB_Hash)
0000
64K_POINTER = TSB_Extension[63:13+N] 0000
For a split TSB (TSB Register split field = 1):
88 SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002
TSB_Size
=1.
field of the TSB Base or TSB Extension Register.
(VA[24+N:16] ⊕ TSB_Hash)
8K_POINTER = TSB_Extension[63:14+N] 0 (VA[21+N:13] ⊕ TSB_Hash)
0000
64K_POINTER = TSB_Extension[63:14+N] 1 TSB_Hash) 0000
Value of TSB_Hash for both a shared TSB and a split TSB
When 0 <= N <= 4,
TSB_Hash = context_register[N+8:0]
Otherwise, when 5 <= N <= 15,
TSB_Hash[ 12:0 ] = context_register[ 12:0 ] TSB_Hash[ N+8:13 ] = 0 ( N-4 bits zero )

F.5 Faults and Traps

IMPL. DEP. #230
dependent in JPS1, but there are several mandatory causes of trap.
SPARC64 V sig nals a Commonality. However, caution is needed to deal with an invalid ASI. See Section F.10.9 for details.
IMPL. DEP. #237
captured when instruction is implementation dependent.
On SPARC64 V, the fault status and address (DSFSR/DSFAR) are not captured when a instruction.
: The cause of a
data_access_exception
: Whether the fa ult status and /or address (DSFSR/DSFAR) are
mem_address_not_aligned
mem_address_not_aligned
(VA[24+N:16]
data_access_exception
for the causes, as defined in F.5 in
is generated during a JMPL or RETURN
exception is generated during a JMPL or RETURN
trap is implementation
data_access_exception
Additional information:
instruction_access_error
TABLE
to those in
F-2 of Commonality. A modification (the two traps are added) of
On SPARC64 V, the two precise traps
data_access_error
and
are recorded by the M MU in additio n
that table is show n below.
TABLE F-2
Ref #Trap Name Trap Cause I-SFSR
1.
Release 1.0, 1 July 2002 F. Chapter F Memory Management Unit 89
MMU Trap Types, Causes, and St ored State Regist er Updat e Policy
fast_instruction_access_MMU_miss
I-TLB miss X2 X 6416–67
Registers Updated
(Stored State in MMU)
I-MMU Tag Access
D-SFSR, SFAR
D-MMU Tag Access Trap Type
16
Loading...