For all product questions and inquiries contact a Cirrus Logic Sales Representative.
To find one nearest you go to www.cirrus.com
IMPORTANT NOTICE
“Preliminary” product information describes products that are in production, but for which full characterizati on data is not yet avai lable. Cirrus Logic, Inc. and
its subsidiaries (“Cirrus”) believe that the i nformati on contained in this document is accurate and reliabl e. However, the information is subject to change without
notice and is provided “AS IS” without warranty of any kind (express or implied). Customers are advised to obtain the latest versi on of r elevant information to
verify, before placing orders, that informati on being relied on is current and complete. All products are sold subject to the terms and condi tions of sale supplied
at the ti me of order acknowledgment, incl uding those pertai ning to warrant y, patent infringement, and limitatio n of liabili ty. No responsi bility is assumed by Cirr us
for the use of this information, including use of this information as the basis for manufacture or sale of any items, or for infringement of patents or other rights
of third parties. This document is the property of Ci rrus and by furni shing this information, Cirrus grants no license, express or implied under any patents, mask
work rights, copyrights, tr ademarks, trade secrets or other intellectual property rights. Cirrus owns the copyrights associated with the information contained
herein and gi ves consent for copies to be made of the information onl y for use with in your organizat ion with respect to Ci rrus integra ted circuit s or other product s
of Cirrus. This consent does not extend to other copying such as copying for general distr ibution, advertisi ng or promotional purposes, or for creating any work
for resale.
CERTAIN APPLICATIONS USI NG SEMI CONDUCTOR PRODUCTS MAY INVOLVE POTENTIAL RISKS OF DEATH, PERSONAL INJURY, OR SEVERE
PROPERTY OR ENVIRONMENTAL DAMAGE (“CRITICAL APPLICATIONS”). CIRRUS PRODUCTS ARE NOT DESIGNED, AUTHORIZED OR WARRANTED FOR USE IN AIRCRAFT SYSTEMS, MILITARY APPLICATIONS, PRODUCTS SURGICALLY IMPLANTED INTO THE BODY, LIFE SUPPORT PRODUCTS OR OTHER CRITICAL APPLICATIONS ( INCLUDING MEDICAL DEVICES, AIRCRAFT SYSTEMS OR COMPONENTS AND PERSONAL OR
AUTOMOTIVE SAFETY OR SECURITY DEVICES). I NCLUSION OF CIRRUS PRODUCTS IN SUCH APPLICATIONS IS UNDERSTOOD TO BE FULLY AT
THE CUSTOMER'S RISK AND CIRRUS DISCLAIMS AND MAKES NO WARRANTY, EXPRESS, STATUTORY OR IMPLIED, INCLUDING THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR PARTICULAR PURPOSE, WITH REGARD TO ANY CIRRUS PRODUCT THAT IS USED IN
SUCH A MANNER. IF THE CUSTOMER OR CUSTOMER'S CUSTOMER USES OR PERMITS THE USE OF CIRRUS PRODUCTS IN CRITICAL APPLICATIONS, CUSTOMER AGREES, BY SUCH USE, TO FULLY INDEMNIFY CIRRUS, ITS OFFICERS, DIRECTORS, EMPLOYEES, DISTRIBUTORS AND OTHER AGENTS FROM ANY AND ALL LIABILITY, I NCLUDING ATTORNEYS' FEES AND COSTS, THAT MAY RESULT FROM OR ARISE I N CONNECTION
WITH THESE USES.
Cirrus Logic, Cirrus, MaverickCrunch, MaverickKey, and the Cirrus Logic logo designs are trademarks of Cirrus Logic, Inc. All other brand and product names
in this document may be trademarks or service marks of their respective owners.
Microsoft and Windows are registered trademarks of Microsoft Corporation.
MicrowireTM is a trademark of National Semiconductor Corp. Nati onal Semiconductor is a registered trademark of National Semiconductor Corp.
Texas Instruments is a registered trademark of Texas Instruments, Inc.
Motorola is a registered trademark of Motorola, Inc.
LINUX is a registered trademark of Linus Torval ds.
2EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
About the EP9315 User’s Guide
This Guide describes the architecture, hardware, and operation of the Cirrus
Logic EP9315. It is intended to be used in conjunction with the EP9315
Datasheet, which contains the full electrical specifications for the device.
How to Use this Guide
Subject MatterLocation
AC’97Chapter 22 - AC’97 Controller
ARM920T Processor
Boot ROM, Hardware and SoftwareChapter 4 - Boot ROM
Booting From SROM or SyncFlashChapter 13 - SDRAM, SyncROM, and SyncFLASH Controller
Chapter 2 - ARM920T Core and Advanced High-Speed Bus
(AHB)
Chapter 7 - Raster Engine With Analog/LCD Integrated Timing
and Interface
Chapter 14 - UART1 With HDLC and Modem Control Signals
Chapter 15 - UART2
Chapter 16 - UART3 With HDLC Encoder
Related Documents from Cirrus Logic
1.EP9315 Data Sheet, Document Number - DS638PP1
Reference Documents
1.ARM920T Technical Reference Manual
2.AMBA Specification (Rev. 2.0), ARM IHI 0011A, ARM Limited.
4EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
3.AHB Example AMBA System (Addendum 01), ARM DDI 0170A, ARM
Limited.
4.The coprocessor instruction assembler notation can be referenced from
ARM programming manuals or the Quick Reference Card, document
number ARM QRC 0001D.
5.The MAC engine is compliant with the requirements of ISO/IEC 8802-3
(1993), Sections 3 and 4.
6.OpenHCI - Open Host Controller interface Specification for USB, Release
1.0a; Compaq, Microsoft, National Semiconductor.
7.ARM Coprocessor Quick Reference Card, document number ARM QRC
0001D.
8.Information Technology, AT Attachment with Packet Interface - 5
(ATA/ATAPI-5) ANSI NCITS document T13 1321D, Revision 3, 29
February 2000
9.OpenHCI - Open Host Controller Interface Specification for USB,
Release: 1.0a, Released - 09/14/99 2:33 PM
10. ARM PrimeCell PL190-Rel1v1 Revision 1.7 Technical Reference Manual
DDI0181C
11. Audio Codec ‘97, Revision 2.3, April 2002, Intel Corporation
Notational Conventions
This document uses the following conventions:
• Internal and external Signal Names, and Pin Names use mixed upper and
lower case alphanumeric, and are shown in bold font: RDLED.
• Register Bit Fields are named using upper and lower case alphanumeric:
that is, SBOOT, LCSn1.
• Registers are named using mixed upper and lower case alphanumeric:
that is, SysCfg or PxDDR. (Where there are multiple registers with similar
names, a lower case “x” is used as a place holder. For example, in the
PxDDR registers, x represents a letter between A and H, indicating the
specific port being discussed.)
Caution: In the Internal Register Map in Table 2-7 on page 7-53, some
memory locations are listed as Reserved. These memory
locations should not be used. Reading from these memory
locations will yield invalid data. Writing to these memory locations
may cause unpredictable results.
(An example register description is shown below. This description is used for
the following examples.)
Table 29-1: Security Register List ................................................................................. 816
28EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
NN
1.1 Introduction
The EP9315 is a highly integrated system-on-chip processor that paves the
way for a multitude of next-generation consumer and industrial electronic
products. Designers of digital media servers and jukeboxes, telematic control
systems, thin clients, set-top boxes, point-of-sale terminals, industrial controls,
biometric security systems, and GPS devices will benefit from the EP9315’s
integrated architecture and advanced features. In fact, with amazingly agile
performance provided by a 200 MHz ARM920T processor, and featuring an
incredibly wide breadth of peripheral interfaces, the EP9315 is well suited to
an even broader range of high volume applications. Furthermore, by enabling
or disabling the EP9315’s peripheral interfaces, designers can reduce
development costs and accelerate time-to-market by creating a single platform
that can be easily modified to deliver a variety of differentiated end products.
The EP9315 features an advanced ARM920T processor design with an MMU
that supports Linux®, Windows® CE, and many other embedded operating
systems. The ARM920T’s 32-bit microcontroller architecture, with a five-stage
pipeline, delivers impressive performance at very low power. The included 16
KByte instruction cache and 16 KByte data cache provide zero-cycle latency
to the current program and data, or can be locked to provide guaranteed nolatency access to critical instructions and data. For applications with
instruction memory size restrictions, the ARM920T’s compressed Thumb
instruction set provides a space-efficient design that maximizes external
instruction memory usage.
1.4.2 MaverickCrunch™ Coprocessor for Ultra-Fast Math Processing
The MaverickCrunch coprocessor is an advanced, mixed-mode math
coprocessor that greatly accelerates the single and double-precision integer
and floating-point processing capabilities of the ARM920T processor core.
The engine simplifies the end-user’s programming task by using predefined
coprocessor instructions, by utilizing standard ARM compiler tools, and by
requiring just one debugger session for the entire system. Furthermore, the
integrated design provides a single instruction stream and the advantage of
zero latency for cached instructions. To emulate this capability, competitors’
solutions add a DSP to the system, which requires separate
compiler/linker/debugger tool sets. This additional DSP requires programmers
to write two separate programs and debug them simultaneously, which can
result in frustration and costly delays.
®
The single-cycle integer multiply-accumulate instruction in the
MaverickCrunch coprocessor allows the EP9315 to offer unique speed and
performance while encoding digital audio and video formats, processing data
via Ethernet, and performing other math-intensive computing and dataprocessing functions in consumer and industrial electronics.
1.4.3 MaverickKey™ Unique ID Secures Digital Content and OEM
Designs
MaverickKey unique hardware programmed IDs provide an excellent solution
to the growing concern over secure Web content and commerce. With Internet
security playing an important role in the delivery of digital media such as books
or music, traditional software methods are quickly becoming unreliable. The
MaverickKey unique IDs provide OEMs with a method of utilizing specific
hardware IDs for DRM (Digital Rights Management) mechanisms.
32EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
Introduction
NN
MaverickKey uses a specific 32-bit ID and a 128-bit random ID that are
programmed into the EP9315 through the use of laser probing technology.
These IDs can then be used to match secure copyrighted content with the ID
of the target device that the EP9315 is powering, and then deliver the
copyrighted information over a secure connection. In addition, secure
transactions can benefit by matching device IDs to server IDs.
MaverickKey IDs can also be used by OEMs and design houses to protect
against design piracy by presetting ranges for unique IDs. For more
information on securing your design using MaverickKey, please contact your
Cirrus Logic sales representative.
1.4.4 Integrated Three-port USB 2.0 Full Speed Host with Transceivers
The EP9315 integrates three USB 2.0 Full Speed host ports. Fully compliant
to the OHCI USB 2.0 Full Speed specification (12 Mbps), the host ports can be
used to provide connections to a number of external devices including mass
storage devices, external portable devices such as audio players or cameras,
printers, or USB hubs. Naturally, the three-port USB host also supports the
USB 2.0 Low Speed standard. This provides the opportunity to create a wide
array of flexible system configurations.
1.4.5 Integrated Ethernet MAC Reduces BOM Costs
1
The EP9315 integrates a 1/10/100 Mbps Ethernet Media Access Controller
(MAC) on the device. With a simple connection to an MII-based external PHY,
an EP9315-based system has easy, high-performance, cost-effective Internet
capability.
1.4.6 8x8 Keypad Interface Reduces BOM Costs
The keypad circuitry scans an 8x8 array of 64 normally open, single pole
switches. Any one or two keys depressed will be de-bounced and decoded.
An interrupt is generated whenever a stable set of depressed keys is detected.
If the keypad is not utilized, the 16 column/row pins may be used as generalpurpose I/Os.
The processor includes a 16 KByte boot ROM to set up standard
configurations. Optionally, the processor may be booted from FLASH memory,
over the SPI serial interface, or through the UART. This boot flexibility makes it
easy to design user-controlled, field-upgradable systems. See Chapter 4 on
page 121, for additional details.
1.4.8 Abundant General Purpose I/Os Build Flexible Systems
The EP9315 includes both enhanced and standard general-purpose I/O pins
(GPIOs). The 16 different enhanced GPIOs may individually be configured as
inputs, outputs, or interrupt-enabled inputs. There are an additional 31
standard GPIOs that may individually be used as inputs, outputs, or opendrain pins. The standard GPIOs are multiplexed with peripheral function pins,
so the number available depends on the utilization of peripherals. Together,
the enhanced and standard GPIOs facilitate easy system design with external
peripherals not integrated on the EP9315.
1.4.9 General-Purpose Memory Interface (SDRAM, SRAM, ROM and
FLASH)
The EP9315 features a unified memory address model in which all memory
devices are accessed over a common address/data bus. A separate internal
bus is dedicated to the read-only Raster/Display refresh engine, while the rest
of the memory accesses are performed via the high-speed processor bus. The
SRAM memory controller supports 8, 16 and 32-bit devices and
accommodates an internal boot ROM concurrently with a 32-bit SDRAM
memory.
1.4.10 12-Bit Analog-to-Digital Converter (ADC) Provides an Integrated
Touch-Screen Interface or General ADC Functionality
The EP9315 includes a 12-bit ADC, which can be utilized either as a touchscreen interface or for general ADC functionality. The touch-screen interface
performs all sampling, averaging, ADC range checking, and control for a wide
variety of analog-resistive touchscreens. To improve system performance, the
controller only interrupts the processor when a meaningful change occurs.
The touch-screen hardware may be disabled, and the switch matrix and ADC
controlled directly for general ADC usage if desired.
1.4.11 Graphics Accelerator
The EP9315 includes a hardware graphics acceleration engine that improves
graphic performance by handling block copy, block fill and hardware line draw
operations. The graphics accelerator is used in the system to off load graphics
operations from the processor.
1.4.12 PCMCIA Interface
The PCMCIA interface supports one 16-bit PCMCIA PC Card. These devices
are credit card sized peripherals that add memory, mass storage and I/O
capabilities to computer systems, and can be used to further broaden the
options of a designer’s platform.
34EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
Chapter 2
OO
2ARM920T Core and Advanced High-Speed Bus (AHB)
2.1 Introduction
This section discusses the ARM920T processor core and the Advanced HighSpeed Bus (AHB).
2.2 Overview: ARM920T Processor Core
The ARM920T is a Harvard architecture processor core with separate
16 kbyte instruction and data caches with an 8-word line length used in the
EP9315. The processor core utilizes a five-stage pipeline consisting of fetch,
decode, execute, data memory access, and write stages.
2.2.1 Features
Key features include:
•ARM V4T (32-bit) and Thumb (16-bit compressed) instruction sets
•32-bit Advanced Micro-Controller Bus Architecture (AMBA)
2
• 16 kbyte Instruction Cache with lockdown
•16 kbyte Data Cache (programmable write-through or write-back) with
lockdown
•Write Buffer
• MMU for Microsoft Windows CE and Linux operating systems
•Translation Look-aside Buffers (TLB) with 64 Data and 64 Instruction
Entries
•Programmable Page Sizes of 64 kbyte, 4 kbyte, and 1 kbyte
The ARM920T core follows a Harvard architecture and consists of an
ARM9TDMI core, MMU, instruction and data cache. The core supports both
the 32-bit ARM and 16-bit Thumb instruction sets.
The internal bus structure (AMBA) includes both an internal high speed and
external low speed bus. The high speed bus AHB (Advanced Highperformance Bus) contains a high speed internal bus clock to synchronize
coprocessor, MMU, cache, DMA controller, and memory modules. AMBA
includes a AHB/APB bridge to the lower speed APB (Advanced Peripheral
Bus). The APB bus connects to lower speed peripheral devices such as
UARTs and GPIOs.
JTAG
ARM9TDMI
Processor core
(Integral
EmbeddedICE)
R13
Data cacheData MMU
CP15
Write
Buffer
Write Back
PA TAG
RAM
AMBA
Bus
Int.
APB
The MMU provides memory address translation for all memory and
peripherals designed to remap memory devices and peripheral address
locations. Sections, large, small and tiny pages are programmable to map
memory in 1 Mbyte, 64 kbyte, 4 kbyte, 1 kbyte size blocks. To increase system
performance, a 64-entry translation look-aside buffer will cache 64 address
locations before a TLB miss occurs.
36EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
A 16 kbyte instruction and a 16 kbyte data cache are included to increase
performance for cache-enabled memory regions. The 64-way associative
cache also has lock-down capability. Cached instructions and data also have
access to a 16-word data and 4-word instruction write buffer to allow cached
instructions to be fetched and decoded while the write buffer sends the
information to the external bus.
The ARM920T core supports a number of coprocessors, including the
MaverickCrunch coprocessor by means of a specific pipeline architecture
interface.
2.2.3.1 ARM9TDMI Core
ARM9TDMI core is responsible for executing both 32-bit ARM and 16-bit
Thumb instructions. Each provides a unique advantage to a system design.
Internally, the instructions enter a 5-stage pipeline. These stages are:
• Instruction Fetch
•Instruction Decode
ARM920T Core and Advanced High-Speed Bus (AHB)
OO
2
• Execute
• Data Memory Access
•Register Write
All instructions are fully interlocked. This mechanism will delay the execution
stage of a instruction if data in that instruction comes from a previous
instruction that is not available yet. This simply insures that software will
function identically across different implementations.
For memory access instructions, the base register used for the access will be
restored by the processor in the event of an Abort exception. The base
register will be restored to the value contained in the processor register before
execution of the instruction.
The ARM9TDMI core memory interface includes a separate instruction and
data interface to allow concurrent access of instructions and data to reduce
the number of CPI (cycles per instruction). Both interfaces use pipeline
addressing. The core can operate in big and little endian mode. Endianess
affects both the address and the data interfaces.
The memory interface executes four types of memory transfers: sequential,
non-sequential, internal, and coprocessor. It will also support uni- and bidirectional transfer modes.
The core provides a debug interface called JTAG (Joint Testing Action Group).
This interface provides debug capability with five external control signals:
There are six scan chains (0 through 5) in the ARM9TDMI controlled by the
JTAG Test Access Port (TAP) controller. Details on the individual scan chain
function and bit order can be found in the ARM920T Technical Reference
Manual.
2.2.3.2 Memory Management Unit
The MMU provides the translation and access permissions for the address
and data ports for the ARM9TDMI core. The MMU is controlled by page tables
stored in system memory and accessed using the CP15 register 1. The main
features of the MMU are as follows:
• Address Translation
•Access Permissions and Domains
• MMU Cache and Write Buffer Access
2.2.3.2.1 Address Translation
The virtual address from the ARM920T core is modified by R13 internally to
create a modified virtual address. The MMU then translates the modified
virtual address from R13 by the CP15 register 3 into a physical address to
access external memory or a device. The MMU looks for the physical address
from the Translation Table Base (TTB) in system memory. It will also update
the TLB cache.
The TLB is two 64-entry caches, one for data and one for instruction. If the
physical address for the current virtual address is not found in the TLB (miss),
the processor will go to external memory and look for the TTB in system
memory. The internal translation table walks hardware steps through the page
table setup in external memory for the appropriate physical address.
When the physical address is acquired, the TLB is updated. When the address
is found in the TLB, system performance will increase since it will take
additional cycles to access memory and update the TLB.
Translation of system memory is done by breaking up the memory into
different size blocks called sections, large pages, small pages, and tiny pages.
System memory and registers can be remapped by the MMU. The block sizes
are as follows:
• Section - 1 Mbyte
•Large Page - 64 kbyte
•Small Page - 16 kbyte
38EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
• Tiny Page - 1 kbyte
2.2.3.2.2 Access Permission and Domains
ARM920T Core and Advanced High-Speed Bus (AHB)
OO
Access to any section or page of memory is dependent on its domain. The
page table in external memory also contains access permissions for all subdivisions of external memory. Access to specific instructions or data has three
possible states, assuming access is permitted:
•
•
•
2.2.3.2.3 MMU Enable
Enabling the MMU allows for system memory control, but is also required if
the data cache and the write buffer are to be used. These features are
enabled for specific memory regions, as defined in the system page table.
MMU enable is done via CP15 register 1. The procedure is as follows:
1. Program the Translation Table Base (TTB) and domain access control
2. Create level 1 and level 2 pages for the system and enable the cache and
3. Enable MMU - bit 0 of CP15 register 1.
Client
: Access permissions based on the section or page table descriptor
Manager
descriptor
No access
registers.
the write buffer.
: Ignore access permissions in the section or page table
: any attempted access generates a domain fault
2
2.2.3.3 Cache and Write Buffer
Cache configuration is 64-way set associative. There is a separate 16 kbyte
instruction and data cache. The cache has the following characteristics:
•8 words per line with 1 valid bit and 2 dirty bits per line for allowing halfline write-backs.
•Write-through and write-back capable, selectable per memory region
defined by the MMU.
•Pseudo random or round robin replacement algorithms for cache misses.
This is determined by the RR bit (bit 14 in CP15 register 1). An 8-word line
is reloaded on a cache miss.
•Independent cache lock-down with granularity of 1/64th of total cache
size or 256 bytes for both instructions and data. Lock-down of the cache
will prevent an eight-word cache line fill of that region of cache.
•For compatibility with Windows CE and to reduce latency, physical
addresses stored for data cache entries are stored in the PA TAG RAM to
be used for cache line write-back operations without need of the MMU,
which prevents a possible TLB miss that would degrade performance.
•Write Buffer is a 4-word instruction x 16-word data buffer. If enabled,
writes are sent to buffer directly from cache or from the CPU in the event
of a cache miss or cache not enabled.
2.2.3.3.1 Instruction Cache Enable
•At reset, the cache is disabled.
•A write to CP15 register 1, bit 12, will enable or disable the Instruction
Cache. If the Instruction Cache (I-Cache) is enabled without the MMU
enabled, all accesses are treated as cacheable.
•If disabled, current contents are ignored. If re-enabled before a reset,
contents will be unchanged but may not be coherent with main memory. If
so, contents must be flushed before re-enabling.
2.2.3.3.2 Data Cache Enable
•A write to CP15 register 1, bit 0, will enable or disable the Data Cache (DCache)/Write Buffer.
•D-Cache must only be enabled when the MMU is enabled. All data
accesses are subject to MMU and permission checks.
•If disabled, current contents are ignored. If re-enabled before a reset,
contents will be unchanged but may not be coherent with main memory.
Depending on system software, a clean and invalidate action may be
required before re-enabling.
2.2.3.3.3 Write Buffer Enable
•The Write bugger is enabled by the page table entries in the MMU. The
Write buffer is not enabled unless MMU is enabled.
2.2.4 Coprocessor Interface
The MaverickCrunch coprocessor is explained in detail in Chapter 3. The
relationship between the ARM coprocessor instructions and MaverickCrunch
coprocessor is also explained in Chapter 3.
The ARM coprocessor instruction set includes the following:
•LDC - Load coprocessor from memory
•STC - Store coprocessor register from memory
• MRC - Move to ARM register from coprocessor register
• MCR - Move to coprocessor register from ARM register
• Access to sixteen (C0 through C15) 64-bit registers to access the
coprocessor for data transfer and data manipulation to be used with the
above instructions. See Chapter 3, Section 3.2 on page 75 for a code
example.
40EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
ARM920T Core and Advanced High-Speed Bus (AHB)
2.2.5 AMBA AHB Bus Interface Overview
The AMBA AHB is designed for use with high-performance, high clock
frequency system modules. The AHB acts as the high-performance system
backbone bus. AHB supports the efficient connection of processors, on-chip
memories and off-chip external memory interfaces with low-power peripheral
functions. AHB is also specified to ensure ease of use in an efficient design
flow using synthesis and automated test techniques. Figure 2-2 shows a
typical AMBA AHB System.
AHB (Advanced High-Performance Bus) connects with devices that require
greater bandwidth, such as DMA controllers, external system memory, and
coprocessors. The AMBA AHB bus has the following characteristics:
• Burst Transactions
•Split Transactions
•Bus Master hand-over to devices, that is, DSP or DMA controller
•Single clock edge operations
OO
2
APB (Advanced Peripheral Bus) is a lower bandwidth lower power bus which
provides the following:
Peripherals that have high bandwidth or latency requirements are connected
to the EP9315 processor using the AHB bus. These include the external
memory interface, Vectored Interrupt Controllers (VIC1, VIC2), DMA,
LCD/Raster registers, USB host, IDE, Ethernet MAC and the bridge to the
APB interface. The AHB/APB Bridge transparently converts the AHB access
into the slower speed APB accesses. All of the control registers for the APB
peripherals are programmed using the AHB/APB bridge interface. The main
AHB data and address lines are configured using a multiplexed bus. This
removes the need for three state buffers and bus holders and simplifies bus
arbitration. Figure 2-3 shows the main data paths in the EP9315 AHB
implementation.
Figure 2-3. EP9315 Main Data Paths
VIC2
VIC1
Ethernet
ARM920T
18 Bit Raster
LCD I/F
SDRAM
Controller
E
B
I
Static
Memory/
PCMCIA
IDE
USB
Host
AHB
Maverick
Crunch
Boot ROM
DMA
UARTs
Timers
AHB/APB
bridge
RTC
Watchdog
Test
Support
APB
Touchscreen
8x8 Key Mtx
GPIOs
PWM
SPI
I2S
IrDA
PLL1PLL2
Clock & State
Control
AC97
42EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
ARM920T Core and Advanced High-Speed Bus (AHB)
Before an AMBA-to-AHB transfer can commence, the bus master must be
granted access to the bus. This process is started by the master asserting a
request signal to the arbiter. Then the arbiter indicates when the master will be
granted use of the bus. A granted bus master starts an AMBA-to-AHB transfer
by driving the address and control signals. These signals provide information
on the address, direction and width of the transfer, as well as indicating
whether the transfer forms part of a burst.
Two different forms of burst transfers are allowed:
•Incrementing bursts, which do not wrap at address boundaries
•Wrapping bursts, which wrap at particular address boundaries.
A write data bus is used to move data from the master to a slave, while a read
data bus is used to move data from a slave to the master. Every transfer
consists of:
• An address and control cycle
• One or more cycles for the data.
OO
2
In normal operation a master is allowed to complete all the transfers in a
particular burst before the arbiter grants another master access to the bus.
However, in order to avoid excessive arbitration latencies, it is possible for the
arbiter to break up a burst, and, in such cases, the master must re-arbitrate for
the bus in order to complete the remaining transfers in the burst.
2.2.7 Memory and Bus Access Errors
There are several possible sources of access errors.
•Reads to reserved or undefined register memory addresses will return
indeterminate data. Writes to reserved or undefined memory addresses
are generally ignored, but this behavior is not guaranteed. Many register
addresses are not fully decoded, so aliasing may occur. Addresses and
memory ranges listed as Reserved should not be accessed; access
behavior to these regions is not defined.
•Access to non-existent registers or memory may result in a bus error.
•Any access in the APB control register space will complete normally, as
these devices have no means of signaling an error.
•Access to non-existent AHB/APB registers may result in a bus error,
depending on the device and nature of the error. Device specific access
rules are defined in the device descriptions.
•External memory access is controlled by the Static Memory Controller
(SMC) and the Synchronous Dynamic RAM (SDRAM) controller. In
general, access to non-existent external memory will complete normally,
with reads returning random false data.
The arbitration mechanism is used to ensure that only one master has access
to the bus it controls at any one time. The arbiter performs this function by
observing a number of different requests to use the bus and deciding which is
currently the highest priority master requesting the bus.
The arbitration scheme can be broken down into three main areas:
• The main AHB system bus arbiter
•The SDRAM slave interface arbiter
•The EBI bus arbiter
2.2.8.1 Main AHB Bus Arbiter
This arbiter controls the bus master arbitration for the AHB bus. The AHB bus
has eight Master interfaces, these are:
•ARM920T
•DMA controller
• USB host (USB1, 2, 3)
• Ethernet MAC
• LCD/Raster and Raster Hardware Cursor.
These interfaces have an order of priority that is linked closely with the power
saving modes. The power saving modes of Halt and Standby force the arbiter
to grant the default bus master, in this case, the ARM920T.
In summary, the order of priority of the bus masters, from highest to lowest, is
shown in Table 2-1.
Table 2-1: AHB Arbiter Priority Scheme
Priority
Number
1Raster CursorRasterRasterRaster
2MACRaster CursorRaster CursorDMA
3USBMACDMAMAC
4DMAUSBUSBUSB
5ARM920TARM920T MACRaster Cursor
6RasterDMAARM920TARM920T
PRIORITY 00
(Reset value)
PRIORITY 01PRIORITY 10PRIORITY 11
The priority of the Arbiter can be programmed in the BusMstrArb register in
the Clock and State Controller. The Arbiter can also be programmed to
degrant one of the following masters: DMA, USB Host or Ethernet MAC, if an
interrupt (IRQ or FIQ) is pending or being serviced. This prevents one of these
masters from blocking important interrupt service routines. These masters are
44EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
prevented from accessing the bus, and their bus requests are masked, until
the IRQ/FIQ is removed (by the Interrupt Service Routine), at which point their
bus requests will be recognized. The default is to program the Arbiter so that it
does
not
degrant any of these masters.
In normal operation, when the ARM920T is granted the bus and a request to
enter Halt mode is received, the ARM920T is de-granted from the AHB bus.
Any other master requesting the bus in Halt mode (according to the priority)
will be granted the bus. In the case of the entry into Standby, the dummy
master will be granted the bus, which simply performs IDLE transfers. In this
way, all the masters except the ARM920T can be used during Halt mode, but
are shutdown during an entry into Standby.
2.2.8.2 SDRAM Slave Arbiter
The SDRAM controller has a slave interface for the main AHB bus and the
Raster controller DMA bus. In order to control the accesses to these memory
systems, the SDRAM controller has an arbiter that prioritizes between the
AHB and the Raster DMA bus. In this case the Raster controller bus is given
priority. If an access from the AHB is requested at the same time as a Raster
DMA, the Raster will be given access while the AHB request is queued.
ARM920T Core and Advanced High-Speed Bus (AHB)
OO
2
2.2.8.3 EBI Bus Arbiter
This arbiter is used to arbitrate between accesses from the SDRAM controller
and the Static Memory controller. The priority is given to accesses from the
SDRAM controller.
2.3 AHB Decoder
The AHB decoder contains the memory map for all the AHB masters/slaves
and the APB bridge. When a particular address range is selected, the
appropriate signal is generated. It is defined in Table 2-2.
(For additional information, see “Reference Documents”, on Page 4.)
Note: Due to decoding optimization, the AHB peripheral registers are aliased
throughout each peripherals register bank. Do not program access to an
unspecified register within the bank.
2.3.1 AHB Bus Slave
An AHB slave responds to transfers initiated by bus masters within the
system. The slave uses signals from the decoder to determine when it should
respond to a bus transfer. All other signals required for the transfer, such as
the address and control information, are generated by the bus master.
2.3.2 AHB to APB Bridge
The AHB to APB bridge is an AHB slave, providing an interface between the
high-speed AHB and the low-power APB. Read and write transfers on the
AHB are converted into equivalent transfers on the APB. As the APB is not
pipelined. Wait states are added during transfers to and from the APB when
the AHB is required to wait for the APB.
The main sections of this module are:
•AHB slave bus interface
• APB transfer state machine, which is independent of the device memory
map
• APB output signal generation.
2.3.2.1 Function and Operation of APB Bridge
The APB bridge responds to transaction requests from the currently granted
AHB master. The AHB transactions are then converted into APB transactions.
46EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
If an undefined location is accessed, operation of the system continues as
normal, but no peripherals are selected. The APB bridge acts as the only
master on the APB.
Note: Due to decoding optimization, the APB peripheral registers are aliased
throughout each peripherals register bank. Do not program access to an
unspecified register within the bank.
2.3.3 APB Bus Slave
An APB slave responds to transfers initiated by bus masters within the
system. The slave uses signals from the decoder to determine when it should
respond to a bus transfer. All other signals required for the transfer, such as
the address and control information, are generated by the APB bridge.
2.3.4 Register Definitions
ARM has thirty seven 32-bit internal registers, some are modal, some are
banked. If operating in Thumb mode, the processor must switch to ARM mode
before taking an exception. The return instruction will restore the processor to
Thumb state. Most tasks are executed out of User mode.
2
User: Unprivileged normal operating mode
FIQ:Fast interrupt (high priority) mode when FIQ is asserted
IRQ:Interrupt request (normal) mode when IRQ is asserted
Supervisor:Software interrupt instruction (SWI) or reset will cause entry
into this mode
Abort:Memory access violation will cause entry into this mode
Undef:Undefined instructions
System:Privileged mode. Uses same registers as user mode
Table 2-4 illustrates the use of all registers for the following ARM920T
operating modes. Each will bank or store a specific number of registers.
Banked register information is not shared between modes. FIQs bank the
fewest number of registers which increases performance.
48EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
Table 2-4: Register Organization Summary
UserSystemSupervisorAbortUndefinedIRQFIQ
r0r0r0r0r0r0r0
r1r1r1r1r1r1r1
r2r2r2r2r2r2r2
r3r3r3r3r3r3r3
r4r4r4r4r4r4r4
r5r5r5r5r5r5r5
r6r6r6r6r6r6r6
r7r7r7r7r7r7r7
r8r8r8r8r8r8
r9r9r9r9r9r9
r10r10r10r10r10r10
r11r11r11r11r11r11
r12r12r12r12r12r12
r13(sp)r13
r14(lr)r14
r15(pc)pcpcpcpcpcpc
ARM920T Core and Advanced High-Speed Bus (AHB)
Priveledged Modes
Exception Modes
r8_fiq
r9_fiq
r10_fiq
r11_fiq
r12_fiq
r13_svcr13_abtr13_undr13_irqr13_fiq
r14_svcr14_abtr14_undr14_irqr14_fiq
OO
2
Thumb
state low
registers
Thumb
state high
registers
cpsrcpsrcpsrcpsrcpsrcpsrcpsr
spsr_svcspsr_abtspsr_undspsr_irqspsr_fiq
Note: Colored areas represent banked registers.
User mode in Thumb state generally limits access to r0-r7. There are six
instructions that allow access to the high registers. For these 6 exceptions, the
processor must revert to ARM state. These exceptions are:
• r0-r12: General purpose read/write 32-bit registers
• r13 (sp): Stack Pointer
• r14 (lr): Link Register
• r15 (pc): Program Counter
• cpsr: Current Program Status Register (contains condition codes and
operating modes)
• spsr: Saved Program Status Register (saves CPSR when exception
occurs)
2
The ARM920T core has 16 coprocessor registers for control over the core.
Updates to the coprocessor registers are written using the CP15 instruction.
Table 2-5 describes the CP15 ARM920T registers.
Table 2-5: CP15 ARM920T Register Description
RegisterDescription
ID Code: (Read/Only) This register returns a 32-bit device code. ID Code data represents
the core type, revision, part number etc. Access to this register is done with the following
instruction:
0
1
2
3
MRC p15 0, Rd, c0, c0, 0
Cache Code: This will also return cache type, size and length of both I-Cache and D-
Cache, size, and associativity. This is accessed with:
MRC p15 0, Rd, c0, c0, 1
Control Register: (Read/Write) Use this register to enable MMU, instruction and data
cache, round robin replacement ‘RR’-bit, system protection, ROM protection, clocking
mode. Read/Write Instructions:
MRC p15, 0, Rd, c1, c0, 0 - Read control register - value stored in Rd
MCR p15, 0, Rd, c1, c0, 0 - Write control register - value first loaded into Rd
Translation Base Table: (Read/Write) This register contains the start address of the first
level translation table. Upper18 bits represent the pointer to table base. Lower 14 bits
should be 0 for a write, unpr edictable if read.
MRC p15, 0, Rd, c2, c0, 0 - Read TTB
MCR p15, 0, Rd, c2, c0, 0 - Write TTB
Domain Access Control: (Read/Write) This register specifies permissions for all 16
The overall memory map for the device is shown in Table 2-6.
If internal Boot Mode is selected and the register BootModeClr has been
written, the address range 0x0000_0000 -> 0x0000_FFFF is occupied by the
internal Boot ROM until the internal Boot Code is completed and then the map
reverts back to either Synchronous or Asynchronous memory in this address
space.
NOTE: Some memory locations are listed as Reserved. These memory
locations should not be used. Reading from these memory locations will yield
invalid data. Writing to these memory locations may cause unpredictable
results.
10
13
15
TLB Lockdown: (Read/Write) Prevents TLB entries from being erased during a table walk.
MRC p15, 0, Rd, c10, c0, 1- Write lockdown base pointer for data TLB entry
MRC p15, 0, Rd, c10, c0, 1 - Write lockdown base pointer for instruction TLB entry
Reserved
FCSE PID Register: (Read/Write) Addresses by the ARM9TDMI core in a range from 0 to
32MB are translated by this register to A + FCSE*32MB and remapped. If turned off,
straight address map to the MMU result.
Test Register Only: Reads or writes will cause unpredictable behavior.
2
Table 2-6: Global Memory Map for the Two Boot Modes
Table 2-6: Global Memory Map for the Two Boot Modes (Continued)
Address RangeSync Memory BootAsync Memory Boot
2
0x0000_0000 - 0x0000_FFFF
Note: The shaded areas are the memory areas dedicated to system registers. Details
of these registers are in Table 2-7.
2.3.6 Internal Register Map
Registers are set to their default state by the RSTOn pin and by the PRSTn pin
inputs. Some state conserving registers are reset only by the PRSTn pin.All
registers are read/write unless specified otherwise.
2.3.6.1 Memory Access Rules
Any memory address not specifically assigned to a register should be
avoided. Reads to register memory addresses labelled Reserved, Unused or
Undefined will return indeterminate data. Writes to register memory addresses
labelled Reserved, Unused or Undefined are generally ignored, but this
behavior is not guaranteed. Many register addresses are not fully decoded, so
aliasing may occur. Addresses and memory ranges listed as Reserved
(RSVD) should not be accessed; access behavior to these regions is not
defined.
ASD0 Pin = 1ASD0 Pin = 0
Sync memory (nSDCE3)
or
Internal Boot ROM
if INTBOOT selected
Async memory (nCS0)
Internal Boot ROM
if INTBOOT selected
or
The SW Lock field identifies registers with a software lock. The software lock
prevents the register from being written unless a proper unlock operation is
performed immediately prior to writing the target register. Any register whose
accidental alteration could cause system damage is controlled with a software
lock. Each peripheral with software lock capability has its own software lock
register.
Within a register definition, a reserved bit, indicated the name RSVD, means
the bit is not accessible. Software should mask the RSVD bits when doing bit
reads. RSVD bits will ignore writes, that is writing a zero or a one does not
matter.
Register bits identified as NC must be treated in a specific manner for reads
and writes; see the register description for each register for information on
how to read and write register bits identified as NC. Register bits identified as
NC are functionally alive but have an undocumented or a “don’t care”
operating function. The register description will provide information on how to
handle NC bits.
Unless specified otherwise, all registers can be accessed as a byte, half-word,
or word.
52EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
CAUTION: Some memory locations are listed as Reserved. These memory
locations should not be used. Reading from these memory locations will yield
invalid data. Writing to these memory locations may cause unpredictable
results.
The MaverickCrunch coprocessor accelerates IEEE-754 floating point
arithmetic and 32-bit and 64-bit fixed point arithmetic operations. It provides
an integer multiply-accumulate (MAC) that is considerably faster than the
native MAC implementation in the ARM920T. The MaverickCrunch
coprocessor significantly accelerates the arithmetic processing required to
encode/decode digital audio formats.
The MaverickCrunch coprocessor uses the standard ARM920T coprocessor
interface, sharing its memory interface and instruction stream. All
MaverickCrunch operations are simply ARM920T coprocessor instructions.
The coprocessor handles all internal inter-instruction dependencies by using
internal data forwarding and inserting wait states.
PP
Chapter 3
3MaverickCrunch Coprocessor
3
3.1.1 Features
Key features include:
•IEEE-754 single and double precision floating point
• 32/64-bit integer
• Add/multiply/compare
•Integer Multiply-Accumulate (MAC) 32-bit input with 72-bit accumulate
•Integer Shifts
• Floating point to/from integer conversion
• Sixteen 64-bit registers
•Four 72-bit accumulators
3.1.2 Operational Overview
The MaverickCrunch coprocessor is a true ARM920T coprocessor. It
communicates with the ARM920T via the coprocessor bus and shares the
instruction stream and memory interface of the ARM920T. It runs at the
ARM920T core clock frequency (either FCLK or BCLK).
The coprocessor supports four primary data formats:
•IEEE-754 single precision floating point (24-bit signed significand and 8-
• IEEE-754 double precision floating point (53-bit signed significand and
bit biased exponent)
11-bit biased exponent)
3
• 32-bit integer
• 64-bit integer
The coprocessor performs the following standard operations on all four
supported data formats:
•addition
•subtraction
•multiplication
• absolute value
• negation
•logical left/right shift
•comparison
In addition, for 32-bit integers, the coprocessor provides:
• multiply-accumulate (MAC)
• multiply-subtract (MSB)
Any of the four data formats may be converted to another of the formats. All
four data types may be loaded directly from and stored directly to memory via
the ARM920T coprocessor interface. They may also be moved to or from
ARM920T registers.
The MaverickCrunch coprocessor also provides a 72-bit extended precision
integer format that is used only in the accumulators. The accumulators may
also be used in MAC and MSB operations.
IEEE-754 rounding and exceptions are also provided. Four rounding modes
for floating point operations are:
• round to nearest
•round toward
•round toward -∞
•round toward 0
Exceptions include:
•Invalid operator
•Overflow
70EP9315 User’s Manual - DS638UM1
+∞
Copyright 2004 Cirrus Logic
•Underflow
•Inexact
Note that the division by zero exception is not supported as the
MaverickCrunch coprocessor does not provide division or square root.
3.1.3 Pipelines and Latency
There are two primary pipelines within the MaverickCrunch coprocessor. One
handles all communication with the ARM920T, while the other, the “data path”
pipeline, handles all arithmetic operations (this one actually operates at one
half the MaverickCrunch coprocessor clock frequency).
The data path pipeline may run synchronously or asynchronously with respect
to the ARM instruction pipeline. If run asynchronously, data path computation
is decoupled from the ARM, allowing high throughput, though arithmetic
exceptions are not synchronous. If run synchronously, exceptions are
synchronous, but throughput suffers.
MaverickCrunch Coprocessor
PP
3
Assuming no inter-instruction dependencies causing pipeline stalls, arithmetic
instructions can produce a new result every two ARM920T clocks which is a
maximum throughput of one data path instruction per eight ARM920T clocks.
The only exception is 64-bit multiplies (CFMULD or CFMUL64), which require
six extra ARM920T clocks to produce their result, which is maximum
throughput of eight ARM920T clocks per instruction.
The normal latency for an arithmetic instruction is approximately nine
ARM920T clocks, from initial decode to the time the result is written to the
register file. A 64-bit multiply requires 15 clocks.
3.1.4 Data Registers
The MaverickCrunch coprocessor contains the following registers:
•16 64-bit general purpose registers, c0 through c15
• 4 72-bit accumulators, a0 through a3
• 1 status and control register, DSPSC
A single precision floating point value is stored in the upper 32 bits of a 64-bit
register and must be explicitly promoted to double precision to be used in
double precision calculations:
A 32-bit integer is stored in the lower 32 bits of a 64-bit register and signextended when written, provided the UI bit in the DSPSC is clear:
6332 31 300
Sign ExtensionSignData
Hence, 32-bit integers may be used directly in calculations with 64-bit
integers, which are stored as shown:
63620
Sign Data
3.1.5 Integer Saturation Arithmetic
By default, the coprocessor treats all 32-bit and 64-bit integers as signed
values and automatically saturates the results of most integer operations and
all conversions from floating-point to integer format. Instructions that may
saturate their results are:
• CFADD32 and CFADD64
•CFSUB32 and CFSUB64
• CFMUL32 and CFMUL64
• CFMAC32 and CFMSC32
• CFCVTS32 and CFCVTD32
• CFTRUNCS32 and CFTRUNCD32
This behavior, however, can be altered by setting the UI bit and the ISAT bit in
the DSPSC. With the UI bit clear (the default), 32-bit and 64-bit integer
operations are treated as signed with respect to overflow and underflow
detection and saturation as well as compare operations. Setting the UI bit
causes the MaverickCrunch coprocessor to treat all 32-bit and 64-bit integer
operations as unsigned with respect to overflow, underflow, saturation, and
comparison.
With saturation enabled (the default), the maximum representable value is
returned on overflow and the minimum representable value is returned on
72EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
underflow. The maximum and minimum values depends on the operand size
and whether the UI bit in the DSPSC is set, as shown in Table 3-1.
Table 3-1: Saturation for Non-accumulator Instructions
Signed
Overflow
Unsigned
Signed
Underflow
Unsigned
To disable saturation on overflow and underflow, set the ISAT bit in the
DSPSC.
Normally, arithmetic instructions that write to an accumulator do not saturate
their results on overflow or underflow. These instructions are:
However, the SAT[1:0] bits in the DSPSC may be set to select one of several
kinds of saturation to occur on the results of these instructions before they are
written to an accumulator.
Note: This action does not affect the operation of instructions that do not write their
result to an accumulator.
Enabling saturation also modifies the representation of data stored in the
accumulator. The three supported bit formats and their maximum and
minimum saturation values are shown in Table 3-2 on page 73.
The bit format x.yy represents x binary bits before the decimal point and yy
fraction bits after decimal point, as for example, when the bit format 2.62 has
two binary bits and sixty-two fraction bits. Though these formats utilize either
32- or 64-bit integers, the accumulators are 72 bits wide. If the accumulator
saturation mode is disabled (the default), the accumulator bit fields are
assigned as below for a 2’s complement integer.
If the saturation mode 1.63 is selected, the bit field assignments are:
7164 63620
Sign Extension SignData
3
If the saturation mode 1.31 is selected, the bit field assignments are:
7164 636232 310
Sign Extension SignDataUnused
If the saturation mode 2.62 is selected, the bit field assignments are:
7163 62610
Sign ExtensionSignData
3.1.6 Comparisons
The Crunch coprocessor provides four compare operations:
•CFCMP32 - 32-bit integer
•CFCMP64 - 64-bit integer
•CFCMPS - single floating point
• CFCMPD - double floating point
The DSPSC register bit UINT affects the operation of integer comparisons. If
clear, integers are treated as signed values, and if set, they are treated as
unsigned. DSPSC.UINT has no effect on floating point comparisons.
All compare operations update both the FCC[1:0] bits in the DSPSC register
and an ARM register. Though any of the ARM general purpose registers r0
through r14 may be specified as the destination, specifying r15 actually
updates the CPSR flag bits NZCV. This permits the condition code field of any
subsequent ARM instruction to gate the execution of that instruction based on
the result of a Crunch compare operation.
Table 3-3 on page 75 illustrates the legal relationships and, for each one, the
values written to the FCC bits and the NZCV flags. The FCC bits and the
NZCV flags provide the same information, but in different ways and in different
places. Their values depend only on the relationship between the operands,
regardless of whether the operands are considered signed integer, unsigned
integer, or floating point. The unordered relationship can only apply to floating
point operands.
74EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
Table 3-3: Comparison Relationships and Their Results
RelationshipFCC[1:0]NCZV
MaverickCrunch Coprocessor
PP
AB=
AB<
AB>
Unordered110000
The NZCV flags are not computed exactly as with integer comparisons using
the ARM CMP instruction. Hence, when examining the result of Crunch
comparisons, the condition codes field of ARM instructions should be
interpreted differently, as shown in Table 3-4 on page 75. The same six
condition codes should be used whether the comparison operands were
signed integers, unsigned integers, or floating point. No other condition codes
are meaningful.
Table 3-4: ARM Condition Codes and Crunch Compare Results
Condition Code
Opcode[31:28] Mnemonic
0000EQEqualEqual
0001NENot EqualNot Equal
1010GESigned Greater Than or Equal Greater Than or Equal
The examples below show two algorithms, each implemented using the
standard programming languages and the MaverickCrunch instruction set.
3.2.1 Example 1
Sections 3.2.1.2, 3.2.1.3, and 3.2.1.4, show three coding samples performing
the same operation. Section 3.2.1.1 on page 76 shows common setup code
used by all three samples. Section 3.2.1.2 on page 76 shows the program
implemented in C code. Section 3.2.1.3 on page 76 uses ARM assembly
language, accessing the MaverickCrunch with ARM coprocessor instructions.
3.2.1.3 Accessing MaverickCrunch with ARM Coprocessor Instructions
ldc p5, c0, [r0, #0x0] ; data section preloaded with 0x0 (“num”)
ldc p5, c1, [r0, #0x4] ; data section preloaded with 0xa
ldc p5, c2, [r0, #0x8] ; data section preloaded with 0x1
ldc p5, c3, [r0, #0xc] ; data section preloaded with 0x5
loop
cdp p5, 1, c0, c0, c3, 0 ; c0 <= c0 * 5
cdp p5, 3, c0, c0, c2, 6 ; c0 <= c0 - 1
mrc p5, 0, r15 c0, c1, 4 ; c0 < 10 ?
blt loop ; yes
stc p5, c0, [r0, #0x0] ; no, store result
3.2.1.4 MaverickCrunch Assembly Language Instructions
cfldr32 c0, [r0, #0x0] ; data section preloaded with 0x0 (“num”)
cfldr32 c1, [r0, #0x4] ; data section preloaded with 0xa
cfldr32 c2, [r0, #0x8] ; data section preloaded with 0x1
cfldr32 c3, [r0, #0xc] ; data section preloaded with 0x5
loop
cfmul32 c0, c0, c3 ; c0 <= c0 * 5
cfsub32 c0, c0, c2 ; c0 <= c0 - 1
cfcmp32 r15, c0, c1 ; c0 < 10 ?
blt loop ; yes
cfstr32 c0, [r0, #0x0] ; no, store result
3.2.2 Example 2
The following function performs an FIR filter on the given input stream. The
variable “data” points to an array of floating point values to be filtered, “n” is the
number of samples for which the filter should be applied, “filter” is the FIR filter
76EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
3.2.2.1 C Code
MaverickCrunch Coprocessor
to be applied, and “m” is the number of taps in the FIR filter. The “data” array
must be “n + m - 1” samples in length, and “n” samples will be produced.
void
ComputeFIR(float *data, int n, float *filter, int m)
{
int i, j;
float sum;
for(i = 0; i < n; i++)
{
sum = 0;
for(j = 0; j < m; j++)
{
sum += data[i + j] * filter[j];
}
PP
3
data[i] = sum;
}
}
3.2.2.2 MaverickCrunch Assembly Language Instructions
MaverickCrunch Status and Control Register. Accessed only via the
MaverickCrunch instruction set. All bits, including status bits, are both
readable and writable. This register should generally be written only using a
read-modify-write sequence.
RSVD:Reserved. Unknown During Read.
INST: Exception Instruction. Whenever an unmasked exception
occurs, these 32 bits are loaded with the instruction that
caused the exception. Hence, this contains the instruction
that caused the most recent unmasked exception.
DAID: MaverickCrunch Architecture ID. This read-only value is
incremented for each revision of the overall
MaverickCrunch coprocessor architecture. These bits are
“000” for this revision.
HVID: Hardware Version ID. This read-only value is incremented
each time the hardware implementation of the architecture
named by DAID[2:0] is changed, typically done in
response to bugs. These bits are “000” for this version.
78EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
MaverickCrunch Coprocessor
ISAT:Integer Saturate Enable. This bit controls whether non-
accumulator integer operations, both signed and
unsigned, will saturate on overflow or underflow.
0 = Saturation enabled.
1 = Saturation disabled.
PP
UI:Unsigned Integer Enable. This bit controls whether non-
accumulator integer operations treat their operands as
signed or unsigned. It also determines the saturation value
if the ISAT bit is clear.
0 = Signed integers.
1 = Unsigned integers.
INT:MaverickCrunch Interrupt. This bit indicates whether an
interrupt has occurred. This bit is identical to the external
interrupt signal.
0 = No interrupt signaled.
1 = Interrupt signaled.
AEXC:Asynchronous Exception Enable. This bit determines
whether exceptions generated by the coprocessor are
signaled synchronously or asynchronously to the
ARM920T. Synchronous exceptions force all data path
instructions to be serialized and to stall the ARM920T. If
exceptions are asynchronous, they are signalled by
assertion of the DSPINT output of the coprocessor, which
may interrupt the ARM920T via the interrupt controller.
Enabling asynchronous exceptions does provide a
performance improvement, but makes it difficult for an
interrupt handler to determine the coprocessor instruction
that caused the exception because the address of the
instruction is not preserved. Exceptions may be
individually enabled by other bits in this register (IXE, UFE,
OFE, and IOE). This bit has no effect if no exceptions are
enabled.
0 = Exceptions are synchronous.
1 = Exceptions are asynchronous
3
SAT[1:0]:Accumulator saturation mode select. These bits are set to
select the saturation mode or to disable the saturation for
accumulator operations.
0X = Saturation disabled for accumulator operations
10 = Accumulator saturation enabled, bit formats 1.63 and
1.31
11 = Accumulator saturation enabled, bit format 2.62
00 = Operand A equals operand B.
01 = Operand A less than operand B.
10 = Operand A greater than operand B.
11 = Operands are unordered (at least one is NaN).
3
V:Overflow Flag. Indicates the overflow status of the
previous integer operation.
0 = No overflow.
1 = Overflow.
FWDEN:Forwarding Enable. This bit determines whether data path
writeback results are forwarded to the data path operand
fetch stage and to the STC/MRC execute stage. When
pipeline interlocks occur due to dependencies of data
path, STC, and MRC instruction source operands on data
path results, setting this bit will improve instruction
throughput.
0 = Forwarding not enabled.
1 = Forwarding enabled.
Invalid:0 = No invalid operations detected
1 = An invalid operation was performed.
Denorm: 0 = No denormalized numbers have been supplied as
instruction operands
1 = a denormalized number has been supplied as an
instruction operand.
trapping for IEEE 754 invalid operator exceptions.
0 = Disable software trapping for invalid operator
exceptions.
1 = Enable software trapping for invalid operator
exceptions.
IX:Inexact. Set when an IEEE 754 inexact exception occurs,
regardless of whether or not software trapping for inexact
exceptions is enabled. Writing a “0” to this position clears
the status bit.
0 = No inexact exception detected.
1 = Inexact exception detected.
UF:Underflow. Set when an IEEE 754 underflow exception
occurs, regardless of whether or not software trapping for
underflow exceptions is enabled. Writing a “0” to this
position clears the status bit.
0 = No underflow exception detected.
1 = Underflow exception detected.
PP
3
OF:Overflow. Set when an IEEE 754 overflow exception
occurs, regardless of whether or not software trapping for
overflow exceptions is enabled. Writing a “0” to this
position clears the status bit.
0 = No overflow exception detected.
1 = Overflow exception detected.
IO:Invalid Operator. Set when an IEEE 754 invalid operator
exception occurs, regardless of whether or not software
trapping for invalid operator exceptions is enabled. Writing
a “0” to this position clears the status bit.
0 = No invalid operator exception detected.
1 = Invalid operator exception detected.
3.4 ARM Coprocessor Instruction Format
The ARM V4T architecture defines five ARM coprocessor instructions:
• CDP - Coprocessor Data Processing
•LDC - Load Coprocessor
• STC - Store Coprocessor
•MCR - Move to Coprocessor Register from ARM Register
• MRC - Move to ARM Register from Coprocessor Register
The coprocessor instruction assembler notation is found in the ARM
programming manuals or the Quick Reference Card. (For additional
(U=1) or subtracted from a base register (U=0). This bit is ignored by the
MaverickCrunch coprocessor.
•N: Specifies the width of a data type involved in a move operation. The
MaverickCrunch coprocessor uses this bit to distinguish between single
precision floating point/32-bit integer numbers (N=0) and double precision
floating point/64-bit integer numbers (N=1).
•W: Specifies whether or not a calculated address is written back to a base
register (W=1) or not (W=0). This bit is ignored by the MaverickCrunch
coprocessor.
•offset: An 8-bit word offset used in address calculations. These bits are
ignored by the MaverickCrunch coprocessor.
Table 3-6, below, and Table 3-7, Table 3-8, and Table 3-9 on page 85, define
the bit values for opcode2, opcode1, and cp_num for all of the
MaverickCrunch instructions.
• CRd, CRn, and CRm each refer to any 16 general purpose
MaverickCrunch registers unless otherwise specified
• CRa refers to any of the MaverickCrunch accumulators
• Rd and Rn refer to any of the ARM920T general purpose registers
• <imm> refers to a seven-bit immediate value
The remainder of this section describes in detail each of the individual
MaverickCrunch instructions. The fields in the opcode for each
MaverickCrunch instruction are shown. When specific bit values are required
for the instruction, they are shown as either '1' or '0'. Any field whose value
may vary, such as a register index, is named as in the ARM programming
manuals, and its function described below.
Fields that are ignored by the coprocessor are shaded. Dark shading implies
that a field is processed by the ARM itself and can have any value, while light
shading indicates that the field, though ignored by both the ARM and the
coprocessor, should have the value shown.
Table 3-10: MaverickCrunch Instruction Set
Maverick
Crunch
Coprocessor
Instruction
Type
LoadsLDC
StoresSTC
Moves to
coprocessor
ARM
Coprocessor
Instruction
Typ e
MCR
InstructionDescription
cfldrs CRd, [Rn]Load CRd with single stored at address in Rn
cfldrd CRd, [Rn]Load CRd with double stored at address in Rn
cfldr32 CRd, [Rn]
cfldr64 CRd, [Rn]Load CRd with 64-bit integer stored at address in Rn
cfstrs CRd, [Rn]Store single in CRd at address in Rn
cfstrd CRd, [Rn]Store double in CRd at address in Rn
cflstr32 CRd, [Rn]Store 32-bit integer in CRd at address in Rn
cfstr64 CRd, [Rn]Store 64-bit integer in CRd at address in Rn
cfmvsr CRn, RdMove single from Rd to CRn[63:32]
cfmvdlr CRn, RdMove lower half of double from Rd to CRn[31:0]
cfmvdhr CRn, RdMove upper half of double from Rd to CRn[63:32]
cfmv64lr CRn, Rd
cfmv64hr CRn, RdMove upper half of 64-bit integer from Rd to CRn[63:32]
Load CRd with 32-bit integer stored at address in Rn, sign extend through
bit 63
Move lower half of 64-bit integer from Rd to CRn[31:0], sign extend bit 31
through bits [63:31]
86EP9315 User’s Manual - DS638UM1
Copyright 2004 Cirrus Logic
Table 3-10: MaverickCrunch Instruction Set (Continued)
MaverickCrunch Coprocessor
PP
Maverick
Crunch
Coprocessor
Instruction
Type
Moves from
coprocessor
Moves to
accumulator
Moves from
accumulator
Move to
DSPSC
Move from
DSPSC
ARM
Coprocessor
Instruction
Typ e
MRC
CDP
CDP
CDP
InstructionDescription
cfmvsr Rd, CRnMove single from CRn[63:32] to Rd
cfmvrdl Rd, CRnMove lower half of double from CRn[31:0] to Rd
cfmvrdh Rd, CRnMove upper half of double from CRn[63:32] to Rd
cfmvr64l Rd, CRnMove lower half of 64-bit integer from CRn[31:0] to Rd
cfmvr64h Rd, CRnMove upper half of 64-bit integer from CRn[63:32] to Rd
cfmval32 CRd, CRnMove 32-bit integer from CRn [31:0] to accumulator CRd[31:0]
cfmvam32 CRd, CRnMove 32-bit integer from CRn [31:0] to accumulator CRd[63:32]
cfmvah32 CRd, CRn
cfmva32 CRd, CRn
cfmva64 CRd, CRn
cfmv32al CRd, CRnMove accumulator CRn[31:0] to 32-bit integer CRd[31:0]
cfmv32am CRd, CRnMove accumulator CRn[63:32] to 32-bit integer CRd[31:0]
cfmv32ah CRd, CRnMove accumulator CRn[71:64] to lower 8 bits of 32-bit integer CRd[31:0]
cfmv32a CRd, CRn
cfmv64a CRd, CRn
cfmvsc32 CRd, CRnMove CRd to DSPSC; CRn is ignored
cfmv32sc CRd, CRnMoves DSPSC to CRd; CRn is ignored
Move lower 8 bits of 32-bit integer from CRn [7:0] to accumulator
CRd[71:64]
Move 32-bit integer from CRn[31:0] to accumulator CRd[31:0] and sign
extend through bit 71
Move 64-bit integer from CRn to accumulator CRd[63:0] and sign extend
through bit 71
Saturate to 32-bit integer and move accumulator CRn[31:0] to 32-bit
integer CRd[31:0]
Saturate to 64-bit integer and move accumulator CRn[63:0] to 64-bit
integer CRd