TO THE EXTENT PERMITTED BY APPLICABLE LAW, CYPRESS MAKES NO WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, WITH REGARD TO THIS DOCUMENT OR ANY SOFTWARE OR ACCOMPANYING HARDWARE, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. No computing device can be absolutely secure. Therefore, despite security measures implemented in Cypress hardware or software products, Cypress shall have no liability arising out of any security breach, such as unauthorized access to
or use of a Cypress product. CYPRESS DOES NOT REPRESENT, WARRANT, OR GUARANTEE THAT CYPRESS PRODUCTS, OR SYSTEMS CREATED USING CYPRESS PRODUCTS, WILL BE FREE FROM CORRUPTION, ATTACK,
VIRUSES, INTERFERENCE, HACKING, DATA LOSS OR THEFT, OR OTHER SECURITY INTRUSION (collectively, "Security Breach").Cypress disclaims any liability relating to any Security Breach, and you shall and hereby do release Cypress
from any claim, damage, or other liability arising from any Security Breach. In addition, the products described in these materials may contain design defects or errors known as errata which may cause the product to deviate from published specifications. To the extent permitted by applicable law, Cypress reserves the right to make changes to this document without further
notice. Cypress does not assume any liability arising out of the application or use of any product or circuit described in this
document. Any information provided in this document, including any sample design information or programming code, is provided only for reference purposes. It is the responsibility of the user of this document to properly design, program, and test the
functionality and safety of any application made of this information and any resulting product. “High-Risk Device” means any
device or system whose failure could cause personal injury, death, or property damage. Examples of High-Risk Devices are
weapons, nuclear installations, surgical implants, and other medical devices. “Critical Component” means any component of
a High-Risk Device whose failure to perform can be reasonably expected to cause, directly or indirectly, the failure of the
High-Risk Device, or to affect its safety or effectiveness. Cypress is not liable, in whole or in part, and you shall and hereby do
release Cypress from any claim, damage, or other liability arising from any use of a Cypress product as a Critical Component
in a High-Risk Device. You shall indemnify and hold Cypress, its directors, officers, employees, agents, affiliates, distributors,
and assigns harmless from and against all claims, costs, damages, and expenses, arising out of any claim, including claims
for product liability, personal injury or death, or property damage arising from any use of a Cypress product as a Critical Component in a High-Risk Device. Cypress products are not intended or authorized for use as a Critical Component in any HighRisk Device except to the limited extent that (i) Cypress's published data sheet for the product explicitly states Cypress has
qualified the product for use in a specific High-Risk Device, or (ii) Cypress has given you advance written authorization to use
the product as a Critical Component in the specific High-Risk Device and you have signed a separate indemnification agreement.
Cypress, the Cypress logo, Spansion, the Spansion logo, and combinations thereof, WICED, PSoC, CapSense, EZ-USB, FRAM, and Traveo are trademarks or registered trademarks of Cypress in the United States and other countries. For a more
complete list of Cypress trademarks, visit cypress.com. Other names and brands may be claimed as property of their respective owners.
EZ-USB® FX3™ is Cypress's high-bandwidth USB 3.0 peripheral controller that provides integrated and flexible features.
FX3 integrates the USB 3.0 and USB 2.0 physical layers (PHYs) along with a 32-bit ARM926EJ-S microprocessor for
powerful data processing and for building custom USB SuperSpeed applications.
To provide high-bandwidth access to USB 3.0 data, FX3 contains a hardware unit called General Programmable Interface,
Generation 2 (GPIF II). GPIF II is an enhanced version of the GPIF in FX2LP™, Cypress's USB 2.0 product. GPIF II provides
easy and glueless connectivity to popular interfaces such as asynchronous SRAM and asynchronous and synchronous
address and data multiplexed interfaces. FX3 implements a DMA-centric architecture that enables direct 375-MBps data
transfer from GPIF II to the USB interface without CPU intervention.
An integrated USB 2.0 USB On-The-Go (OTG) controller enables applications in which FX3 may serve dual high-speed roles;
for example, EZ-USB FX3 may function as a High-Speed On-The-Go (HS-OTG) host to USB Mass Storage Class (MSC)
devices and HID-class devices. FX3 contains 512 KB or 256 KB of on-chip SRAM for code and data. EZ-USB FX3 also
provides interfaces to connect to serial peripherals such as UART, SPI, I2C, and I2S. FX3 comes with application
development tools. The software development kit (SDK) provides application examples for accelerating the time to market.
FX3 complies with the USB 3.0 v1.0 specification and is also backward compatible with USB 2.0. It also complies with the
USB 2.0 OTG Specification v2.0.
In addition to these features, FX3S has an integrated storage controller and can support up to two independent mass storage
devices. It can support SD 3.0 and eMMC 4.41 memory cards. It can also support SDIO on these ports.
This TRM describes the following functional blocks of the FX3/FX3S device: CPU subsystem, memory, global control, DMA,
USB, GPIF II, low-bandwidth (serial and GPIO) peripherals, and storage (storage block is specific to FX3S only). Registers
associated with these functional blocks are documented in the Registers chapter on page 228.
The following sections describe the details of USB 3.0 and briefly outline the EZ-USB FX3/FX3S architecture.
1.1Overview of USB 3.0
This section gives an overview of USB 3.0 and describes the significant changes in each layer from the USB 2.0 specification,
including the new power management features provided in USB 3.0. Refer to AN57294 - USB 101: An Introduction to
Universal Serial Bus 2.0 for more details on USB 2.0. The USB 2.0 and 3.0 specifications can be downloaded from
www.usb.org/developers/docs.
The USB 3.0 specification, released in 2008, allows a maximum signaling rate of 5 Gbps (SuperSpeed), which is 10 times the
signaling rate of High Speed. The USB 3.0 architecture contains three layers: the physical layer, the link layer on top of the
physical layer, and the protocol layer on top of the link layer.
The additional bandwidth provided by USB SuperSpeed transactions can benefit applications such as real-time audio and
video streaming that require a higher bus bandwidth at regular intervals. Mass storage applications can also benefit from the
SuperSpeed bandwidth. FX3 has also been used to implement a high-performance PC-based logic analyzer.
1.1.1Physical Layer
The physical layer refers to the PHY part of the port and the cable connecting the upstream and downstream ports. USB 3.0
cables have separate shielded differential pairs of lines for transmitting and receiving data. These lines exist along with the
USB 2.0 signals. So a USB 3.0 cable contains a total of nine wires including the four wires that are part of the USB 2.0 cable
(see Figure 1-1). The SuperSpeed bus employs a dual-simplex approach that allows simultaneous transmission and
reception of packets. In many cases, a SuperSpeed device may be both transmitting and receiving data at the same time. For
example, during burst transactions, a device may be receiving data from the host and returning acknowledgments associated
with the data it already received.
Figure 1-1. Cross-Section of a USB 3.0 Cable
Coming to the power distribution via the USB 3.0 host, 150 mA is considered as the unit load. USB 3.0 host supplies one unit
load of current for unconfigured devices and six unit loads of current for configured devices. The USB 3.0 host detects the
device connection based on the receiver end termination, and the transmitter is responsible for detecting the device
connection. USB 3.0 uses spread-spectrum clocking on its signaling. If spread-spectrum clocking is enabled, then the energy
of the signal is spread over a larger frequency band rather concentrated over a small frequency band at a high level, helping
to reduce the EMI emissions. The USB 3.0 physical layer supports low-frequency periodic signaling (LFPS), which is used to
manage signal initiation and low-power management on the bus on an idle link to consume less power. Ta bl e 1 -1 lists the
differences between the USB 3.0 and USB 2.0 physical layers.
Table 1-1. Differences Between Physical Layers of USB 3.0 and USB 2.0
FeatureUSB 3.0USB 2.0
Signaling rate5 Gbps480 Mbps
Data transfersDual simplexHalf duplex
Number of pins in the USB cable
Data lines in the cableShielded differential pair (SDP, twisted, or twinax)Unshielded twisted pair (UTP)
Current supplied by the host
Device detection by the hostReceiver end terminationPull-up resistor on D+ or D- lines
150 mA for unconfigured devices, 900 mA for configured
devices
4 (VBUS, Ground, D+, D-)
100 mA for unconfigured devices, 500 mA for configured devices
1.1.2Link Layer
The link layer is responsible for maintaining a reliable and robust communication channel between the host and the device.
The Link Training and Status State Machine (LTSSM) at the core of the USB 3.0 link layer establishes the link connectivity
and link power management states and transitions. LTSSM consists of 12 states, including four operational link states (U0,
U1, U2, U3). The link layer offers these four link power states for better power management:
■ U0-Fully powered; link partners are fully powered and ready to send packets
■ U1-Standby with fast recovery; link is in a low-power state and is not ready to send packets but can transition back to U0
within microseconds
■ U2-Standby with slow recovery; link power saving is greater than U1 and transitioning back to U0 within microseconds to
milliseconds
■ U3-Suspend; greatest power savings and longest recovery back to U0 (milliseconds)
■ U1, U2, and U3 have increasingly longer wakeup times into U0 and thus allow transmitters to go into increasingly deeper
sleep.
■ Four link initialization and training states (Rx.Detect, Polling, Recovery, Hot Reset)
■ Two link test states (Loopback and Compliance mode)
■ SS.Inactive (link error state where USB 3.0 is nonoperable)
■ SS.Disabled (SuperSpeed bus is disabled and operates as USB 2.0 only)
Link commands are used to maintain the link flow control and to initiate a change in the link power state.
1.1.3Protocol Layer
The protocol layer manages the communication rules between a host and device. The SuperSpeed protocol layer includes
the following improvements to enable better performance, efficiency, and power conservation. The information in this section
is provided as background material; the FX3 logic manages protocol details so the application program can deal directly with
USB data.
1.1.3.1Unicast Transactions
SuperSpeed transactions are routed directly from a root port to the target device with the help of a route string in the packet
header. Therefore, only links in the direct path between the root port and target device see the traffic, which lets other links in
the topology enter or remain in a low-power state.
1.1.3.2Token/ Data/Handshake Sequences
A USB 2.0 transaction consists of three packets: token, data, and handshake. A transaction is initiated with the token packet
and it is always from the host. Data packets deliver the payload data, which can be sourced by the host or device. The
handshake packet acknowledges the error-free receipt of data and is sent by the receiver of the data. But with SuperSpeed,
to save bandwidth, the token is incorporated into the data packet for OUT transactions; it is replaced by the handshake for IN
transactions. So an ACK packet acknowledges the previous data packet sent and requests the next data packet. The
following examples clarify the differences between USB 2.0 and USB 3.0 IN and OUT transactions.
1.1.3.2.1IN Transaction Exam ple
Figure 1-2 on page 22 illustrates the differences between USB 2.0 and SuperSpeed OUT transactions. The example on the
left in Figure 1-2 on page 22 shows the sequence of packets required to perform two USB 2.0 IN transactions that require six
packets:
1. Host broadcasts an IN token packet (1) to initiate the transaction.
2. Device returns the requested data packet (2).
3. Host acknowledges receipt of data with an ACK handshake packet (3).
4. Steps 1-3 are repeated.
The example on the right indicates the packet sequence necessary to perform two back-to-back SuperSpeed IN transactions,
which require only five packets to be exchanged:
1. SuperSpeed uses an ACK header (1) to initiate an IN transaction.
2. The SuperSpeed device returns the data packet (2).
3. The second ACK header (3) both acknowledges receipt of the data and requests a second transaction.
4. The second data packet (4) is delivered by the device.
5. The final ACK header (5) acknowledges receipt of the data, but does not request additional data.
Figure 1-2. USB 2.0 IN Transaction Versus USB 3.0 IN Transaction
1.1.3.2.2OUT Transaction Example
Figure 1-3 illustrates the differences between USB 2.0 and SuperSpeed OUT transactions. The example on the left shows
two back-to-back OUT transactions that require six packets:
1. Host broadcasts an OUT token packet (1) to initiate the transaction.
2. Host sends a data packet (2) to the device.
3. Device acknowledges receipt of data with an ACK handshake packet (3).
4. Steps 1-3 are repeated
The right side of Figure 1-3 shows the packet sequence required to perform two back-to-back SuperSpeed OUT transactions,
requiring only 4 packets to be exchanged:
1. SuperSpeed USB uses a data header (1) to initiate an OUT transaction and to deliver data to the device.
2. Device acknowledges receipt of data via an ACK packet (2).
3. The second data packet (3) initiates the second transaction and delivers data to the device.
4. Device acknowledges receipt of data via an ACK packet (4), completing the sequence.
Figure 1-3. USB 2.0 OUT Transaction Versus USB 3.0 OUT Transaction
1.1.3.3Data Bursting
The SuperSpeed end-to-end protocol supports transmitting the data in bursts (multiple data packets) without receiving an
acknowledgement to improve latency and performance. The protocol allows efficient bus utilization by concurrently
transmitting and receiving over the bus. A transmitter (host or device) can burst multiple packets of data back to back, and the
receiver can transmit data acknowledgements without interrupting the burst of data packets. Also, the host may
simultaneously schedule multiple OUT bursts to be active at the same time as an IN burst. Devices report their ability to
support bursting in their device descriptors. The maximum burst size is 16, and the actual number to be used represents the
number of data packets that can be sent without receiving an acknowledgement.
This bursting approach is explained in Figure 1-3 with an IN endpoint that supports a burst size of four. The host initiates the
burst transfer and indicates the expected sequence number of the first data packet returned (Seq=0) and the number of
packets it wishes to receive (NumP=4). The target device responds with a burst sequence of four data packets without
receiving any handshakes. A fifth data packet cannot be returned until data packet zero is acknowledged and the host has
indicated a request for another data packet (that is, a second ACK packet with NumP=4). In this burst example, the host
continues to request additional data by keeping the NumP value at 4.
USB 2.0 uses polling and the NAK handshake packet for flow control. For example, USB keyboards must be constantly polled
by the host software to check for activity. When an IN token packet is delivered and no keyboard activity has occurred, the
keyboard returns a NAK packet. Subsequently, the host software will poll the device again and receive another NAK. This
process continues until there is renewed activity. SuperSpeed flow control uses a poll-once approach coupled with an
asynchronous ready notification. Consider the IN transaction shown in Figure 1-5. An ACK packet initiates the IN transaction
and if a device responds with "NRDY" (Not Ready), then the host stops talking to that device until the device sends the
"ERDY" (ready) packet saying that now it is ready to transmit the data. So the host does not need to continue polling. This can
significantly reduce SuperSpeed traffic and improve link power management.
USB 3.0 enhanced the bulk transfer capabilities by adding a concept called "streams." This allows a device to accept multiple
commands on a pipe from the host and to complete them out of order using the stream IDs.
Table 1-2 lists the differences between the USB 3.0 and USB 2.0 protocol layers:
Table 1-2. Differences Between USB 3.0 and USB 2.0 Protocol Layers
Bus transaction protocol
Data transfer types
Maximum packet size
Control51264
Interrupt10241024
Bulk
Isochronous10241024
1.2SuperSpeed Power Management
SuperSpeed USB provides a much improved mechanism for entering and exiting low-power states. USB 2.0 implements a
feature known as "Suspend" that forces devices to limit current consumption to 2.5 mA. Entry into the low-power state
requires a minimum of 3 ms, and exit requires more than 20 ms. SuperSpeed power management provides finer granularity
when entering low-power states and reduces entry and exit times. The device can also initiate the low-power link states when
it is idle.
Host directly routes the packet to the targeted device with the help of
route string; exception-isochronous timestamp packet
Asynchronous traffic flowPolled traffic flow
Simultaneous IN/OUTsIN or Out
USB 2.0 types with SuperSpeed constraints; bulk has streams capability
Host broadcasts the packets to all devices-no route
string
Four types: Control, interrupt, bulk, isochronous
1024
Supports streaming functionalityDoes not support
Introduction to EZ-USB FX3
Power management features are implemented at all layers:
The physical layer supports the remote wakeup signaling.
The link layer supports low-power link state entry and exit with the help of LTSSM and link commands. It offers four link power
states (U0, U1, U2, and U3) for better power management. The following architectural features aid in SuperSpeed link power
management:
■ The much higher transmission rates mean that SuperSpeed transactions complete very quickly, leaving links in the idle
state for a longer period of time.
■ The Unicast approach involves only the links in the direct path between the originating root port and target device, leaving
other links idle.
■ The "poll once and notify" mechanism used in SuperSpeed end-to-end flow control reduces overall link traffic.
SuperSpeed power management includes the ability to place a specific function (an interface or a set of interfaces) into a
suspended state. This means that a multifunction device can have some functions suspended while others remain fully
operational. Functions are placed into suspend under software control. The asynchronous Function Wake notification tells
software that a suspended function or device is requesting a remote wakeup.
1.3FX3/FX3S Features
EZ-USB FX3/FX3S supports the following features.
■ Universal Serial Bus (USB) integration
❐ USB 3.0 and USB 2.0 peripherals compliant with USB 3.0 specification revision 1.0
❐ 5-Gbps USB 3.0 PHY
❐ HS-OTG host and peripheral compliant with OTG Supplement version 2.0
❐ 32 physical endpoints
■ General Programmable Interface (GPIF II)*
❐ Programmable GPIF II, enabling connectivity to a wide range of external devices
❐ Interface frequency of up to 100 MHz
❐ 8-, 16-, and 32-bit data bus
❐ As many as 16 configurable control signals
■ 32-bit CPU
❐ ARM926EJ core with 200-MHz operation
❐ 512 KB or 256 KB embedded SRAM
■ Additional connectivity to the following peripherals:
❐ I2C master controller up to 1 MHz
❐ I2S master (transmitter only) at sampling frequencies of 8 kHz, 16 kHz, 32 kHz, 44.1 kHz, 48 kHz, 96 kHz, and
192 kHz
❐ UART support of up to 4 Mbps
❐ SPI master at 33 MHz
■ Selectable clock input frequencies
❐ 19.2, 26, 38.4, and 52 MHz
❐ 19.2-MHz crystal input support
■ Ultra low-power in core power-down mode
❐ Less than 60 μA with VBATT on and 20 μA with VBATT off
Note: For more details on pin mapping and their descriptions, refer to FX3 and FX3S datasheets.
1.4Functional Overview
1.4.1CPU
FX3 has an on-chip 32-bit, 200-MHz ARM926EJ-S core CPU. The core has direct access to 16 KB of instruction tightly
coupled memory (TCM) and 8 KB of data TCM. The ARM926EJ-S processor also has associated instruction cache (I-cache)
and data cache (D-cache) memories. Both the instruction and data caches are 8 KB.
FX3 integrates 512 KB or 256 KB (depending on the part number) of embedded SRAM for storing code and data. The
ARM926EJ-S core provides a JTAG interface for firmware debugging.
FX3 interrupts are managed through the standard ARM PrimeCell Vectored Interrupt Controller (PL192) block. This interrupt
controller provides vectored interrupt support with configurable priorities for all interrupt sources.
Examples of the FX3 firmware are available with the Cypress EZ-USB FX3 Development Kit.
For more information about the CPU subsystem, refer to the FX3 CPU Subsystem chapter on page 34.
For more information about the memory map, refer to the Memory and System Interconnect chapter on page 44.
FX3 enables efficient and flexible DMA transfers between the various peripherals (such as USB, GPIF II, I2S, SPI, and
UART), requiring firmware only to configure data accesses between peripherals. The data transfers are managed by
distributed DMA controllers within each peripheral. For more information about the FX3 DMA interconnect, refer to the FX3
DMA Subsystem chapter on page 58.
1.4.3USB Interface
The FX3 USB interface supports the following:
■ USB SuperSpeed and High-Speed peripheral functionality is compliant with USB 3.0 specification revision 1.0 and is
backward compatible with the USB 2.0 specification.
■ As a USB peripheral, FX3 supports SuperSpeed, High-Speed, and Full-Speed transfers. As a host, FX3 supports High-
Speed, Full-Speed, and Low-Speed transfers.
■ Complies with OTG Supplement revision 2.0, supporting dual-role operation. As an OTG host, FX3 supports
■ USB classes such as Mass Storage (MSC) and Human Interface Device (HID).
■ Carkit Pass-through UART functionality on USB D+/D- lines based on the CEA-936A specification
■ 16 IN and 16 OUT endpoints
■ CONTROL, BULK, INTERRUPT, and ISOCHRONOUS endpoints
■ USB 3.0 BULK streams feature
For more information about the USB block, refer to the Universal Serial Bus (USB) chapter on page 78.
1.4.4GPIF II
The GPIF II is a programmable state machine that enables a flexible interface that may function either as a master or slave to
industry-standard or proprietary interfaces. The high-performance GPIF II interface provides functionality similar to, but more
advanced than, the FX2LP™ GPIF and Slave FIFO interfaces. Both parallel and serial interfaces may be implemented with
GPIF II.
The GPIF II implements an interface by creating a GPIF II state machine. GPIF II state transitions are based on input signals,
and the control output signals are driven as a result of the GPIF II state transitions. Some popular interfaces that can be
implemented with GPIF II are the Slave FIFO interface, SRAM, Address/Data bus interfaces, and Address Multiplexed
(ADMux) interfaces.
For more information about the synchronous Slave FIFO interface, refer to the application note AN65974 - Designing with the
EZ-USB FX3 Slave FIFO Interface.
The key features of GPIF II are:
■ Functions as a master or slave
■ Provides 256 firmware programmable states
■ Supports 8-bit, 16-bit, 24-bit, and 32-bit parallel data buses
■ Enables interface frequencies up to 100 MHz
■ Supports 14 configurable control pins when a 32-bit data bus is used. All control pins can be either input or outputor bidi-
rectional.
■ Supports 16 configurable control pins when a 16- or 8-bit data bus is used. All control pins can be either input or output or
bidirectional.
Cypress's GPIF II Designer tool enables GPIF II designs to be developed quickly and includes examples of common
interfaces. For more information about the GPIF II block, refer to the General Programmable Interface II (GPIF II) chapter on
FX3 UART supports full-duplex communication and consists of the TX, RX, CTS, and RTS signals. The UART is capable of
generating a range of baud rates, from 300 bps to 4608 Kbps, selectable by the firmware. If flow control is enabled, then the
FX3 UART transmits data when the CTS input is asserted. In addition, the FX3 UART asserts the RTS output signal when it is
ready to receive data.
1.4.6I2C Interface
The FX3 I2C interface is compatible with the I2C Bus Specification revision 3. This I2C interface is capable of operating only
as an I2C master; therefore, it may be used to communicate with other I2C slave devices. For example, FX3 may boot from
an EEPROM connected to the I2C interface, as a selectable boot option.
The FX3 I2C master controller also supports multimaster functionality. The FX3 I2C controller is powered by a dedicated
power pin, VIO5, which also powers the JTAG interface. This gives the I2C interface the flexibility to operate at a different
voltage than other serial interfaces.
The I2C controller supports bus frequencies of 100 kHz, 400 kHz, and 1 MHz. When VIO5 is 1.2 V, the maximum operating
frequency is 100 kHz. When VIO5 is 1.8 V, 2.5 V, or 3.3 V, the operating frequencies supported are 400 kHz and 1 MHz. The
I2C controller supports the clock-stretching feature to enable slower devices to exercise flow control. The I2C interface's SCL
and SDA signals require external pull-up resistors, which must be connected to VIO5.
1.4.7I2S Interface
FX3 has an I2S port to support external audio codec devices. FX3 functions as an I2S master as a transmitter only.
The I2S interface consists of four signals: clock (I2S_CLK), serial data (I2S_SD), word select (I2S_WS), and master system
clock (I2S_MCLK). FX3 can generate the system clock as an output on I2S_MCLK or accept an external system clock input
on I2S_MCLK.
The I2S master (transmitter only) supports sampling frequencies of 8 kHz, 16 kHz, 32 kHz, 44.1 kHz, 48 kHz, 96 kHz, and
192 kHz.
1.4.8SPI Interface
FX3 provides an SPI master interface. The maximum operation frequency is 33 MHz. The SPI controller supports four modes
of SPI communication. This controller is a single master controller with a single automated Slave Select, Negative-true (SSN)
control. It supports transaction sizes ranging from 4 bits to 32 bits.
For more information about the UART, I2C, I2S, and SPI interfaces, refer to the Low Performance Peripherals (LPP) chapter
on page 167.
1.4.9JTAG Interface
The FX3 JTAG interface is a standard five-pin interface to connect to a JTAG debugger to debug the firmware through the
CPU core's on-chip-debug circuitry. Industry-standard debugging tools for the ARM926EJ-S core can be used for FX3
application development.
1.4.10Storage Interface
FX3S has two independent storage ports (S0-Port and S1-Port). Both storage ports support the following specifications:
■ MMC system specification, MMCA Technical Committee, version 4.41
■ SD specification, version 3.0
■ SDIO host controller compliant with SDIO Specification version 3.00
Both storage ports support the following features:
FX3S supports the stop clock feature, which can save power if the internal buffer is full when receiving data from the SD/
MMC/SDIO.
1.4.10.2SD_CLK Output Clock Stop
During the data transfer, the SD_CLK clock can be enabled (on) or disabled (stopped) at any time by the internal flow control
mechanism.
SD_CLK output frequency is dynamically configurable using a clock divider from a system clock. The clock choice for the
divisor is user-configurable through a register. For example, the following frequencies may be configured:
■ 400 kHz - For the SD/MMC card initialization
■ 20 MHz - For a card with 0- to 20-MHz frequency
■ 24 MHz - For a card with 0- to 26-MHz frequency
■ 48 MHz - For a card with 0- to 52-MHz frequency (48-MHz frequency on SD_CLK is supported when the clock input to
FX3S is 19.2 MHz or 38.4 MHz)
■ 52 MHz - For a card with 0- to 52-MHz frequency (52-MHz frequency on SD_CLK is supported when the clock input to
FX3S is 26 MHz or 52 MHz)
■ 100 MHz - For a card with 0- to 100-MHz frequency If the DDR mode is selected, data is clocked on both the rising and
falling edge of the SD clock. DDR clocks run up to 52 MHz.
1.4.10.3Card Insertion and Removal Detection
FX3S supports the following two card insertion and removal detection mechanisms:
■ Use of SD_D[3] data
■ Use of the S0S1_INS pin
1.4.10.4Write Protection (WP)
The S0_WP/S1_WP (SD Write Protection) on S-Port is used to connect to the WP microswitch of the SD/MMC card
connector. This pin internally connects to a CPU-accessible GPIO for firmware to detect the SD card write protection.
1.4.10.5SDIO Interrupt
The SDIO interrupt functionality is supported as specified in the SDIO specification version 2.00 (January 30, 2007).
1.4.10.6SDIO Read-Wait Feature
FX3S supports the optional read-wait and suspend-resume features as defined in the SDIO specification version 2.00
(January 30, 2007).
1.4.10.7Boot Options
FX3 integrates 32 KB of ROM, which contains a bootloader, allowing FX3 to load boot images from various sources. The boot
mode is selected by the configuration of the PMODE pins as shown in Table 1-3 on page 27
Refer to the application note AN76405 - EZ-USB FX3 Boot Options.
In addition to these FX3 boot options, FX3S supports the following:
■ Boot from eMMC (Storage port)
■ Boot from PMMC (Processor port)
Table 1-4. Boot Mode Selection Based on PMODE Pins
PMODE[2:0] PinsBoot Option
PMODE[2]PMODE[1]PMODE[0]
Z00Sync ADMUX (16-bit)
Z01Async ADMUX (16-bit)
Z0ZAsync SRAM (16-bit)
Z11USB Boot
1ZZ I2C
Z1ZI2C; on failure, USB Boot is enabled
0Z1SPI; on failure, USB Boot is enabled
FX3S Specific Boot Options
Z10PMMC Legacy
000S0-port (eMMC); On Failure, USB Boot is enabled
100S0-port (eMMC)
Introduction to EZ-USB FX3
Z = Pin is floating; left unconnected.
1.4.11Clocking
FX3 allows either connecting a crystal between the XTALIN and XTALOUT pins or connecting an external clock to the CLKIN
pin. The XTALIN, XTALOUT, CLKIN, and CLKIN_32 pins can be left unconnected if they are not used.
The crystal frequency supported is 19.2 MHz, while the external clock frequencies supported are 19.2, 26, 38.4, and 52 MHz.
FX3 has an on-chip oscillator circuit that uses an external 19.2-MHz (±100 ppm) crystal (when the crystal option is used).
Refer to the application note AN70707 - EZ-USB FX3/FX3S Hardware Design Guidelines and Schematic Checklist for
guidelines on crystal selection. An appropriate load capacitance is required with a crystal. The FSLC[2:0] pins must be
configured appropriately to select the crystal- or clock-frequency option. The configuration options are listed in Table 1-4.
Table 1-5. Crystal and Clock Frequency Selection
FSLC[2]FSLC[1]FSLC[0]Crystal/Clock Frequency
00019.2-MHz Crystal
10019.2-MHz Input CLK
10126-MHz Input CLK
11038.4-MHz Input CLK
11152-MHz input CLK
Clock inputs to FX3 must meet the phase noise and jitter requirements specified in the EZ-USB FX3 datasheet.
The EZ-USB FX3 device has an embedded 32-bit ARM926EJ-S core that delivers a processing capability up to 220 MIPS.
This ARM core is coupled with instruction and data caches, Tightly Coupled Memories (TCM), and a PL192 vectored interrupt
controller (VIC). FX3 also implements the standard ARM JTAG Test Access Point (TAP), which allows you to use standard
JTAG debuggers to debug firmware applications.
The ARM926EJ-S processor is targeted at multitasking applications and can support high-performance and low-power
requirements.
Interrupts in the FX3 device are managed through the standard ARM PL192 VIC block.
2.1Features
The ARM9 core in the FX3 device supports the following features:
■ Operation at frequencies up to 200 MHz
■ Support for both 32-bit ARM and 16-bit thumb instructions
■ Integrated data and instruction caches of 8 KB each
■ Dedicated instruction and data TCMs for guaranteed low-latency memory access
■ VIC capable of managing 32 internal interrupt sources with programmable interrupt priorities
■ Standard ARM JTAG interface for debugging
■ Clock frequency control for power saving
■ System RAM on FX3 device serves as main storage for code and data
Figure 2-1 shows the ARM9 core and the associated blocks in the FX3 device. The CPU is associated with TCM blocks that
enable zero-latency accesses to performance-critical instructions and data, and it provides separate instruction and data
caches for other memory accesses. The PL192 VIC manages interrupts raised by the FX3 hardware blocks.
2.3.1ARM926EJ-S CPU
The FX3 device has an embedded 32-bit ARM926EJ-S CPU core. This makes the device capable of implementing
multitasking applications where high performance and low power consumption are important.
This ARM9 core supports both 32-bit ARM instructions and 16-bit thumb instructions. The processor has separate advanced
high-performance bus (AHB) interfaces for internal instruction and data accesses. It also has separate instruction and data
TCM interfaces.
Note: The 32-bit ARM instruction set is commonly used in the FX3 SDK from Cypress, as this makes it more convenient to
The following subsections provide information for programming the FX3 device. The FX3 SDK takes care of these aspects
and initializes the ARM core and memory blocks on the FX3 device. The following detail is only required if you are making any
modifications to the base SDK source code. For more details on the ARM CPU architecture, instruction set, and so on, refer
to the
ARM926EJ-S Technical Reference Manual.
2.3.1.1Processor Modes
The ARM architecture supports seven operating modes, as shown in Table 2-1.
Table 2-1. ARM9 CPU Operating Modes
Processor ModeAbbreviationDescription
UserUsrNormal program execution mode
FIQFiqFast interrupt mode; used for a performance-critical interrupt
IRQIrqGeneral-purpose interrupt handling mode
SupervisorSvcProtected mode used by operating systems
AbortAbtInstruction/data abort handling mode; used for virtual memory implementation
UndefinedUnd
SystemSysRuns privileged operating system tasks
All of the modes except the user mode are privileged modes that have full access to all system resources and can freely
change modes.
Undefined instruction mode; used to support software emulation of hardware coprocessors; generally not useful
on FX3 device.
The FIQ, IRQ, supervisor, abort, and undefined modes are entered when specific exceptions occur. These modes have a
separate register set, so that the user mode registers are not corrupted when the exception occurs.
The system mode uses the same set of registers as the user mode and is used to execute OS tasks that do not need a
separate register set.
Run-time stack regions need to be set up for all these modes at system startup. The following code snippet shows how this
can be done using ARM assembly language code. This code is provided by the Cypress FX3 firmware library, and rarely
needs to be modified.
/* Stack pointers are placed at the top address as the stack grows downwards */
SetupStackPtrs:
ldr r1, =FX3_STACK_BASE /* Load the stack base address */
sub r1, r1, #8 /* Prevent overflow */
R15Used to read/write the program counterSingle register that is used across all processor modes.
CPSR
SPSR
Link register (LR), used to hold return address when executing branch instructions
Current program status register, provides status flags,
global interrupt enable control, and so on
Saved program status register, provides saved program status for each exception mode
These are not banked registers, which means that the same physical register is
used in all processor modes.
Two copies of these registers exist. One copy is used only in FIQ mode, and the
other copy is used in all other processor modes.
Six copies of this register exist. The user and system modes share one register,
and all other modes have their own SP register.
Six copies of this register exist. The user and system modes share one register,
and all other modes have their own LR register.
The CPSR register reflects the current program status in each of the processor
modes.
Separate copies of this register exist for each of the FIQ, IRQ, supervisor, abort,
and undefined modes.
2.3.1.3Exception Vectors
The exception vectors that serve as the entry point for each of the exception modes are stored in a table in the main system
memory. These vectors can be placed at one of two addresses: 0x00000000 or 0xFFFF0000. The selection of the exception
vector table location is based on the ARM standard system control coprocessor (CP15) configuration.
As shown in Table 2-3, the normal exception vectors are located in the address range from 0x0 onward, which falls in the
instruction TCM (ITCM) region. The high exception vectors are located in the address range from 0xFFFF0000, which is part
of the BootROM on the FX3 device. As the high-exception vectors are hard-wired to vector to the bootloader on the FX3
device, the normal (user-definable) exception vectors are used when running user applications.
The following code snippet shows the procedure to move the exception vectors to address 0x0. This step is done by the FX3
firmware library as part of device initialization.
MRC p15, 0, r1, c1, c0, 0/* Read the CP15 register value */
MOV r2, #0xFFFFDFFF
AND r1, r1, r2/* Mask off the vector location bit. */
MCR p15, 0, r1, c1, c0, 0/* Write back to CP15 register. */
2.3.1.4MMU
The MMU in the ARM926EJ-S processor is an ARM architecture v5 implementation, which supports the virtual memory
features required by standard embedded operating systems. The MMU uses a set of two-level page tables located in the
main memory to control the address translation, permission checks, and so on.
The data cache on the ARM core can be enabled only if the MMU is enabled. However, most FX3 designs do not use any
secondary storage and do not need a virtual memory system. FX3 provides a fixed set of page tables that maps each physical
address to the equivalent virtual address.
The ARM926EJ-S processor has associated instruction and data cache memories. The RAM on the FX3 device holds DMA
data buffers in addition to code and data. The DMA driver and APIs in the FX3 library ensure cache coherency by using the
cache clean and invalidate operations.
The instruction and data caches on FX3 are 8 KB. The caches are four-way set associative with eight word (32-byte) cache
lines. The data cache implements two dirty bits per cache line and is typically configured for write-back operations.
The ARM926EJ-S processor supports the following operations using the CP15 coprocessor interface:
■ Invalidating the entire D-cache or I-cache
■ Invalidating (flushing) regions of the D-cache or I-cache
■ Cleaning the entire D-cache
■ Cleaning regions of the D-cache
■ Locking specified memory regions into the D-cache or I-cache
The CPU also provides a write buffer that is used for writes to regions that are not cacheable or bufferable, and for writethrough operations. It is also used for write misses during write-back operations. A separate write buffer is provided as part of
the D-cache for holding write-back data during cache line eviction or clean operations. Instructions are provided to drain both
write buffers, and the CPU ensures data coherency when read operations are addressed to data that is sitting in the write
buffer.
The following code snippet shows the procedure to enable the caches and the MMU on the FX3 device. Refer to the Memory
and System Interconnect chapter on page 44 for more information on cache operations.
MRC p15, 0, r1, c1, c0, 0/* Read CP15 register value */
ORR r1, r1, #0x1000/* Update I-Cache enable bit. */
BIC r1, r1, #0x4000/* Select random replacement. */
ORR r1, r1, #0x05/* Enable MMU and D-Cache. */
MCR p15, 0, r1, c1, c0, 0/* Write modified value back */
2.3.1.6Tightly Coupled Memories
Some operations, such as interrupt handlers, may not be able to tolerate the added latency created by a cache miss. The
ARM9 CPU provides a zero wait state TCM interface to facilitate quick access to such instructions and data. Firmware
applications can locate performance-critical code and data sections in the TCM regions using the appropriate linker settings.
The FX3 SDK provides a linker script that sets up the recommended memory map for FX3 applications.
As in memory and cache access, separate paths are used for instruction and data access from the TCMs. FX3 implements 16
KB of instruction TCM (ITCM) and 8 KB of data TCM (DTCM). The ITCM area can also be accessed by the data side of the
ARM core. This is required to facilitate loading the code into the ITCM region.
The ITCM region on FX3 is located in the address range 0x0000-0x3FFF, and the DTCM region is located in the address
range 0x10000000-0x10001FFF. The ITCM region is typically used to store the ARM exception vectors and the interrupt
service routine (ISR) code. The DTCM region is typically used to store performance-critical data and the run-time stacks for
various processor modes.
The TCMs must be configured as non-cacheable memories, and any instruction or data movement between the TCM and the
main memory must be performed by the CPU. The TCMs are disabled when the device is reset, and they need to be enabled
by the firmware. This is done by the FX3 library as part of device initialization.
MOV r1, #0x15 /* ITCM address is 0x0 and size is 16 KB. */
MCR p15, 0, r1, c9, c1, 1 /* Initialize the ITCM */
MOV r1, 0x10000011 /* DTCM address is 0x10000000 and size is 8 KB. */
MCR p15, 0, r1, c9, c1, 0 /* Initialize the DTCM */
FX3 makes the standard ARM JTAG TAP available for users to connect a JTAG debugger. Only the 5-pin JTAG mode of
debugging is supported, and 2-pin serial wire debug (SWD) mode is not supported by the FX3 device. The JTAG pins are
directly connected to the ARM CPU, and they do not support boundary scan.
The JTAG interface allows the use of industry-standard ARM debug probes to debug the firmware running on FX3. No
Cypress custom software tools are required to enable the debugging.
The JTAG instruction register (IR) length for the ARM926EJ-S device is 4 bits. If multiple devices are connected on the JTAG
chain, the offset and length for the FX3 device have to be set correctly to achieve JTAG connection. Refer to Section 5.10.3 of
the Segger J-Link User Guide for instructions on how to do this when using the J-Link debugger.
2.3.1.8Vectored Interrupt Controller
The FX3 device provides a number of interrupt notifications for the ARM CPU to handle. Interrupts in the FX3 system are
managed through the standard ARM PrimeCell Vectored Interrupt Controller (PL192) block. This interrupt controller provides
vectored interrupt support with configurable priorities for all interrupt sources.
The PL192 controller supports up to 32 interrupt sources and generates the nIRQ and nFIQ signals to the ARM CPU based
on the configuration. The controller is connected to the CPU on the AHB bus and allows you to perform the interrupt
configuration through a set of memory mapped registers.
Table 2-4 shows the various interrupt sources on the FX3 device, along with their vector numbers.
11STORAGE_DMADMA socket interrupt from the storage (SD/MMC) interface Applies only to the FX3S™ devices.
12STORAGE0_COREGeneral-purpose interrupt from storage interface 0Applies only to the FX3S™ devices.
13STORAGE1_COREGeneral purpose interrupt from storage interface 1.Applies only to the FX3S™ devices.
14UNUSEDDo not enable.
15I2C_CORE
16I2S_COREGeneral-purpose I2S interrupt
17SPI_COREGeneral-purpose SPI interrupt
18UART_COREGeneral-purpose UART interrupt
19GPIO_CORE
20PERIPH_DMA
21GCTL_POWER
22-31UNUSEDDo not enable.
Interrupt SourceDescriptionComments
Interrupt raised on FX3 waking up from suspend or standby
low-power modes.
This is a custom implementation of the software
interrupt scheme.
Watchdog timer interrupt; the watchdog timer functions on
the basis of a 32-kHz clock signal with a user-configured
period
General-purpose GPIF interrupts; indicates conditions such
as state machine interrupt, GPIF errors, mailbox register
access, and so on
General-purpose USB interrupts; indicates various conditions triggered during device or host operation of the USB
block
General-purpose I2C interrupt; indicates conditions such as
transfer completion, error detect, and so on
General-purpose GPIO interrupt; common for both simple
and complex GPIOs
DMA socket interrupt from any serial peripheral block (I2C,
I2S, SPI, or UART).
Power detect interrupt; indicates voltage changes on any
power inputs that can be dynamically changed during
device operation.
This is commonly used for OS scheduling on the
FX3 device.
Applies to both USB device and host mode operation.
This is commonly used for VBus voltage detection.
You can configure any of the 32 interrupt sources as fast interrupt request (FIQ), which takes the highest priority among the
interrupts. The rest of the interrupts are prioritized by user code through the VIC_VECT_PRIORITY registers. If two or more
interrupts are programmed with the same priority value, the source with the lower vector number assumes the higher priority.
The PL192 controller allows each of the interrupt sources to be independently masked (disabled) or unmasked (enabled) and
provides registers that report the raw interrupt status and the interrupt status after masking. The vector addresses for various
interrupt sources are programmed through the VIC_VEC_ADDRESS registers. When one or more interrupt sources are
active, the controller identifies the highest priority interrupt, stores the corresponding vector address in the VIC_ADDRESS
register, and then asserts the nIRQ signal to interrupt the ARM CPU.
Refer to Vectored Interrupt Controller (VIC) Registers on page 230 on for information about the various configuration and
status registers associated with the VIC.
Some interrupts in the FX3 system need to be prioritized over others to ensure that the application can meet all the USB spec
requirements and transfer rates. In particular, the USB core interrupt must be handled at the highest priority (can be selected
as FIQ), and all the DMA interrupts should be allotted the next high priority level.
To enable a specific interrupt source, the firmware has to do the following:
■ Set the VIC_VECT_PRIORITY register value as desired.
■ Enable the interrupt by setting the corresponding bit in the VIC_INT_ENABLE register.
The following code snippet shows the procedure for setting up Fx3IntHandler as the ISR for interrupt vector i:
CY_U3P_VIC_VEC_ADDRESS[i] = Fx3IntHandler;/* ISR address. */
CY_U3P_VIC_VECT_PRIORITY[i] = 2;/* Set the priority to 2. */
CY_U3P_VIC_INT_ENABLE |= (1 << i);/* Enable the interrupt. */
The VIC sets the VIC_ADDRESS register to point to the vector for the active interrupt with the highest priority before making
the nIRQ signal active. The IRQ exception vector is designed to jump to the address pointed by the VIC_ADDRESS register.
The ISR is responsible for saving all the necessary context information, including the non-banked registers, while executing.
Nesting of interrupts is not allowed by the VIC. The ISR needs to clear the interrupt by writing any value to the
VIC_ADDRESS register before the next interrupt can be raised.
As direct interrupt nesting is not supported by the VIC, it is recommended that the handling of low-priority interrupts be
deferred to allow higher priority interrupts to run with bounded latency. The firmware needs to mask out the deferred interrupts
until the source of the interrupt has been cleared to avoid repeated interrupt calls. This can be done by setting the
corresponding bit in the VIC_INT_CLEAR register.
CY_U3P_VIC_INT_CLEAR = (1 << i);/* Mask out interrupt vector i. */
Note:
The ARM CPU uses the ARM instruction set when starting to execute any interrupt handlers. If the ISR uses the thumb
instruction set, the firmware needs to ensure that the switch to thumb mode is done before entering the ISR.
Refer to the PL192 Technical Reference Manual for more details on the interrupt controller and its use.
2.3.1.9CPU Operating Frequency
The operating clock for the FX3 CPU is derived from the input clock or crystal frequency.
First, the input clock is multiplied to generate the FX3 system clock. The system clock frequency is 384 MHz when the FX3
device is clocked using a 19.2-MHz crystal or a 38.4-MHz clock input. The system clock frequency is 416 MHz when the FX3
device is clocked using a 26-MHz or 52-MHz input clock.
The CPU clock is then derived from the system clock using a programmable divider. The minimum divisor supported is 2,
which means that the maximum CPU clock frequency is 192 MHz or 208 MHz, depending on the clock source.
Note: While the multipliers used to derive the system clock are programmable, Cypress strongly recommends the use of the
previously mentioned default frequencies. The device has not been tested for proper functioning at other frequencies.
The CPU clock frequency can be reduced by specifying divisor values greater than 2. As the clocks used by the DMA engine
and the memory mapped I/O (MMIO) register interface are derived from the CPU clock, reducing the CPU clock frequency
will also reduce them and result in reduced data transfer throughput. Reducing the clock frequency will also increase the
effective interrupt latency and can cause USB specification compliance errors. Therefore, Cypress recommends that you use
a reduced clock rate only when USB 3.0 connections are not active.
Refer to the Global Controller (GCTL) chapter on page 51 for details on how to configure the device clocks.
2.3.1.10CPU Power Modes
The FX3 CPU supports low-power modes, which can be used when the device is not active. The CPU supports the following
power modes:
Normal mode: The CPU operates at a clock frequency determined by the programmed dividers.
Suspend mode: This applies to both the L1 and L2 modes of the FX3 device. The clock to the CPU is gated in this mode,
and the CPU is placed in the wait for interrupt state. Firmware execution resumes from the next instruction, once the device
wakes from the L1/L2 mode.
Standby mode: This applies to the L3 mode of the FX3 device. The CPU is powered down, while the program RAM content
is retained. Firmware execution starts from the reset vector once the device wakes from the L3 mode.
2.3.1.11Timers
The ARM CPU does not have any associated timer blocks. The FX3 device provides a pair of general-purpose timers that can
also provide the watchdog functionality. These timers are provided as part of the Global Control block on the FX3 device.
These timers operate on a 32-kHz input clock and can be configured in one of the free running counter, timer with interrupt, or
watchdog reset modes.
Note: If the system provides a clock input through the CLKIN_32 pin of the FX3 device, the timer uses this clock. If not, the
32-kHz clock is derived from the system clock, which runs at about 200 MHz.
The WATCHDOG_CS register in the GCTL block controls the operation of these timers. The relevant bits in this register are
shown in Ta b le 2 - 5 .
Table 2-5. Watchdog Timer Control Register
Field NameBit RangeDescription
0-Free running mode; counter wraps around after reaching 0xFFFFFFFF.
MODE01:0
INTR02Interrupt status for timer 0
BITS07:3Number of bits to be considered when checking for counter limit
MODE19:8Timer mode for timer 1
INTR110Interrupt status for timer 1
BITS115:11Number of bits to be considered when checking for counter limit
1-Interrupt mode; raises WATCHDOG_TIMER interrupt when lowest significant BITS0 bits of the counter are cleared
2-Reset mode; resets the FX3 when lowest significant BITS0 bits of the counter are cleared
3-Disabled
The actual counter values for the timers are stored in the WATCHDOG_TIMER0 and WATCHDOG_TIMER1 registers. When
the timer is configured in reset mode, the firmware is expected to restore this register to its initial value before the counter limit
(lowest BITS bits getting set) is reached.
Hint: As a single interrupt vector is used for both timers, the ISR needs to check the INTR bits in the WATCHDOG_CS register
to identify the timer that triggered the interrupt.
The memory subsystem on the FX3 device comprises the system RAM that forms the main memory, SRAM controller, and
AHB-based interconnect that allows the ARM CPU and the hardware blocks to access these memories. The MMIO
interconnect provides access to registers in various peripheral blocks.
Because USB data moves through the system RAM (where USB endpoint buffers are implemented), FX3 implements a
specialized memory controller to arbitrate between the various types of traffic with high throughput and predictable latency.
Details on the arbitration mechanism and priorities are provided in System Interconnect on page 47.
3.1Features
The FX3 memory and system interconnect supports the following:
■ 512 KB or 256 KB of system memory, depending on the FX3 part number selected
■ 16 KB of Instruction Tightly Couple Memory (I-TCM) and 8 KB of Data Tightly Couple Memory (D-TCM).
■ DMA architecture that can deliver 800 MBps bandwidth to memory
■ MMIO register access from CPU at up to 50 MBps (12.5 million 32-bit register accesses per second)
■ Guaranteed and bounded memory access latency for both CPU and DMA accesses
3.2Block Diagram
Figure 3-1 shows a block diagram of the memory and system interconnect on the FX3 device.
Note: Some FX3 parts (CYUSB3011,
CYUSB3012) only have 256 KB
of SYSTEM RAM available.
Page table (Opt.)
0xE004FFFF
0xE0040000
0xE0030000
0xE0020000
0x4007FFFF
Memory and System Interconnect
Note: The memory map shown is for the CYUSB3014 part, which implements 512 KB of system RAM. For other parts such
as CYUSB3011, the system RAM region is limited to 256 KB (from 0x40000000 to 0x4003FFFF).
The major memory regions on the FX3 device are the following:
■ ITCM-16 KB dedicated space for holding exception vectors and ISR code. While it is possible to read and write to the
ITCM memory region from the ARM CPU, it is not possible to use this memory region as a target for DMA transfers.
■ DTCM-8 KB memory region that can be used for holding frequently accessed data structures, ISR data, run-time stacks,
and more. It is not possible to use this region as a target for DMA transfers.
■ System RAM-The main SRAM region that is used for code and data storage as well as for buffering any data that is flow-
ing through the FX3 device. The System RAM region can be 256 KB or 512 KB, depending on the FX3 part being used.
The first 12 KB of this region is reserved for storing DMA-related data structures (descriptors) that are used by the FX3
hardware. The remainder of the System RAM can be used as required by the application. Figure 3-2 shows the commonly
used subdivisions for the RAM region.
■ MMIO-The register space that holds all the configuration and status registers implemented by all the blocks on the FX3
device. While a total memory region of 256 MB has been allocated for the registers, most of this memory is unused and
unimplemented. Refer to the chapters on each of the FX3 hardware blocks for information on the registers corresponding
to those blocks.
■ BootROM-A 32 KB ROM region that is preprogrammed with the FX3 device bootloader. The bootloader implements mul-
tiple boot modes such as USB boot, I2C boot, SPI boot, and GPIF boot. The desired boot mode is selected through a set
of pin straps. This memory region is not accessible to FX3 user applications.
■ VIC-Control and status registers for the VIC block. They are separate from the other MMIO registers and are located at
the address 0xFFFFF000.
3.3.2System Interconnect
The FX3 device implements a hierarchical AHB-based interconnect system that allows the device interfaces and the ARM
CPU to access the system memory with high throughput and bounded latency. Different parts of the FX3 device function at
different clock rates. The ARM CPU typically runs on a 200 MHz clock and has separate 32-bit wide buses for instruction and
data access. The DMA interconnect runs at 100 MHz and has separate 64-bit buses for read and write accesses. The MMIO
interconnect runs at 100 MHz and has a single 32-bit bus for all register accesses. The system RAM is organized as 128-bit
wide memories and is clocked at 200 MHz.
As the DMA interconnect provides separate buses for read and write transfers, it can issue one read access and one write
access during each DMA clock cycle. This means that the DMA interconnect can simultaneously support DMA read and DMA
write traffic at 800 MBps each (100 MHz times 8 bytes in the 64-bit bus equals 800 MBps).
The system interconnect supports the following kinds of transfers:
■ CPU traffic to system RAM
■ DMA traffic to system RAM
■ CPU accesses to MMIO registers
■ DMA accesses to MMIO registers
Note: DMA access to MMIO registers is used to synchronize data transfers between a pair of communicating hardware
blocks.
A specialized FX3 memory controller arbitrates between all these accesses. The memory controller connects to the highspeed system interconnect, which has two 128-bit wide buses running at 200 MHz. The memory controller guarantees equal
SRAM bandwidth for CPU and DMA accesses. It also allows the DMA controller to use any unused CPU cycles so that typical
FX3 applications can sustain a higher bandwidth.
3.3.3Low-Power Operations
FX3 supports low-power operating modes in which the device clocks can be turned off and most of the blocks can be
powered off. Ta bl e 3 -1 shows the state of various FX3 blocks in the different power modes.
ARM9 CPUONWaiting for interruptWaiting for interruptOFF
System RAMONClock gatedClock gatedClock gated
USB blockONClock gatedOFFOFF
GPIFONClock gatedClock gatedOFF
Serial peripherals
(UART, I2C, I2S and SPI)
GPIOONHolds previous stateHolds previous stateHolds previous state
ONClock gatedClock gatedOFF
The device supports two variants of suspend modes: L1 and L2. All the device blocks will be in the clock gated state in the L1
mode. This mode can be entered when the USB connection to the FX3 device has been suspended by the host, and the
device should be configured to wake up on any USB bus activity.
The USB block is powered off in the L2 suspend mode. This mode can be entered only if the VBus input to the device is
turned off. The device can be configured to wake up when the VBus input is detected.
In the standby mode, most of the blocks on the device are powered off. Only the system RAM and the logic required to detect
wakeup requests are left powered on. Since the CPU as well as all the blocks including USB are powered off in this mode,
firmware operation after wakeup is similar to that on a warm reset of the device.
3.3.4Cache Operations
The FX3 device has dedicated instruction and data caches that improve access latencies to the SYSTEM RAM region. The
caches are not used for TCM, MMIO, and VIC access. TCMs guarantee single-cycle access for both instructions and data,
and they do not require a cache. The MMIO and VIC registers need to be updated atomically, and they are configured as not
cacheable.
The ARM9 architecture supports the following cache operations:
■ Flushing the entire I-cache or D-cache
■ Cleaning (writing back to memory) the entire D-cache
■ Flushing (evicting) a memory region from the I-cache or D-cache
■ Cleaning (writing back to memory) a memory region from the D-cache
■ Loading a specific memory region into the I-cache or D-cache
Refer to the ARM926EJ-S Technical Reference Manual or the ARM System Developer's Guide for details on how to perform
these operations.
Enabling the instruction cache is recommended for all FX3 applications. Enabling the data cache helps improve performance
in applications where the FX3 CPU performs data manipulations.
3.3.4.1Cache Coherency
Almost all FX3 applications involve transferring data in or out through one or more of the device interfaces. All these data
transfers are achieved through the distributed DMA fabric on the device and make use of a part of the system RAM for
buffering.
As the system RAM is used for DMA buffers as well as for code and data storage, there is a possibility of cache/memory
corruption when the D-cache is enabled. The firmware application needs to avoid this possibility using the following
guidelines. These steps are handled by the FX3 firmware library when the DMA APIs in the SDK are used.
1. As data is loaded or evicted from the cache one line at a time, it is likely that any data that shares a cache line with a DMA
data buffer will be corrupted. Ensure that no code/data shares a cache line with DMA buffers to avoid this possibility.
Hint: It is recommended that all DMA buffers be located in a separate memory region within the system RAM. It is also
recommended that each DMA buffer occupy an integral number of cache lines (32 bytes) to ensure that an adjacent DMA
buffer is not corrupted by a data transfer.
2. Ensure that the memory region corresponding to a DMA buffer is cleaned from the D-cache before initiating any egress
(data output from FX3) data transfers.
3. Ensure that the memory region corresponding to a DMA buffer is flushed (evicted) from the D-cache before initiating any
ingress (data input into FX3) data transfers.
3.3.5Memory Usage
The TCM and system RAM regions on the FX3 device are general purpose and can be used by the firmware application as
desired. The only constraints are as follows:
■ The initial part of the I-TCM (approximately 256 bytes starting at address 0) is reserved for setting up the ARM exception
vectors.
■ The initial part of the system RAM (12 KB starting from address 0x40000000) is reserved for setting up DMA transfer-
related data structures.
■ The TCM regions cannot be used for direct data transfers, as the DMA engine does not support data transfers from/to
these regions.
This section provides guidelines on mapping out the firmware code, data, and DMA buffers within the available memory on
the device.
Table 3-2 shows the memory sections required by an FX3 firmware application. Figure 3-3 shows the mapping of these
sections to FX3 memory addresses in a typical application. The size of the sections other than exception vectors and DMA
descriptors can be set according to the needs of the application.
Table 3-2. Memory Regions Used by an FX3 Application
Section NameDescription
VectorsARM exception vectors; needs to be located at the beginning of the I-TCM region
TextAll executable code for the application
DataExplicitly initiated global data used by the application
BSSUninitialized global data used by the application
StacksStack regions for the ARM processor operating modes (supervisor, user, IRQ, FIQ, abort, and undefined)
Heap
DMA descriptors
DMA buffer
Run-time heap for dynamic allocation (malloc or new) of variables used by the application; this section is optional and is only required
if dynamic memory allocation (malloc, free, new, delete) is used
Memory region reserved for DMA transfer-related data structures; these structures are used by the FX3 hardware and have to be
located at the beginning of the system RAM region
Memory region reserved for DMA data buffers used by the application; as larger DMA buffers can improve data transfer throughput, it
is recommended that as much memory as is possible be allocated to this section
Figure 3-3. Memory Map for Typical FX3 Firmware Application
The memory map for the firmware application is specified through a linker script file. The format of the linker script file used by
the standard GNU C compiler for ARM processors is documented here.
The FX3 Global Controller (GCTL) supports I/O configuration, clock management, power management, and watchdog timer
configuration. The GCTL features include the following:
■ Can generate up to 500-MHz master clock
■ Serves as a clock source for various peripherals (such as UIB, PIB, and LPP) in FX3
■ Supports simple and complex GPIO configuration
■ Supports special function (such as SPI and GPIF II) I/O configuration
■ Supports watchdog timer configuration
■ Supports power mode control and various wakeup source configuration
4.1GPIO Pins
All 60 GPIO pins in FX3 can function as GPIOs. Each is multiplexed to support other functions/peripheral blocks (such as
UART, SPI, and so on). By default, the pins are allocated in groups to either one function block or the other, depending on the
interface mode, in their respective power domains. In a typical application, all FX3 peripheral blocks are not used. Also, not all
pins of the blocks being used are utilized. Unused pins in each block may be overridden as simple or complex GPIO pins on
a pin-by-pin basis.
Simple GPIOs provide software-controlled and observable input and output capability only. In addition, they can also raise
interrupts. Complex GPIOs add three timer/counter registers for each group and support a variety of time-based functions.
They work off a slow or fast clock. Complex GPIOs can also be used as general-purpose timers by the firmware. There are
eight complex I/O pin groups, the elements of which are chosen in a modulo 8 fashion (complex I/O group 0: GPIO 0, 8, 16;
complex I/O group 1: GPIO 1, 9, 17, and so on). Each group can have different complex I/O functions (such as PWM, one
shot, and so on). However, only one pin from a group can use the complex I/O functions. The rest of the pins in the group are
used as block I/O or simple GPIO. Refer to Table 7 in the
EZ-USB FX3 datasheet for the GPIO configuration options.
4.1.1I/O Matrix Configuration
I/O matrix configuration is used to configure the interface mode for I/O pins. The GCTL_IOMATRIX register must be configured before accessing any alternate function pins. Table 4-1 lists the I/O pin alternate functions.
GPIO[0] to GPIO[15]DQ[0] to DQ[15]GPIF II data pins
GPIO[16]PCLK or CLKGPIF II clock pin
GPIO[17] to GPIO[29]CTL[x]GPIF II control pins
GPIO[30] to GPIO[32]PMOD[x]Boot mode
GPIO[33] to GPIO[44]*DQ[16] to DQ[27]GPIF II data pins
GPIO[45]-No alternate function
GPIO[46] to GPIO[49]**DQ[28] to DQ[31] or UARTGPIF II data pins or UART pins
GPIO[50] to GPIO[52] and GPIO[57]I2S
GPIO[53] to GPIO[56]**SPI or UART
GPIO[58] to GPIO[59]I2C
* 24- or 32-bit GPIF II bus width is not supported by all FX3 chips. If the FX3 chip does not support more than a 16-bit bus
width, then alternate functions are not applicable. Refer to the EZ-USB FX3 datasheet for more details.
** If the GPIF II bus width is configured to 8, 16, or 24 bits, then UART lines are available on the GPIO[46] to GPIO[49] pins,
and SPI lines are available on the GPIO[53] to GPIO[56] pins. If the GPIF II bus width is configured to 32 bits, then UART
lines are available on the GPIO[53] to GPIO[56] pins, and SPI is not supported.
Table 4-2 lists the FX3 registers associated with GPIO pin control. These registers are described in detail in following tables.
Table 4-2. Registers Associated with GPIO Pins
GCTL_IOMATRIXGCTL_DS
GCTL_WPU_CFG0GCTL_WPU_CFG1
GCTL_WPD_CFG0GCTL_WPD_CFG1
GCTL_GPIO_SIMPLE0GCTL_GPIO_SIMPLE1
GCTL_GPIO_COMPLEX0GCTL_GPIO_COMPLEX1
LPP_GPIO_IDLPP_GPIO_POWER
LPP_GPIO_PIN_STATUS(n)LPP_GPIO_PIN_TIMER(n)
LPP_GPIO_PIN_THRESHOLD(n)LPP_GPIO_SIMPLE(n)
LPP_GPIO_DRIVE_LO_ENLPP_GPIO_DRIVE_HI_EN
LPP_GPIO_INPUT_ENLPP_GPIO_PIN_INTR
LPP_GPIO_INVALUE0LPP_GPIO_INVALUE1
LPP_GPIO_INTR0_REGLPP_GPIO_INTR1_REG
See I/O Matrix Configuration Register on page 240
The FX3 SDK API CyU3PDeviceConfigureIOMatrix configures the GCTL_IOMATRIX register. A code snippet to configure the
I/O matrix follows. Refer to the SDK firmware example for the complete details on the IOMATRIX configuration.
/* Configure the IO matrix for the device.
* 32 bit bus width is disabled.
* S0 port is disabled.
* S1 port is disabled.
* UART is enabled on S1 port.
* IOs 43, 45, 52 and 57 are chosen as GPIO. */
*/
io_cfg.isDQ32Bit = CyFalse;
io_cfg.s0Mode = CY_U3P_SPORT_INACTIVE;
io_cfg.s1Mode = CY_U3P_SPORT_INACTIVE;
io_cfg.gpioSimpleEn[0] = 0;
io_cfg.gpioSimpleEn[1] = 0x02102800;
io_cfg.gpioComplexEn[0] = 0;
io_cfg.gpioComplexEn[1] = 0;
The drive strength for I/O pins is programmable even if the pin is configured for an alternate function. The I/O pin drive
strength can be set to quarter strength, half strength, three-quarter strength, or full strength by configuring the appropriate bits
in the GCTL_DS register.
See I/O Drive Strength Configuration Register on page 245.
In the FX3 SDK, I/Os on the FX3 device are grouped based on function into multiple interfaces (GPIF, I2C, I2S, SPI, UART).
The I/O drive strength for each group can be separately configured using APIs.
■ CyU3PSetPportDriveStrength
■ CyU3PSetI2cDriveStrength
■ CyU3PSetGpioDriveStrength
■ CyU3PSetSerialIoDriveStrength
4.1.3GPIO Pull-up and Pull-down
FX3 supports internal weak (50 K ohm) pull-up or pull-down I/O pins. A weak pull-up on I/O can be enabled by setting GCTL_WPU_CFG registers. A weak pull-down on I/O can be enabled by setting the GCTL_WPD_CFGx registers. The default
state of the IOs at power-on and after reset is tristate.
See the following:
■ GCTL_WPU_CFG on page 247
■ GCTL_WPD_CFG on page 249
The FX3 SDK API CyU3PGpioSetIoMode is used to set pull-up or pull-down on an I/O pin.
4.1.4Simple GPIO Override
FX3 supports the simple GPIO override option to configure any I/O as a simple GPIO pin. Register CY_U3P_GCTL_GPIO_SIMPLE is used to set the simple GPIO override. Data pins DQ[0[ to DQ[31] can be configured as Simple GPIOs using IO
matrix configuration and the override is not needed whereas all other GPIO pins needs the override to function as simple
GPIO. Refer to GPIO on page 192 for more details on GPIO configuration.
See GCTL_GPIO_SIMPLE on page 241.
The FX3 SDK API CyU3PDeviceGpioOverride is used to configure the CY_U3P_GCTL_GPIO_SIMPLE register.
4.1.5Complex GPIO Override
FX3 supports the complex GPIO override option to configure any I/O as a complex GPIO pin. Register CY_U3P_GCTL_GPIO_COMPLEX is used to set the complex GPIO override. Refer to GPIO on page 192 for more details on GPIO configuration.
See GCTL_GPIO_COMPLEX on page 243.
The FX3 SDK API CyU3PDeviceGpioOverride is used to configure the CY_U3P_GCTL_GPIO_COMPLEX register.
Three GCTL registers-GCTL_IOPOWER, GCTL_IOPWR_INTR, and GCTL_IOPWR_INTR_MASK-allow the firmware to
monitor the power status of various I/O blocks.
4.1.6.1GCTL_IOPOWER
See GCTL_IOPOWER on page 251.
4.1.6.2GCTL_IOPWR_INTR
See GCTL_IOPOWER_INTR on page 253.
4.1.6.3GCTL_IOPWR_INTR_MASK
See GCTL_IOPOWER_INTR_MASK on page 255.
4.2Clock Management
Clocks for FX3 peripherals can be configured using the GCTL registers. To enable an FX3 functional block (UART, SPI, and
so on), configure the corresponding clock. The clock can be disabled for the unused functional blocks to save power.
As shown in Figure 4-1, CLKIN and PLL clock are the two input clock sources to the GCTL block. CLKIN is the chip reference
clock provided by a 19.2-MHz, 26-MHz, 38.4-MHz, or 52-MHz external clock source or a 19.2-MHz crystal. It is used as the
source clock for the PLL, which generates a master clock at frequencies up to 500 MHz. This PLL output clock is used to generate all the core clocks in the system.
Programmable dividers generate clocks in GCTL (except the blocks that contain their own PLL, for example, USB block). All
generated clocks have a configurable divide capability and on/off programmability.
Four system clocks are obtained by dividing the master clock by 1, 2, 4, and 16. The system clocks are then used to generate
clocks for most peripherals in the device through the respective Clock Select and Divide (CSD) block. A CSD block is used to
select one of the four system clocks and then divide it using the specified divider value. The depth of the divider is different for
different peripherals.
The CPU clock is derived by selecting and dividing one of the four system clocks by an integer factor between 1 and 16. The
bus clocks are derived from the CPU clock. Independent 4-bit dividers are provided for both the DMA and MMIO bus clocks.
The frequency of the MMIO clock, however, must be an integer divide of the DMA clock frequency. It is not recommended to
drop the frequency of the CPU clock and DMA clock while the device is handling any data traffic.
A 32-kHz external clock source is used for low-power operation during standby. In the absence of a 32-kHz input clock
source, the application can derive it from the input reference clock.
Certain peripherals deviate from the clock derivation described above. The fast clock source of GPIO is derived from the system clocks using a CSD. The core clock is a fixed division of the fast clock. The slow clock source of GPIO is obtained directly
from the reference clock using a programmable divider. The standby clock is used to implement wakeup on GPIO. The I2S
block can be run off an internal clock derived from the system clocks or from an external clock sourced through the I2S_MCLK pin of the device.
Exceptions to the general clock derivation strategy are blocks that contain their own PLL, because they include a PHY that
provides its clock reference, for example, the USB2PHY and the USB3PHY. Refer to Universal Serial Bus (USB) chapter on
page 78 for more details on the USB block clocking.
The CPU, DMA, and MMIO clock domains are synchronous to each other. However, every peripheral assumes its core clock
to be fully asynchronous from other peripheral core clocks, the computing clock, or the wakeup clock.
Table 4-3. Registers Associated with Clock Management
CY_U3P_GCTL_PLL_CFGCY_U3P_GCTL_GPIO_FAST_CLK
CY_U3P_GCTL_CPU_CLK_CFGCY_U3P_GCTL_GPIO_SLOW_CLK
CY_U3P_GCTL_UIB_CORE_CLKCY_U3P_GCTL_I2C_CORE_CLK
CY_U3P_GCTL_PIB_CORE_CLKCY_U3P_GCTL_UART_CORE_CLK
CY_U3P_GCTL_SIB0_CORE_CLKCY_U3P_GCTL_SPI_CORE_CLK
CY_U3P_GCTL_SIB1_CORE_CLKCY_U3P_GCTL_I2S_CORE_CLK
4.3Power Management
Table 4-4. Registers Associated with Power Management
Power supply domains in FX3 can be classified in four categories: core power domain, memory power domain, I/O power
domain, and always-on power domain.
The core power domain encompasses a large section of the device, including the CPU, peripheral logic, and interconnect fabric. The system SRAM resides in the memory power domain. I/O logic dwells in the respective peripheral I/O power domain.
The peripheral I/O power domain includes the I2C-IO power domain, I2S-IO power domain, UART-IO power domain, SPI
power domain, IO-GPIO power domain, Clock IO power domain, USB IO power domain, and Processor Port IO power
domain. The always-on power domain hosts the power management controller, the wakeup sources, and their associated
logic.
Wakeup sources force a system in the suspend or standby state to switch to the normal power operation mode. These are
distributed across peripherals and configured in the always-on global configuration block. Some of them include level match
on level sensitive wakeup I/Os, toggle on edge sensitive wakeup I/Os, activity on the USB 2.0 data lines, OTG ID change,
LFPS detection on USB 3.0 RX lines, USB connect event, and watchdog timer-timeout event.
The always-on global configuration block runs off the standby clock and is turned off only in the lowest power state (core
power down).
At any instant, FX3 is in one of the four power modes: normal, suspend, standby, or core power down. When FX3 is actively
executing its tasks, the system is in normal mode. The clock gating techniques in peripherals minimize the overall power consumption.
On detecting prolonged periods of inactivity, the firmware can place FX3 in suspend mode. All ongoing port (peripheral) activities/transfers are completed, ports are disabled, and wakeup sources are set before entering the suspend state. In applications involving USB 3.0, the USB3 PHY is forced into the U3 state. USB2PHY, if used, is forced into suspend. The system
RAM transitions to a low-power mode in which read and write to RAM cannot be performed. The CPU is forced into the halt
state. The ARM core retains its state, including its program counter. All clocks except the 32-kHz standby are turned off by
disabling the system PLL through the global configuration block. In the absence of clocks, the I/O pins can be frozen to retain
their state as long as the I/O power domain is not turned off The INT# pin can be configured to indicate the presence of FX3
in low-power mode.
Further reduction in power is achieved by having the firmware place FX3 into the standby state, where in addition to disabling
clocks, the core power domain is turned off. As in suspend mode, the I/O states of powered peripheral I/O domains are frozen
and the ports are disabled. The essential configuration registers of logic blocks are first saved to the system RAM. Then the
system RAM itself is forced into the low-power memory retention only mode. The warm boot setting is enabled in the global
configuration block. Finally, the core is powered down. When FX3 comes out of standby, the CPU goes through a reset; the
bootloader senses the warm boot mode and restores the system to its original state after reloading the configuration values
(including the firmware resume point) from the system RAM.
Optionally, FX3 can be placed in core powered-down mode from standby mode, which also involves removing power from the
VDD pins. The contents of system SRAM are lost, and I/O pins retain their states, if suitably configured in the firmware. When
power is reapplied to the VDD pins, FX3 performs the normal power-on reset (POR) sequence.
4.3.3Reset
Resets in FX3 are classified into two categories: hard reset and soft reset.
4.3.4Hard Reset
A POR or a Reset# pin assertion initiates a hard reset. This sets all register bits to their default states and restarts the program but retains the states of all register bits.
4.3.5Soft Reset
A soft reset is generated by setting the appropriate bits in the GCTL_CONTROL register. There are two types of soft resets:
CPU reset and whole device reset.
■ CPU resets the CPU program counter. The firmware does not need to be reloaded following a CPU reset.
■ Whole device reset is identical to hard reset. The firmware must be reloaded following a whole device reset.
At the heart of the FX3 is a sophisticated, distributed DMA controller that is capable of moving data at 800 MBps that allows
high-performance data transfers between memories and peripherals without CPU intervention. Multiple Advanced Highperformance Buses (AHB, as defined by the ARM System Architecture) are used to interconnect the system elements. The
EZ-USB FX3 device architecture includes a DMA fabric that is used to route data between various peripheral interfaces and/
or the system memory of the device.
This chapter focuses on FX3 DMA transfer basics and the registers FX3 firmware uses to initialize and initiate DMA transfers.
For a more advanced and practical DMA usage model, including types of DMA channels and common data transfer
scenarios, refer to the “DMA Engine” section in the “FX3 Firmware” chapter of the
5.2DMA Features
The DMA subsystem in the FX3 device includes the following features:
■ Distributed DMA controllers
■ Data transfer support in either direction between:
■ Memory and peripheral
■ Peripheral and peripheral
■ Two gateways (virtual ports) of the same peripheral
■ Localized DMA adapter (local DMA controller) to each peripheral
FX3 Programmers Manual.
5.3DMA Block Diagram
Non-CPU-intervened data transfers between a peripheral and CPU (system memory) or between two different peripherals or
between two different gateways of the same peripheral are collectively referred to as DMA in FX3. All the data in the DMA
subsystem flows through the system memory.
The Advanced Microcontroller Bus Architecture - Advanced High Performance Bus (AMBA AHB) interconnect forms the
central nervous system of FX3. More details on AHB can be understood from the “Interconnect Fabric” section in the “FX3
Overview” chapter in
System AHB. All peripheral DMA paths connect to the DMA AHB. Bridges between the System bus and the DMA bus are
essential in routing the DMA traffic through the System memory. The width of a peripheral connection to the AHB determines
its throughput. The peripheral core implements the actual logic of the peripheral (I2C, GPIF, and USB).
FX3 Programmer's Manual. Figure 5-1 shows how the CPU accesses the System Memory using the
Figure 5-1. Block Diagram of FX3 DMA Subsystem
Peripheral1
PeripheralCore
logic
DMAAdapter
Peripheral2
PeripheralCore
logic
DMAAdapter
I/OMatrix
I/OPads
I/OMatrix
I/OPads
SystemMemory
CPU
Bridge
DMA AHB
Peripheral 1 AHB
Peripheral 2 AHB
System AHB
FX3
FX3 DMA Subsystem
5.4DMA Overview
FX3 contains a standard, configurable, DMA adapter that is replicated for each DMA-capable peripheral, as depicted in
Figure 5-2. This architecture provides the FX3 distributed DMA controllers. There is only one DMA adapter for all low-
performance peripherals. The DMA adapter is essentially a local DMA controller that initiates DMA transactions to and from
the system memory on behalf of the peripheral that it services. With hardware synchronization between DMA adapters, data
transfers can occur seamlessly between peripherals.
The FX3 DMA subsystem runs on an internal DMA bus clock, dma_bus_clk_i, that is divided down from the CPU clock. The
DMA bus clock divider value is determined by the DMA_DIV field of GCTL_CPU_CLK_CFG, as shown below:
DMA Bus clock divider = (GCTL_CPU_CLK_CFG.DMA_DIV + 1)
See GCTL_CPU_CLK_CFG register on page 260
Divide by 1 (i.e. GCTL_CPU_CLK_CFG.DMA_DIV = 0) is illegal and will result in undefined behavior. Thus, the range of
allowed divider values is 2 to 16 (GCTL_CPU_CLK_CFG.DMA_DIV => 1 to 15).
A typical dma_bus_clk_i frequency is set to one-half the CPU clock during device initialization. For example, if the CPU clock
is set to 192 MHz, the register setting GCTL_CPU_CLK_CFG.DMA_DIV=1 (divider = 2) will result in a 96 MHz of
dma_bus_clk_i frequency. Tabl e 5 -1 summarizes the DMA clock information.
Note: The maximum DMA clock frequency is half of the CPU clock frequency, as the minimum allowed divider value is '2'.
Therefore, reducing the CPU clock frequency will result in reducing the DMA clock frequency and limit system performance.
The default clock settings for FX3 are:
■ CPU clock = System clock / 2
■ DMA clock = CPU clock / 2
It is recommended that the default clock settings be retained in all cases.
Table 5-1. DMA Clock
DomainTy p/ Ma x Fre qConfiguration Register SourceDescription
The CPU, DMA, and MMIO clock domains are synchronous to each other. However, every peripheral assumes its core clock
to be fully asynchronous from other peripheral core clocks, the computing clock, or the wakeup clock.
If the core (peripheral) clock is faster than the bus clock, the DMA adapter for the block runs in the core clock domain and the
DMA adapter reconciles the clocks on its interconnect side. If the core clock is slower than the bus clock, the DMA adapter for
that block runs in the bus clock domain and the DMA adapter reconciles the clocks on its core IP side. This is shown in
Figure 5-3.
Figure 5-3. DMA Adapter Clock
5.5.2Descriptors Buffers, and Sockets
DMA descriptors are DMA instructions in a set of registers allocated in the FX3 RAM. A DMA descriptor holds information
about the address and size of the DMA buffer as well as pointers to the next DMA Descriptor. These pointers create DMA
descriptor chains. Descriptors enable the synchronization between sockets as described below.
A DMA buffer is a section of RAM used for intermediate storage of data transferred through the FX3 device. DMA buffers are
allocated from the system RAM by the FX3 firmware; their addresses are stored as part of DMA descriptors. Every buffer
created in the system memory has a descriptor associated with it that contains buffer information such as its address, empty/
full status, and the next buffer/descriptor in the chain.
A socket is a point of connection between a peripheral hardware block and the FX3 RAM. Each peripheral hardware block on
FX3 such as USB, GPIF, UART, and SPI has a fixed number of sockets associated with it. The number of parallel data
channels through a peripheral is equal to the number of its sockets. The socket implementation includes a set of registers that
point to the active DMA descriptor and the enable or flag interrupts associated with the socket. Sockets can directly signal
each other through events or they can signal the FX3 CPU via interrupts. This signaling is configured by firmware.
5.5.3DMA Descriptors
Descriptors are data structures that keep track of the resources (memory buffers and sockets) used for a DMA transfer. This
data structure is directly interpreted by the DMA hardware on FX3, and has to be located in a specific memory region of the
FX3 RAM as described in the Memory and System Interconnect chapter on page 44. During a transfer, descriptors are loaded
into the active socket one at a time for execution.
Figure 5-4. FX3 DMA Descriptor Structure
BUFFER_ADDRESS
PROD_IP_NUMPROD_SCKCONS_IP_NUMCONS_SCK
PROD_NEXT_DSCRCONS_NEXT_DSCR
BYTE_COUNT
BUFFER_SIZE
Each descriptor contains four 32-bit words. Figure 5-4 details the fields of the data structure.
DSCR_BUFFER provides the pointer of the buffer used for this transfer.
DSCR_SYNC defines the source and destination of this transfer. Depending on the use case, the event generation and
interrupt can be enabled for either or both the consumer and producer half. Events are used to signal the peer socket,
whereas interrupts are used to notify the CPU
A unique IP_NUM, which is assigned to each DMA-capable peripheral, is used for the source (producer) and destination
(consumer) of the transfer. A unique IP_NUM, which is assigned to each DMA-capable peripheral, is used for the source
(producer) and destination (consumer) of the transfer.
IP_NUM together with the socket number determines the actual socket identity. Ta ble 5 -2 details how sockets are identified
by IP identification and socket number.
DSCR_CHAIN contains the pointers to next descriptors for the producer and consumer respectively.
DSCR_SIZE defines the size of the buffer and transfer byte count. The specific transfer status can also be set or monitored
by the following bits:
■ BUFFER_OCCUPIED indicates that data is available in the associated buffer.
■ BUFFER_ERROR indicates whether the data is valid or in error.
❐ EOP marks this transfer with end-of-packet. The socket can use this to suspend an EOP condition.
■ MARKER is simply a software indicator for this specific transfer.
See the following register descriptions:
■ DSCR_BUFFER on page 616
■ DSCR_SYNC on page 617
■ DSCR_CHAIN on page 619
■ DSCR_SIZE on page 620
The following C code example is the DMA descriptor data structure used in the FX3 SDK.
/** \brief Descriptor data structure.
**Description**\n
This data structure contains the fields that make up a DMA descriptor on
the FX3 device.
Each structure member is composed of multiple fields as shown below. Refer
to the sock_regs.h header file for the definitions used.
**\see\n
*\see CyU3PDmaDscrGetConfig
*\see CyU3PDmaDscrSetConfig
*/
typedef struct CyU3PDmaDescriptor_t
{
uint8_t *buffer; /**< Pointer to buffer used. */
uint32_t sync; /**< Consumer, Producer binding. */
uint32_t chain; /**< Next descriptor links. */
uint32_t size; /**< Current and maximum sizes of buffer. */
} CyU3PDmaDescriptor_t;
The actual DMA descriptors are maintained in the FX3 System RAM starting from address 0x40000010. The end of this
space is not defined by hardware and is determined by the firmware. There are a fixed number of fixed-size descriptors in a
fixed location in the main memory. Because of this, the hardware can directly access a descriptor with its descriptor number.
Firmware is responsible for setting up each of the descriptors and linking them to each other and to the sockets. DMA
adapters will load the content of the active DMA descriptor in the socket register region as and when required.
5.5.4DMA Buffer
DMA buffers are data buffers allocated in the system memory used for DMA. They can be of any size within the memory
region and byte aligned. However, if the ARM data cache is enabled, it requires that the full buffer must be 32-byte aligned
and of a size that is a multiple of 32 bytes.
A data packet contained in the buffer may be smaller than one buffer or even of zero size. A packet may be split over multiple
buffers, or multiple packets can be in one buffer.
Every buffer created in the System memory has a descriptor associated with it. Descriptors are thus associated with buffers
and have a producer, a consumer or both, or neither. A descriptor without a producer and consumer is called "free".
Synchronization between producers and consumers happens at the level of descriptors and buffers. Multiple descriptors may
refer to the same buffer. When more than one descriptor is used to describe the same buffer, it is the responsibility of
FIRMWARE to make sure that the descriptors are coherent and synchronized with respect to each other. Descriptors contain
meta-data about the data in the buffer (in addition to the location and size), in particular the number of valid bytes (refer to
Figure 5-4 and DSCR_SIZE on page 620).
Since a 12-bit field (DSCR_SIZE.BUFFER_SIZE) is used to indicate the size of the DMA buffers in multiples of 16 bytes, the
maximum size of an individual DMA buffer can be as much as 0xFFF0 bytes, as depicted below. Multiple such DMA buffers
can be allocated for a single DMA transfer.
FX3 uses the same 512 KB of system RAM for code and data storage as well as for memory buffers that are used for DMA
transfers into or out of the device. The ARM 926EJ-S core on the device also includes an 8 KB data cache. The data cache is
four-way set associative with a cache-line size of 32 bytes and two dirty bits (one for each 16-byte region) per cache line. The
use of the data cache will give a good performance boost to any application that involves firmware access of the data buffer
contents. However, this also leads to a risk of memory corruption in the cases where a cache line contains both software data
structures and DMA data content. The following three sub-sections deal with this issue.
5.5.4.2Memory Corruption Due to Cache Line Overlap
Consider the scenario represented in Figure 5-5. In this case, both cache lines 0 and 3 have an overlap of firmware data
along with the DMA buffer space. If the DMA buffer is being filled with data by any of the hardware blocks on the device, and
the firmware needs to access this data, memory corruption can happen as described below.
1. If the firmware tries to access the data in the buffer directly, stale memory content from the cache will be retrieved.
2. If the firmware flushes the cache line(s) and then accesses the data in the buffer, any updates to the firmware data structures will be lost.
3. If the firmware cleans the cache line(s) and then accesses the data buffer, the part of DMA buffer that shares a cache line
with the firmware data will get corrupted.
Figure 5-5. Unsafe Overlap of Firmware Data with DMA Buffer
The firmware needs to ensure that the software data structures and DMA buffers do not share cache lines to prevent these
problems.
5.5.4.3Safe Usage of Data Cache
Consider the scenario in Figure 5-6. In this case, the DMA buffer does not share any cache lines with the firmware data.
Figure 5-6. Safe Overlap of Firmware Data with DMA Buffer
In this case, there is no possibility of the CPU and/or hardware seeing bad data due to the data cache. Whenever the CPU
wants to read a data buffer that has been filled by the DMA hardware, it can flush the corresponding region from the cache
and then initiate a read. Whenever the CPU wants to commit a buffer containing data for an egress DMA operation, it can
clean the region from the cache and then initiate the DMA operation.
5.5.4.4ALIGNMENT REQUIREMENT - How Not To Share Cache Lines
The basic requirement is that DMA buffers that may be modified by the hardware should not share a cache line with software
data elements. This translates to a requirement that all DMA buffers which may be used for ingress (data coming in to FX3
and being written into memory by the DMA hardware) transfers should be 32-byte aligned and occupy an integral number of
cache lines (32 bytes each).
This restriction only applies to DMA buffers that may be used for ingress data transfers. The restriction does not apply to DMA
buffers that are used only as a source of egress data.
5.5.5Sockets
A socket is the unidirectional virtual port (gateway) used by a peripheral (IP) block to transfer data to/ from the system SRAM.
Each DMA transfer involves one or two sockets. A socket represents either the consuming or the producing half of a transfer.
For a transfer from one peripheral to another, two sockets are involved. A socket is either a consuming socket or a producing
socket at any point in time-not both at the same time.
An FX3 DMA-capable peripheral has multiple sockets in the DMA adapter. The number of sockets and their properties
depend on the specific DMA adapter to the peripheral. Each peripheral block (IP block) in the device can support a predefined
number of sockets which is the maximum number of independent data flows that can be done through that IP at a given point
of time. A producer (ingress) socket is one which moves data from the IP block to the system SRAM. A consumer (egress)
socket is one which takes data from the system SRAM and moves it out through the IP block. Each socket can be identified
with the IP number and the socket number. Ta bl e 5 -2 is the socket summary for each FX3 peripheral.
Note: Separate DMA adapters are used for USB IN and OUT endpoints to allow greater USB data bandwidth. Interrupts from
both adapters are combined into a single interrupt vector.
Table 5-2. Peripheral DMA Sockets
FX3 Peripheral/
IP block
GPIF II0x0132
USB0x0316Socket 0-15 for USB Egress: Egress Only
USB-IN0x0416Socket 0-15 for USB Ingress: Ingress Only
Storage0x028All are BidirectionalFX3S Only
Serial Peripherals
(UART, I2C, I2S, SPI)
CPU0x3F2
Each socket has its own register set that the firmware uses to control the DMA operations. The socket register base address
is located at offset 0x8000 from the base address of the peripheral.
IP_NUMSocket CountSpecific PropertyNotes
Socket 0-15: Bidirectional Socket
16-31: Ingress only
This maps to IN endpoints where FX3 sends
data out.
This maps to OUT endpoints where FX3
receives data.
Socket 0-1 for I2S: Egress only
Socket 2 for I2C data out: Egress only
0x008
Socket 3 for UART data out: Egress only
Socket 4 for SPI data out: Egress only
Socket 5 for I2C data in: Ingress only
Socket 6 for UART data in: Ingress only
Socket 7 for SPI data in: Ingress only
Socket 0 for CPU data in
Socket 1 for CPU data out
The purpose of each socket in this adapter
is fixed and cannot be changed.
Each register set occupies a 128-byte address space with some gaps in between. Figure 5-7 details the socket registers and
their field definitions.
SCK_DSCR on page 605 describes the current descriptor to be loaded for this socket.
SCK_SIZE on page 607 sets the amount of data to be transferred. A zero value in this register means the data amount is
SCK_INTR_MASK
0x18
0x1C
0x20DSCR_BUFFER
0x24DSCR_SYNC
0x28DSCR_CHAI N
0x2CDSCR_SIZE
0x30
…
0x78
0x7C
EVENT_ Type
EVENT
T
T
SCK_COUNT on page 608 reports the amount of data transferred.
SCK_STATUS on page 609 is the socket control and status register.
SCK_INTR on page 612 is the interrupt request register that indicates the status of the interrupt source. The same register is
used to clear the interrupt status.
SCK_INTR_MASK on page 614 is used to enable the interrupt source to CPU.
DSCR_BUFFER on page 616, DSCR_SYNC on page 617, DSCR_CHAIN on page 619, and DSCR_SIZE on page 620
indicate the currently loaded active descriptor in the socket. Figure 5-4 details the field definition of the descriptor.
EVENT is the socket-event communication register. FX3 DMA supports two possible events:
■ Consume event: Data has been read out of the buffer.
■ Produce event: Data has been filled into the buffer.
When a DMA transfer involves two sockets, the producer socket sends a produce event to the peer consumer socket after
data is written to the buffer. Similarly, the consumer socket sends a consume event to the peer producer socket after the data
is read from the buffer. The event signaling between the two sockets is usually handled by hardware, and the CPU is not
involved through the entire transfer. However, there are situations when the firmware needs to generate the event manually
by writing the appropriate event in the EVENT register.
Figure 5-7. FX3 DMA Socket Registers
DSCR_COUN
Reserved
Reserved
BUFFER_ADDRESS
TRANS_SIZE
TRANS_COUNT
Rese rved
Currentlyloadedactivedescriptor
Rese rved
DSCR_NUMBERDSCR_LOW
AVL_MINReservedSTATE
AVL_COUN
CON SUME_EVEN
ACTIVE_DSCRReserved
For detailed field description, see the following:
Each socket can be identified to be in one of the states listed in Tab l e 5- 2 . The SCK_STATUS.STATE field can be read to
understand the socket state. More on the socket states will be discussed later.
Table 5-3. Socket States
Socket State
DESCR0
STALL1Stall state. The socket is stalled, waiting for data to be loaded into the Fetch Queue or waiting for an event.
ACTIVE2Active state. The socket is available for core data transfers.
EVENT3
CHECK14
SUSPENDED5The socket is suspended
CHECK26
WAITING7Waiting for confirmation that the event was sent.
SCK_STATUS.STATE
Value
Description
Descriptor state. This is the default initial state indicating the descriptor registers are NOT valid in the
adapter. The adapter will start loading the descriptor from the memory if the socket becomes enabled and
not suspended. Suspend has no effect on any other state.
Event state. Core transfer is done. The descriptor is being written back into the memory and an event is
being generated if enabled.
Check states. An active socket gets here based on the core's EOP request to check the transfer size and
determine whether the buffer should be wrapped up. Depending on result, the socket will either go back to
the Active state or move to the Event state.
Check states. An active socket gets here based on the core's EOP request to check the transfer size and
determine whether the buffer should be wrapped up. Depending on result, the socket will either go back to
the Active state or move to the Event state.
The following C code example shows the DMA socket data structure used in the FX3 SDK.
/** \brief DMA socket register structure.
**Description**\n
Each hardware block on the FX3 device implements a number of DMA sockets through
which it handles data transfers with the external world. Each DMA socket serves as
an endpoint for an independent data stream going through the hardware block.
Each socket has a set of registers associated with it, that reflect the configuration
and status information for that socket. The CyU3PDmaSocket structure is a replica
of the config/status registers for a socket and is designed to perform socket configu ration
and status checks directly from firmware.
See the sock_regs.h header file for the definitions of the fields that make up each
of these registers.
**\see
*\see CyU3PDmaSocketConfig_t
*/
typedef struct CyU3PDmaSocket_t
{
uvint32_t dscrChain; /**< The descriptor chain associated with the socket
*/
uvint32_t xferSize; /**< The transfer size requested for this socket. The
size can
be specified in bytes or in terms of number of
buffers,
depending on the UNIT field in the status value. */
uvint32_t xferCount; /**< The completed transfer count for this socket. */
uvint32_t status; /**< Socket configuration and status register. */
Sockets are to be initialized, inspected, and modified by firmware as described in this section.
5.5.5.2Initializing a Socket
Sockets can be initialized only when in the IDLE state, that is, SCK_STATUS.ENABLED=0. If this is not the case, the socket
must first be terminated (see 5.5.5.3 Terminating a Socket on page 68).
The general procedure is as follows:
1. Descriptors are allocated or located and initialized in memory. These descriptors are chained appropriately.
2. The SCK_xxx registers are initialized with the proper configuration values. This includes SCK_DSCR, which contains the
number of the first descriptor to be loaded. DSCR_xxx registers are not initialized - the DMA adapter will load those by
itself.
3. SCK_STATUS.GO_ENABLE is set to '1' to activate the socket (if so desired).
5.5.5.3Terminating a Socket
Sockets can be terminated at any time. If a socket is active, its activities will be aborted after an unspecified amount of time.
The general procedure is as follows:
1. SCK_STATUS.GO_ENABLE is cleared to '0'. The IP will continue to perform an unspecified amount of its pending activity.
2. It is permissible to write '0' multiple times to SCK_STATUS.GO_ENABLE while the socket is being terminated, but it is illegal to write '1' to it during this time.
3. SCK_STATUS.ENABLED is read back until its value changes to '0'. No further activity emanates from this socket after
SCK_STATUS.enabled is observed as cleared.
5.5.5.4Modifying or Suspending a Socket
Sockets are normally modified safely only when in the suspend state. A socket that is active or stalled must first be
suspended. A socket that is suspended will not complete an ongoing transfer, but rather go into the suspended state almost
immediately. It is possible though, that going into the suspend state safely takes a noticeable (but small) number of cycles.
The general procedure is as follows:
1. SCK_STATUS.GO_SUSPEND is set to '1'.
2. SCK_STATUS.SUSPENDED is read back until its value changes to '1' or SCK_INTR.SUSPEND is used as the interrupt
source to indicate the the suspend status.
3. Any changes are made to SCK_xxx or DSCR_xxx registers.
4. SCK_STATUS.GO_SUSPEND is cleared. If SCK_DSCR.active_dscr was modified, the socket will load the new descriptor from the memory[ otherwise it will resume the operation using the current contents of DSCR_xxx.
Note that SCK_STATUS.SUSPENDED will only take effect after the transfer of the current buffer completes. This may take a
long time.
Note that it is also possible to modify sockets that are in the stalled state, provided that it is known that no synchronization
EVENT(s) will occur. This is normally the case when the socket is waiting for firmware to generate an event, that is, the socket
is not coupled to another socket in another adapter. In this case it is not necessary to first suspend the socket.
Sockets
* Each IP block has its own sockets
* Sockets can use same or different descriptors
* Sockets use circular or linear list of descriptors
* Unlimited number of descriptors per socket
IP Block A
Descriptor #0
Descriptor #5
Descriptor #9
Descriptor #315
Socket #M
Socket #0
Buffer #0
Buffer #0
Buffer #0
Buffer #0
IP Block B
Descriptor #0
Descriptor #7
Descriptor #9
Descriptor #278
Socket #N
Socket #0
Descriptors
* Descriptors point to exactly one buffer
* Descriptors are either consuming or producing
* Multiple descriptors can point to the same buffer
* Synchronization is done by using same descriptor
* Descriptors have status (free/occupied, #bytes, ...)
Buffers
* Buffers are regions in main memory
* Buffers can have any size (in bytes)
* Buffers can be anywhere (byte aligned)
* Buffer pool is managed in software
* Multiple descriptors can use the same buffer
* Buffers may contain packets smaller than its size
* Buffers may contain zero length packets
5.5.5.5Inspecting a Socket
Sockets can be inspected at any time. When a socket is active, values are not guaranteed to be accurate due to the activity
and clock domain crossing issues. The general procedure is as follows:
1. Any value in SCK_xxx or DSCR_xxx registers is read twice in succession.
2. If the values of two reads match they are accurate. If they are different, go back to step 1.
5.5.5.6Wrapping Up a Socket
A socket that is in the Active state, but that is not expected to send/receive any more data can be forcibly wrapped up by
asserting SCK_STATUS.WRAPUP. This should never be done for sockets that are not in the Active state or that may send/
receive data. This option is normally used when the core IP has transitioned into an error or partial completion state and the
firmware is required to clean up the remaining, unexecuted, portion of the transfer.
5.5.6Illustration of Descriptor, Buffer and Socket Usage
Figure 5-8 below sums up the concepts discussed in sections 5.5.2 Descriptors Buffers, and Sockets on page 61 to 5.5.5
Sockets on page 65.
Figure 5-8. Sockets, Descriptors, and Buffers
5.5.7Understanding DMA Operation: Peripheral to Peripheral
This section explains in a high level how DMA descriptors, buffers, and sockets are tied together to achieve the required DMA
operation, with the help of an example peripheral-to-peripheral DMA operation.
1. When a producer has filled a buffer (an end-of-packet or a buffer-full event occurred - note that packets can be empty), it
updates the descriptor associated with the buffer in the main memory to indicate that the buffer is full (DMA-to-memory
write transaction).
2. The producer then sends a produce event to the consuming socket of its descriptor (DMA-to-MMIO write transaction).
This event contains the number of the descriptor to which it relates.
3. The producer then loads the next descriptor for the socket and makes it active (DMA-to-memory read transaction).
4. If the new descriptor indicates its buffer space is occupied, the socket stalls. If a consume event is received for the current
descriptor, the descriptor is either updated using the data in the event or reloaded from the memory. (DMA-to-memory
read transaction). DMA data write transfers may resume. Go to step (1)
5. If no descriptor is available (next_dscr=0xFFFF), the socket is suspended. When the software extends the descriptor list
and explicitly 'resumes', the IP block operation continues. Go to step (3).
The consumer behavior is exactly symmetric:
1. When a consumer has emptied a buffer (an end-of-packet or a buffer empty event occurred), it updates the descriptor
associated with the buffer in the main memory to indicate that the buffer is empty and available (DMA-to memory write
transaction).
2. The consumer then sends a consume event to the producing socket of its descriptor (DMA-to-MMIO write transaction).
This event contains the number of the descriptor to which it relates.
3. The consumer then loads the next descriptor for the socket and makes it active (DMA-to-memory read transaction).
4. If the new descriptor indicates its buffer is empty, the socket stalls. If a produce event is received for the current descriptor,
the descriptor is either updated using the data from the event or reloaded from the memory. (DMA-to-memory read transaction). DMA data read transfers may resume. Go to step (1)
5. If no descriptor is available the socket is suspended. When the software extends the descriptor list and explicitly
'resumes', the IP block operation continues. Go to step (3).
Refer to the section "Setting up the DMA System" in AN75779, where it is explained graphically in a high-level how the
socket, descriptor, and buffers are tied together in the FX3 DMA system.
5.5.8Interrupt Requests
Each DMA-capable peripheral block has dedicated global interrupt request lines to the PL192 VIC. Refer to 2.3.1.8 Vectored
Interrupt Controller on page 40 for more details. Table 2-4 on page 41 in 2.3.1.8 Vectored Interrupt Controller on page 40 lists
the various FX3 interrupt sources. Ta b le 5 - 4 describes the DMA interrupt lines.
Table 5-4. Global DMA Interrupt Request to VIC
InterruptVIC LineInterrupt SourceDescription
GPIF_DMA6GPIF DMA adapterDMA socket interrupt from the GPIF block
USB_DMA8USB DMA adapter
STORAGE_DMA11Storage (SD/MMC) interface DMA adapterApplies only to FX3S devices
PERIPH_DMA20Serial peripheral block DMA adapter
Applies to both USB device and host mode operation
5.5.9DMA Interrupts
Although DMA transfer takes place independently from CPU execution, the CPU may need to be notified when certain
transfer conditions are met, when an error has occurred, or when the transfer is complete. See SCK_INTR on page 623.
Each peripheral has a global SCK_INTR register, in which each bit represents the socket number that generates the interrupt.
The bit description is as described above. There is also a per-socket SCK_INTR in which each bit represents the interrupt
source from the corresponding socket. Bit description is explained in 5.5.5 Sockets on page 65. The logical OR of all socket
interrupts presented in the per-socket SCK_INTR register represents the corresponding bit in the peripheral global
SCK_INTR register. This is illustrated in Figure 5-10.
Figure 5-10. Global and Per-Socket SCK_INTR Register
When a DMA interrupt occurs, the CPU is notified by VIC with the DMA interrupt line specific to the peripheral, as shown in
Table 5-4. Then the CPU can check the peripheral's global SCK_INTR register to find the socket number that needs attention.
The Global SCK_INTR register is read-only and the interrupt bits are cleared by clearing the interrupt cause or bit in the persocket SCK_INTR register itself. Once the CPU finds the socket number, it can find the source of the interrupt from the persocket SCK_INTR register of the corresponding socket.
5.6Programming Sequence
5.6.1Initialization
The default state of most sockets is indicated in the register map. Briefly, in the default state the socket is disabled and holds
no descriptor. Sockets are initialized as described in the 5.5.5.2 Initializing a Socket on page 68 .The same steps are detailed
below.
1. Initialize and allocate descriptor(s) in the main memory. The base address for descriptor allocation starts at 0x40000010.
In addition to specifying where the data buffer is located, each descriptor data structure must be configured properly with
its associated peripherals, sockets, and event/interrupt flags for both consumer and producer halves. If a list of descriptors
is used, descriptors can be chained with DSCR_CHAIN field of the descriptor structure. The DSCR_CHAIN specifies the
next descriptor number in the chain for both consumer and producer. A value of 0xFFFF terminates the descriptor chain.
2. Initialize sockets with SCK_xxx registers. In addition to specifying the associated descriptor (or top of the descriptor chain)
for the socket, these registers control how the socket behaves during the transfer, that is, ., interrupt/event generation and
current status of the socket.
3. Enable socket(s). Sockets can be enabled by writing the SCK_STATUS.go_enable bit. Once socket is enabled, it loads
the descriptor specified in SCK_DSCR register (top of descriptor chain) and starts the transfer accordingly.
5.6.1.1Producer Half
When the socket for the producer half is enabled, it loads the descriptor specified in the SCK_DSCR register and makes it
active. With a valid buffer specified by the active descriptor, the socket goes to the active state and starts to fill the data buffer.
When the producer socket has filled the buffer, it updates the active descriptor associated with the buffer in main memory and
then sends a produce event to its peer consuming socket defined in the dscr_sync field of the descriptor structure along with
the descriptor number itself.
If there is a valid peripheral consumer socket specified in the descriptor, the producer notifies the consumer socket by writing
to the EVENT register in the consumer socket. The EVENT value will indicate the DMA descriptor that has been updated and
specify that a PRODUCE event is being sent.
If the descriptor does not specify a valid consumer socket, it is the firmware's responsibility to identify and work on the
descriptor that has been produced. The SCK_INTR_MASK.PRODUCE_EVENT bit can be set to enable notification of
produce events to the CPU through the DMA interrupt. The firmware can then take appropriate actions to use the data that
has been received.
The producer socket then loads the next descriptor in the chain and makes it active. If the descriptor indicates its buffer space
is not available, the socket goes to the stall state. When the buffer is available as indicated from the socket's EVENT register,
the socket becomes active again and starts to fill the buffer pointed by the active descriptor.
If a consume event is received for the current descriptor, the descriptor is either updated using data in the EVENT register or
reloaded from memory.
If no descriptor is available (next_dscr=0xFFFF), the socket goes to the suspend state.
A DMA transfer from a peripheral to the system memory involves only the producer half.
5.6.1.2Consumer Half
Similar to the producer half, when the socket for the consumer half is enabled, it loads the descriptor specified in the
SCK_DSCR register and makes it active. If the consumer socket is enabled at the same time as the producer socket, most
likely the buffer is waiting to be filled by the producer and is not available for the consumer socket. In this case, the consumer
socket goes to the stall state and waits for the buffer to become available upon a produce event.
When the buffer is available, it goes to the active state and starts consuming data in the buffer. When the consumer socket
has emptied a buffer, it updates the descriptor associated with the buffer in main memory and then sends a consume event to
its peer producer socket defined in the dscr_sync field of the descriptor structure, along with the descriptor number itself.
If the descriptor does not specify a valid producer socket, it is the firmware's responsibility to populate the data buffer in the
next descriptor in the descriptor chain. The SCK_INTR_MASK.CONSUME_EVENT bit can be set to enable notification of
consume events to the CPU through the DMA interrupt. The firmware can then take appropriate actions to populate the next
buffer.
The consumer socket then loads the next descriptor in the chain and makes it active. If the descriptor indicates its buffer
space is empty, the socket goes to the stall state. Until the buffer is filled as indicated from the socket's EVENT register, the
socket becomes active again and starts to empty the buffer pointed by the active descriptor.
If a produce event is received for the current descriptor, the descriptor is either updated using data from the event or reloaded
from memory.
If no descriptor is available (next_dscr=0xFFFF), the socket goes to the suspend state.
A DMA transfer from the system memory to a peripheral involves only the consumer half.
5.6.2Peripheral to Peripheral Transfer
A DMA transfer from one peripheral to another peripheral involves both the producer and consumer halves. The sequence of
events for producer and consumer are identical to that when they are standalone, except when the producer half completes
the transfer, a produce event will be sent from the producer socket to the peer consumer socket to trigger the consumer half.
In this case, the whole peripheral to peripheral transfer can take place without CPU intervention in the transfer itself.
Figure 5-11 depicts the connection model used to describe peripheral to peripheral transfers. It also considers the software
drivers for the ingress and egress peripherals (that are mutually independent) and the higher level s/w that manages the
endpoint.
buf = (uint32_t*)address;
if (direction)
s = (PSCK_T)&UIB->sck[epNum & 0x0F];
else
s = (PSCK_T)&UIBIN->sck[epNum & 0x0F];
/* Initializing socket */
id = ((uint32_t)s>>16) & 0x1f; /* use bit16-20 to decode IP_NUM */
ch = ((uint32_t)s>>7) & 0x1f; /* use bit7-11 to decode socket number */
s->status = CY_U3P_LPP_SCK_STATUS_DEFAULT; /* Default SCK_STATUS register setting. Refer
SCK_STATUS register description */
while (s->status & CY_U3P_LPP_ENABLED) /* checking if the socket is active */
__nop ();
s->status = CY_U3P_LPP_SCK_STATUS_DEFAULT | CY_U3P_LPP_UNIT; /* setting SCK_STATUS.unit
= 1*/
s->intr = 0xFF; /* clear all previous interrupt */
s->dscr = DSCR_ADDR(dscr); /* Loading the first descriptor (top of descriptor
chain) */
s->size = 1;
s->count = 0;
if (((epNum & 0x0F) == 0) && (IsNewCtrlRqtReceived ()))
{
/* This request has been aborted due to a new control request. Just reset
the USB socket and return an error. */
Although it is rarely needed, it is possible to interpose a CPU intervention in between an ingress and egress DMA adapter.
Conceptually, this means the CPU (the software component) acts as the consumer agent to the producing (ingress) adapter
and acts as the producer agent to the consuming (egress) adapter. This model is depicted in Figure 5-12.
Figure 5-12. CPU Intervention in DMA Transfer Path
This mode will have a significant negative performance impact on high-bandwidth transfers because the CPU gets into the
critical path of every buffer transferred. This mode should only be used to handle special case stream requirements or to
implement processing of the actual data by the CPU such as DSP applications.
A DMA channel is a software construct that encapsulates all of the DMA elements used (discussed so far in this chapter) in a
single data flow. The DMA manager in the FX3 firmware library introduces the notion of a DMA channel that encapsulates the
hardware resources such as sockets, buffers and descriptors used for handling a data flow through the device. The channel
concept is used to hide the complexity of configuring all of these resources in a consistent manner. The DMA manager
provides API functions that can be used to create data flows between any two interfaces on the FX3 device.
The DMA channel implementation where sockets can directly signal each other through events or can signal the FX3 CPU via
interrupts, when configured by firmware, is called automatic DMA channel. Alternatively, when there is CPU intervention as
explained in section DMA Features on page 58, it is called a manual DMA channel. For more details on DMA channels and
types of DMA channels supported, refer DMA Engine section in FX3 Programmer’s Manual.
USB is a successful peripheral interconnect defined and heavily adopted in consumer electronics and PC peripherals. The
first version of the specification, USB 1.0, released in 1996, defined two transfer speeds to address the different types of
devices available at that time: 1.5 Mbps (Low Speed) to address devices such as keyboards and joysticks; and 12 Mbps (Full
Speed) to address devices such as disk drives. The USB 2.0 specification, released in 2000, supports a maximum signaling
rate of 480 Mbps (High Speed), which is 40 times the signaling rate of Full Speed. With the introduction of USB OTG, USB
went beyond a peripheral interconnect. It enabled printers to use USB to directly connect to cameras and PDAs to use USBconnected keyboards and mice. The new generation USB 3.0 is the next revolution in device interconnect technology. It
features the same ease of use and flexibility that users expect, at a much higher (5-Gbps) data rate and advanced power
management.
6.2Features
The FX3 USB subsystem features the following controllers to support many advanced features of the USB standard:
■ USB Interface Block (UIB)
❐ USB 3.0 function controller
❐ USB 2.0 function controller
❐ USB 2.0 embedded host controller
❐ USB 2.0 OTG controller
■ USB I/O system
❐ Dedicated USB 2.0 OTG PHY and USB 3.0 PHY
❐ Integrated voltage regulator
The USB 3.0 and USB 2.0 subsystems have dedicated transceivers, but they share the same I/O interconnect in the back
end. This means that only one of the USB 3.0 or USB 2.0 controller can be active at a time.
The FX3 USB subsystem supports the following modes of operation:
■ USB 3.0 peripheral in SuperSpeed (5 Gbps)
■ USB 2.0 peripheral in High/Full Speed (480/12 Mbps, respectively)
■ USB 2.0 host in High/Full/Low Speed (480/12/1.5 Mbps respectively), with one downstream port; the hub is not supported
■ USB 2.0 OTG dual-role device (DRD), with Host Negotiation Protocol (HNP) and Session Request Protocol (SRP)
support
6.3Block Diagram
Figure 6-1 shows the top-level block diagram of the FX3 USB subsystem.
The FX3 USB Interface Block (UIB) includes the following components:
6.4.2USB 3.0 Function Controller
The USB 3.0 function controller implements both the link and protocol layers of the USB 3.0 specification.
6.4.3USB 2.0 Function Controller
The USB 2.0 function controller implements the protocol layer of the USB 2.0 specification with an integrated serial interface
engine (SIE) and a token processor (TP).
6.4.4USB 2.0 Embedded Host
The FX3 USB 2.0 embedded host is simpler than a full-featured PC-based controller. Embedded USB hosts are defined to
support a limited peripheral list and to operate with limited memory (compared to a PC). In essence, the host controller
functions similar to a device controller, with added scheduling capability to control and initiate data traffic on the bus. The USB
2.0 embedded host includes the following features:
■ EHCI-like interface in high-speed mode (EHCI is the PC-based Enhanced Host Controller Interface)
■ OHCI-like interface in full- and low-speed modes (OHCI is the PC-based Open Host Controller Interface)
■ Support for point-to-point communications with one downstream port
■ Performance of all transaction scheduling in hardware
■ DMA adapter interface common to other FX3 peripherals
■ Fixed, one-to-one unidirectional endpoint to DMA socket mapping
■ Shared hardware endpoint managers (EPMs) among USB 2.0 device, USB 2.0 host, and USB 3.0 device controllers
6.4.5USB OTG Controller
The FX3 USB subsystem is capable of supporting a USB 2.0 host, USB 2.0 peripheral, and a USB 3.0 peripheral. The FX3
OTG controller has global control of these functions. Dynamic role swapping is limited to USB 2.0 only by enabling the
appropriate USB 2.0 host or peripheral controller. The necessary control interface of the OTG controller facilitates:
■ Global control of the embedded host, and the USB 2.0 device function
■ Session Request Protocol (SRP) support per the OTG 2.0 specification
■ Host Negotiation Protocol (HNP) support per the OTG 2.0 specification
6.4.6End-Point Memory
The end-point memory (EPM) supports data transfers through the USB 2.0 host controller, USB 2.0 function controller, and
USB 3.0 function controller blocks. It also supports the USB 3.0 bulk stream protocol. Two EPM units are available, dedicated
to each data direction.
6.4.7DMA Adapters
The UIB has two dedicated DMA adapters that manage all DMA data flow in and out of the UIB block, one for each direction.
These DMA adapters are shared among the USB 3.0 device, USB 2.0 device, and USB 2.0 host controllers. Endpoint to DMA
socket mapping is fixed and unidirectional; that is, ingress endpoints 0 to 16 are mapped to UIB ingress DMA sockets 0 to 16,
and egress endpoints 0 to 16 are mapped to UIB DMA sockets 0 to 16. Hence the terms “socket” and “endpoint” are
interchangeable within the USB block. The DMA adapter is identical to those within other FX3 peripherals. Refer to the FX3
DMA Subsystem chapter on page 58 to learn how the FX3 DMA works.
6.4.8USB I/O System
6.4.8.1USB 2.0 OTG PHY
Within USB 2.0 subsystems, FX3 has a USB 2.0 transceiver with a UTMI+ interface to the back-end, multiplexed between the
USB 2.0 function and USB 2.0 embedded host controllers. It contains the required transceiver and OTG functionality,
including:
■ Standard four-wire signaling (VBUS, D+, D-, GND)
■ USB 2.0 High-/Full-/Low-Speed data transmission rate
■ USB 2.0 test modes fully supported
■ VBUS sensing for connection detection
■ Sampling of the USB_ID input for detection of A-device or B-device connection
■ Charging and discharging of DP line for starting a session as B-device
For USB 3.0, FX3 has a separate, dedicated USB 3.0 transceiver with a PIPE 3-compliant interface directly connected to the
back-end USB 3.0 function controller. It includes these features:
■ Dedicated, dual-simplex differential pairs for data transmit (SSTX+/-) and receive (SSRX+/-)
■ Sideband functionality (such as reset, wake) with Low Frequency Periodic Signaling (LFPS)
■ USB hot plug with receiver termination for connect/disconnect detection
■ 5-Gbps SuperSpeed data transmission rate over 3-meter USB 3.0 cable
FX3 features a common top-level register interface shared among all UIB functional blocks, as shown in the following code.
Some functional blocks may also have their own specific register interface, which is described in their respective sections.
In addition, the EPM uses a clock, uib_epm_clk_i, that is 125 MHz when the USB 3.0 function controller is active. The
uib_epm_clk_i configuration source is from GCTL_UIB_CORE_CLK.EPMCLK_SRC and enabled by
GCTL_UIB_CORE_CLK.CLK_EN.
6.6.1.2Interrupt Requests
The UIB block has three global interrupt sources to the VIC, listed in Table 6-2, which are shared among USB 3.0, USB 2.0,
and OTG controllers.
usbep0_int10USB 3.0 device or USB 2.0 deviceEP0 interrupt that is used only in device mode
USB 3.0 function, USB 2.0 function, USB 2.0 host, USB
2.0 OTG, charger detect, EPM
UIB core interrupt
UIB has a global interrupt register, UIB_INTR, which contains interrupt sources from the respective functional blocks (USB
3.0 function, USB 2.0 function, USB 2.0 host, USB 2.0 OTG, charger detect, EPM). The UIB core interrupt to VIC is the logical
OR of interrupt sources in UIB_INTR.
The USB 3.0 link layer interrupts are located in UIB_LNK_INTR. UIB_INTR.LNK_INT is the logical OR of the interrupt sources
in UIB_LNK_INTR.
USB 3.0 protocol layer interrupts are located in UIB_PROT_INTR. UIB_INTR.PROT_INT is the logical OR of the interrupt
sources in UIB_PROT_INTR.
USB 3.0 function endpoint interrupts are located in UIB_PROT_EP_INTR. UIB_INTR.PROT_EP_INT is the logical OR of the
interrupt sources in UIB_PROT_EP_INTR.
6.6.1.3USB 3.0 Functional Description
The SuperSpeed bus is a layered communications architecture that comprises the following elements:
SuperSpeed interconnect: The SuperSpeed interconnect is the manner in which devices are connected to and communicate
with the host over the SuperSpeed bus. It includes the topology of devices connected to the bus, the communication layers,
the relationships between them, and how they interact to accomplish information exchanges between the host and devices.
Devices: Devices implement the required function end of SuperSpeed communication layers to provide a specific function of
the application, for example, a mass storage device. The terms "USB device" and "USB function" are interchangeable.
Host: The host implements the required host end of SuperSpeed communication layers to use the functions of the attached
devices. It owns the SuperSpeed data activity schedule and management of the SuperSpeed bus and all devices connected
to it.
As shown in Figure 6-2, the rows (device or host, protocol, link, physical) represent the communication layers of the
SuperSpeed interconnect, namely:
■ Physical (PHY) layer
■ Link layer
■ Protocol layer
■ The FX3 USB 3.0 function controller design follows the same basic SuperSpeed architecture.
The physical layer defines the PHY portion of the port and the physical connection between a downstream facing port (on a
host or hub) and an upstream facing port (on a device). The FX3 USB 3.0 function physical connection comprises two
differential data pairs, one transmit path and one receive path. The nominal signaling data rate is 5 Gbps. The electrical
aspects of each path are characterized as a transmitter, channel, and receiver; these collectively represent a unidirectional
differential link. Each differential link is AC-coupled with capacitors located on the transmitter side of the differential link. The
channel includes the electrical characteristics of the cables and connectors.
At an electrical level, each differential link is initialized by enabling its receiver termination. The transmitter is responsible for
detecting the far-end receiver termination as an indication of a bus connection and informing the link layer so the connect
status can be factored into link operation and management. When receiver termination is present but no signaling is occurring
on the differential link, it is considered to be in the electrical idle state. In this state, LFPS is used to signal initialization and
power management information. The LFPS is relatively simple to generate and detect and uses very little power.
FX3 USB 3.0 PHY has its own clock domain with Spread Spectrum Clock (SSC) modulation. The USB 3.0 cable does not
include a reference clock, so the clock domains on each end of the physical connection are not directly connected. Bit-level
timing synchronization relies on the local receiver aligning its bit recovery clock to the remote transmitter's clock by phaselocking to the signal transitions in the received bit stream. The receiver needs enough transitions to reliably recover clock and
data from the bit stream. To assure that adequate transitions occur in the bit stream independent of the data content being
transmitted, the transmitter encodes data and control characters into symbols using an 8b/10b code. Control symbols are
used to achieve byte alignment and are used for framing data and managing the link. Special characteristics make control
symbols uniquely identifiable from data symbols.
The signal (timing, jitter tolerance, and so on) and electrical (DC characteristics, channel capacitance, and so on.)
performance of SuperSpeed links is defined with compliance requirements specified in terms of transmit and receive signaling
eye diagrams. The FX3 USB 3.0 function physical layer receives 8-bit data from the link layer and scrambles the data to
reduce EMI emissions. It then encodes the scrambled 8-bit data into 10-bit symbols for transmission over the physical
connection. The resultant data is sent at a rate that includes spread spectrum to further lower the EMI emissions. The bit
stream is recovered from the differential link by the receiver, assembled into 10-bit symbols, and decoded and descrambled,
producing 8-bit data that is then sent to the link layer for further processing.
FX3 PHY carries out the physical layer control settings and operations via the PHY register interface, which is combined with
the link layer register interface for easy firmware implementation.
6.6.3Link Layer
A SuperSpeed link is a logical and physical connection of two ports. The connected ports are called "link partners." A port has
a physical part and a logical part. The FX3 USB 3.0 function link layer defines the logical portion of a port and the
communications between link partners.
The logical portion of a port contains:
■ State machines for managing its end of the physical connection. These include physical layer initialization and event
management, that is, connect, removal, and power management.
■ State machines and buffering for managing information exchanges with the link partner. It implements protocols for flow
control, reliable delivery (port to port) of packet headers, and link power management.
■ Buffering for data and protocol layer information elements.
The logical portion of a port does the following:
■ Provides the correct framing of sequences of bytes into packets during transmission; for example, insertion of packet
delimiters
■ Detects received packets, including packet delimiters and error checks of received header packets (for reliable delivery)
■ Provides an appropriate interface to the protocol layer for pass-through of protocol-layer packet information exchanges
The FX3 USB 3.0 function physical layer provides the logical port with an interface through which it can do the following:
■ Manage the state of its PHY (that is, its end of the physical connection), including power management and events
(connection, removal, and wake).
■ Transmit and receive byte streams, with additional signals that qualify the byte stream as control sequences or data. The
physical layer includes discrete transmit and receive physical links; therefore, a port can simultaneously transmit and
receive control and data information.
The FX3 USB 3.0 function controller has a unified register interface for physical and link layers as in the following code. The
register level programming is managed by the USB driver block in the FX3 SDK.
/* FX3 USB 3.0 Function Physical Layer and Link Layer Register Interface */
The FX3 USB 3.0 function protocol layer defines the "end-to-end" communication rules between a host and device. The
SuperSpeed protocol provides for application data information exchanges between a host and a device endpoint. This
communications relationship is called a "pipe." It is a host-directed protocol, which means the host determines when
application data is transferred between the host and the device. SuperSpeed is not a polled protocol, as a device can
asynchronously request service from the host on behalf of a particular endpoint.
All protocol layer communications are accomplished via the exchange of packets. Packets are sequences of data bytes with
specific control sequences that serve as delimiters managed by the link layer. Unlike USB 2.0 which uses a broadcast
mechanism, host-transmitted protocol packets are routed through intervening hubs directly to a peripheral device. A
peripheral device considers itself that targeted by any protocol layer packet it receives. Device transmitted protocol packets
simply flow upstream through hubs to the host.
Packet headers are the building blocks of the protocol layer. They are fixed-size packets with type and subtype field
encodings for specific purposes. A small record within a packet header is utilized by the link layer (port to port) to manage the
flow of the packet from port to port. Packet headers are delivered through the link layer (port to port) reliably. The remaining
fields are utilized by the end-to-end protocol.
Application data is transmitted within data packet payloads, which are preceded (in the protocol) by specifically encoded data
packet headers. Data packet payloads are not delivered reliably through the link layer (however, the accompanying data
packet headers are delivered reliably). The protocol layer supports the reliable delivery of data packets via explicit
acknowledgement (header) packets and the retransmission of lost or corrupt data. Not all data information exchanges utilize
data acknowledgements.
Data may be transmitted in bursts of back-to-back sequences of data packets (depending on the scheduling by the host). The
protocol allows efficient bus utilization by concurrently transmitting and receiving over the link. For example, a transmitter
(host or device) can burst multiple packets of data back to back, while the receiver can transmit data acknowledgements
without interrupting the burst of data packets. The number of data packets in a specific burst is scheduled by the host.
Furthermore, a host may simultaneously schedule multiple OUT bursts to be active at the same time as an IN burst.
The protocol provides flow control support for some transfer types. A device-initiated flow control is signaled by a device via a
defined protocol packet. A host-initiated flow control event is realized via the host schedule (host will simply not schedule
information flows for a pipe unless it has data or buffering available). On receipt of a flow control event, the host removes the
pipe from its schedule. Resumption of scheduling information flows for a pipe may be initiated by the host or device. A device
endpoint notifies a host of its readiness (to source or sink data) via an asynchronously transmitted "ready" packet. On receipt
of the "ready" notification, the host adds the pipe to its schedule, assuming that it still has data or buffering available.
Independent information streams can be explicitly delineated and multiplexed on the bulk transfer type. This means that
through a single pipe instance, more than one data stream can be tagged by the source and identified by the sink. The
protocol provides for the device to direct which data stream is active on the pipe.
Devices may asynchronously transmit notifications to the host. These notifications are used to convey a change in the device
state. A host transmits a special packet header to the bus that includes the host's timestamp. The value in this packet is used
to keep devices in synchronization with the host. In contrast to other packet types, the timestamp packet is forwarded down all
paths not in a low-power state. The timestamp packet transmission is scheduled by the host at a specification-determined
period.
The FX3 USB 3.0 function controller has a dedicated register interface for the protocol layer as shown in the following code.
The register level programming and the interrupt handling is performed by the USB driver in the FX3 SDK.
/* FX3 USB 3.0 Function Protocol Layer Register Interface */
In addition, the EPM uses a clock uib_epm_clk_i that is 100 MHz when the USB 2.0 function controller is active. The
uib_epm_clk_i configuration source is from GCTL_UIB_CORE_CLK.EPMCLK_SRC and enabled by
GCTL_UIB_CORE_CLK.CLK_EN.
6.7.2Interrupt Requests
The UIB block has three global interrupt sources to the VIC, listed in Table 6-2, which are shared among USB 3.0, USB 2.0
and OTG controllers.
UIB has a global interrupt register, UIB_INTR, which contains interrupt sources from the respective functional blocks (USB
3.0 function, USB 2.0 function, USB 2.0 host, USB 2.0 OTG, charger detect, EPM). The UIB core interrupt to VIC is the logical
OR of interrupt sources in UIB_INTR.
USB 2.0 function controller interrupts are located in UIB_DEV_CTL_INTR. UIB_INTR.DEV_CTL_INT is the logical OR of the
interrupt sources in UIB_DEV_CTL_INTR.
USB 2.0 function endpoint interrupts are located in UIB_DEV_EP_INTR. UIB_INTR.DEV_EP_INT is the logical OR of the
interrupt sources in UIB_DEV_EP_INTR.
6.7.3USB 2.0 Functional Description
The USB 2.0 function controller hardware includes an a Serial Interface Engine (SIE) and Token Processor (TP). It does the
following:
■ Handles the handshake between the endpoint and the host device
■ Generates an interrupt when valid data packets are received
■ Generates an interrupt when an error in transmission occurs
■ Moves valid data to/from the endpoint
■ Handles all the bit stuffing required
6.7.3.1Serial Interface Engine
The SIE is responsible for handling the USB traffic at the byte-level and for detecting the suspend, reset, and resume USB
bus states. It parses the traffic, decoding the types of packets that have been received and translating the packet types into
bytes for transmission back to the host on the USB bus.
6.7.3.2Token Processor
The TP handles most of the protocol described in chapter 8 of the USB specification. It receives the USB basic protocol
commands from the host and generates the appropriate sequence of responses by synchronizing the frame timer, receiving/
transmitting data, or receiving/transmitting handshake responses to the commands themselves. It does all of this based on
the information provided by the SIE, which is responsible for decoding the bytes into those commands.
6.7.4USB 2.0 Function Registers
The FX3 USB 2.0 function registers can be accessed directly from the UIB top-level register interface. These registers are
shown below and are programmed by the USB driver in the FX3 SDK.
/* FX3 USB 2.0 Function Register Interface */
/* These definitions extracted from the Top Level UIB register interface */
USB 2.0 reset is detected by the SIE and is reported to the TP. The URESET bit in the DEV_CTL_INTR register is set, and
the corresponding interrupt is generated (if not masked).
6.7.6USB Suspend
USB 2.0 suspend is detected by the SIE and is reported to the TP. The SUSP bit in the DEV_CTL_INTR register is set, and
the corresponding interrupt is generated (if not masked).
6.7.7USB Resume
USB 2.0 resume is detected by the SIE and is reported to the TP. The SUSP bit in the DEV_CTL_INTR register is cleared,
and the corresponding interrupt is generated (if not masked). A resume by the device is generated by the firmware, setting
the SIGRSUME bit in the PWR_CS register. The firmware must also clear this bit to end the resume signaling.
6.7.8Start of Frame
Start of frame (SOF) timer packets are received by the SIE and reported to the TP. The SOF bit in the UIB_DEV_CTL_INTR
register is set, and the corresponding interrupt is generated (if not masked). The frame number is stored in the FRAMECNT
register. The SIE is capable to generating synthetic SOF notifications to replace any SOF packets that get lost. The feature
can be controlled using the NOSYNSOF bit in the DEV_PWR_CS register.
6.7.9SETUP Packet
SETUP packets (to endpoint 0) are received by the SIE and reported to the TP. The SETUP data is stored in registers
DEV_SETUPDAT0 and DEV_SETUPDAT0 1. The SUDAV bit in the DEV_CTL_INTR register is set, as is the SUTOK bit if the
packet is received correctly, and their corresponding interrupts are generated if not masked. Upon receipt of a SetAddress
command from the host, the DEV_CS register field DEVICEADDR is updated with the new device address.
6.7.10IN Packet
IN packets are received by the SIE and reported to the TP, which generates the appropriate responses to the SIE and
updates the corresponding DEV_EPO_CS register for the endpoint to which the packet was sent.
6.7.11OUT Packet
OUT packets are received by the SIE and reported to the TP, which generates the appropriate responses to the SIE and
updates the corresponding DEV_EPO_CS register for the endpoint to which the packet was sent.
6.8USB 3.0 and USB 2.0 Function Coordination
When the FX3 is functioning as a USB device, the USB 3.0 PHY or the USB 2.0 PHY needs to be turned on based on the
capabilities of the USB host to which FX3 is connected. The USB 3.0 specification requires that only one of the PHY layers be
operational at most times during the USB device operation. The exception to this rule is a small time window in which the
device is attempting to move from USB 2.0 mode to USB 3.0 mode. The firmware application on FX3 is responsible for
identifying the host capabilities and setting the USB connection accordingly. The following procedure should be used by the
FX3 firmware for USB connection negotiation:
1. Wait for a valid VBus voltage (GCTL_IOPWR interrupt).
2. Turn on the USB 3.0 PHY to start 3.0 receiver detection.
a. If receiver detection succeeds, the LNK_LTSSM_CONNECT interrupt will be received. If this interrupt is received, the
device will proceed with enumeration in USB 3.0 mode.
3. If receiver detection fails, the LNK_LTSSM_DISCONNECT interrupt will be received. If this interrupt is received:
a. Turn off USB 3.0 PHY and turn on USB 2.0 PHY.
b. A USB 2.0 bus reset will be received as part of USB 2.0 connection startup.
c. The 3.0 PHY should be re-enabled on receiving the URESET interrupt that is triggered on a 2.0 bus reset. Both the 2.0
and 3.0 PHYs will be active at this time.
d. If the 3.0 receiver detection succeeds (LNK_LTSSM_CONNECT):
i.Turn off the USB 2.0 PHY.
ii.Proceed with enumeration as a USB 3.0 device.
e. If the 3.0 receiver detection fails (LNK_LTSSM_DISCONNECT):
i.Turn off the USB 3.0 PHY.
ii.Check number of times that 3.0 receiver detection has failed. If this count is greater than 3:
4. Proceed with enumeration as a USB 2.0 device.
5. There is no need to attempt 3.0 enumeration on any further bus resets.
Any PHYs that are enabled need to be disabled when the VBus voltage is removed. The entire previous procedure needs to
be repeated when valid VBus is detected again.
Note that USB 3.0 PHY on the FX3 needs to be turned off when VBus is removed or a host disconnect is discovered by other
means. If the 3.0 PHY is left turned on, the 3.0 link startup is liable to fail when connected again to the host.
Before USB 3.0 is operational, the function controller needs to be initialized. The following code example from the FX3 SDK
implements the UIB initialization sequence for the USB 3.0 function.
static void
CyU3PUibInit (
void)
{
uint8_t ep = 0;
/* Enable the Power regulators*/
GCTLAON->wakeup_en = 0;
GCTLAON->wakeup_polarity = 0;
Once the USB 3.0 function controller is initialized, it needs to be enabled. The following code snippet from the FX3 SDK
implements the sequence to enable the USB 3.0 function controller.
/* Make sure that all relevant USB 3.0 interrupts are enabled. */
USB3LNK->lnk_intr = 0xFFFFFFFF;
USB3LNK->lnk_intr_mask = CY_U3P_UIB_LGO_U3 |
CY_U3P_UIB_LTSSM_CONNECT | CY_U3P_UIB_LTSSM_DISCONNECT | CY_U3P_UIB_LTSSM_RESET |
CY_U3P_UIB_LTSSM_STATE_CHG;
USB3PROT->prot_intr = 0xFFFFFFFF;
USB3PROT->prot_intr_mask = (CY_U3P_UIB_STATUS_STAGE | CY_U3P_UIB_SUTOK_EN |
CY_U3P_UIB_EP0_STALLED_EN |
CY_U3P_UIB_TIMEOUT_PORT_CAP_EN | CY_U3P_UIB_TIMEOUT_PORT_CFG_EN |
CY_U3P_UIB_LMP_RCV_EN |
CY_U3P_UIB_LMP_PORT_CAP_EN | CY_U3P_UIB_LMP_PORT_CFG_EN);
/* Set port config and capability timers to their initial values. */
USB3PROT->prot_lmp_port_capability_timer = CY_U3P_UIB_PROT_LMP_PORT_CAP_TIMER_VALUE;
USB3PROT->prot_lmp_port_configuration_timer = CY_U3P_UIB_PROT_LMP_PORT_CFG_TIMER_VALUE;
/* Turn on AUTO response to LGO_U3 command from host. */
USB3LNK->lnk_compliance_pattern_8 |= CY_U3P_UIB_LFPS;
USB3LNK->lnk_phy_conf = 0xE0000001;
CyU3PSetUsbCoreClock (1, 0);
CyU3PBusyWait (10);
/* Force LTSSM into SS.Disabled state for 100us after the PHY is turned on. */
USB3LNK->lnk_ltssm_state = (CY_U3P_UIB_LNK_STATE_SSDISABLED << CY_U3P_UIB_LTSSM_OVERRIDE_VALUE_POS) |
CY_U3P_UIB_LTSSM_OVERRIDE_EN;
UIB->otg_ctrl |= CY_U3P_UIB_SSDEV_ENABLE;
CyU3PBusyWait (100);
USB3LNK->lnk_ltssm_state &= ~CY_U3P_UIB_LTSSM_OVERRIDE_EN;
When the USB 3.0 termination detection or link training fails, the FX3 UIB can fall back to USB 2.0 mode. The following
function implements this fallback mechanism.
static void
CyU3PUsbFallBackToUsb2 (
void)
{
CyFx3UsbWritePhyReg (0x1005, 0x0000);
/* Force the link state machine into SS.Disabled. */
USB3LNK->lnk_ltssm_state = (CY_U3P_UIB_LNK_STATE_SSDISABLED <<
CY_U3P_UIB_LTSSM_OVERRIDE_VALUE_POS) |
CY_U3P_UIB_LTSSM_OVERRIDE_EN;
/* Keep track of the number of times the 3.0 link training has failed. */
glUibDeviceInfo.tDisabledCount++;
/* Change EPM config to full speed */
CyU3PBusyWait (2);
CyU3PSetUsbCoreClock (2, 2);
CyU3PBusyWait (2);
/* Switch the EPM to USB 2.0 mode, turn off USB 3.0 PHY and remove Rx Termination. */
UIB->otg_ctrl &= ~CY_U3P_UIB_SSDEV_ENABLE;
CyU3PBusyWait (2);
UIB->otg_ctrl &= ~CY_U3P_UIB_SSEPM_ENABLE;
/* For USB 2.0 connections, enable pull-up on D+ pin. */
CyU3PConnectUsbPins ();
}
6.9.4USB Reset
The following code example implements the USB 3.0 reset handler to handle the USB reset event. The USB 3.0 reset event
is detected by the LTSSM_RESET bit of the LNK_INTR link layer interrupt register.
/* Enable USB 3.0 control eps. */
USB3PROT->prot_epi_cs1[0] |= CY_U3P_UIB_SSEPI_VALID;
UIB->eepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
USB3PROT->prot_epo_cs1[0] |= CY_U3P_UIB_SSEPO_VALID;
UIB->iepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
UIB->eepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
UIB->iepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
for (ep = 1; ep < 16; ep++)
{
/* Reset, flush and clear stall condition on all valid endpoints. */
if (glPcktSizeIn[ep].valid == CyTrue)
{
CyU3PUsbFlushEp (ep | 0x80);
CyU3PUsbStall (ep | 0x80, CyFalse, CyTrue);
}
if (glPcktSizeOut[ep].valid == CyTrue)
{
CyU3PUsbFlushEp (ep);
CyU3PUsbStall (ep, CyFalse, CyTrue);
}
}
}
6.9.5USB Connect
The following code example implements the USB connect handler to handle the USB connect event. This event is detected
by the LTSSM_CONNECT bit of the LNK_INTR link layer interrupt register.
/* hardware specific setting for USB 3.0 Phy */
USB3LNK->lnk_phy_tx_trim = glUsb3TxTrimVal;
CyFx3UsbWritePhyReg (0x1006, 0x180);
CyFx3UsbWritePhyReg (0x1024, 0x0080);
/* If USB 2.0 PHY is enabled, switch it off and take out the USB 2.0 pullup. */
if (UIB->otg_ctrl & CY_U3P_UIB_DEV_ENABLE)
{
state = USB3LNK->lnk_ltssm_state & CY_U3P_UIB_LTSSM_STATE_MASK;
while ((UIB->otg_ctrl & CY_U3P_UIB_SSDEV_ENABLE)
&& (state == CY_U3P_UIB_LNK_STATE_POLLING_LFPS))
{
CyU3PThreadRelinquish ();
state = USB3LNK->lnk_ltssm_state & CY_U3P_UIB_LTSSM_STATE_MASK;
}
if (state == CY_U3P_UIB_LNK_STATE_COMP)
{
if (!glUibDeviceInfo.ssCmdSeen)
{
CyU3PUsbAddToEventLog (CYU3P_USB_LOG_USBSS_DISCONNECT);
CyU3PUsbSSDisConnecthandler ();
}
return;
/* Update the PHY to not send spurious LFPS. */
CyFx3UsbWritePhyReg (0x0030, 0x00C0);
CyFx3UsbWritePhyReg (0x1010, 0x0080);
CyU3PUsbFlushEp (0x00);
CyU3PUsbFlushEp (0x80);
/* Enable USB 3.0 control eps. */
USB3PROT->prot_epi_cs1[0] |= CY_U3P_UIB_SSEPI_VALID;
UIB->eepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
USB3PROT->prot_epo_cs1[0] |= CY_U3P_UIB_SSEPO_VALID;
UIB->iepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
UIB->eepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
UIB->iepm_endpoint[0] = 0x200; /* Control EP transfer size is 512
bytes. */
/* Propagate the event to the application. */
if (glUsbEvtCb != NULL)
{
glUsbEvtCb (CY_U3P_USB_EVENT_CONNECT, 0x01);
}
/* Configure the EPs for super-speed operation. */
The following code example implements the USB disconnect handler to handle the USB disconnect event. This event is
detected by the LTSSM_DISCONNECT bit of the LNK_INTR link layer interrupt register.
static void
CyU3PUsbSSDisConnecthandler (
void)
{
/* If we still have VBUS, try to connect in USB 2.0 mode. */
if (CyU3PUsbCanConnect ())
{
if (UIB->otg_ctrl & CY_U3P_UIB_DEV_ENABLE)
{
/* If the 2.0 PHY is already on, simply turn off the USB 3.0 PHY. */
UIB->otg_ctrl &= ~(CY_U3P_UIB_SSDEV_ENABLE | CY_U3P_UIB_SSEPM_ENABLE);
CyU3PBusyWait (2);
The following code example implements the USB control request handler to handle setup commands from the host. Note that
this function is implemented for both USB 3.0 and USB 2.0 functions.
For USB 3.0, the control request event is detected by the SUTOK_EV bit of the PROT_INTR protocol layer interrupt register.
When the control event occurs, setup data from the host is stored in the protocol layer registers PROT_SETUPDAT0 and
PROT_SETUPDAT1.
For USB 2.0, the control request event is detected by the SUDAV bit of the USB 2.0 device register DEV_CTL_INTR. When
the control even occurs, setup data from host is stored in the USB 2.0 device registers DEV_SETUPDAT0 and
DEV_SETUPDAT1.
Once the function obtains the setup data from the host, it parses the command class and type and executes the command
accordingly.
/* Parses the USB setup command received. */
static void
CyU3PUsbSetupCommand (
void)
{
uint32_t setupdat0;
uint32_t setupdat1;
uint32_t status = 0;
CyBool_t isHandled = CyFalse;
/* For super speed handling. */
if (glUibDeviceInfo.usbSpeed == CY_U3P_SUPER_SPEED)
{
setupdat0 = USB3PROT->prot_setupdat0;
setupdat1 = USB3PROT->prot_setupdat1;
/* Clear the status stage interrupt. We later check for this. */
USB3PROT->prot_intr = CY_U3P_UIB_STATUS_STAGE;
/* If the LTSSM is currently in U1/U2, set an event to trigger a wakeup. */
status = USB3LNK->lnk_ltssm_state & CY_U3P_UIB_LTSSM_STATE_MASK;
if ((status == CY_U3P_UIB_LNK_STATE_U1) || (status == CY_U3P_UIB_LNK_STATE_U2))
CyU3PEventSet (&glUibEvent, CY_U3P_UIB_EVT_TRY_UX_EXIT, CYU3P_EVENT_OR);
status = 0;
}
else
{
/* If USB-SS is enabled, set a flag indicating that the 3.0 PHY should
* be turned on at the next bus reset. */
status = CyU3PDmaChannelWaitForCompletion (&glUibChHandleOut, 100);
if ((status != CY_U3P_SUCCESS) && (status != CY_U3P_ERROR_NOT_STARTED))
{
/* The endpoint needs to be NAKed before the channel is reset. */
CyU3PUsbSetEpNak (0x00, CyTrue);
CyU3PBusyWait (100);
CyU3PDmaChannelReset (&glUibChHandleOut);
CyU3PUsbSetEpNak (0x00, CyFalse);
}
status = CyU3PDmaChannelWaitForCompletion (&glUibChHandle, 100);
if ((status != CY_U3P_SUCCESS) && (status != CY_U3P_ERROR_NOT_STARTED))
{
/* The endpoint needs to be NAKed before the channel is reset. */
CyU3PUsbSetEpNak (0x80, CyTrue);
CyU3PBusyWait (100);
CyU3PDmaChannelReset (&glUibChHandle);
CyU3PUsbFlushEp (0x80);
CyU3PUsbSetEpNak (0x80, CyFalse);
}
status = 0;