Altera Nios II C2H Compiler User Manual

101 Innovation Drive San Jose, CA 95134 www.altera.com
Nios II C2H Compiler
User Guide
Nios II C2H Compiler Version: 9.1 Document Date: November 2009
Copyright © 2009 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device des­ignations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Al­tera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the ap­plication or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published in­formation and before placing orders for products or services.
UG-N2C2HCMPLR-1.6
ii 9.1 Altera Corporation Nios II C2H Compiler User Guide

Contents

Chapter 1. Introduction to the C2H Compiler
User Guide Overview ........................................................................................................................... 1–1
Target Audience ..................................................................................................................................... 1–2
Introduction ............................................................................................................................................ 1–2
Features .............................................................................................................................................. 1–2
Design Abstraction and the Rise of C for FPGAs ........................................................................ 1–3
What to Expect From the C2H Compiler ...................................................................................... 1–5
C2H Support in Nios II Tool Flows ............................................................................................... 1–6
C2H Compiler Concepts ....................................................................................................................... 1–6
Simplicity and Ease of Use .............................................................................................................. 1–6
Rapid Design Iteration to Find Optimal Partitioning of Hardware and Software ................. 1–7
Accelerate Performance-Critical Sections of Code ...................................................................... 1–7
The C2H Compiler Operates at the Function Level .................................................................... 1–8
System Architecture ......................................................................................................................... 1–9
Generation of a Hardware Accelerator ....................................................................................... 1–10
One-to-One Mapping From C Syntax to Hardware Structure ................................................ 1–11
Performance Depends on Memory Access Time ....................................................................... 1–12
C Code Appropriate for Hardware Acceleration ........................................................................... 1–13
Ideal Acceleration Candidates ...................................................................................................... 1–13
Poor Acceleration Candidates ...................................................................................................... 1–14
Understanding Code to Find Opportunities for Acceleration ................................................. 1–15
Next Steps ............................................................................................................................................. 1–16
Chapter 2. Getting Started Tutorial
Introduction ............................................................................................................................................ 2–1
C2H Compiler Design Flow ................................................................................................................. 2–1
Starting Point for the C2H Compiler Design Flow ..................................................................... 2–1
Typical Design Flow .............................................................................................................................. 2–2
Software Requirements ................................................................................................................... 2–2
OpenCore Plus Evaluation .............................................................................................................. 2–3
Tutorial .................................................................................................................................................... 2–4
Tutorial Design ................................................................................................................................. 2–4
Set up the Hardware for the Project .............................................................................................. 2–5
Create the Software Project ............................................................................................................. 2–6
Run the Project as Software Only .................................................................................................. 2–7
Create and Configure a Hardware Accelerator ........................................................................... 2–8
Rebuild the Project ......................................................................................................................... 2–10
Observe Results in the Report File ............................................................................................... 2–11
Observe the Accelerator in SOPC Builder .................................................................................. 2–14
Run the Project with the Accelerator ........................................................................................... 2–14
Remove the Accelerator ................................................................................................................ 2–15
Altera Corporation 9.1 iii
Contents
Next Steps ............................................................................................................................................. 2–16
Chapter 3. C-to-Hardware Mapping Reference
One-to-One C-to-Hardware Mapping ................................................................................................ 3–1
Arithmetic and Logical Operators ................................................................................................. 3–1
Assignments ...................................................................................................................................... 3–2
Iteration Statements ......................................................................................................................... 3–5
Selection Statements ......................................................................................................................... 3–6
Subfunction Calls ........................................................................................................................... 3–11
Macros and Preprocessing Directives ......................................................................................... 3–13
Variable Declarations .......................................................................................................................... 3–13
Local vs. Non-Local Variables ...................................................................................................... 3–13
Scalar Variables ............................................................................................................................... 3–14
Memory Accesses ................................................................................................................................ 3–15
Indirection Operator (Pointer Dereference) ............................................................................... 3–16
Avalon-MM Master Port Signal Generation .............................................................................. 3–20
Array Subscript Operator .............................................................................................................. 3–26
Structure and Union Operators .................................................................................................... 3–28
Scheduling ............................................................................................................................................ 3–30
Scheduling Concepts for Hardware Accelerators ..................................................................... 3–30
Pointer Aliasing .............................................................................................................................. 3–32
Read Operations with Latency ..................................................................................................... 3–37
Stalling ............................................................................................................................................. 3–39
Loop Pipelining .............................................................................................................................. 3–42
Subfunction Pipelining .................................................................................................................. 3–49
Resource Sharing ................................................................................................................................. 3–51
Chapter 4. Understanding the C2H View
Introduction ............................................................................................................................................ 4–1
Overview ................................................................................................................................................. 4–1
Generation/Compilation Configurations ..................................................................................... 4–1
Resources ................................................................................................................................................ 4–3
Avalon-MM Master Port Resources .............................................................................................. 4–6
Mathematical Operator Resources ................................................................................................. 4–8
Performance .......................................................................................................................................... 4–10
Source Line Number ...................................................................................................................... 4–10
Loop Latency ................................................................................................................................... 4–11
Cycles Per Loop Iteration (CPLI) ................................................................................................. 4–11
Scheduling Information ................................................................................................................. 4–14
Further Reading ................................................................................................................................... 4–19
Chapter 5. Accelerating Code Using the Nios II Software Build Tools Command Line
Creating an Accelerator from the Command Line ........................................................................... 5–1
C2H Performance Metrics .................................................................................................................... 5–2
Chapter 6. Pragma Reference
Introduction ............................................................................................................................................ 6–1
iv 9.1 Altera Corporation Nios II C2H Compiler User Guide
Contents
Connection Pragma ............................................................................................................................... 6–1
Reducing Arbitration Logic ............................................................................................................ 6–2
Optimizing Sequential Memory Access with Arbitration Shares ............................................. 6–2
Flow Control Pragma ............................................................................................................................ 6–3
Interrupt Pragma ................................................................................................................................... 6–4
Unshare Pointer Pragma ....................................................................................................................... 6–5
Chapter 7. ANSI C Compliance and Restrictions
Introduction ............................................................................................................................................ 7–1
Language ................................................................................................................................................. 7–1
Declarations ....................................................................................................................................... 7–1
Expressions ........................................................................................................................................ 7–3
Functions ........................................................................................................................................... 7–4
Miscellaneous Unsupported Features ........................................................................................... 7–8
Other Restrictions .................................................................................................................................. 7–9
Additional Information
Referenced Documents ............................................................................................................................. 1
Revision History ........................................................................................................................................ 2
How to Contact Altera .............................................................................................................................. 3
Typographic Conventions ........................................................................................................................ 3
Altera Corporation 9.1 v
Nios II C2H Compiler User Guide
Contents
vi 9.1 Altera Corporation Nios II C2H Compiler User Guide

1. Introduction to the C2H Compiler

The Nios® II C-to-Hardware Acceleration (C2H) Compiler is a tool that allows you to create custom hardware accelerators directly from ANSI C source code. A hardware accelerator is a block of logic that implements a C function in hardware, which often improves the execution performance by an order of magnitude. Using the C2H Compiler, you can develop and debug an algorithm in C targeting an Altera quickly convert the C code to a hardware accelerator implemented in a field programmable gate array (FPGA).
The C2H Compiler improves the performance of Nios II programs by implementing specific C functions as hardware accelerators. The C2H Compiler is not designed to create arbitrary hardware systems from C code. Rather, the C2H Compiler is a tool for generating a hardware accelerator module, functionally identical to the original C function, that offloads and enhances the performance of the Nios II processor.
®
Nios II processor, and then

User Guide Overview

Altera Corporation 9.1 1–1 November 2009
This user guide comprises the following chapters:
Chapter 1, Introduction to the C2H Compiler provides a detailed
background on the C2H Compiler and the concepts required to use it.
Chapter 2, Getting Started Tutorial provides hands-on instructions
that teach you the first steps to begin using the C2H Compiler.
Chapter 3, C-to-Hardware Mapping Reference provides reference on
how the C2H Compiler translates C constructs to hardware structures.
Chapter 4, Understanding the C2H View helps you use the C2H
view to get performance information and to control the compilation of accelerators.
Chapter 5, Accelerating Code Using the Nios II Software Build Tools
Command Line explains how to use the C2H Compiler with the
Nios® II software build tools.
Chapter 6, Pragma Reference summarizes usage of all C2H #pragma
directives.
Chapter 7, ANSI C Compliance and Restrictions documents all
sections of the ANSI C specification that the C2H Compiler does not support.

Target Audience

Target Audience

Introduction

This user guide assumes you have at least a basic understanding of hardware design for field programmable gate arrays (FPGAs). It also assumes you are fluent in the C language and you have experience with software design in C for microprocessors.
The C2H Compiler operates in conjunction with the following Altera tools:
Quartus II software for creating FPGA designs
SOPC Builder system integration tool for creating Nios II processor
hardware systems
C programming environments for the Nios II processor:
Nios II integrated development environment (IDE)
Nios II software build tools
To benefit from this user guide, you do not need to be an expert in these tools, and you do not need an understanding of any particular Altera FPGA family. However, at least a basic understanding of each tool is required to use the C2H Compiler practically.
This chapter introduces the Nios II C2H Compiler. The sections in this chapter discuss the features, background, and principles of the C2H Compiler, and describe the most appropriate types of C code for acceleration. After reading this chapter, you will understand all the concepts necessary to begin using the C2H Compiler.

Features

The C2H Compiler is founded on the following premises:
ANSI C syntax is sufficient to describe computationally intensive or
memory access-intensive tasks.
A C-to-hardware tool must not disrupt existing software and
hardware development flows.
Based on these premises, the C2H Compiler's design methodology provides the following features:
ANSI C compliance – The C2H Compiler operates on plain ANSI C
code, and supports most C constructs, including pointers, arrays, structures, global and local variables, loops, and subfunction calls. The C2H Compiler does not require special syntax or library functions to specify the structure of the hardware. Unsupported ANSI C constructs are documented.
1–2 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler
Straightforward C-to-hardware mapping – The C2H Compiler maps
each element of C syntax to a defined hardware structure, giving you control over the structure of your hardware accelerator.
Integration with C language development environments for the
Nios II processor, including the Nios II integrated development environment (IDE), and the Nios II software build tools. You control the C2H Compiler with the Nios II C development tools. You do not need to learn a new environment to use the C2H Compiler.
Based on SOPC Builder and Avalon system interconnect fabric – The
C2H Compiler uses SOPC Builder as the infrastructure to connect hardware accelerators into Nios II systems. A C2H accelerator becomes a component within an existing Nios II system. SOPC Builder automatically generates system interconnect fabric to connect the accelerator to the system, saving you the time of manually integrating the hardware accelerator.
Reporting of generated results – The C2H Compiler produces a
detailed report of hardware structure, resource usage, and throughput.
Hardware accelerators generated by the C2H Compiler have the following characteristics:
Parallel scheduling – The C2H Compiler recognizes events that can
occur in parallel. Independent statements are performed simultaneously in hardware.
Direct memory access – Accelerators access the same memories that
the Nios II processor does during execution.
Loop pipelining – The C2H Compiler pipelines the logic
implemented for loops, based on memory access latency and the amount of code that operates in parallel.
Memory access pipelining – The C2H Compiler pipelines memory
accesses to reduce the effects of memory latency.

Design Abstraction and the Rise of C for FPGAs

There is much interest in “C-to-gates” tools that promise a practical method to create hardware logic directly from C code. However, early attempts have had limited success gaining acceptance in the design community. This section discusses the historical background of the C2H Compiler, and looks at the questions “why is this methodology a good idea?” and “why now?”
C compilers and FPGA design tools have evolved along separate paths, but both are founded on the same premise: Higher levels of design abstraction enable engineers to create designs of greater size and complexity. Simultaneous with this evolution, Moore's law has delivered chips of increasing density and complexity, such as FPGAs capable of
Altera Corporation 9.1 1–3 November 2009 Nios II C2H Compiler User Guide
Introduction
implementing entire systems on a chip. As a result, the tools available to FPGA and software designers have undergone continual transformation of design-entry methods and behind-the-scenes optimization techniques. This transformation has enabled designers to create ever-bigger designs to fill ever-growing chip capacity.
Recent years have seen the broad acceptance of FPGA-based microprocessor cores, such as the Nios II processor, and system integration tools, such as SOPC Builder. These tools made it possible, for the first time, to implement C code easily in an FPGA-based system. Optimizing and evolving these tools is a natural next step for C-based design on FPGAs. This background sets the stage for practical advances in C-to-hardware technologies based on an established design methodology.
FPGA-based processors and system integration tools offer new ways to improve the performance of embedded systems. Traditional methods to increase performance of processor systems include:
Increasing clock speed
Upgrading to a processor with higher Dhrystone MIPS-per-
megahertz performance
Coding critical sections of software in assembly language
FPGA-based processor systems enable additional optimization techniques capable of achieving much higher performance gains. These techniques include:
The ability to rapidly alter the FPGA design, allowing you to
prototype a variety of architectures
The ability to divide and conquer processing tasks by instantiating
multiple processor cores
The ability to augment a processor with custom hardware that off-
loads processor-intensive operations into the FPGA fabric
The ability to adjust memory architecture for memory-intensive
operations, such as using high-speed, point-to-point connections to fast memory buffers
The application of these techniques relies on real-world tools to implement them. Consequently, the acceptance of these techniques has grown as system integration tools, such as Altera's SOPC Builder, have matured and gained acceptance. It is a fortunate coincidence that these techniques also directly benefit C-to-gates methodologies. Flexibility of hardware architecture and ease of implementation are at the heart of the appeal of C-to-gates tools.
1–4 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler
The Nios II C-to-Hardware Acceleration (C2H) Compiler represents Altera's next step in the evolution of embedded systems design. The C2H Compiler uses the infrastructure provided by SOPC Builder and the Nios II processor, and adds a higher level of abstraction: converting C functions directly to hardware.

What to Expect From the C2H Compiler

The C2H Compiler is not designed to build all types of FPGA systems. It is designed specifically to augment the performance of programs that run on the Nios II processor; it does not replace the processor. Two notable implications are:
The C2H Compiler assumes that your C code runs successfully on a
Nios II processor system.
The result of using the C2H Compiler is a program that runs on a
Nios II processor system.
The C2H Compiler works best on C code that adheres to certain structural rules. It works well for many types of programs, but not all. Through education and habit, programmers structure C programs with an existing compiler in mind. Experienced designers learn the particular structures that produce optimal compiled results. The C2H Compiler is also a C compiler. It takes ANSI C programs that execute normally on a processor. However, the program structure for producing optimal hardware results with the C2H Compiler often differs from code structured for execution on a processor. You achieve the best results if you have a reasonable understanding of how the C2H Compiler translates C structures to hardware. Refer to chapter Chapter 3, C-to-Hardware
Mapping Reference for details.
The C2H Compiler is not a replacement for traditional HDL-based hardware design. Tasks such as connecting modules together and interfacing to bus protocols are not easily inferred from ANSI C code. In the hands of an experienced user, the C2H Compiler allows considerable control over circuit latency and parallelism. However, it does not provide the ability to define user logic with complex timing requirements. For example, the C2H Compiler does not allow you to create an arbitrary state machine that guarantees a particular operation on a specific clock cycle.
1 The Nios II processor is little-endian. For Nios II compatibility,
C2H accelerators expect to exchange little-endian data with the processor. If your accelerator must handle big-endian data, you can swap the byte order in the accelerated C code. Ensure that the data is in little-endian form when your accelerated function transfers it to any unaccelerated function.
Altera Corporation 9.1 1–5 November 2009 Nios II C2H Compiler User Guide

C2H Compiler Concepts

f For information about using the Nios II IDE, refer to the Using the Nios II
f For information about using C2H on the command line, refer to

C2H Support in Nios II Tool Flows

The Nios II IDE is the preferred tool flow for developing Nios II C2H programs. The Nios II IDE allows you to carry out the following important tasks:
Debug your function prior to accelerating
Generate the accelerator and incorporate it into your hardware
Test and profile your software and hardware with the C2H
accelerator
Altera recommends creating new C2H systems with the Nios II IDE.
Integrated Development Environment appendix to the Nios II Software Developer's Handbook, or to the Nios II IDE help system. For information
about Nios II tool flows, refer to “Development Flows for Creating Nios II Programs” in the Overview chapter of the Nios II Software Developer's
Handbook.
The Nios II Software Build Tools also provide command-line support for pre-existing command-line C2H projects.
Chapter 5, Accelerating Code Using the Nios II Software Build Tools Command Line.
1 The Nios II Software Build Tools for Eclipse do not support the
C2H Compiler.
C2H Compiler Concepts
This section describes fundamental concepts underpinning the C2H Compiler. These concepts help you better understand how the C2H Compiler works and how you can produce optimal results.

Simplicity and Ease of Use

The C2H Compiler minimizes interruptions to existing design flows. The flow to generate a hardware accelerator and link software for it uses the familiar Nios II and SOPC Builder design tools. When you create a Nios II software project, you specify which C function (or functions) compiles as a hardware accelerator rather than instructions on a processor . The
1–6 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler
C2H Compiler calls other tools in the background to handle the hardware and software integration tasks. Specifically, the C2H Compiler automatically performs the following tasks in the background:
1. Calls SOPC Builder to specify how the accelerator connects to the system, and then generates the system hardware.
®
2. Calls the Quartus and generate an SRAM object file (.sof).
II software to recompile the hardware design

Rapid Design Iteration to Find Optimal Partitioning of Hardware and Software

The C2H Compiler allows you to move the dividing line between hardware and software easily in C code, without significant additional design effort. As a result, you have the freedom to design iteratively, and explore multiple architectures. By contrast, writing a hardware accelerator by hand in a hardware description language (HDL) would require a significant amount of time to create the logic design and integrate it into the system. Changing the functional or performance requirements of hand-written HDL blocks can significantly impact design time.
With the C2H Compiler, you can accelerate as many functions as necessary to achieve the desired performance. You can balance the trade­off between performance and resource utilization with simple edits to the C source.
With these tools available to you, the process of achieving desired system performance undergoes a profound change: The balance of design time shifts away from creating, interfacing, and debugging hardware in favor of perfecting the algorithm implementation and finding the optimal system architecture.

Accelerate Performance-Critical Sections of Code

The C2H Compiler converts only sections of code that you specify. A typical program contains a mix of performance-critical code and other code. Performance-critical sections are often iterative and simple, but consume the majority of a program's execution time on a processor. They might occupy the processor by either computing a value, moving data, or both. The best use of hardware resources is to accelerate only the performance-critical functions of a program, rather than converting an entire program to hardware.
Altera Corporation 9.1 1–7 November 2009 Nios II C2H Compiler User Guide
C2H Compiler Concepts

The C2H Compiler Operates at the Function Level

Code you want to accelerate must be expressed as an individual C function. The C2H Compiler converts all code within and below the chosen function to a hardware accelerator block. If the function you are accelerating calls a subfunction, the C2H Compiler also converts the subfunction to a hardware accelerator. Therefore, you must be careful that subfunctions are also good candidates for C2H acceleration.
If the code you want to accelerate is not isolated in a separate function, a good practice is to partition the function to separate the critical section into its own function. The resulting hardware accelerator then replaces only processor-intensive tasks, rather than setup or control tasks which the processor can implement efficiently.
1–8 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009

System Architecture

Nios II
Processor
M
Hardware
Accelerator
Data
Memory
S
Arbitrator
Peripherals
S
S
Instruction
M
Data
MM
Control
Arbitrator
S
Instruction
Memory
Avalon Switch
Fabric
Write Data & Control Path Read Data
M
S
Avalon Master Port
Avalon Slave Port
MUX
Data
Memory
S
Figure 1–1 shows the architecture of a simple Nios II processor system
that includes one hardware accelerator.
Figure 1–1. Example System Topology with Single Hardware Accelerator
Introduction to the C2H Compiler
SOPC Builder automatically integrates the accelerator logic into the system as an SOPC Builder component. If there is more than one accelerator in the system, multiple accelerators appear in SOPC Builder.
Altera Corporation 9.1 1–9 November 2009 Nios II C2H Compiler User Guide
Accelerators are separate from the Nios II processor but can access the same memory devices that the Nios II processor can.
C2H Compiler Concepts
The accelerator's connections are managed by the C2H Compiler. You can manually customize the connections using pragma directives in the accelerated C code. Chapter 6, Pragma Reference, describes C2H Compiler pragma usage. You cannot edit the accelerator's connections in the SOPC Builder GUI.

Generation of a Hardware Accelerator

The C2H compilation flow shares commonalities with a conventional C compiler, but the scheduling of statements, optimization, and object generation is different. When generating a hardware accelerator, the C2H Compiler does the following:
1. Runs the GNU GCC preprocessor to evaluate macros, includes, and other preprocessing directives.
2. Parses code.
3. Creates a graph of data dependencies.
4. Performs some optimizations.
5. Determines the best sequence in which to perform each operation.
6. Generates an object file for the hardware accelerator. This object file is a synthesizable HDL file.
7. Generates a C wrapper function that isolates and hides the details of how the Nios II processor interacts with the hardware accelerator. The wrapper function is a C file that replaces the original C function at software link time.
The generated accelerator logic includes the following:
One or more state machines that manage the sequence of operations
defined by the C function. On any clock cycle, an arbitrary number of computations and memory accesses can happen simultaneously, orchestrated by the state machines.
One or more Avalon Memory-Mapped (Avalon-MM) master ports,
which fetch and store data as required by the state machines.
An Avalon-MM slave port and a set of memory-mapped registers
that allow the processor to set up, start, and stop the accelerator.
1–10 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler
The software wrapper, executing on the Nios II processor, controls the accelerator by reading and writing the register interface. From the perspective of the calling function, the result of calling the software wrapper is functionally the same as calling the original C function. The basic operation of the software wrapper is as follows:
1. Sets up parameters for the accelerator, similar to passing variables to the original, unaccelerated function.
2. Optionally flushes the processor's data cache to avoid cache coherency problems. Flushing the data cache might be necessary if the accelerator accesses the same memory that the processor does.
3. Starts the accelerator. Once an accelerator is running, it can return a value, terminate, or run continuously, depending on the design of the C source code.
4. Polls registers in the accelerator hardware to determine when the task completes.
5. If the function returns a result, reads the result value, and returns it to the calling function.

One-to-One Mapping From C Syntax to Hardware Structure

The C2H Compiler maps each element of C syntax to an equivalent hardware structure using straightforward translation rules that directly instantiate hardware resources based on the input C code. Once familiar with the C2H Compiler mappings, you can control the generated hardware structure with simple changes to your C source.
The following are examples of how the C2H Compiler translates C to hardware:
Mathematical operators (such as +, -, *, >>) become direct hardware
equivalent circuits (such as add, subtract, multiply and shift circuits). These circuits might be shared between operations, depending on the degree of parallelism inherent in the C code.
Loops (such as for, while, do-while) become state machines that
iterate over the operations inside the loop, until the loop condition is exhausted.
Pointer dereferences and array accesses (such as *p, array[i][j])
become Avalon-MM master ports that access the same memory that the processor does.
Statements not dependent on the result of a previous operation are
scheduled as early as possible, allowing parallel execution to the extent possible.
Altera Corporation 9.1 1–11 November 2009 Nios II C2H Compiler User Guide
C2H Compiler Concepts
Subfunctions called within an accelerated function are also
converted to hardware using the same C-to-hardware mapping rules. The C2H Compiler creates only one hardware instance of the subfunction, regardless of how many times the subfunction is called within the top-level function. Isolating accelerated C code into a subfunction provides a method of creating a shared hardware resource within an accelerator.
The C2H Compiler performs certain optimizations when it can reduce logic utilization based on resource sharing.
Refer to Chapter 3, C-to-Hardware Mapping Reference for complete details of the C2H Compiler mappings.

Performance Depends on Memory Access Time

Applications that run on a processor are typically compute-bound, which means the performance bottleneck depends on the rate the processor executes instructions. Memory access time affects the execution time, but instruction and data caches minimize the time the processor waits for memory accesses.
With C2H hardware accelerators, the performance bottleneck undergoes a profound change: Applications typically become memory bound, which means the performance bottleneck depends on the memory latency and bandwidth. When multiple operations do not have data dependencies that require them to execute sequentially, the C2H Compiler schedules them in parallel. The resulting accelerator logic often must access memory to feed data to each parallel operation. If the hardware does not have fast access to memory, the hardware stalls waiting for data, reducing the performance and efficiency.
Achieving maximum performance from a hardware accelerator often involves examining your system's memory topology and data flow, and making modifications to reduce or eliminate memory bottlenecks. For example, if your C code randomly accesses a large buffer of data stored in slow SDRAM, performance suffers due to constant bank switching in SDRAM. You can alleviate this bottleneck by first copying blocks of data to an on-chip RAM, and allowing the accelerator to access this fast, low­latency RAM. Note that you can also accelerate the copy operation, which creates a direct memory access (DMA) hardware accelerator.
1–12 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler

C Code Appropriate for Hardware Acceleration

This section describes guidelines for identifying code that is appropriate for the C2H Compiler.

Ideal Acceleration Candidates

Sections of C code that consume the most CPU time with the least amount of code are excellent candidates for acceleration. These tend to have the following characteristics:
They contain a relatively small and simple loop or set of nested
loops.
They iterate over a set of data, performing one or more operations on
the data per iteration, and then store the result.
Examples of such iterative tasks include memory copy-and-modify tasks, checksum calculations, data encryption, decryption, and filtering operations. In each of these cases, the C code iterates over a set of data many times, with either one or more memory reads or writes performed during each iteration.
Example 1–1 demonstrates a routine that performs a checksum
calculation. This code excerpt is from a TCP/IP stack, and it calculates the checksum over ranges of data in a network protocol stack. Checksum calculations are typically a time-consuming pa rt of an IP sta ck , b ecause all data transmitted and received must be validated, which requires the processor to loop through all bytes.
Altera Corporation 9.1 1–13 November 2009 Nios II C2H Compiler User Guide
C Code Appropriate for Hardware Acceleration
Example 1–1. Checksum Calculation
u16_t standard_chksum(void *dataptr, int len) { u32_t acc; /* Checksum loop: iterate over all data in buffer */ for(acc = 0; len > 1; len -= 2) { acc += *(u16_t *)dataptr; dataptr = (void *)((u16_t *)dataptr + 1); } /* Handle odd buffer lengths */ if (len == 1) { acc += htons((u16_t)((*(u8_t *)dataptr)&0xff)<< 8); } /* Modify result for IP stack needs */ acc = (acc >> 16) + (acc & 0xffffUL); if ((acc & 0xffff0000) != 0) { acc = (acc >> 16) + (acc & 0xffffUL); } return (u16_t)acc; }
Accelerating this function could have a significant impact on execution time, especially the amount of time spent in the for loop. The remaining code executes once per call to format the result and check boundary cases. Accelerating the code outside the loop has little benefit, unless the entire standard_chksum() function is a called from another function that is also a good acceleration candidate. The most efficient hardware accelerator for this code would replace only the for loop. To accelerate the for loop only, you need to refactor the code to isolate the loop in a separate function.

Poor Acceleration Candidates

Accelerating some code can have negative performance impacts, or can unacceptably increase resource utilization, or both. Use the following guidelines to identify functions not to accelerate:
Code that contains many data or control dependencies must perform
many sequential operations, and is a poor candidate for acceleration. A large number of dependencies makes it difficult for the C2H Compiler to fully optimize loops. Processors are designed to perform such operations efficiently.
1–14 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Introduction to the C2H Compiler
If the code contains C syntax not supported by the C2H Compiler, it
cannot be accelerated. Examples are floating point operations and recursive functions. Refer to Chapter 7, ANSI C Compliance and
Restrictions.
Code that calls system and runtime library functions is a poor
candidate for acceleration. For example, there is little point in accelerating printf() or malloc(). The underlying code contains a complex set of sequential operations and does not contain performance-critical loops.
Code that makes extensive use of global or external variables is a
poor candidate for acceleration. Each time the C2H accelerator uses a global or external variable, it must access the Nios II processor’s data memory, which is likely to cause a bottleneck.
There are exceptions to these guidelines. For example:
Experienced C coders often "unroll" iterative algorithms,
representing them as a sequential set of operations to work better with an optimizing C compiler. If you can refactor the code and "roll up" the loop, you might be able to create an efficient hardware accelerator.
A critical inner loop might have a complex set of sequential
operations which, if accelerated in hardware, consumes a lot of logic resources. This presents a trade-off: If the processor spends an unacceptable amount of time in this loop, it might be worth the hardware cost to accelerate the whole loop.
Some runtime library functions are iterative in nature. Examples
include common data movement functions and buffer set functions, such as memcpy() or memset(). If your code calls one of these functions, you might consider writing a simple, custom implementation of the function, which you can then accelerate.
If your cod e uses g lobal or exter na l va riabl es, it mi ght be ea sy for y ou
to refactor it to be suitable for acceleration. Refactor your code to copy the global or external variables to local storage, perform the calculation with the local variables, and then copy results back to global or external storage. The C2H Compiler implements local variables as fast, pipelined registers inside of the accelerator.

Understanding Code to Find Opportunities for Acceleration

The best way to obtain optimal results with the C2H Compiler is to understand your code, and know where the critical loops are. If you wrote the program from scratch, you probably understand where the critical sections of code are. If you are starting with an existing code base that you want to accelerate, the C2H Compiler can benefit you to the
Altera Corporation 9.1 1–15 November 2009 Nios II C2H Compiler User Guide

Next Steps

extent that you analyze the code and understand it. In either case, the Nios II IDE profiling features can help you determine where the processor spends most of its time.
Examine the structure of the code for processor-specific or compiler­specific optimizations written into the structure of the code. These sections of code might result in poor performance with the C2H Compiler, and could benefit from refactoring for the C2H Compiler.
It can be difficult to identify the critical loop just by inspecting code, because programs often spend the majority of time iterating on just a few lines of code. The only way to know exactly where the processor spends the most time is to profile the application, and inspect the bottleneck functions.
f Refer to AN 391: Profiling Nios II Systems for further information.
Next Steps
Now that you understand the underlying concepts of the Nios II C-to-Hardware Acceleration Compiler, you are ready for hands-on experience accelerating designs. Chapter 2, Getting Started
Tu to ri a ldescribes the C2H Compiler design flow, and gives step-by-step
instructions to accelerate your first design. Altera also provides tutorials and application notes to deepen your understanding of the C2H Compiler.
f Refer to the Nios II literature page for further C2H Compiler
documentation: www.altera.com/literature/lit-nio2.jsp.
1–16 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009

2. Getting Started Tutorial

Introduction

f For introductory information on designing with the Nios II processor,

C2H Compiler Design Flow

This chapter describes the design flow for the Nios® II C-to-Hardware Acceleration (C2H) Compiler. This chapter provides a design example and gives you a step-by-step tutorial to guide you through the process of creating your first hardware accelerator.
The example software design performs multiple iterations of a data-copy function. By accelerating the data-copy function, you achieve more than a 10-fold improvement in the execution performance. The resulting hardware accelerator resembles a hardware block with direct memory access (DMA) to copy data without processor intervention.
This tutorial assumes that you are familiar with the Nios II processor and the Nios II design flow.
refer to the Nios II Hardware Development Tutorial available on the Altera Nios II literature page at http://www.altera.com/literature/lit-nio2.jsp, and to the Nios II Software Development Tutorial available in the Nios II integrated development environment (IDE) help system.
This section discusses the design flow to create a hardware accelerator with the C2H Compiler.

Starting Point for the C2H Compiler Design Flow

The design flow for the C2H Compiler starts with one or more C files that compile successfully targeting the Nios II processor. Before you accelerate a function with the C2H Compiler, you must:
Identify the functions that require acceleration.
Debug the functions first targeting the Nios II processor. After
accelerating a function, you can no longer debug individual C statements within the function.
You might have existing C code that you need to accelerate to improve performance. Alternatively, you might develop and debug a function in C with the explicit purpose of converting it to hardware. In either case, you achieve the best results if the C code is structured for the C2H Compiler. To start with, you can accelerate your code as-is, and determine if the results meet the design requirements.
Altera Corporation 9.1 2–1 November 2009

Typical Design Flow

Typical Design Flow
A typical design flow using the C2H Compiler to accelerate a function involves the following steps:
1. Develop and debug your application or algorithm in C targeting a Nios II processor system.
2. Profile the code to identify the areas that would benefit from hardware acceleration.
3. Isolate the code you want to accelerate into an individual C function.
4. Specify the function you want to accelerate in the Nios II IDE.
5. Rebuild the project in the Nios II IDE.
6. Profile the results in hardware, or observe estimates from the C2H report in the Nios II IDE.
7. If the results do not meet the design requirements, modify the C source code and system architecture (for example, the memory topology).
8. Return to Step 5, and iterate.
The typical C2H Compiler design flow is an iterative process of accelerating a function, comparing the performance to design requirements, and modifying C code to improve results. If you start with C code that is not optimized for the C2H Compiler, the first iteration of acceleration might not dramatically improve performance. Further iterations, modifying the C code for optimal hardware structure, often improve the final results significantly over the first pass results.
f This tutorial does not describe techniques for optimizing hardware
accelerator performance. For further information on optimizing C2H Compiler results, refer to the Accelerating Nios II Systems with the
C2H Compiler Tutorial.

Software Requirements

The C2H Compiler in evaluation mode is installed as part of the Altera® Quartus® II Complete Design Suite. You can download the Quartus II Complete Design Suite free from the Altera website. Visit
www.altera.com and click Download.
2–2 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Getting Started Tutorial
During the design process with the C2H Compiler, you use the following tools:
Nios II Integrated Development Environment (IDE) – You control
acceleration options for individual functions in the Nios II IDE. The results of accelerating functions are reported in the Nios II IDE. The output is an executable linking file (.elf) targeting a Nios II CPU. The C2H Compiler also invokes SOPC Builder and optionally the Quartus II software in the background to regenerate the Nios II system and update the SRAM object file (.sof).
SOPC Builder – SOPC Builder manages the generation of C2H logic
and Avalon-MM system interconnect fabric to connect hardware accelerators to the processor. During the software build process, the Nios II IDE can invoke SOPC Builder in the background to update the hardware accelerators when necessary and integrate them into the Nios II hardware design. The output is a set of hardware description language (HDL) files (.v or .vhd) and an SOPC Builder system file (.sopcinfo) defining your system: Nios II processor cores, peripherals, accelerators, on-chip memory, and interfaces to off-chip memory.
Quartus II software – The Quartus II software compiles and
synthesizes HDL produced by the C2H Compiler and SOPC Builder tools, along with any other custom logic in your Quartus II project. During the software build process, the Nios II IDE can invoke the Quartus II software in the background to recompile the Quartus II project. The output is a .sof file that includes the updated Nios II system with accelerators.

OpenCore Plus Evaluation

Hardware accelerator blocks generated by the C2H Compiler support OpenCore®Plus evaluation. OpenCore Plus evaluation allows you to use the C2H Compiler and evaluate the performance of hardware accelerators in real systems before purchasing a license for the tool. With Altera's free OpenCore Plus evaluation feature, you can:
Verify the functionality of your design, as well as evaluate its size
and speed easily
Generate time-limited device programming files for designs that
include megafunctions
Program an FPGA and verify your design in hardware
Simulate the behavior of an accelerator in your system
Altera Corporation 9.1 2–3 November 2009 Nios II C2H Compiler User Guide

Tutorial

OpenCore Plus hardware evaluation supports the tethered mode of operation for C2H. In tethered mode the accelerator runs indefinitely, as long as the target board remains connected to the host computer by an Altera download cable
You need to purchase a license for the Nios II C-to-Hardware Acceleration Compiler only when you are completely satisfied with the functionality and performance of your accelerated Nios II system, and want to take your design to production.
f For more information on OpenCore Plus hardware evaluation, see
AN 320: OpenCore Plus Evaluation of Megafunctions.
Tutorial
This section guides you through the steps to accelerate a function using the C2H Compiler. You create a new software project in the Nios II IDE using the provided example design files, accelerate a function, and observe the performance improvement.
This tutorial guides you through the steps to implement the example design. These steps start with a C source file and end with a running application that includes an accelerated function. The steps you perform are described in the following sections:
1. “Set up the Hardware for the Project” on page 2–5
2. “Create the Software Project” on page 2–6
3. “Run the Project as Software Only” on page 2–7
4. “Create and Configure a Hardware Accelerator” on page 2–8
5. “Rebuild the Project” on page 2–10
6. “Observe Results in the Report File” on page 2–11
7. “Observe the Accelerator in SOPC Builder” on page 2–14

Tutorial Design

The hardware design for this tutorial is based on the standard hardware example design provided with the Nios II EDS. The software design is a C file named dma_c2h_tutorial.c, which is available for download from the Altera website. You can run the tutorial design on any Nios development board available from Altera.
2–4 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Getting Started Tutorial
f You can download dma_c2h_tutorial.c from the Nios II literature page.
The file is located next to this document (Nios II C2H Compiler User Guide) on the Altera Nios II literature page at http://www.altera.com/
literature/lit-nio2.jsp.
The file dma_c2h_tutorial.c includes two functions:
do_dma() – This is the function you accelerate. It performs a block
memory copy. do_dma() takes a source address pointer, a destination address pointer, and an integer number of bytes to copy. When implemented in hardware, do_dma() resembles DMA copy logic. The prototype for do_dma() is as follows:
int do_dma( int * __restrict__ dest_ptr, int * __restrict__ source_ptr, int length )
The __restrict__ qualifier informs the compiler that the pointers dest_ptr and source_ptr point to mutually exclusive address ranges. For further information about the __restrict__ qualifier, see “Pointer Aliasing” on page 3–32 of Chapter 3, C-to-
Hardware Mapping Reference.
main() – main() calls do_dma() and measures the amount of time
taken, so that you can compare the software implementation with the hardware accelerator.
main() performs the following actions:
1. Allocates two 1 MB buffers in main memory
2. Fills the source buffer with incrementing values
3. Fills destination buffer with all 0x0.
4. Calls the do_dma() function 100 times
5. Checks the copied data to ensure there were no errors
6. Frees the two allocated buffers
To measure the time it takes for the copy operations to complete, there are timer routines around the loop that calls the do_dma() function. After the application runs, the number of milliseconds that were spent performing the copy operations is displayed to the Console view in the Nios II IDE.

Set up the Hardware for the Project

To set up the hardware for the tutorial, perform the following steps:
1. Connect your Nios development board to power, and connect the board to your host computer with an Altera download cable.
Altera Corporation 9.1 2–5 November 2009 Nios II C2H Compiler User Guide
Tutorial
2. Set up the hardware project directory
a. Using a file management tool on the host computer, locate the
standard hardware example design for your Nios development board. For example, on a Windows PC, use Windows Explorer to find the Verilog HDL design files for the Nios development
®
board, Cyclone
II Edition at <Nios II EDS install path>/
examples/verilog/ niosII_cycloneII_2c35/standard.
b. Copy the standard directory and name the copied directory
c2h_tutorial_hw. This new directory serves as the hardware design for the tutorial.
3. Start the Quartus II software.
4. Open the Quartus II project standard.qpf located in the c2h_tutorial_hw directory.
The Quartus II software might give a warning "Do you want to overwrite the database ... created by Quartus II Version <version>... The database format is compatible..." if the project was created with an earlier version of the software. If so, click Yes to update the database.
5. Configure the FPGA on the Nios development board.
a. On the Tools menu click Programmer. The Programmer
appears, with the SRAM object file standard.sof automatically ready to download to the FPGA.
b. Turn on the Program/Configure check box for standard.sof.
c. Click Start. The programmer downloads the configuration data
to the FPGA.
1 If Start is not enabled, click Hardware Setup to configure
your JTAG download cable.

Create the Software Project

To set up the software project for the tutorial, perform the following steps.
1. Start the Nios II IDE.
2. If the Workspace Launcher dialog box appears, click OK to accept the default workspace.
2–6 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Getting Started Tutorial
3. If the Welcome to the Altera Nios II IDE page displays, close it to view the workbench.
4. Create a new C/C++ Application project.
a. On the File menu, point to New and click C/C++ Application.
The New Project wizard appears.
b. In the Name box, type c2h_tutorial_sw.
c. In the Select Project Template list, select Blank Project.
d. Use the Select Target Hardware settings to browse to and select
the SOPC Builder system (.ptf) file in your c2h_tutorial_hw directory. After you specify the SOPC Builder system, the IDE automatically sets the CPU setting to cpu, which is the name of the only Nios II processor core available in this SOPC Builder system.
e. Click Finish. The IDE generates a new project c2h_tutorial_sw
and a new system library project c2h_tutorial_sw_syslib.
5. Download the software file dma_c2h_tutorial.c from the Nios II literature page and save it to a known location on your host computer. The file is located next to this document (Nios II C2H Compiler User Guide) on the Altera Nios II literature page at http://
www.altera.com/literature/lit-nio2.jsp.
6. Import the C file dma_c2h_tutorial.c into the c2h_tutorial_sw project. The easiest way to do this is to use an external file management tool, such as Windows Explorer, and drag the file onto the c2h_tutorial_sw project folder in the C/C++ Projects view of the Nios II IDE.

Run the Project as Software Only

In this section, you build and run the project as a software-only implementation, and observe the time required to run the program. To run the program, perform the following steps:
1. In the C/C++ Projects view, right-click the c2h_tutorial_sw project, point to Run As and click Nios II Hardware. The Nios II IDE takes a few minutes to build and run the program.
Altera Corporation 9.1 2–7 November 2009 Nios II C2H Compiler User Guide
Tutorial
2. Observe the execution time in the Console view. Example 2–1 shows results of approximately 86000 milliseconds. The results you see might be different, depending on the memory characteristics of the target board and the clock speed of the example design.
Example 2–1. Execution Results as Software-Only Implementation
This simple program copies 1048576 bytes of data from a source buffer to a destination buffer. The program performs 100 iterations of the copy operation, and calculates the time spent.
Copy beginning SUCCESS: Source and destination data match. Copy verified. Total time: 86520 ms

Create and Configure a Hardware Accelerator

In this section, you create an accelerator for the do_dma() function. To create the hardware accelerator, perform the following steps:
1. Open the dma_c2h_tutorial.c source file in the Nios II IDE editor, if it is not already open.
2. In the source file, double-click the name of the do_dma() function to select it.
3. Right-click do_dma and click Accelerate with the Nios II C2H Compiler. The C2H view appears in the bottom pane of Nios II IDE.
1 In this example, for simplicity, the do_dma() function exists in
the same file as the rest of the application code. However, a good practice is to isolate functions for acceleration into a separate C file. The project makefile cannot determine specifically what part of a file has changed. As a result, if an accelerated function coexists in the same file with other unaccelerated code, the C2H Compiler is forced to rebuild the accelerator, even if you edit unrelated code.
4. Set the build options for the new accelerator, as shown in Figure 2–1.
a. Click the + icon to expand c2h_tutorial_sw in the C2H view.
2–8 9.1 Altera Corporation Nios II C2H Compiler User Guide November 2009
Loading...
+ 108 hidden pages