Features .............................................................................................................................................. 1–2
Design Abstraction and the Rise of C for FPGAs ........................................................................ 1–3
What to Expect From the C2H Compiler ...................................................................................... 1–5
C2H Support in Nios II Tool Flows ............................................................................................... 1–6
Understanding Code to Find Opportunities for Acceleration ................................................. 1–15
Next Steps ............................................................................................................................................. 1–16
Set up the Hardware for the Project .............................................................................................. 2–5
Create the Software Project ............................................................................................................. 2–6
Run the Project as Software Only .................................................................................................. 2–7
Create and Configure a Hardware Accelerator ........................................................................... 2–8
Rebuild the Project ......................................................................................................................... 2–10
Observe Results in the Report File ............................................................................................... 2–11
Observe the Accelerator in SOPC Builder .................................................................................. 2–14
Run the Project with the Accelerator ........................................................................................... 2–14
Remove the Accelerator ................................................................................................................ 2–15
Altera Corporation 9.1iii
Contents
Next Steps ............................................................................................................................................. 2–16
Cycles Per Loop Iteration (CPLI) ................................................................................................. 4–11
Scheduling Information ................................................................................................................. 4–14
Further Reading ................................................................................................................................... 4–19
Chapter 5. Accelerating Code Using the Nios II Software Build Tools Command Line
Creating an Accelerator from the Command Line ........................................................................... 5–1
Language ................................................................................................................................................. 7–1
Miscellaneous Unsupported Features ........................................................................................... 7–8
Other Restrictions .................................................................................................................................. 7–9
Revision History ........................................................................................................................................ 2
How to Contact Altera .............................................................................................................................. 3
vi 9.1Altera Corporation
Nios II C2H Compiler User Guide
1. Introduction to the
C2H Compiler
The Nios® II C-to-Hardware Acceleration (C2H) Compiler is a tool that
allows you to create custom hardware accelerators directly from ANSI C
source code. A hardware accelerator is a block of logic that implements a
C function in hardware, which often improves the execution performance
by an order of magnitude. Using the C2H Compiler, you can develop and
debug an algorithm in C targeting an Altera
quickly convert the C code to a hardware accelerator implemented in a
field programmable gate array (FPGA).
The C2H Compiler improves the performance of Nios II programs by
implementing specific C functions as hardware accelerators. The
C2H Compiler is not designed to create arbitrary hardware systems from
C code. Rather, the C2H Compiler is a tool for generating a hardware
accelerator module, functionally identical to the original C function, that
offloads and enhances the performance of the Nios II processor.
®
Nios II processor, and then
User Guide
Overview
Altera Corporation 9.11–1
November 2009
This user guide comprises the following chapters:
■Chapter 1, Introduction to the C2H Compiler provides a detailed
background on the C2H Compiler and the concepts required to use
it.
■Chapter 2, Getting Started Tutorial provides hands-on instructions
that teach you the first steps to begin using the C2H Compiler.
■Chapter 3, C-to-Hardware Mapping Reference provides reference on
how the C2H Compiler translates C constructs to hardware
structures.
■Chapter 4, Understanding the C2H View helps you use the C2H
view to get performance information and to control the compilation
of accelerators.
■Chapter 5, Accelerating Code Using the Nios II Software Build Tools
Command Line explains how to use the C2H Compiler with the
Nios® II software build tools.
■Chapter 6, Pragma Reference summarizes usage of all C2H #pragma
directives.
■Chapter 7, ANSI C Compliance and Restrictions documents all
sections of the ANSI C specification that the C2H Compiler does not
support.
Target Audience
Target Audience
Introduction
This user guide assumes you have at least a basic understanding of
hardware design for field programmable gate arrays (FPGAs). It also
assumes you are fluent in the C language and you have experience with
software design in C for microprocessors.
The C2H Compiler operates in conjunction with the following Altera
tools:
■Quartus II software for creating FPGA designs
■SOPC Builder system integration tool for creating Nios II processor
hardware systems
■C programming environments for the Nios II processor:
●Nios II integrated development environment (IDE)
●Nios II software build tools
To benefit from this user guide, you do not need to be an expert in these
tools, and you do not need an understanding of any particular Altera
FPGA family. However, at least a basic understanding of each tool is
required to use the C2H Compiler practically.
This chapter introduces the Nios II C2H Compiler. The sections in this
chapter discuss the features, background, and principles of the
C2H Compiler, and describe the most appropriate types of C code for
acceleration. After reading this chapter, you will understand all the
concepts necessary to begin using the C2H Compiler.
Features
The C2H Compiler is founded on the following premises:
■ANSI C syntax is sufficient to describe computationally intensive or
memory access-intensive tasks.
■A C-to-hardware tool must not disrupt existing software and
hardware development flows.
Based on these premises, the C2H Compiler's design methodology
provides the following features:
■ANSI C compliance – The C2H Compiler operates on plain ANSI C
code, and supports most C constructs, including pointers, arrays,
structures, global and local variables, loops, and subfunction calls.
The C2H Compiler does not require special syntax or library
functions to specify the structure of the hardware. Unsupported
ANSI C constructs are documented.
1–2 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
■Straightforward C-to-hardware mapping – The C2H Compiler maps
each element of C syntax to a defined hardware structure, giving you
control over the structure of your hardware accelerator.
■Integration with C language development environments for the
Nios II processor, including the Nios II integrated development
environment (IDE), and the Nios II software build tools. You control
the C2H Compiler with the Nios II C development tools. You do not
need to learn a new environment to use the C2H Compiler.
■Based on SOPC Builder and Avalon system interconnect fabric – The
C2H Compiler uses SOPC Builder as the infrastructure to connect
hardware accelerators into Nios II systems. A C2H accelerator
becomes a component within an existing Nios II system. SOPC
Builder automatically generates system interconnect fabric to
connect the accelerator to the system, saving you the time of
manually integrating the hardware accelerator.
■Reporting of generated results – The C2H Compiler produces a
detailed report of hardware structure, resource usage, and
throughput.
Hardware accelerators generated by the C2H Compiler have the
following characteristics:
■Parallel scheduling – The C2H Compiler recognizes events that can
occur in parallel. Independent statements are performed
simultaneously in hardware.
■Direct memory access – Accelerators access the same memories that
the Nios II processor does during execution.
■Loop pipelining – The C2H Compiler pipelines the logic
implemented for loops, based on memory access latency and the
amount of code that operates in parallel.
■Memory access pipelining – The C2H Compiler pipelines memory
accesses to reduce the effects of memory latency.
Design Abstraction and the Rise of C for FPGAs
There is much interest in “C-to-gates” tools that promise a practical
method to create hardware logic directly from C code. However, early
attempts have had limited success gaining acceptance in the design
community. This section discusses the historical background of the
C2H Compiler, and looks at the questions “why is this methodology a
good idea?” and “why now?”
C compilers and FPGA design tools have evolved along separate paths,
but both are founded on the same premise: Higher levels of design
abstraction enable engineers to create designs of greater size and
complexity. Simultaneous with this evolution, Moore's law has delivered
chips of increasing density and complexity, such as FPGAs capable of
Altera Corporation 9.11–3
November 2009Nios II C2H Compiler User Guide
Introduction
implementing entire systems on a chip. As a result, the tools available to
FPGA and software designers have undergone continual transformation
of design-entry methods and behind-the-scenes optimization techniques.
This transformation has enabled designers to create ever-bigger designs
to fill ever-growing chip capacity.
Recent years have seen the broad acceptance of FPGA-based
microprocessor cores, such as the Nios II processor, and system
integration tools, such as SOPC Builder. These tools made it possible, for
the first time, to implement C code easily in an FPGA-based system.
Optimizing and evolving these tools is a natural next step for C-based
design on FPGAs. This background sets the stage for practical advances
in C-to-hardware technologies based on an established design
methodology.
FPGA-based processors and system integration tools offer new ways to
improve the performance of embedded systems. Traditional methods to
increase performance of processor systems include:
■Increasing clock speed
■Upgrading to a processor with higher Dhrystone MIPS-per-
megahertz performance
■Coding critical sections of software in assembly language
FPGA-based processor systems enable additional optimization
techniques capable of achieving much higher performance gains. These
techniques include:
■The ability to rapidly alter the FPGA design, allowing you to
prototype a variety of architectures
■The ability to divide and conquer processing tasks by instantiating
multiple processor cores
■The ability to augment a processor with custom hardware that off-
loads processor-intensive operations into the FPGA fabric
■The ability to adjust memory architecture for memory-intensive
operations, such as using high-speed, point-to-point connections to
fast memory buffers
The application of these techniques relies on real-world tools to
implement them. Consequently, the acceptance of these techniques has
grown as system integration tools, such as Altera's SOPC Builder, have
matured and gained acceptance. It is a fortunate coincidence that these
techniques also directly benefit C-to-gates methodologies. Flexibility of
hardware architecture and ease of implementation are at the heart of the
appeal of C-to-gates tools.
1–4 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
The Nios II C-to-Hardware Acceleration (C2H) Compiler represents
Altera's next step in the evolution of embedded systems design. The
C2H Compiler uses the infrastructure provided by SOPC Builder and the
Nios II processor, and adds a higher level of abstraction: converting C
functions directly to hardware.
What to Expect From the C2H Compiler
The C2H Compiler is not designed to build all types of FPGA systems. It
is designed specifically to augment the performance of programs that run
on the Nios II processor; it does not replace the processor. Two notable
implications are:
■The C2H Compiler assumes that your C code runs successfully on a
Nios II processor system.
■The result of using the C2H Compiler is a program that runs on a
Nios II processor system.
The C2H Compiler works best on C code that adheres to certain
structural rules. It works well for many types of programs, but not all.
Through education and habit, programmers structure C programs with
an existing compiler in mind. Experienced designers learn the particular
structures that produce optimal compiled results. The C2H Compiler is
also a C compiler. It takes ANSI C programs that execute normally on a
processor. However, the program structure for producing optimal
hardware results with the C2H Compiler often differs from code
structured for execution on a processor. You achieve the best results if you
have a reasonable understanding of how the C2H Compiler translates C
structures to hardware. Refer to chapter Chapter 3, C-to-Hardware
Mapping Reference for details.
The C2H Compiler is not a replacement for traditional HDL-based
hardware design. Tasks such as connecting modules together and
interfacing to bus protocols are not easily inferred from ANSI C code. In
the hands of an experienced user, the C2H Compiler allows considerable
control over circuit latency and parallelism. However, it does not provide
the ability to define user logic with complex timing requirements. For
example, the C2H Compiler does not allow you to create an arbitrary
state machine that guarantees a particular operation on a specific clock
cycle.
1The Nios II processor is little-endian. For Nios II compatibility,
C2H accelerators expect to exchange little-endian data with the
processor. If your accelerator must handle big-endian data, you
can swap the byte order in the accelerated C code. Ensure that
the data is in little-endian form when your accelerated function
transfers it to any unaccelerated function.
Altera Corporation 9.11–5
November 2009Nios II C2H Compiler User Guide
C2H Compiler Concepts
fFor information about using the Nios II IDE, refer to the Using the Nios II
fFor information about using C2H on the command line, refer to
C2H Support in Nios II Tool Flows
The Nios II IDE is the preferred tool flow for developing Nios II C2H
programs. The Nios II IDE allows you to carry out the following
important tasks:
■Debug your function prior to accelerating
■Generate the accelerator and incorporate it into your hardware
■Test and profile your software and hardware with the C2H
accelerator
Altera recommends creating new C2H systems with the Nios II IDE.
Integrated Development Environment appendix to the Nios II Software
Developer's Handbook, or to the Nios II IDE help system. For information
about Nios II tool flows, refer to “Development Flows for Creating Nios
II Programs” in the Overview chapter of the Nios II Software Developer's
Handbook.
The Nios II Software Build Tools also provide command-line support for
pre-existing command-line C2H projects.
Chapter 5, Accelerating Code Using the Nios II Software Build Tools
Command Line.
1The Nios II Software Build Tools for Eclipse do not support the
C2H Compiler.
C2H Compiler
Concepts
This section describes fundamental concepts underpinning the
C2H Compiler. These concepts help you better understand how the
C2H Compiler works and how you can produce optimal results.
Simplicity and Ease of Use
The C2H Compiler minimizes interruptions to existing design flows. The
flow to generate a hardware accelerator and link software for it uses the
familiar Nios II and SOPC Builder design tools. When you create a Nios II
software project, you specify which C function (or functions) compiles as
a hardware accelerator rather than instructions on a processor . The
1–6 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
C2H Compiler calls other tools in the background to handle the hardware
and software integration tasks. Specifically, the C2H Compiler
automatically performs the following tasks in the background:
1.Calls SOPC Builder to specify how the accelerator connects to the
system, and then generates the system hardware.
®
2.Calls the Quartus
and generate an SRAM object file (.sof).
II software to recompile the hardware design
Rapid Design Iteration to Find Optimal Partitioning of
Hardware and Software
The C2H Compiler allows you to move the dividing line between
hardware and software easily in C code, without significant additional
design effort. As a result, you have the freedom to design iteratively, and
explore multiple architectures. By contrast, writing a hardware
accelerator by hand in a hardware description language (HDL) would
require a significant amount of time to create the logic design and
integrate it into the system. Changing the functional or performance
requirements of hand-written HDL blocks can significantly impact
design time.
With the C2H Compiler, you can accelerate as many functions as
necessary to achieve the desired performance. You can balance the tradeoff between performance and resource utilization with simple edits to the
C source.
With these tools available to you, the process of achieving desired system
performance undergoes a profound change: The balance of design time
shifts away from creating, interfacing, and debugging hardware in favor
of perfecting the algorithm implementation and finding the optimal
system architecture.
Accelerate Performance-Critical Sections of Code
The C2H Compiler converts only sections of code that you specify. A
typical program contains a mix of performance-critical code and other
code. Performance-critical sections are often iterative and simple, but
consume the majority of a program's execution time on a processor. They
might occupy the processor by either computing a value, moving data, or
both. The best use of hardware resources is to accelerate only the
performance-critical functions of a program, rather than converting an
entire program to hardware.
Altera Corporation 9.11–7
November 2009Nios II C2H Compiler User Guide
C2H Compiler Concepts
The C2H Compiler Operates at the Function Level
Code you want to accelerate must be expressed as an individual C
function. The C2H Compiler converts all code within and below the
chosen function to a hardware accelerator block. If the function you are
accelerating calls a subfunction, the C2H Compiler also converts the
subfunction to a hardware accelerator. Therefore, you must be careful that
subfunctions are also good candidates for C2H acceleration.
If the code you want to accelerate is not isolated in a separate function, a
good practice is to partition the function to separate the critical section
into its own function. The resulting hardware accelerator then replaces
only processor-intensive tasks, rather than setup or control tasks which
the processor can implement efficiently.
1–8 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
System Architecture
Nios II
Processor
M
Hardware
Accelerator
Data
Memory
S
Arbitrator
Peripherals
S
S
Instruction
M
Data
MM
Control
Arbitrator
S
Instruction
Memory
Avalon
Switch
Fabric
Write Data & Control Path
Read Data
M
S
Avalon Master Port
Avalon Slave Port
MUX
Data
Memory
S
Figure 1–1 shows the architecture of a simple Nios II processor system
that includes one hardware accelerator.
Figure 1–1. Example System Topology with Single Hardware Accelerator
Introduction to the C2H Compiler
SOPC Builder automatically integrates the accelerator logic into the
system as an SOPC Builder component. If there is more than one
accelerator in the system, multiple accelerators appear in SOPC Builder.
Altera Corporation 9.11–9
November 2009Nios II C2H Compiler User Guide
Accelerators are separate from the Nios II processor but can access the
same memory devices that the Nios II processor can.
C2H Compiler Concepts
The accelerator's connections are managed by the C2H Compiler. You can
manually customize the connections using pragma directives in the
accelerated C code. Chapter 6, Pragma Reference, describes
C2H Compiler pragma usage. You cannot edit the accelerator's
connections in the SOPC Builder GUI.
Generation of a Hardware Accelerator
The C2H compilation flow shares commonalities with a conventional C
compiler, but the scheduling of statements, optimization, and object
generation is different. When generating a hardware accelerator, the
C2H Compiler does the following:
1.Runs the GNU GCC preprocessor to evaluate macros, includes, and
other preprocessing directives.
2.Parses code.
3.Creates a graph of data dependencies.
4.Performs some optimizations.
5.Determines the best sequence in which to perform each operation.
6.Generates an object file for the hardware accelerator. This object file
is a synthesizable HDL file.
7.Generates a C wrapper function that isolates and hides the details of
how the Nios II processor interacts with the hardware accelerator.
The wrapper function is a C file that replaces the original C function
at software link time.
The generated accelerator logic includes the following:
■One or more state machines that manage the sequence of operations
defined by the C function. On any clock cycle, an arbitrary number
of computations and memory accesses can happen simultaneously,
orchestrated by the state machines.
■One or more Avalon Memory-Mapped (Avalon-MM) master ports,
which fetch and store data as required by the state machines.
■An Avalon-MM slave port and a set of memory-mapped registers
that allow the processor to set up, start, and stop the accelerator.
1–10 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
The software wrapper, executing on the Nios II processor, controls the
accelerator by reading and writing the register interface. From the
perspective of the calling function, the result of calling the software
wrapper is functionally the same as calling the original C function. The
basic operation of the software wrapper is as follows:
1.Sets up parameters for the accelerator, similar to passing variables to
the original, unaccelerated function.
2.Optionally flushes the processor's data cache to avoid cache
coherency problems. Flushing the data cache might be necessary if
the accelerator accesses the same memory that the processor does.
3.Starts the accelerator. Once an accelerator is running, it can return a
value, terminate, or run continuously, depending on the design of
the C source code.
4.Polls registers in the accelerator hardware to determine when the
task completes.
5.If the function returns a result, reads the result value, and returns it
to the calling function.
One-to-One Mapping From C Syntax to Hardware Structure
The C2H Compiler maps each element of C syntax to an equivalent
hardware structure using straightforward translation rules that directly
instantiate hardware resources based on the input C code. Once familiar
with the C2H Compiler mappings, you can control the generated
hardware structure with simple changes to your C source.
The following are examples of how the C2H Compiler translates C to
hardware:
■Mathematical operators (such as +, -, *, >>) become direct hardware
equivalent circuits (such as add, subtract, multiply and shift circuits).
These circuits might be shared between operations, depending on
the degree of parallelism inherent in the C code.
■Loops (such as for, while, do-while) become state machines that
iterate over the operations inside the loop, until the loop condition is
exhausted.
■Pointer dereferences and array accesses (such as *p, array[i][j])
become Avalon-MM master ports that access the same memory that
the processor does.
■Statements not dependent on the result of a previous operation are
scheduled as early as possible, allowing parallel execution to the
extent possible.
Altera Corporation 9.11–11
November 2009Nios II C2H Compiler User Guide
C2H Compiler Concepts
■Subfunctions called within an accelerated function are also
converted to hardware using the same C-to-hardware mapping
rules. The C2H Compiler creates only one hardware instance of the
subfunction, regardless of how many times the subfunction is called
within the top-level function. Isolating accelerated C code into a
subfunction provides a method of creating a shared hardware
resource within an accelerator.
The C2H Compiler performs certain optimizations when it can reduce
logic utilization based on resource sharing.
Refer to Chapter 3, C-to-Hardware Mapping Reference for complete
details of the C2H Compiler mappings.
Performance Depends on Memory Access Time
Applications that run on a processor are typically compute-bound, which
means the performance bottleneck depends on the rate the processor
executes instructions. Memory access time affects the execution time, but
instruction and data caches minimize the time the processor waits for
memory accesses.
With C2H hardware accelerators, the performance bottleneck undergoes
a profound change: Applications typically become memory bound,
which means the performance bottleneck depends on the memory
latency and bandwidth. When multiple operations do not have data
dependencies that require them to execute sequentially, the
C2H Compiler schedules them in parallel. The resulting accelerator logic
often must access memory to feed data to each parallel operation. If the
hardware does not have fast access to memory, the hardware stalls
waiting for data, reducing the performance and efficiency.
Achieving maximum performance from a hardware accelerator often
involves examining your system's memory topology and data flow, and
making modifications to reduce or eliminate memory bottlenecks. For
example, if your C code randomly accesses a large buffer of data stored in
slow SDRAM, performance suffers due to constant bank switching in
SDRAM. You can alleviate this bottleneck by first copying blocks of data
to an on-chip RAM, and allowing the accelerator to access this fast, lowlatency RAM. Note that you can also accelerate the copy operation, which
creates a direct memory access (DMA) hardware accelerator.
1–12 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
C Code
Appropriate for
Hardware
Acceleration
This section describes guidelines for identifying code that is appropriate
for the C2H Compiler.
Ideal Acceleration Candidates
Sections of C code that consume the most CPU time with the least amount
of code are excellent candidates for acceleration. These tend to have the
following characteristics:
■They contain a relatively small and simple loop or set of nested
loops.
■They iterate over a set of data, performing one or more operations on
the data per iteration, and then store the result.
Examples of such iterative tasks include memory copy-and-modify tasks,
checksum calculations, data encryption, decryption, and filtering
operations. In each of these cases, the C code iterates over a set of data
many times, with either one or more memory reads or writes performed
during each iteration.
Example 1–1 demonstrates a routine that performs a checksum
calculation. This code excerpt is from a TCP/IP stack, and it calculates the
checksum over ranges of data in a network protocol stack. Checksum
calculations are typically a time-consuming pa rt of an IP sta ck , b ecause all
data transmitted and received must be validated, which requires the
processor to loop through all bytes.
Altera Corporation 9.11–13
November 2009Nios II C2H Compiler User Guide
C Code Appropriate for Hardware Acceleration
Example 1–1. Checksum Calculation
u16_t standard_chksum(void *dataptr, int len)
{
u32_t acc;
/* Checksum loop: iterate over all data in buffer */
for(acc = 0; len > 1; len -= 2)
{
acc += *(u16_t *)dataptr;
dataptr = (void *)((u16_t *)dataptr + 1);
}
/* Handle odd buffer lengths */
if (len == 1)
{
acc += htons((u16_t)((*(u8_t *)dataptr)&0xff)<< 8);
}
/* Modify result for IP stack needs */
acc = (acc >> 16) + (acc & 0xffffUL);
if ((acc & 0xffff0000) != 0)
{
acc = (acc >> 16) + (acc & 0xffffUL);
}
return (u16_t)acc;
}
Accelerating this function could have a significant impact on execution
time, especially the amount of time spent in the for loop. The remaining
code executes once per call to format the result and check boundary cases.
Accelerating the code outside the loop has little benefit, unless the entire
standard_chksum() function is a called from another function that is
also a good acceleration candidate. The most efficient hardware
accelerator for this code would replace only the for loop. To accelerate
the for loop only, you need to refactor the code to isolate the loop in a
separate function.
Poor Acceleration Candidates
Accelerating some code can have negative performance impacts, or can
unacceptably increase resource utilization, or both. Use the following
guidelines to identify functions not to accelerate:
■Code that contains many data or control dependencies must perform
many sequential operations, and is a poor candidate for acceleration.
A large number of dependencies makes it difficult for the
C2H Compiler to fully optimize loops. Processors are designed to
perform such operations efficiently.
1–14 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Introduction to the C2H Compiler
■If the code contains C syntax not supported by the C2H Compiler, it
cannot be accelerated. Examples are floating point operations and
recursive functions. Refer to Chapter 7, ANSI C Compliance and
Restrictions.
■Code that calls system and runtime library functions is a poor
candidate for acceleration. For example, there is little point in
accelerating printf() or malloc(). The underlying code contains
a complex set of sequential operations and does not contain
performance-critical loops.
■Code that makes extensive use of global or external variables is a
poor candidate for acceleration. Each time the C2H accelerator uses
a global or external variable, it must access the Nios II processor’s
data memory, which is likely to cause a bottleneck.
There are exceptions to these guidelines. For example:
■Experienced C coders often "unroll" iterative algorithms,
representing them as a sequential set of operations to work better
with an optimizing C compiler. If you can refactor the code and "roll
up" the loop, you might be able to create an efficient hardware
accelerator.
■A critical inner loop might have a complex set of sequential
operations which, if accelerated in hardware, consumes a lot of logic
resources. This presents a trade-off: If the processor spends an
unacceptable amount of time in this loop, it might be worth the
hardware cost to accelerate the whole loop.
■Some runtime library functions are iterative in nature. Examples
include common data movement functions and buffer set functions,
such as memcpy() or memset(). If your code calls one of these
functions, you might consider writing a simple, custom
implementation of the function, which you can then accelerate.
■If your cod e uses g lobal or exter na l va riabl es, it mi ght be ea sy for y ou
to refactor it to be suitable for acceleration. Refactor your code to
copy the global or external variables to local storage, perform the
calculation with the local variables, and then copy results back to
global or external storage. The C2H Compiler implements local
variables as fast, pipelined registers inside of the accelerator.
Understanding Code to Find Opportunities for Acceleration
The best way to obtain optimal results with the C2H Compiler is to
understand your code, and know where the critical loops are. If you
wrote the program from scratch, you probably understand where the
critical sections of code are. If you are starting with an existing code base
that you want to accelerate, the C2H Compiler can benefit you to the
Altera Corporation 9.11–15
November 2009Nios II C2H Compiler User Guide
Next Steps
extent that you analyze the code and understand it. In either case, the
Nios II IDE profiling features can help you determine where the
processor spends most of its time.
Examine the structure of the code for processor-specific or compilerspecific optimizations written into the structure of the code. These
sections of code might result in poor performance with the
C2H Compiler, and could benefit from refactoring for the C2H Compiler.
It can be difficult to identify the critical loop just by inspecting code,
because programs often spend the majority of time iterating on just a few
lines of code. The only way to know exactly where the processor spends
the most time is to profile the application, and inspect the bottleneck
functions.
fRefer to AN 391: Profiling Nios II Systems for further information.
Next Steps
Now that you understand the underlying concepts of the Nios II
C-to-Hardware Acceleration Compiler, you are ready for hands-on
experience accelerating designs. Chapter 2, Getting Started
Tu to ri a ldescribes the C2H Compiler design flow, and gives step-by-step
instructions to accelerate your first design. Altera also provides tutorials
and application notes to deepen your understanding of the
C2H Compiler.
fRefer to the Nios II literature page for further C2H Compiler
1–16 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
2. Getting Started Tutorial
Introduction
fFor introductory information on designing with the Nios II processor,
C2H Compiler
Design Flow
This chapter describes the design flow for the Nios® II C-to-Hardware
Acceleration (C2H) Compiler. This chapter provides a design example
and gives you a step-by-step tutorial to guide you through the process of
creating your first hardware accelerator.
The example software design performs multiple iterations of a data-copy
function. By accelerating the data-copy function, you achieve more than
a 10-fold improvement in the execution performance. The resulting
hardware accelerator resembles a hardware block with direct memory
access (DMA) to copy data without processor intervention.
This tutorial assumes that you are familiar with the Nios II processor and
the Nios II design flow.
refer to the Nios II Hardware Development Tutorial available on the Altera
Nios II literature page at http://www.altera.com/literature/lit-nio2.jsp,
and to the Nios II Software Development Tutorial available in the Nios II
integrated development environment (IDE) help system.
This section discusses the design flow to create a hardware accelerator
with the C2H Compiler.
Starting Point for the C2H Compiler Design Flow
The design flow for the C2H Compiler starts with one or more C files that
compile successfully targeting the Nios II processor. Before you accelerate
a function with the C2H Compiler, you must:
■Identify the functions that require acceleration.
■Debug the functions first targeting the Nios II processor. After
accelerating a function, you can no longer debug individual C
statements within the function.
You might have existing C code that you need to accelerate to improve
performance. Alternatively, you might develop and debug a function in
C with the explicit purpose of converting it to hardware. In either case,
you achieve the best results if the C code is structured for the
C2H Compiler. To start with, you can accelerate your code as-is, and
determine if the results meet the design requirements.
Altera Corporation 9.12–1
November 2009
Typical Design Flow
Typical Design
Flow
A typical design flow using the C2H Compiler to accelerate a function
involves the following steps:
1.Develop and debug your application or algorithm in C targeting a
Nios II processor system.
2.Profile the code to identify the areas that would benefit from
hardware acceleration.
3.Isolate the code you want to accelerate into an individual C
function.
4.Specify the function you want to accelerate in the Nios II IDE.
5.Rebuild the project in the Nios II IDE.
6.Profile the results in hardware, or observe estimates from the C2H
report in the Nios II IDE.
7.If the results do not meet the design requirements, modify the C
source code and system architecture (for example, the memory
topology).
8.Return to Step 5, and iterate.
The typical C2H Compiler design flow is an iterative process of
accelerating a function, comparing the performance to design
requirements, and modifying C code to improve results. If you start with
C code that is not optimized for the C2H Compiler, the first iteration of
acceleration might not dramatically improve performance. Further
iterations, modifying the C code for optimal hardware structure, often
improve the final results significantly over the first pass results.
fThis tutorial does not describe techniques for optimizing hardware
accelerator performance. For further information on optimizing
C2H Compiler results, refer to the Accelerating Nios II Systems with the
C2H Compiler Tutorial.
Software Requirements
The C2H Compiler in evaluation mode is installed as part of the Altera®
Quartus® II Complete Design Suite. You can download the Quartus II
Complete Design Suite free from the Altera website. Visit
www.altera.com and click Download.
2–2 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Getting Started Tutorial
During the design process with the C2H Compiler, you use the following
tools:
■Nios II Integrated Development Environment (IDE) – You control
acceleration options for individual functions in the Nios II IDE. The
results of accelerating functions are reported in the Nios II IDE. The
output is an executable linking file (.elf) targeting a Nios II CPU. The
C2H Compiler also invokes SOPC Builder and optionally the
Quartus II software in the background to regenerate the Nios II
system and update the SRAM object file (.sof).
■SOPC Builder – SOPC Builder manages the generation of C2H logic
and Avalon-MM system interconnect fabric to connect hardware
accelerators to the processor. During the software build process, the
Nios II IDE can invoke SOPC Builder in the background to update
the hardware accelerators when necessary and integrate them into
the Nios II hardware design. The output is a set of hardware
description language (HDL) files (.v or .vhd) and an SOPC Builder
system file (.sopcinfo) defining your system: Nios II processor cores,
peripherals, accelerators, on-chip memory, and interfaces to off-chip
memory.
■Quartus II software – The Quartus II software compiles and
synthesizes HDL produced by the C2H Compiler and SOPC Builder
tools, along with any other custom logic in your Quartus II project.
During the software build process, the Nios II IDE can invoke the
Quartus II software in the background to recompile the Quartus II
project. The output is a .sof file that includes the updated Nios II
system with accelerators.
OpenCore Plus Evaluation
Hardware accelerator blocks generated by the C2H Compiler support
OpenCore®Plus evaluation. OpenCore Plus evaluation allows you to use
the C2H Compiler and evaluate the performance of hardware
accelerators in real systems before purchasing a license for the tool. With
Altera's free OpenCore Plus evaluation feature, you can:
■Verify the functionality of your design, as well as evaluate its size
and speed easily
■Generate time-limited device programming files for designs that
include megafunctions
■Program an FPGA and verify your design in hardware
■Simulate the behavior of an accelerator in your system
Altera Corporation 9.12–3
November 2009Nios II C2H Compiler User Guide
Tutorial
OpenCore Plus hardware evaluation supports the tethered mode of
operation for C2H. In tethered mode the accelerator runs indefinitely, as
long as the target board remains connected to the host computer by an
Altera download cable
You need to purchase a license for the Nios II C-to-Hardware
Acceleration Compiler only when you are completely satisfied with the
functionality and performance of your accelerated Nios II system, and
want to take your design to production.
fFor more information on OpenCore Plus hardware evaluation, see
AN 320: OpenCore Plus Evaluation of Megafunctions.
Tutorial
This section guides you through the steps to accelerate a function using
the C2H Compiler. You create a new software project in the Nios II IDE
using the provided example design files, accelerate a function, and
observe the performance improvement.
This tutorial guides you through the steps to implement the example
design. These steps start with a C source file and end with a running
application that includes an accelerated function. The steps you perform
are described in the following sections:
1.“Set up the Hardware for the Project” on page 2–5
2.“Create the Software Project” on page 2–6
3.“Run the Project as Software Only” on page 2–7
4.“Create and Configure a Hardware Accelerator” on page 2–8
5.“Rebuild the Project” on page 2–10
6.“Observe Results in the Report File” on page 2–11
7.“Observe the Accelerator in SOPC Builder” on page 2–14
Tutorial Design
The hardware design for this tutorial is based on the standard hardware
example design provided with the Nios II EDS. The software design is a
C file named dma_c2h_tutorial.c, which is available for download from
the Altera website. You can run the tutorial design on any Nios
development board available from Altera.
2–4 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Getting Started Tutorial
fYou can download dma_c2h_tutorial.c from the Nios II literature page.
The file is located next to this document (Nios II C2H Compiler User
Guide) on the Altera Nios II literature page at http://www.altera.com/
literature/lit-nio2.jsp.
The file dma_c2h_tutorial.c includes two functions:
■do_dma() – This is the function you accelerate. It performs a block
memory copy. do_dma() takes a source address pointer, a
destination address pointer, and an integer number of bytes to copy.
When implemented in hardware, do_dma() resembles DMA copy
logic. The prototype for do_dma() is as follows:
int do_dma( int * __restrict__ dest_ptr,
int * __restrict__ source_ptr, int length )
The __restrict__ qualifier informs the compiler that the
pointers dest_ptr and source_ptr point to mutually exclusive
address ranges. For further information about the __restrict__
qualifier, see “Pointer Aliasing” on page 3–32 of Chapter 3, C-to-
Hardware Mapping Reference.
■main() – main() calls do_dma() and measures the amount of time
taken, so that you can compare the software implementation with the
hardware accelerator.
main() performs the following actions:
1.Allocates two 1 MB buffers in main memory
2.Fills the source buffer with incrementing values
3.Fills destination buffer with all 0x0.
4.Calls the do_dma() function 100 times
5.Checks the copied data to ensure there were no errors
6.Frees the two allocated buffers
To measure the time it takes for the copy operations to complete, there are
timer routines around the loop that calls the do_dma() function. After
the application runs, the number of milliseconds that were spent
performing the copy operations is displayed to the Console view in the
Nios II IDE.
Set up the Hardware for the Project
To set up the hardware for the tutorial, perform the following steps:
1.Connect your Nios development board to power, and connect the
board to your host computer with an Altera download cable.
Altera Corporation 9.12–5
November 2009Nios II C2H Compiler User Guide
Tutorial
2.Set up the hardware project directory
a.Using a file management tool on the host computer, locate the
standard hardware example design for your Nios development
board. For example, on a Windows PC, use Windows Explorer
to find the Verilog HDL design files for the Nios development
®
board, Cyclone
II Edition at <Nios II EDS install path>/
examples/verilog/ niosII_cycloneII_2c35/standard.
b.Copy the standard directory and name the copied directory
c2h_tutorial_hw. This new directory serves as the hardware
design for the tutorial.
3.Start the Quartus II software.
4.Open the Quartus II project standard.qpf located in the
c2h_tutorial_hw directory.
The Quartus II software might give a warning "Do you want to
overwrite the database ... created by Quartus II Version <version>...
The database format is compatible..." if the project was created with
an earlier version of the software. If so, click Yes to update the
database.
5.Configure the FPGA on the Nios development board.
a.On the Tools menu click Programmer. The Programmer
appears, with the SRAM object file standard.sof automatically
ready to download to the FPGA.
b.Turn on the Program/Configure check box for standard.sof.
c.Click Start. The programmer downloads the configuration data
to the FPGA.
1If Start is not enabled, click Hardware Setup to configure
your JTAG download cable.
Create the Software Project
To set up the software project for the tutorial, perform the following steps.
1.Start the Nios II IDE.
2.If the Workspace Launcher dialog box appears, click OK to accept
the default workspace.
2–6 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Getting Started Tutorial
3.If the Welcome to the Altera Nios II IDE page displays, close it to
view the workbench.
4.Create a new C/C++ Application project.
a.On the File menu, point to New and click C/C++ Application.
The New Project wizard appears.
b.In the Name box, type c2h_tutorial_sw.
c.In the Select Project Template list, select Blank Project.
d.Use the Select Target Hardware settings to browse to and select
the SOPC Builder system (.ptf) file in your c2h_tutorial_hw
directory. After you specify the SOPC Builder system, the IDE
automatically sets the CPU setting to cpu, which is the name of
the only Nios II processor core available in this SOPC Builder
system.
e.Click Finish. The IDE generates a new project c2h_tutorial_sw
and a new system library project c2h_tutorial_sw_syslib.
5.Download the software file dma_c2h_tutorial.c from the Nios II
literature page and save it to a known location on your host
computer. The file is located next to this document (Nios II C2H Compiler User Guide) on the Altera Nios II literature page at http://
www.altera.com/literature/lit-nio2.jsp.
6.Import the C file dma_c2h_tutorial.c into the c2h_tutorial_sw
project. The easiest way to do this is to use an external file
management tool, such as Windows Explorer, and drag the file onto
the c2h_tutorial_sw project folder in the C/C++ Projects view of the
Nios II IDE.
Run the Project as Software Only
In this section, you build and run the project as a software-only
implementation, and observe the time required to run the program. To
run the program, perform the following steps:
1.In the C/C++ Projects view, right-click the c2h_tutorial_sw project,
point to Run As and click Nios II Hardware. The Nios II IDE takes a
few minutes to build and run the program.
Altera Corporation 9.12–7
November 2009Nios II C2H Compiler User Guide
Tutorial
2.Observe the execution time in the Console view. Example 2–1 shows
results of approximately 86000 milliseconds. The results you see
might be different, depending on the memory characteristics of the
target board and the clock speed of the example design.
Example 2–1. Execution Results as Software-Only Implementation
This simple program copies 1048576 bytes of data from a source buffer to a
destination buffer.
The program performs 100 iterations of the copy operation, and calculates
the time spent.
Copy beginning
SUCCESS: Source and destination data match. Copy verified.
Total time: 86520 ms
Create and Configure a Hardware Accelerator
In this section, you create an accelerator for the do_dma() function. To
create the hardware accelerator, perform the following steps:
1.Open the dma_c2h_tutorial.c source file in the Nios II IDE editor, if
it is not already open.
2.In the source file, double-click the name of the do_dma() function
to select it.
3.Right-click do_dma and click Accelerate with the Nios II C2H Compiler. The C2H view appears in the bottom pane of Nios II
IDE.
1In this example, for simplicity, the do_dma() function exists in
the same file as the rest of the application code. However, a good
practice is to isolate functions for acceleration into a separate C
file. The project makefile cannot determine specifically what
part of a file has changed. As a result, if an accelerated function
coexists in the same file with other unaccelerated code, the
C2H Compiler is forced to rebuild the accelerator, even if you
edit unrelated code.
4.Set the build options for the new accelerator, as shown in Figure 2–1.
a.Click the + icon to expand c2h_tutorial_sw in the C2H view.
2–8 9.1Altera Corporation
Nios II C2H Compiler User GuideNovember 2009
Loading...
+ 108 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.