This document provides a starting guide to some commonly
used digital signal processing functions available for use
with the Freescale MSC8156EVM board. The example
projects are demonstrated in this guide. The objective of this
document is to help the users integrate various independent
projects using these kernels.
1Introduction
The MSC8156EVM is supported by a collection of
commonly used digital signal processing kernels that
function with the SC3850 DSP core. The project described
in this document provides the kernel library consisting of C
and assembly callable kernel applications, as well as their
test harnesses. This tutorial guide demonstrates how to use
several of the most useful and representative kernel
examples such as FIR and IIR filters, FFT, Divide and
Matrix Inverse.
NOTE
Download the kernel software package from
the MSC8156EVM Tool Summary Page on
Running the DSP kernels requires the following devices:
•Personal computer (PC) with CodeWarrior for StarCore-Based DSP IDE for the MSC8156EVM
board connected to the PC
•MSC8156EVM board
The MSC8156EVM project includes the following kernels:
•FIR_complex_16×16
•Complex Radix-4 FFT/IFFT 16×16
•Complex Radix-4 and Radix-2 FFT/IFFT 16×16
•IIR
•Division
•Ln
•Matrix Inversion complex 2×2
•Matrix Inversion complex 4×4
Figure 1 shows the folder directory of all the kernel example projects.
Figure 1. Kernel Example Project Directory
MSC8156EVM Kernels Starting Guide, Rev. 0
2Freescale Semiconduct or
3Test Procedures
Use the following steps to prepare for and run the project:
1. Import the SC3850 DSP kernel library by dragging the .project file in
\fsl_sc3850_kernels\code\cw\sc3850_kernels to the CodeWarrior project window (Figure 2).
Figure 2. Importing the Project Files
2. Build the kernel by clicking on the build icon .
3. After building the kernel project, .elb files are created in the folder fsl_sc3850_kernels\lib.
Test Procedures
4. After the kernel is built, you can run one of the test cases in the \fsl_sc3850_kernels\test\
folder. Import the associated .project of the selected test case and build the project. After
building the test case, .eld files are created in the \fsl_sc3850_kernels\tests\<test_case>\cw
folder.
5. Load the project by clicking on the debug icon and selecting Debug Configurations.
6. Select the appropriate launch configuration, that is, assembly or C test (Figure 3), and click on the
Debug button. Note that not all test cases are available in both assembly (ASM) and C. Some test
cases only have one option.
Figure 3. Launch Configuration
7. Run the project by clicking on the run icon .
NOTE
See Section 4, Common Kernel Example Demonstration for details on how to run the DSP
kernel test cases.
MSC8156EVM Kernels Starting Guide, Rev. 0
Freescale Semiconduc tor3
Common Kernel Example Demonstration
4Common Kernel Example Demonstration
After the DSP kernel library is built, the u ser can run one of the kerne l test ca ses provided with the EVM.
This section provides detailed information for each kernel. For each kernel, the listing includes the
following:
•Location from which to import the file.
•Function
•ASM Prototype
•C Prototype
•Inputs
•Outputs
•Data alignment requirements (if applicable).
•Performance Measurement
The following notes apply for all kernels:
1. Import the kernel as described in Section 3, Test Procedures.
2. DPU is a defined function that enables cycle measurements
#ifdef DPU
#define INIT_CYCLE InitDPU()
#define GET_CYCLE ReadCountDPU()
#endif
3. The kernel is called twice in the example project. The first call brings the kernel to cache so we
can measure the performance of the second call more with warm cache.
4. The test results printed in the CodeWarrior console should show the cycles used to complete the
kernel process and check with the reference outputs.
Word32 x[]: 32-bit complex inputs, 16 bits for real part and 16 bits for imaginary part
Word32 h[]: 32-bit complex coefficients, 16 bits for real and 16 bits for imaginary part
Word16 Nr: number of input data samples
Word16 Nh: number of elements in the filter
Common Kernel Example Demonstration
In the test source code, these inputs are defined as shown in Figure 4.
#define Nr 40
#define Nh 40
Word16 Inpu t[2 * (2 *Nr + Nh+ 2) ]= {
#include "../vectors/test_in _8 0. dat "
};
Word16 Coef f[2 * Nh ]={
#include "../vectors/coeff.d at "
};
Input data and coefficients are vectors stored
in .dat files
Test_in_80 has 244 entries
Coeff has 80 entries
Figure 4. Input Definitions
Output:
Word16 y[]:16-bit output. Interleaved real and imaginary part
In the test source code, the output is computed and stored as shown in Figure 5.
str ea m = fo pen( "../vectors/output_8 0.da t", "w+ " );
for
(i= 0 ;i<2*2*N r;i+ + )
{
fpr in t f(s tr eam,"%d,\n",( in t) Ou tp ut[i]);
}
fclose(strea m );
Figure 5. Output Definition
The output vector is stored to output_80.dat and compared with the reference output. If the
accuracy of the filter is verified, in the CodeWarrior console it displays:
No wrong results found
MSC8156EVM Kernels Starting Guide, Rev. 0
Freescale Semiconduc tor5
Common Kernel Example Demonstration
Performance Measurement:
Estimated cycle count: (Nr/2)*Nh + overhead
Measured cycle count: 939 cycles for asm
1390 cycles for C
Radix-4 complex FFT with 16-bit input and 16-bit output. Input & output complex data are
stored in structure of [real][imag]. It supports 64, 256, 1024, and 4096 point FFTs.
Word16 data_buffer[]: Address of Input and Output Buffer. Input and output share one
memory area pointed by data_buffer.
Word16 wctwiddles[]: Address of the array of twiddle factor Wc
Word16 wbdtwiddles[]: Address of the array of twiddle factor Wb and Wd
Word16 n: FFT point
Word16 ln: Base 4 Log(N). Number of FFT stages
Word16 Shift_down: Scaling down parameter at each stage
These inputs are defined or imported by the lines in the test source file shown in Figure 6.
1. This block selects the number of FFT points, as shown in Figure 8.
#define N 64
//#de f ine N 25 6
//#define N 1024
//#define N 4096
Figure 8. Number of FFT Points
2. FFT and IFFT are both written in the same test file. If an FFT project is built, then the test code
only runs the FFT part and vice versa.
3. WARMCACHE is a macro to call the kernel twice to bring the code into the cache. Use only this macro
for cycle measurements. Otherwise, the input data is overwritten resulting with incorrect results.
#define WARMCACHE
Performance Measurement:
Estimated cycle count: 3N/4*log4_N -N/8 + 5*log4_N + 17
Measured cycle count for the ASM test: See Table 1.
Radix-2 and Radix-4 complex FFT with 16-bit input and 16-bit output. Data structure is double
word [real][imag]. It supports 32, 128, 512 and 2048 points FFTs.
Radix-2 loop is used for first stage additions and subtractions and Radix-4 is used for the main
FFT loops.
Word16 data_buffer[]: Address of Input and Output Buffer. Input and output share one
memory area pointed by data_buffer.
Word16 wctwiddles[]: Address of the array of twiddle factor Wc
Word16 wbdtwiddles[]: Address of the array of twiddle factor Wb and Wd
MSC8156EVM Kernels Starting Guide, Rev. 0
Freescale Semiconduc tor9
Common Kernel Example Demonstration
Word16 n: FFT point
Word16 ln: Base 4 Log(N). Number of FFT stages
Word16 Shift_down: Scaling down parameter at each stage
Outputs:
Word16 data_buffer[] : Address of Input and Output Buffer. Input and output share one big
memory area pointed by data_buffer.
The test example of Radix-4 and Radix2 FFT is very similar to Radix-4 FFT in the previous
section, although they use different algorithms in calculation. Please refer to section 3.2 for
detailed description on how to implement the kernel
typedef struct iir_1st_art_t {
word16 *y;
word16 *x; // Pointer to input buffer
word16 *c; // Pointer to coefficient list
word16 *s; // Pointer to state variable list
unsigned short M); // IO buffer size
MSC8156EVM Kernels Starting Guide, Rev. 0
10Freescale Semiconduct or
Inputs:
The structure inputs are defined by the codes shown in Table 9.
// Struct p
p.y =Do ut ;
p.x =Di n;
p.c =Co ef fe s;
p.s =St at e;
p.M=No ut
______ __ __ __ __ ___ _ ___ __ ___________________
1. Number of data samples has to be multiple of 4.
2. Adjust the data size when changing the input files
3. WARMCACHE is a macro to call the kernel twice to bring the code into the cache. Use only this macro
for cycle measurements. Otherwise, the input data is overwritten resulting with incorrect results.
#define WARMCACHE
Performance Measurement
Estimated cycle count: 8*Nr/4 + 13, Nr is the number of data samples
Measured cycle count: 67 cycles for asm, 139 cycles for C
MSC8156EVM Kernels Starting Guide, Rev. 0
Freescale Semiconduc tor11
Common Kernel Example Demonstration
4.5Division 16×16
Location
fsl_sc3850_kernels\tests\div\cw\test_sc3850_div\
Function
Compute y = a/b where a, b are 16 bits real numbers
ASM Prototype:
Word16 sc3850_div_16x16_asm(div_arg_16x16*arg)
C Prototype:
Word16 sc3850_div_16x16_c(div_arg_16x16*arg)
Structure Definition:
typedef struct div_arg_16x16_t { word16 a, word16 b}
Inputs:
a: an array of numerators
b: an array of denominators
a and b should be the same size
In the test code, the inputs are imported and defined by the code shown in Figure 10..
Word16 in[L*2]=
{ #i nclude" ../vectors/di v_ 16x16_in. i o"
};
for(i = 0;i<L;i++)
{
p.b =in[2*i+1];
…}
p.a=i n [i* 2];// ev en entries as n umer ato rs
// odd entries as denominators
Figure 10. Input Definitions
Output:
The function will return a Word16 result.
Performance Measurement:
Estimated cycle count: 15 + overhead
Measured cycle count: 27 for ASM, 33 for C
MSC8156EVM Kernels Starting Guide, Rev. 0
12Freescale Semiconduct or
4.6Ln
Location
fsl_sc3850_kernels\tests\Ln\
Function:
Computes Ln(x) for every x in the input array and returns the results into the output array.
C Prototype:
Word32 sc3850_ln_c( ln_arg_t);
Structure Definition:
typedef struct ln_arg_t { Word32 *X, // The array of input values
Word32 *Y, // The array of results after computation
unsigned Short n}
The structure is defined by the codes in the test file shown in Figure 11.
Common Kernel Example Demonstration
stream = fopen( "..\\ve ct or s\ \test_in.dat", "r" );
for (i=0;i<M;i++)
{
Computes the inverse of a complex 4x4 matrix, 16-bit complex input (16-bit real and 16-bit
complex), 32-bit signed output (32-bit real and 32-bit complex).
const Complex16* source: Pointer to input matrix, input must be in Complex16 format
Word32 detmin: Determinant threshold used to return an error code
Word32 input_shift: Shift parameter used to scale down the input data to avoid overflowing
Output:
Word16 *sf: Pointer to scaling factor
Word32 * output: Pointer to output matrix, the output is in Complex32 format
Performance Measurement:
ASM version: 294 cycles
Optimized C version: 511 cycles
MSC8156EVM Kernels Starting Guide, Rev. 0
Freescale Semiconduc tor15
How to Reach Us:
Home Page:
www.freescale.com
Web Support:
http://www.freescale.com/support
USA/Europe or Locations Not Listed:
Freescale Semiconductor, Inc.
Technical Information Center, EL516
2100 East Elliot Road
Tempe, Arizona 85284
+1-800-521-6274 or
+1-480-768-2130
www.freescale.com/support
Headquarters
ARCO Tower 15F
1-8-1, Shimo-Meguro, Meguro-ku
Tokyo 153-0064
Japan
0120 191014 or
+81 3 5437 9125
support.japan@freescale.com
Asia/Pacific:
Freescale Semiconductor China Ltd.
Exchange Building 23F
No. 118 Jianguo Road
Chaoyang District
Beijing 100022
China
+86 010 5879 8000
support.asia@freescale.com
For Literature Requests Only:
Freescale Semiconductor
Literature Distribution Center
+1-800 441-2447 or
+1-303-675-2140
Fax: +1-303-675-2150
LDCForFreescaleSemiconductor
@hibbertgroup.com
Information in this document is provided solely to enable system and software
implementers to use Freescale Semiconductor products. There are no express or
implied copyright licenses granted hereunder to design or fabricate any integrated
circuits or integr ated circuits based on the information in this document.
Freescale Semiconductor reserves the right to make changes without further notice to
any products herein. Freescale Semiconductor makes no warranty, representation or
guarantee regarding the suitability of its products for any particular purpose, nor does
Freescale Semiconductor assume any liability arising out of the application or use of
any product or circuit, and specifically disclaims any and all liability, including without
limitation consequential or incidental damages. “Typical” parameters which may be
provided in Freescale Semiconductor data sheets and/or specifications can and do
vary in different applications and actual performance may vary over time. All operating
parameters, including “Typicals” must be validated for each custom er application by
customer’s technical experts. Freescale Semiconductor does not convey any license
under its patent rights nor the rights of others. Freescale Semiconductor products are
not designed, intended, or authorized for use as components in systems intended for
surgical implant into the body, or other applications intended to support or sustain life,
or for any other application in which the failure of the Freescale Semiconductor product
could create a situat ion where personal injury or death may occur. Should Buyer
purchase or use Freescale Semiconductor products for any such unintended or
unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor
and its officers, employees, subsidiaries, affiliates, and distributors harmless against all
claims, costs, damages, and expenses, and reasonable attorney fees arising out of,
directly or indirectly, any claim of personal injury or death associated with such
unintended or unauthorized use, even if such claim alleges that Freescale
Semiconductor was negligent regarding the design or manufacture of the part.
Freescale, the Freescale logo, CodeWarrior, and StarCore are trademarks of
Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. All other product or service
names are the property of their respective owners.