Analog Devices EE215 Application Notes

Engineer-to-Engineer Note EE-215
1
1
a
Technical notes on using Analog Devices DSPs, processors and development tools
Contact our technical support at dsp.support@analog.com and at dsptools.support@analog.com Or vi sit our o n-li ne r esou rces htt p:/ /www.analog.com/ee-notes and http://www.analog.com/processors
A 16-bit IIR Filter on the ADSP-TS20x TigerSHARC® Processor
Contributed by Rickard Fahlqvist November 6, 2003

Introduction

This document describes an implementation of a
nd
2
order 16-bit IIR filter on the ADSP-TS20x TigerSHARC® processor. The implementation uses the direct form II.

Filter Structure

The structure of the 2nd order filter is shown in Figure 1. To achieve filters of orders higher than two, several 2 connected in a parallel fashion. Note that the resulting filter will have an order that is a multiple of 2. For an odd number of ordered filters, a 1 one or several 2 implemented in the example code presented in Listing 3.
x[i]
+
+
Figure 1. 2nd order direct form II
nd
order filters can be cascaded or
st
order filter is needed in addition to
nd
order filters. This is not
w[i]
z
a1
z
a2
b0
-
b
1
-
b
2
+
+
y[i]
The equations that characterizes this type of filter structure are described in Equation 1.
2
[] [] [ ]
[] [ ]
Equation 1. Characterizing 2
2
k
=
1
=
k
k
0
+=
kiwaixiw
k
=
kiwbiy
nd
order direct form II IIR
The first numerator coefficient (b0) can always be set to unity by scaling the input, x[i], with b and dividing b
and b2 by b0. This has been used
1
in the example implementation in listing 1 below.

Implementation

Implementing small recursive algorithms on pipelines presents a difficult problem. In this filter, the value of w at iteration i depends on the value of w at iteration i-1. This loop-carried dependence limits the degree to which the operations may be pipelined. In this program, all multiplications required to produce w[i] and y[i] are computed in a single cycle. However, to add the input with the partial results in order to get the final output takes 5 cycles. During this time, there is no available parallelism to keep the multiplier busy. As a result, this program can only use one compute unit. The operations used to administer the delay line (as well as the memory fetches) are associated with the ALU, so filling multiple instruction slots per instruction line becomes an issue. As mentioned above, if a
0
Copyright 2003, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
a
higher order filter is required, you can implement it as parallel 2
nd
order filters. In this case, two filtered outputs are calculated simultaneously, one in each compute block, with a summation of the results at the end.
The 2-element state is stored in register yR5, and during every iteration each newly computed w[i] is inserted into the state register, which is then shifted left. The delay line is duplicated so that the computation of the 4 multiplications on the feedback and feedforward paths can be performed with one 4-way 16x16-bit multiplication instruction as shown in Figure 2.
63
47
D2 D1 D2 D1
31 15
0
****
a
b
1
Dx denotes Delay line element X.
Figure 2. Multiplication of delay line and coefficients
The partial results are collected with the sideways summation instructions. If the coefficients are mixed fraction and integers, the 4-way multiplication must be divided into two separate multiplications, which will impair efficiency.
2
b2
a
1
Because the SDAB (Short Word Data Alignment Buffer) is used, eight 16-bit input words are loaded per iteration where only the first one is used. Memory bandwidth is not the bottleneck, so this seemingly wasteful treatment of the fetched words is appropriate.

Interface

The C style prototype for the filter in the example implementation is noted in Listing 1.
void iir_16(int2x16 input[], int output[], int input_len, iir_state_t *f_state)
Listing 1. IIR function prototype
The typedef “iir_state_t” is a structure that holds coefficients, delay line and the input scaling factor. The structure definition is listed in Listing
2.
typedef struct { const int2x16 c; /* coefficients */ int2x16 d; /* start of delay line */ int2x16 s; /* Input value scaling */ } iir_state_t;
Listing 2. Filter state structure
A 16-bit IIR Filter on the ADSP-TS20x TigerSHARC® Processor (EE-215) Page 2 of 4
Loading...
+ 2 hidden pages