Channel decoding of voice and low bit-rate data channels found in third generation (3G) cellular standards
requires decoding of convolutional encoded data. The Viterbi-decoder coprocessor 2 (VCP2) provided in
the TCI648x/9x devices has been designed to perform Viterbi decoding for IS2000 and 3GPP wireless
standards. The VCP2 coprocessor has been designed to perform forward-error correction for 2G and 3G
wireless systems. The VCP2 coprocessor offers a very cost effective and synergistic solution when
combined with Texas Instruments (TI) DSPs. The VCP2 supports 762 12.2 Kbps 3G AMR channels when
running at 333 MHz. This document describes the operation and programming of the VCP2.
Notational Conventions
This document uses the following conventions.
•Hexadecimal numbers are shown with the suffix h. For example, the following number is 40
hexadecimal (decimal 64): 40h.
•Registers in this document are shown in figures and described in tables.
– Each register figure shows a rectangle divided into fields that represent the fields of the register.
Each field is labeled with its bit name, its beginning and ending bit numbers above, and its
read/write properties below. A legend explains the notation used for the properties.
– Reserved bits in a register figure designate a bit that is used for future device expansion.
•The term "word" describes a 32-bit value.
Preface
SPRUE09E–May 2006–Revised December 2009
Read This First
Related Documentation From Texas Instruments
The following documents describe the C6000™ devices and related support tools. Copies of these
documents are available on the Internet at www.ti.com. Tip: Enter the literature number in the search box
provided at www.ti.com.
SPRU189 — TMS320C6000 DSP CPU and Instruction Set Reference Guide. Describes the CPU
architecture, pipeline, instruction set, and interrupts for the TMS320C6000 digital signal processors
(DSPs).
SPRU198 — TMS320C6000 Programmer's Guide. Describes ways to optimize C and assembly code for
the TMS320C6000™ DSPs and includes application program examples.
SPRU301 — TMS320C6000 Code Composer Studio Tutorial. Introduces the Code Composer Studio™
integrated development environment and software tools.
SPRU321 — Code Composer Studio Application Programming Interface Reference Guide.
Describes the Code Composer Studio™ application programming interface (API), which allows you
to program custom plug-ins for Code Composer.
SPRU871 — TMS320C64x+ Megamodule Reference Guide. Describes the TMS320C64x+ digital signal
processor (DSP) megamodule. Included is a discussion on the internal direct memory access
(IDMA) controller, the interrupt controller, the power-down controller, memory protection, bandwidth
management, and the memory and cache.
C6000, TMS320C6000, Code Composer Studio are trademarks of Texas Instruments.
All other trademarks are the property of their respective owners.
Channel decoding of voice and low bit-rate data channels found in cellular standards such as 2.5G, 3G,
and WiMAX requires the decoding of convolutional encoded data. The Viterbi-decoder coprocessor 2
(VCP2) provided in the TCI648x/9x devices performs Viterbi decoding for IS2000 and 3GPP wireless
standards. The VCP2 coprocessor also performs forward-error correction for 2G and 3G wireless systems.
The VCP2 coprocessor offers a very cost effective and synergistic solution when combined with Texas
Instruments (TI) DSPs. The VCP2 supports 762 12.2 Kbps 3G AMR channels when running at 333 MHz.
1Features
The VCP2 provides:
•High flexibility:
– Variable constraint length, K = 5, 6, 7, 8, or 9
– User-supplied code coefficients
– Code rates (1/2, 1/3, or 1/4)
– Configurable trace back settings (convergence distance, frame structure)
– Branch metrics calculation and depuncturing done in software by the DSP
•System and development cost optimization:
– The VCP2 releases DSP resources for other processing
– Reduces board space and power consumption by performing on-chip decoding
– Communication between the DSP and the VCP2 is performed through the high-performance
EDMA3 engine
– Uses its own optimized working memories
– Provides debug capabilities during frame processing
– Libraries are provided for reduced development time
User's Guide
SPRUE09E–May 2006–Revised December 2009
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
A convolutional code is generated by passing the information sequence to be transmitted through a linear
finite-state shift register. The VCP2 is able to decode only a subset of those codes known as a single-shift
register, nonrecursive convolutional code (an example is given in Figure 1). Important parameters for this
type of codes are:
•The constraint length K (length of the delay line, the VCP2 supports K values from 5 to 9).
•The rate R given by R = k/n where k is the number of information bits needed to produce n output bits
also known as codewords (the VCP2 supports 1/2, 1/3, and 1/4 codes with rates).
•The generator polynomials Gn describe how the outputs are generated from the inputs.
www.ti.com
Figure 1. Convolutional Encoder Example Block Diagram
NOTE: K = 3, R = k/n = 1/3, G0= (100)8, G1= (101)8, G2= (111)80/000 means input is 0, output0 is
0, output1 is 0, output2 is 0.There are 2
(K-1)
states and 2kincoming branches per state.
From the parameters, we can derive a trellis diagram providing a useful representation of the code, but
whose complexity grows exponentially with the constraint length K. Figure 2 shows the trellis diagram of
the code from Figure 1. The fact that there is a limited number of possible transitions from one state to
another makes the code powerful and will be used in the decoding process.
As a maximum-likelihood sequence estimation (MLSE) decoder, the Viterbi decoder identifies the code
sequence with the highest probability of matching the transmitted sequence based on the received
sequence.
The Viterbi algorithm is composed of a metric update and a traceback routine. The metric update performs
a forward recursion in the trellis over a finite number of symbol periods where probabilities are
accumulated (the VCP2 accumulates on 13 bits) for each individual state based on the current input
symbol (branch metric information). The accumulated metric is known as path metrics or state metrics.
Once a path through the trellis is identified, the traceback routine performs a backward recursion in the
trellis and outputs hard decisions or soft decisions.
8
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The DSP controls the operation of the VCP2 (Figure 3) using memory-mapped registers. The DSP
typically sends and receives data using synchronized EDMA3 transfers through the EDMA3 bus. The
VCP2 sends two synchronization events to the EDMA3: a receive event (VCPREVT) and a transmit event
(VCPXEVT). The VCP2 input data corresponds to the branch metrics and the output data to the hard
decisions or soft decisions.
www.ti.com
Figure 3. VCP2 Block Diagram
10
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The branch metrics (BM) are calculated by the DSP and stored in the DSP memory subsystem as 8-bit
signed values. Per symbol interval T, for a rate R = k/n and a constraint length K, there are a total of 2
branches in the trellis. For rate 1/n codes, only 2
n-1
branch metrics need to be computed per symbol period
K-1+k
and passed to the VCP2. Moreover, n soft inputs are required to calculate 1 branch metric.
Assuming BSPK modulated bits (0 → 1, 1 → -1), the branch metrics are calculated as follows:
•Rate 1/2: there are 2 branch metrics per symbol period
– BM0(t) = r0(t) + r1(t)
– BM1(t) = r0(t) - r1(t)
where r(t) is the received codeword at time t (2 symbols, r0(t) is the symbol corresponding to the encoder
upper branch, see Figure 1).
•Rate 1/3: there are 4 branch metrics per symbol period
– BM0(t) = r0(t) + r1(t) + r2(t)
where r(t) is the received codeword (4 symbols, r0(t) is the symbol corresponding to the encoder upper
branch, see Figure 1).
The data must be sent to the VCP2 as described in Table 1, Table 2, and Table 3 for rates 1/2, 1/3, and
1/4, respectively (the base address must be double-word aligned).
The branch metrics can be saved in the DSP memory subsystem in either their native format or packed in
words (user implementation). When working in big-endian mode, the VCP2 endian mode register
(VCPEND) indicates if the data is 32-bit word packed or native 8-bit format and the VCP2 will handle the
endianness byte swapping accordingly (see Section 7).
Table 1. Branch Metrics for Rate 1/2
Data
Address (hex)MSBLSB
BaseBM1(t=T)BM0(t=T)BM1(t=0)BM0(t=0)
Base + 4hBM1(t=3T)BM0(t=3T)BM1(t=2T)BM0(t=2T)
Base + 8h...
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
BaseBM3(t=0)BM2(t=0)BM1(t=0)BM0(t=0)
Base + 4hBM3(t=T)BM2(t=T)BM1(t=T)BM0(t=T)
Base + 8h...
Address (hex)MSBLSB
BaseBM3(t=0)BM2(t=0)BM1(t=0)BM0(t=0)
Base + 4hBM7(t=0)BM6(t=0)BM5(t=0)BM4(t=0)
Base + 8hBM3(t=T)BM2(t=T)BM1(t=T)BM0(t=T)
Base + ChBM7(t=T)BM6(t=T)BM5(t=T)BM4(t=T)
Base + 10h...
The state metric accumulation resolution is 13 bits on the VCP2. Consequently, full 8-bit dynamic range is
available for branch metrics on the TCI648x/9x VCP2, for all constraint lengths and all code rates.
4.2Soft Input Dynamic Ranges
The VCP2 implementation implies that the soft inputs need to be quantized so that the branch metrics
satisfy the following bound B1 (branch metrics upper bound - absolute value):
(C - 1)
2
- 1 ≥ (2 × (K - 1) + 2) × B
K is the constraint length and C determines the truncation of state metrics that can be performed without
loss of decoding performance.
The VCP2 is designed with C = 13. The branch metrics can have a maximum dynamic range of 7 + 1 sign
bits [-128; +127]. This gives another branch metrics upper bound B2≤ 128.
www.ti.com
Table 2. Branch Metrics for Rate 1/3
Data
Table 3. Branch Metrics for Rate 1/4
Data
1
So for a given constraint length, min (B1, B2) gives the final branch metrics maximum bound B.
To satisfy B in the branch metrics calculation, the soft input values, delivered as 8-bit-signed equalized
values, are linearly scaled with the following formula where 1/n is the rate.
Scaled = min (B1, B2)/n × SoftValue/128
Example
K = 9, then B1≤ 227.5 and the branch metrics range B2is [-128; +127]. So the branch metrics need to be
in [-128;+127] range.
If rate 1/3, 128/342, so the soft inputs need to be scaled by a factor of 0.333333 and saturated within
the range [-42; +42].
Table 4 summarizes the calculations for the different constraint length and rate:
The VCP2 can be configured to generate either hard decisions (one bit per decision), or soft decisions
(8-bit value per decision). Ordering of the VCP2 decisions depends on the OUT_ORDER field of VCPIC3
and the SD field of VCPEND. If the DSP is set to work in big-endian mode and the results are soft
decisions (see the VCP2 endian mode register, Section 6.3). The decisions buffer start address must be
double-word aligned and the buffer size must be a multiple of 8 bytes.
The soft decisions in the VCP2 are initially computed with the path metrics at 13-bit values. The results
are then clipped to 8-bit signed integer values before being stored in the traceback soft decision memory.
Decision Data
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
The VCP2 contains several memory-mapped registers accessible by the CPU, the IDMA, the QDMA, and
the EDMA3. A configuration-bus access is faster than an EDMA3-bus access for isolated accesses
(typically when accessing control registers). EDMA3-bus accesses are used for EDMA3 transfers and
provide maximum throughput to/from the VCP2. The registers are listed in Table 5. For the memory map
and full register addresses, see the device-specific data manual.
The branch metric and traceback decision memories contents are not accessible and the memories can
be regarded as FIFOs by the DSP, meaning you do not have to perform any indexing on the addresses.
•Data Transfer Alignment: Normal (non-emulation) mode data transfers to/from the VCP2
must be aligned on a double-word (64-bit) boundary. Alignment can be forced in C using
the 'DATA_ALIGN' pragma. Non-alignment results in data transfer failure.
Example:
#pragma DATA_ALIGN(configIc, 8)// Should be double-word aligned
VCP_ConfigIc configIc;// VCP Input Configuration Reg
•Data Transfer Size: Normal (non-emulation) mode data transfers to/from the VCP2 must
be of a length that is an 8-byte (double-word) multiple.
•Emulation mode transfers are performed on 32-bit boundaries and are 4 bytes in length.
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The VCP2 peripheral identification register (VCPPID) is a constant register that contains the ID and ID
revision number for the peripheral. The PID stores version information used to identify the peripheral. All
bits within this register are read-only (writes have no effect), meaning that the values within this register
should be hard-coded with the appropriate values and must not change from their reset state.
The VCPPID register is shown in Figure 4 and described in Table 7.
Figure 4. VCP2 Peripheral ID Register (VCPPID)
TCI648x DSP
3124 2316 15870
ReservedTYPECLASSREV
R-0R-0x01R-0x11R-rev
TCI649x DSP
3130 2928 2716 1511 1087650
SCHEMEReservedPIDRTLMAJORCUSTOMMINOR
R-1R-0R-0x80AR-<rtl>R-<major>R-R-<minor>
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
Table 7. VCP2 Peripheral ID Register (VCPPID) Field Descriptions
BitFieldValueDescription
TCI648x DSP
31-24Reserved0Reserved
23-16TYPE01hPeripheral Type. Identifies the type of the peripheral.
15-8CLASS11hPeripheral Class. Identifies the class.
7-0REV<rev>Peripheral Revision. Identifies the revision level of the specific instance of the peripheral. This
value should begin at 0x01 and be incremented each time the design is revised.
The polynomial generators are 9-bit values defined as G(z) = b8z-8+ b7z-7+ b6z-6+ b5z-5+ b4z-4+ b3z-3+ b2z-2+ b1z-1+ b0, but only 8 bits
are passed in the POLYn bitfields so that b1is the most significant bit and b8is the least significant bit (b0is not passed, but set by the
internal VCP hardware). The VCP2 uses the number of poly fields set to zero starting at POLY3 to determine the code rate. Therefore,
POLYn fields not used by the current code rate must be set to zero. The VCP2 uses the number of least-significant bits that are zero in
POLY0 to determine the constraint length.
(1)
16
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
Table 12. VCP2 Input Configuration Register 4 (VCPIC4) Field Descriptions
BitFieldValueDescription
31-29Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no
28-16IMINS0-1FFFhMinimum initial state metric (13 bits).
15-13Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no
12-0IMAXS0-1FFFhMaximum initial state metric (13 bits).
effect.
effect.
20
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
Table 13. VCP2 Input Configuration Register 5 (VCPIC5) Field Descriptions
BitFieldValueDescription
31SDHDOutput decision type select bit.
0Hard decisions
1Soft decisions
30OUTFOutput parameters read flag bit.
0VCPREVT is not generated by VCP for output parameters read
1VCPREVT generated by VCP for output parameters read
29-28TBTraceback mode select bits.
0Not allowed
1hTailed, F ≤ F
2hConvergent, (no tail bits)
3hMixed, F ≥ F
27-25Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
24-20SYMR0-1FhDetermines decision buffer length in output FIFO. For information on selecting the appropriate
SYMR value, see Section 8.4.
19-16SYMX0-FhDetermines branch metrics buffer length in input FIFO. For information on selecting the appropriate
SYMX value, see Section 8.3.
15-8Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
7-0IMAXI0-FFh Maximum initial state metric value bits. IMAXI bits determine which state should be initialized with
the maximum state metrics value (IMAXS) bits in VCPIC4; all the other states are initialized with the
value in the IMINS bits.
(1)
For more details on F
, see Section 8.1.4.
max
(1)
max
and tail bits are used
max
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
The VCP2 output register 0 (VCPOUT0) is shown in Figure 11 and described in Table 14.
Figure 11. VCP2 Output Register 0 (VCPOUT0)
31292816
ReservedFMINS
R/W-0R/W-0
1513120
ReservedFMAXS
R/W-0R/W-0
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
Table 14. VCP2 Output Register 0 (VCPOUT0) Field Descriptions
BitFieldValueDescription
31-29Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
28-16FMINS0-FFFh Minimum final state metric value (13 bits).
15-13Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
12-0FMAXS0-FFFh Maximum state metric value for the final trellis stage (at trellis stage R+C). 13 bits
22
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The VCP2 status register 0 (VCPSTAT0) is shown in Figure 15 and described in Table 18.
Figure 15. VCP2 Status Register 0 (VCPSTAT0)
31292816
ReservedNSYMPROC
R/W-0R-0
1512118
NSYMPROCReserved
R-0R/W-0
76543210
ReservedOFFULIFEMPWICERRRUNPAUSE
R/W-0R-0R-0R-1R-0R-0R-0R-0
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
BitFieldValueDescription
31-29Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
28-12NSYMPROCNumber of symbols processed bits. The NSYMPROC bits indicate how many symbols have been
11-7Reserved0Reserved. The reserved bit location is always read as 0. A value written to this field has no effect.
6EMUHALTEmulation halt status bit.
5OFFULOutput FIFO buffer full status bit.
4IFEMPInput FIFO buffer empty status bit.
3WICWaiting for input configuration bit. The WIC bit indicates that the VCP is waiting for new input
2ERRVCP error status bit. The ERR bit is cleared as soon as the DSP reads the VCP error register
1RUNVCP running status bit.
0PAUSEVCP pause status bit.
EMU
HALT
Table 18. VCP2 Status Register 0 (VCPSTAT0) Field Descriptions
processed in the state metric unit with respect to time.
The maximum number of processed stages is equal to f + (k-1) in tailed or mixed mode. The
maximum number of processed stages is equal to f + c in convergent mode.
0No halt due to emulation.
1Halt due to emulation.
0Output FIFO buffer is not full.
1Output FIFO buffer is full.
0Input FIFO buffer is not empty.
1Input FIFO buffer is empty.
control parameters to be written. This bit is always set after decoding of a user channel.
0Not waiting for input configuration words.
1Waiting for input configuration words.
(VCPERR).
0No error.
1VCP paused due to error.
0VCP is not running.
1VCP is running.
0VCP is not paused. The UNPAUSE command is acknowledged by clearing the PAUSE bit.
1VCP is paused. The PAUSE command is acknowledged by setting the PAUSE bit. The PAUSE bit
can also be set, if the input FIFO buffer is becoming empty or if the output FIFO buffer is full.
26
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The VCP endian mode register (VCPEND) is intended to solve possible big-endian issues and is,
therefore, used only when the DSP is in big-endian mode. Depending on whether the data is saved in the
DSP memory subsystem in its native format or is 32-bit word packed, data interpretation will be different.
7.1Branch Metrics
When the data are saved in their 8-bit native format (BM = 1), they must be organized in the DSP memory
as described in Table 23. When the data are packed on 32-bit words (BM = 0), they must be organized in
the DSP memory as described in Table 24.
Little_big_endianBMDescription (MSB to LSB)
003,2,1,0,7,6,5,4 Þ 7,6,5,4,3,2,1,0 (bytes)
010,1,2,3,4,5,6,7 Þ 7,6,5,4,3,2,1,0 (bytes)
10Endianness manager has no effect
11Endianness manager has no effect
Endianness
Table 22. Branch Metrics
7,6,5,4,3,2,1,0 Þ 7,6,5,4,3,2,1,0 (bytes)
7,6,5,4,3,2,1,0 Þ 7,6,5,4,3,2,1,0 (bytes)
Table 23. Branch Metrics in DSP Memory (BM = 1)
Address (hex bytes)Data
BaseBM0
Base + 1BM1
Base + 2BM2
Base + 3BM3
Base + 4BM4
Base + 5BM5
Base + 6BM6
Base + 7BM7
Data are presented to the EDMA3 as shown in Figure 19. The endianness manager reorders the BMs, as
shown in Figure 20, for processing.
Figure 19. Data Source - VBUSP/DMA (BM = 1)
63:5655:4847:4039:3231:2423:1615:87:0
BM0BM1BM2BM3BM4BM5BM6BM7
Figure 20. Data Destination - Kernel for Processing Unit (BM = 1)
63:5655:4847:4039:3231:2423:1615:87:0
BM7BM6BM5BM4BM3BM2BM1BM0
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
The VCP2 hard-decisions bit ordering within the 32-bit hard-decision word is programmable via the
OUT_ORDER = 0 register, such that the oldest bit can be either in the MSB or the LSB position.
Figure 23. Trellis Stage Ordering of Hard Decisions in 32-Bit Word (OUT_ORDER = 0)
6362...3231...10
StageStageStageStageStageStage
NN - 1N - 31N - 32N - 62N - 63
......
OUT_ORDER = 1 orders the hard-decision data in the order it is calculated in the state metric
computation, from 31 to 0 in the 32-bit word output.
Figure 24. Trellis Stage Ordering of Hard Decisions in 32-Bit Word (OUT_ORDER = 1)
6362...3231...10
StageStageStageStageStageStage
N - 31N - 30NN - 63N - 33N - 32
......
7.1.2Soft Decisions
The VCP2 soft decisions are 8-bit results and output 64 bits at a time. The soft decisions are organized as
shown in Table 25, based on the CPU's endianness and whether SD is set for native or 32-bit packed
results.
The VCP2 processing unit is shown in Figure 25. The state metrics unit performs the Viterbi forward
recursion using branch metrics as inputs and updates the states metrics for all states (add/compare/select
or ACS operations) at every trellis stage. The state metrics memory is not accessible by the DSP. The
traceback unit performs the Viterbi backward recursion and generates hard decisions or soft decisions.
The traceback memories are not directly accessible by the DSP.
8.1Sliding-Windows Processing
The traceback hard-decision memory can store up to 32768 traceback bits and there are 2
at each trellis stage. Therefore, the traceback hard-decision memory can store decisions of 32768/2
symbols. The traceback soft-decision memory can store up to 8192 traceback soft values and, therefore,
contain up to 8192 soft decisions of 8192/2
tail bits) and a constraint length K, F and K determine whether all decisions can be stored in the traceback
memories.
If the decisions do not fit in the traceback HD/SD memory, then the convergent or mixed mode is used
and the original frame is segmented into sliding windows (SW); otherwise, the traceback mode is set to
tailed and no segmentation is required.
Figure 25. Processing Unit
(K-1)
symbols. Assume a terminated frame of length F (excluding
(K-1)
bits stored
www.ti.com
(K-1)
8.1.1Tailed Traceback Mode
This mode is used when the frame is terminated and fits within the coprocessor traceback memory (see
Figure 26). The state metrics are computed over F + K - 1 symbols, the traceback is initialized to the
proper state and executed over F + K - 1 symbols. It should be noted that only F decisions are output.
They are output in reverse order and in blocks of user-defined size.
34
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
This mode is used when the frame is terminated and does not fit within the coprocessor traceback
memory. The frame is split into sliding windows (see Figure 27). The state metrics are computed over F +
K - 1 symbols, the traceback is initialized to the proper state and executed over F + K - 1 symbols. It
should be noted that only F decisions are output in blocks of user-defined size (see Section 8.3). The
state metrics computation of sliding window I + 1 is done in parallel with the traceback computation of
sliding window I. Tailed traceback type is used on the last sliding window.
8.1.3Convergent Traceback Mode
This mode is used with non-terminated frames or when you want to decode a portion of the frame. When
the frame does not fit into the coprocessor traceback memory, then the frame is split into sliding windows
(see Figure 28). The state metrics are computed over F + C symbols, the traceback is initialized to the
proper state and executed over F + C symbols. It should be noted that only F decisions are output in
blocks of user-defined size (see Section 8.4). The state metrics computation of sliding window I + 1 is
done in parallel with the traceback computation of sliding window I.
Figure 27. Mixed Traceback Mode-Example With Five Sliding Windows
Architecture
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
Submit Documentation Feedback
Figure 28. Convergent Traceback Mode-Example With Five Sliding Windows
VCP2 has increased the traceback soft-decision memory and output FIFO compared to the memory in
VCP. The traceback soft-decision memory size has been increased from 1024 × 96 bits to 2048 × 64 bits.
The output FIFO memory has been increased from 32 × 64 bits to 64 × 64 bits. In addition, the
soft-decision resolution is increased internally to 13 bits and is clipped to 8 bits when output from VCP2.
These memory sizes have an effect on the F
and the (R+C)
parameter (i.e., the max. sliding window length for mixed/convergent processing).
max
The differences between VCP and VCP2 are related to F
performance is as follows:
•Allowed values for C are C = N*(K - 1), where N is a convergence multiplier. Table 26 and Table 27
show the possible values of N for each value of K for hard and soft decisions. The larger the
convergence length is, the better the BER. However, larger convergence lengths require more VCP
clock cycles.
•Soft decisions:
– R is constrained to be less than or equal to 248.
– F
and (R+C)
max
for VCP2 have been increased over VCP. This change reduces the number of
max
sliding windows and, therefore, decreases the number of VCP2 cycles needed for traceback.
Overall, there is no major improvement in the processing delay because a larger portion of cycles is
spent in state metric accumulation, which is largely unaffected by the choice of R and C.
•Hard decisions:
– (R+C)
for mixed/convergent processing for hard decisions has been increased from 605 to 635
max
for K = 6, and from 1020 to 2044 for K = 5. As with soft decisions, this change results in a small
decrease in VCP2 cycle counts during traceback.
The procedure to calculate the reliability length is as follows [the reliability length cannot be larger than
1920, C ≤ 1920 - (k - 1)]:
1. Determine the convergence length C = N*(K - 1).
2. Determine the number of sliding windows: Nsw= ceil(f/[(r + c)
3. Determine the reliability length: R = m × ceil[f/(Nsw × m)].
4. If hard decisions are being used and R > 1920 or if soft decisions are being used and R > 248, then
increment the number of sliding windows and go back to step 3.
5. For hard decision and soft-decision limits, see Table 26 and Table 27.
www.ti.com
parameter (i.e., the max. frame size for tailed processing),
During the standard forward recursion, an entity called the Yamamoto bit is computed for each state and
updated every symbol interval. The Yamamoto bit was proposed by Hirosuke Yamamoto (Hirosuke
Yamamoto, Viterbi Decoding Algorithm for Convolutional Codes with Repeat Request, IEEE Transactions
on Information Theory, Vol. IT-26, No. 5, September 1980).
Basically, a bit (the Yamamoto bit) is associated with each state in the decoding process. Initially, all the
Yamamoto bits are set (1). During the decoding process, the Yamamoto bit for a particular state comes
from a couple of decisions made on the path metrics and the Yamamoto bit of previous states. The
metrics of all paths leading to a particular state are compared. If the difference between any two metrics is
less than a given threshold (YAMT bits in VCPIC1), then the Yamamoto bit is cleared; otherwise, the
Yamamoto bit is inherited from the previous state of the path with the largest metric. The end result of this
process (YAM bit in VCPOUT1) yields a zero (0) if anywhere along the decoding path there was a point
where the decision between two paths was ambiguous. The YAM bit can therefore be used as a binary
frame quality indicator.
The Yamamoto algorithm can be enabled or disabled by toggling the YAMEN bit in VCPIC1.
8.3Input FIFO (Branch Metrics)
The branch metric input FIFO uses a double-buffering scheme to allow the transfer of new input data while
processing the current data. After the VCP2 is initiated, it generates a VCPXEVT EDMA synchronization
event each time one side of the input FIFO is empty (and, thus, ready to accept new data). The value of
SYMX in the VCPIC5 register determines the number of 64-bit transfers of input data expected to be
written into the input FIFO by the EDMA for each VCPXEVT event.
Table 28 lists the valid values for SYMX, along with the corresponding number of expected 64-bit
transfers. As shown for the supported code rates, the VCP2 can be programmed to expect either 8 or 16
64-bit transfers for each VCPXEVT event.
The VCP2 only generates as many VCPXEVT events as needed to transfer all the branch metric input
data required for the current code block. In other words, no excess VCPXEVT events are generated
based on the FIFO being empty at the end of processing.
Architecture
Code RateSYMXNumber of 64-Bit Transfers
1/4316
1/418
1/3716
1/338
1/21516
1/278
8.4Output FIFO (Decisions)
The decoded decision output FIFO uses a double-buffering scheme to allow the EDMA to transfer out
available decoded data while the VCP2 processes and writes more decoded data. The VCP2 generates a
VCPREVT EDMA synchronization event each time one side of the output FIFO is full. In the case that all
the decoded data fits within one side of the output FIFO, only one VCPREVT is generated after all the
data has been written to the FIFO.
The value of SYMR in the VCPIC5 register should be set to one less than the number of 64-bit transfers
of output data expected to be transferred from the output FIFO by the EDMA for each VCPREVT event.
The possible range for SYMR is 1 to 31. SYMR should be calculated as follows:
•For hard decisions:
– If F ≤ 2048, then SYMR = ceil (F/64) - 1
– If F > 2048, then SYMR = 15 or 31
•For soft decisions:
Table 28. Code Rate versus SYMX
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
The number of 64-bit transfers per VCPREVT event is SYMR + 1. Again, when F ≤ 2048, for hard
decision output, or F ≤ 256, for soft decision output, and SYMR is calculated as shown, a single
VCPREVT event is generated once all the output data has been written to the output FIFO.
www.ti.com
– If F ≤ 256, then SYMR = ceil (F/8) - 1
– If F > 256, then SYMR = 15 or 31
38
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
The VCP2 requires setting up the following context per user channel:
•3 to 4 EDMA3 parameters (see Table 29)
•The input configurations parameters
Several user channels can be programmed prior to starting the VCP2. A suggested implementation is to
use the EDMA3 interrupt generation capabilities [see the TMS320C6472/TMS320TCI648x DSP EnhancedDMA (EDMA3) Controller User's Guide (SPRU727)] and program the EDMA3 to generate an interrupt
after the user channel's last VCPREVT synchronized EDMA3 transfer has completed.
Table 29. Required EDMA3 Links Per User Channel
Direction
TransmitInput configuration parametersSend the input configurationRequired
Transmit direction (DSP → VCP), receive direction (VCP → DSP)
DataUsageRequired/Optional
9.1EDMA3 Resources
9.1.1VCP2 Dedicated EDMA3 Resources
Within the available 64 EDMA3 channel event sources, two are assigned to the VCP2: event 28 and event
29.
•Event 28 is associated to the VCP2 receive event (VCPREVT) and is used as the synchronization
event for EDMA3 transfers from the VCP2 to the DSP (receive). EDMA3 channel 28 is primarily
intended to serve VCP2-to-DSP transfers.
•Event 29 is associated to the VCP2 transmit event (VCPXEVT) and is used as the synchronization
event for EDMA3 transfers from the DSP to the VCP2 (transmit). EDMA3 channel 29 is primarily
intended to serve DSP-to-VCP2 transfers.
The EDMA parameters consist of eight words as shown in Figure 29. All EDMA transfers, in the context of
the VCP, must contain an even number of words, and have source and destination addresses
double-word aligned.
All EDMA transfers must be double-word aligned and the ACNT for the VCP EDMA transfer must be a
multiple of 8. Single-word transfers that are not double-word aligned cause errors in the TCP2/VCP2
memory.
For more information, see the TMS320C6472/TMS320TCI648x DSP Enhanced DMA (EDMA3) ControllerUser's Guide (SPRU727).
Figure 29. EDMA3 Parameters Structure
310
EDMA3 Channel Options Parameter (OPT)
EDMA3 Channel Source Address (SRC)
Number of arrays of length ACNT (BCNT)Number of bytes in array (ACNT)
EDMA3 Channel Destination Address (DST)
Destination 2nd Dimension Index (DSTBIDX)Source 2nd Dimension Index (SRCBIDX)
BCNTRLDLINK
Destination 3rd Dimension Index (DSTCIDX)Source 3rd Dimension Index (SRCCIDX)
ReservedNumber of frames in block (CCNT)
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
•BCNT = CEIL(TNBM/ACNT) (Number of arrays in a frame)
Where TNBM = Total Number of Branch Metrics, in bytes
To calculate total number of branch metrics data in bytes:
For mixed and tailed traceback mode, Total number of Branch Metrics = (F + K - 1) ×(2
(Number of branch metrics bytes in an array)
(r - 1)
)
40
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
Upon completion, this EDMA3 transfer is linked to one of the following:
•The DMA input configuration parameters transfer parameters of the next user channel, if there is one
ready to be decoded.
•Dummy DMA transfer parameters, if there are no more user channels ready to be decoded [for
information on how to set up a dummy Xfer, see the TMS320C6472/TMS320TCI648x DSP EnhancedDMA (EDMA3) Controller User's Guide (SPRU727)]. Do not link to a NULL transfer, as the secondary
event register will set the event flag for Event 29. The final VCPXEVT is generated upon the reading of
the decisions and output registers, which is intended to transfer the input configuration of the next user
channel. If a NULL transfer link is in place, the final VCPXEVT will set the event 29 flag of SER and no
further VCP execution will occur until it is cleared.
9.1.2.3Decisions Transfer
EDMA3 transfers from the decision buffer are VCPREVT frame-synchronized transfers. The programming
of these transfers depend on the decision type and the traceback mode.
Upon completion, this EDMA3 transfer is linked to one of the following:
1. The decisions EDMA3 transfer parameters of the next user channel, if there is one ready to be
decoded and the OUTF bit is 0.
2. Null EDMA3 transfer parameters (with all zeros), if there are no more user channels ready to be
decoded and the OUTF bit is 0.
3. The output parameters EDMA3 transfer parameters, if the OUTF bit is 1.
(r − 1)
Programming
)
9.1.2.4Hard-Decisions Mode
The OPTIONS should be set as:
•ITCCEN = 0 (Intermediate transfer complete chaining is disabled)
•TCCEN = 0 (Transfer complete chaining is disabled)
•ITCINTEN = 0 (Intermediate transfer complete interrupt is disabled)
•TCINTEN = 0 (Transfer complete interrupt is disabled)
•WIMODE = 0 (Normal operation)
•TCC = 1 to 63 (Transfer complete code)
•TCCMODE = 0 (Normal completion)
•FWID = Don't care
•STAT = 0 (Entry is updated as normal)
•SYNCDIM = 0 (A-sync transfer, each event triggers the transfer of ACNT elements)
•DAM = 0 (Dst address within an array increments. Dst is not a FIFO.)
•SAM = 1 (Src Address is fixed. Src is a FIFO.)
•SOURCE ADDRESS: VCPRDECS Decision FIFO address
•ACNT = (SYMR+1) × 8 (Number of hard decision bytes in an array)
•BCNT = CEIL(TNHD/ACNT) (Number of arrays in a frame) Where TNHD is the total number of hard
decisions in bytes (Framelength/8).
•Destination Address: hard-decision array address
•SRCBIDX = 0
•DSTBIDX = ACNT
•SRCCIDX = 0
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
Upon completion, this EDMA3 transfer is linked to one of the following:
•The EDMA3 decisions transfer parameters of the next user channel, if there is one ready to be
decoded.
•Null EDMA3 transfer parameters (with all zeros), if there are no more user channels ready to be
decoded.
9.2Input Configuration Words
The input configuration words should reflect the parameters of the user channels to be decoded.
The POLYn bits in VCPIC0 correspond to the generator polynomials in the encoder (see Figure 1). The
values in each POLYn bit field must be entered in reverse order. The POLYn least-significant bit is set by
the VCP2 logic.
•For rate 1/2, POLY0 and POLY1 are required. POLY2 and POLY3 must be set to zero.
•For rate 1/3, POLY0, POLY1, and POLY2 are required. POLY3 must be set to zero.
•For rate 1/4, all the POLYn bits are required.
The YAMT and YAMEN bits in VCPIC1 are described in Section 8.2.
The F and R bits in VCPIC2, the C bit in VCPIC3, and the TB bits in VCPIC5 are described in Section 8.1.
The IMAXI bits in VCPIC5 determine which state should be initialized with the maximum state metrics
value (IMAXS), all the other states are initialized with the minimum state metrics value (IMINS). The IMAXI
can range from 0 to 2
The SYMX and SYMR bits in VCPIC5 are described in Section 8.3 and Section 8.4.
The OUTF bit in VCPIC5 indicates whether the VCP should generate a VCPREVT for reading the output
parameters. The OUTF bit setting will impact the EDMA3 programming (see Section 9.1.2.3).
K-1
-1. The IMAXS and IMINS are 13-bit signed values.
Output Parameters
10Output Parameters
The FMAXS and FMINS bits in VCPOUT0 indicate the final maximum and minimum state metric values,
respectively. The FMAXI bit in VCPOUT1 indicates the state index for the state with the final maximum
state metric.
The YAM bit in VCPOUT1 is described in Section 8.2.
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
A VCP2 transmit event (VCPXEVT) is generated when any of the following conditions appear:
•A START command write in VCPEXE.
•All input control words have been received and are correct.
•Top side or bottom side of the input FIFO buffer is empty.
•After all decisions have been read and OUTF is cleared, or if all decisions have been read and the
output registers have been read and OUTF is set. Note that this extra XEVT pulls the next set of input
configurations via EDMA3. If EDMA3 is not set up to link in another set of input configurations, then a
dummy transfer should be set up to avoid an SER event flag being set for the EDMA3 parameter entry.
If the flag is set, it effectively locks the VCP2, and must be cleared before any future events can be
processed for that entry.
11.2 VCPREVT Generation
A VCP2 receive event (VCPREVT) is generated when any of the following conditions appear:
•The traceback unit has finished writing top side or bottom side of the output FIFO buffer.
•After the traceback is completed (the whole frame has been decoded).
•OUTF bit in VCPIC5 is 1 and all decisions have been read to read the output registers.
www.ti.com
44
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009
•0: Value at reset or value written by the coprocessor when previous instruction is read and its
execution is ongoing. DSP may test the status word in the output control memory to check if the
instruction is being executed.
•1: Start - CPU orders the coprocessor to start a processing block. The first action of the coprocessor is
then to generate the first XEVT to trigger DMA transfer of the input control words.
•2: Pause - CPU orders the coprocessor to pause a processing block at the beginning of traceback.
•3: Unpause (single_traceback) - CPU orders the coprocessor to restart at the beginning of traceback
and halt at next traceback.
•4: Unpause (finish_traceback) - CPU orders the coprocessor to restart at the beginning of traceback
and complete decode.
•Stop - CPU orders the coprocessor to reset. The coprocessor resets all VCP2 registers.
12.1 Debugging Features
Visibility into the internal operation of the VCP2 (i.e., the state metric accumulation, traceback memory) is
available to the CPU via a pause command. However, since the pause command is not synchronized with
the internal VCP2 state machine but is rather sent from the CPU at a random moment in time, this feature
is of limited use.
The pause command on the VCP2 is augmented to provide visibility into VCP2 operation on a sliding
window basis. Instead of using the normal start command which tells the VCP2 to perform a complete
decode of one frame (including input/output transfers via EDMA3), halt at beginning of traceback and
resume until next traceback commands are used, and the internal VCP2 memories can be inspected at
various points in the decoding process. The procedure for using this command is as follows:
•VCP2 configuration and branch metrics are prepared
•A halt at beginning of traceback command is sent
•The VCP2 generates necessary interrupts to the EDMA3 to transfer input configuration and to start
transferring branch metrics. The VCP2 performs state metric accumulation as branch metrics become
available. When it reaches the end of the first sliding window (i.e., the reliability portion and the
convergence portion), the VCP2 halts.
•The CPU polls the VCP2 status register until the VCP2 state changes from running to paused. At that
point, the state metrics memory can be inspected, as well as the traceback memory. To perform an
inspection, halt the CPU via a software breakpoint set at an appropriate point in the code, for instance.
Then, the memory can be inspected visually via the debugger GUI, or, alternatively, the CPU can copy
the relevant internal VCP2 memories to another location for later analysis.
– The CPU sends the resume until next traceback command to the VCP2.
– The VCP2 performs the traceback, generates a portion of hard or soft decisions, and continues with
state metric accumulation until the end of the next sliding window (i.e., another number of R stages,
where R is the reliability length).
– The process continues until the decoding is complete. Alternatively, the decoding process can be
run to completion after any sliding window by sending the resume to completion command instead
of the resume until next traceback command.
Operational Modes
SPRUE09E–May 2006–Revised December 2009TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2
When the coprocessor detects an error, the coprocessor sets the status and error words, then sends an
interrupt to the CPU. Any coprocessor processing is paused and the DSP must reset or start the
coprocessor. An error occurs if the VCP2 receives an invalid value in the input configuration parameters. If
an error is detected, the VCPERR bit field is set accordingly, the ERR bit in VCPSTAT0 is set, the
VCP2_INT interrupt is generated, and no processing is engaged. The only way to restart the VCP2 is to
read VCPERR and send another START command. VCP2_INT has an interrupt selector value of 32. For
details on how to set up interrupts, see the TMS320C6000 DSP Interrupt Selector Reference Guide
(SPRU646).
The status registers are provided for debugging purposes and are best used when either the processor is
halted or the VCP2 is halted. If an error occurs, the VCP2 is halted and a VCP2_INT interrupt is generated
that can be mapped to a CPU interrupt. There may be cases where you would want to view the status
registers when the VCP2 is still running. One such case is when the VCP2 seems to have taken a long
time in processing the current frame. In such cases, a watchdog timer should be used and set according
to the frame length and VCP2 configuration, in addition to some overhead to allow for EDMA3 usage.
www.ti.com
46
TMS320TCI648x/9x Viterbi-Decoder Coprocessor 2SPRUE09E–May 2006–Revised December 2009