ST AN910 Application note

AN910

APPLICATION NOTE

ST7 AND ST9 PERFORMANCE BENCHMARKING

INTRODUCTION

STMicroelectronics has developed a set of test routines related to 8-bit and low-end 16-bit microcontroller applications to evaluate computing performance and interrupt processing performance of microcontroller cores. These routines have been implemented on ST7 and ST9 Microcontroller Units (MCUs) as well as several MCUs available on the market.

The routines have been written in assembler language to optimize their implementation and focus on core performance, without being dependent upon compiler code transformation.

For each test, the two parameters of interest are execution time and code size. Timings have been either measured whenever possible, or theoretically calculated when there was no other alternative. In most cases, programs have really run and execution times have actually been measured, so that assembly sources should not contain implementation errors and results can be considered as correct and reliable.

The results of this study point out the capability of the ST9+ to compete with 16-bit MCUs on 8-bit and low-end 16-bit applications and confirms its position of high-end 8/16-bit MCU. It also confirms the ST7 as an outstanding 8-bit MCU.

The first four sections provide synthetical information:

1.

Overview of the Test Routines

on page 2

2.

Overview of the MCU cores

on page 3

3.

Benchmark results

on page 4

4.

Result analysis

on page 11

More detailed information is provided in the appendixes:

5.

Description of MCU work environments

on page 17

6.

Complete numerical results

on page 21

7.

MCU Core architecture analysis

on page 25

8.

Description of the test routines

on page 43

9.

Measurement proceeding and calculation

on page 46

Rev. 2.0

AN910/1104

1/51

1

ST7 AND ST9 PERFORMANCE BENCHMARKING

1 OVERVIEW OF THE TEST ROUTINES

Eleven different test routines have been implemented in assembler language.

The first ten routines are oriented at measuring core computing performance. They are based on known algorithms and represent currently used operations in 8-bit and low-end 16bit applications. They mix bit, 8-bit and 16-bit operations as many applications do.

This set of tests is described in Table 1.

Table 1. Test routine overview

Abbreviated name

Full name

Description

Features stressed

sieve

Eratosthenes sieve

find prime numbers ≥ 3 out of

16-bit data computation

8189 elements

bit manipulation

 

 

 

 

 

 

acker(m,n)1)

 

make recursive function calls

function calls

Ackermann function

number of calls depending upon

stack use

 

 

two parameters (m,n)

 

string

String search

search a 16-byte string in a 128-

8-bit data block manipulation

character array

string manipulation

 

 

 

 

 

 

char

Character search

search a byte in a 40-byte array

8-bit data manipulation

char manipulation

 

 

 

 

 

 

 

bubble(n)2)

Bubble sort

sort of a one-dimension array of

16-bit data manipulation

n 16-bit integers

integer manipulation

 

 

 

 

 

 

blkmov(n)3)

Block move

move a n-byte block from a

8-bit data block manipulation

place in memory to another

block move

 

 

 

 

 

 

convert

Block translation

translate a 121-byte block in a

8-bit data manipulation

different format

use of a lookup table

 

 

 

 

 

 

16mul

16-bit integer multiplication

multiplication of two unsigned

16-bit data computation

words giving a 32-bit result

integer manipulation

 

 

 

 

shright

16-bit value right shift

shift a 16-bit value five places to

16-bit data manipulation

the right

bit manipulation

 

 

 

 

 

 

bitsrt

Bit manipulation

set, reset, and test of 3 bits in a

bit computation

128-bit array

bit and 8-bit data manipulation

 

 

1)The couple of values used are (m,n)=(3,5) and (m,n)=(3,6)

2)The values used are n=10 (words) and n=600 (words)

3)The values used are n=64 (bytes) and n=512 (bytes)

Another test routine handling a timer interrupt has been used to measure core interrupt processing performance:

Abbreviated name

Full name

Description

Features stressed

standard timer input capture or/

interrupt Timer interrupt and output compare interrupt interrupt processing service routine

A more precise description of the test routines is available in section 8.

2/51

2

ST7 AND ST9 PERFORMANCE BENCHMARKING

2 OVERVIEW OF THE MCU CORES

The set of MCUs evaluated is composed of various 8-bit, 8/16-bit, and 16-bit microcontrollers with accumulator, register file or mixed architectures.

Table 2 is an overview of the MCU cores.

Table 2. MCU cores overview

MCU name

Architecture

Short core description

Freq1)

80C51XA

16-bit;

eXtended Architecture (XA) of 80C51’s - upward compatible

 

8/16-bit register bus - 16-bit data/program memory buses

20 MHz

PHILIPS

register file

register file programming model with sixteen 16-bit banked registers

 

 

 

 

 

 

 

 

68HC16

16-bit;

core architecture superset of 68HC11’s - upward compatible

 

two

accumulator programming model with two 16-bit accumulators, and

16 MHz

MOTOROLA

accumulators

three 16-bit index registers (all with 4-bit extensions)

 

 

 

 

 

 

 

68HC12

16-bit;

instruction set is superset of 68HC11’s - upward compatible

 

two

8 MHz

MOTOROLA

programming model identical to 68HC11’s

accumulators

 

 

 

 

 

 

 

 

ST9+

8/16-bit;

evolution of the ST9

 

enhanced clock speed, instruction cycle time

25 MHz

STMicroelectronics

register file

enlarged memory space

 

 

 

 

 

 

 

 

ST9

 

8/16-bit architecture; 8-bit register bus - 16-bit memory bus

 

8/16-bit;

register file programming model with 14 groups of sixteen 8-bit

12 MHz

STMicroelectronics

register file

registers, useable as 16-bit registers

 

 

 

modular paged registers for access to peripheral registers

 

 

 

 

 

H8/300

8/16-bit;

RISC-like architecture and instruction set

10 MHz

HITACHI

register file

register file programming model with sixteen 8-bit registers

 

 

 

 

 

68HC11

8-bit;

market standard 8-bit MCU

 

two

accumulator programming model with two 8-bit accumulators or

4 MHz

MOTOROLA

accumulators

one 16-bit accumulator, and two 16-bit index registers

 

 

 

 

 

 

 

68HC08

 

superset of the 68HC05 - upward compatible

 

8-bit;

enhanced performance and instruction set

8 MHz

MOTOROLA

accumulator

accumulator programming model with one 8-bit accumulator, and

 

 

 

one 16-bit index register

 

 

 

 

 

ST7

8-bit;

upward compatible with the 68HC05

4 MHz

accumulator programming model with one 8-bit accumulator, and

STMicroelectronics

accumulator

8 MHz

two 8-bit index registers

 

 

 

 

 

 

 

80C51

8-bit; register file

mixed accumulator and register file programming model with four

 

banks of eight 8-bit registers (include accumulator), and a 16-bit

20 MHz

INTEL, PHILIPS...

and accumulator

data pointer

 

 

 

 

 

 

 

 

KS88

8-bit;

core architecture superset of SUPER8’s; 8-bit register bus

 

register file programming model with 192 8-bit prime data registers,

8 MHz

SAMSUNG

register file

and two register sets with system/peripheral/data registers

 

 

 

 

 

 

 

 

78K0

8-bit; register file

mixed accumulator and register file programming model with four

10 MHz

NEC

and accumulator

banks of eight 8-bit or four 16-bit registers (include accumulator)

 

1)As the goal is to obtain the best of each MCU core, the maximum internal frequency (Freq) available, for each MCU, on development board has been used (unless other specified). Note that results are directly proportional to this frequency.

A description of the MCU work environments is available in section 5.

3/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

3 BENCHMARK RESULTS

3.1 CORE COMPUTING PERFORMANCE

The two following charts show benchmark results for computing performance. Execution time and code size are presented as global ratios taken the ST9+ as reference.

Preliminary ratios have been calculated for each test. Using those results, a global execution time ratio and a global code size ratio have been calculated as an average of all ratios. As all the tests could not have been implemented on all MCUs (see <Italic>9.2.2 Memory considerations<Italic end>), one or two different results are presented for each MCU. The first one, available for all the MCUs, has been calculated with the reduced set of tests performed on all the MCUs. The second one, only available for some MCUs, has been calculated with the full set of tests.

Refer to section 6 for complete results. Refer to section 9 for measurement proceeding and calculation description.

Figure 1. presents execution time ratios and Figure 2. shows code size ratios.

Notes: The reduced set of tests is composed of:

string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst

The full set of tests is composed of:

string, char, bubble(10 words), blkmov(64 bytes), convert, 16mul, shright, bitrst, sieve, acker(3,5), acker(3,6), bubble(600 words), blkmov(512 bytes)

The 80C51 results are preliminary results. They may change in later versions.

4/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

as reference)

performance

(ST9+

best

Figure 1. Computing performance global execution time ratios

 

16-bitMCUs

8/16-bitMCUs

8-bitMCUs

 

 

 

5/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

reference)

best density

Figure 2. Computing performance global code size ratios (ST9+ as

 

16-bitMCUs

8/16-bitMCUs

8-bitMCUs

 

 

 

6/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

3.2 CORE INTERRUPT PROCESSING PERFORMANCE

The three following charts show benchmark results for interrupt processing performance. Execution time results are presented as time values (in microseconds), and also as ratios taken the ST9+ as reference. Code size results are presented as ratios taken the ST9+ as reference.

Refer to section 6 for complete results and details on calculation.

Figure 3. presents execution time results in microseconds, showing interrupt latency & return time.

Figure 4. presents execution time ratios, and Figure 5. presents code size ratios.

7/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

values

best performance

Figure 3. Interrupt processing performance execution time

 

16-bitMCUs

8/16-bitMCUs

8-bitMCUs

 

 

 

8/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

(ST9+ as reference)

best performance

Figure 4. Interrupt processing performance execution time ratios

 

16-bitMCUs

8/16-bitMCUs

8-bitMCUs

 

 

 

9/51

ST AN910 Application note

ST7 AND ST9 PERFORMANCE BENCHMARKING

as reference)

best density

Figure 5. Interrupt processing performance code size ratios (ST9+

 

16-bitMCUs

8/16-bitMCUs

8-bitMCUs

 

 

 

10/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

4 RESULT ANALYSIS

This section is an analysis of computing performance and interrupt processing performance results (for execution time and code size). Based on core architecture analysis (see section 7), two comparisons are presented, pointing out the strong and weak points of each MCU. The first concerns the high-end to medium-end MCUs versus ST9+. The second concerns the medium-end to low-end MCUs versus ST7.

4.1 PRELIMINARY REMARK

Results show that the two different ratios, for execution time and code size, calculated with full and reduced sets of tests, are in fact not very different. In most cases, the classification of the MCUs is kept. Thus we can consider that the reduced set is sufficient to make the MCU core comparison.

4.2 HIGH-END TO MEDIUM-END MCU ANALYSIS VERSUS ST9+

The Table 3 presents the strong and the weak points for high-end to medium-end MCUs, compared to the ST9+ MCU.

Notes: ICT means Instruction Cycle Time and IL means Instruction Length.

Refer to paragraph <Italic>7.2.2 Average ICT/CPI and IL<Italic end> for details on calculation.

Refer to paragraph <Italic>7.3.4 ST9+ MCU core<Italic end> to see the main characteristics of the ST9+ MCU core.

4.2.1 Computing performance results

Regarding speed, the ST9+ MCU ranks at the top of 8/16-bit MCUs. This new version of the ST9 has been improved on several points, including clock per instruction and clock speed. These enhancements have considerably reduced its instruction cycle time. A large and powerful register file organized in groups allow the ST9+ to perform strong computation

(with many registers), have an easy access to peripheral and i/o port registers (with paged registers), and manage multitasking (with register group pointers). Addressing modes like register pair, register indirect with pre/post-increment, and indexed give the ST9+ the ability to perform 16-bit data computation and manipulation, easily manipulate tables and move blocks. A new memory management unit enlarges the memory space up to 4 Mbytes. New instructions have been added to handle this new space and improve the C-language support.

11/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

Concerning code efficiency, the position of the ST9+ MCU is also among the best MCUs. The 16-bit MCUs are only a little better, although favoured by their true 16-bit computing and data manipulation instructions. In the 8/16-bit MCUs, the H8/300 takes a little advantage due to its special block move instruction. But all 8-bit MCUs, even with shorter instruction lengths, have longer code size results.

4.2.2 Interrupt processing performance results

Regarding speed, the ST9+ MCU ranks at the first position. The value chart shows that it has the shortest interrupt latency but also an interrupt routine execution time which is among the best. These results show that its interruption management and instruction cycle time have been considerably enhanced. The register groups bring in addition fast context switching capabilities.

Some 8-bit MCUs, such as the 68HC08, work quite well in this test. But their performance must be moderated because such MCUs can manage only one interrupt at the time and so cast off a complex arbitration phase. The interrupt management of the ST9+ is one of the more advanced, allowing nested interrupts with full software programmable priorities and program priority level control.

Code efficiency results for interrupt processing performance are not really significant. The code represents only a very small part of an entire interrupt service routine, and so no conclusion can be made.

4.2.3 Conclusion

Global results and all its characteristics allow the ST9+ to compete with the true 16-bit MCUs on 8-bit and low-end 16-bit applications, and confirm its position of high-end 8/16-bit MCU.

12/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

Table 3. High-end to low-end MCU strong and weak points

MCU

Strong points

Weak points

 

 

 

 

 

 

instruction processing:

7-byte prefetch queue

address alignment:

even jump/branch address

 

 

predecoding

 

even word operand address

 

fast 8/16-bit ALU:

16-bit datapath

 

NOP instructions in assembly

 

 

600 ns 8x8 multiplication

 

code

 

short average ICT:

250 to 300 ns

lacking addr. modes:

no indexed addressing

 

special addr. modes:

indirect with 8/16 offset or

 

 

80C51XA

 

auto-increment

 

 

special instructions:

compare & branch like

 

 

(20 MHz)

 

 

 

 

 

 

 

 

decrement & branch like

 

 

 

 

memory-to-memory moves

 

 

 

multitasking:

context switching capabilities

 

 

 

large memory space:

up to 16 Mbytes

 

 

 

interrupt processing:

nested mode

 

 

 

 

4-bit program priority register

 

 

 

 

programmable priority levels

 

 

 

 

 

 

 

 

instruction processing:

3-stage prefetch queue

address alignment:

performance penalty if odd

 

 

predecoding

 

word operand addresses

 

fast 8/16/32-bit ALU:

16-bit datapath

instruction lengths:

only even

 

 

625 ns 8x8 multiplication

lacking addr. modes:

no direct addressing

 

short average ICT:

375 to 440 ns

lacking instructions:

index register manipulation

 

special addr. modes:

post-modified indexed

 

compare & branch like

68HC16

 

with 8-bitoffset

 

decrement & branch like

special instructions:

memory-to-memory moves

 

 

(16 MHz)

 

 

 

 

 

 

 

multitasking:

context switching capabilities

 

 

 

large memory space:

up to 1 Mbyte

 

 

 

 

up to 16 Mbytes with memory

 

 

 

 

expansion module

 

 

 

interrupt processing:

nested mode

 

 

 

 

3-bit program priority register

 

 

 

 

programmable priority levels

 

 

 

 

 

 

 

 

instruction processing:

2-stage prefetch queue

multitasking:

need memory expansion

 

 

predecoding

 

module

 

fast 8/16-bit ALU:

20-bit datapath

interrupt processing:

one interrupt at a time

 

 

375 ns 8x8 multiplication

 

recommended

68HC12

short average ICT:

375 to 500 ns

 

no program priority register

special addr. modes:

auto-incr/decrement indexed

 

hardware fixed priorities

(8 MHz)

 

accumulator offset indexed

 

 

 

special instructions:

memory-to-memory moves

 

 

 

 

incr/decrement & branch like

 

 

 

 

test & branch like

 

 

 

large memory space:

up to 4 Mbytes with memory

 

 

 

 

expansion module

 

 

 

 

 

 

 

 

 

 

 

 

 

instruction encoding:

risc-like encoding

instruction processing:

standard (no prefetch)

 

short average IL:

2 to 3 bytes

medium 8/16-bit ALU:

1400 ns 8x8 multiplication

 

special addr. modes:

register indirect, 16-bit offset

medium average ICT:

500 to 600 ns

 

 

or pre/post-increment

lacking instructions:

16-bit shifts/rotations

H8/300

special instructions:

block moves

 

compare & branch like

 

 

 

decrement & branch like

(10 MHz)

 

 

multitasking:

no special capabilities

 

 

 

memory space:

64 kbytes

 

 

 

interrupt processing:

one interrupt at a time

 

 

 

 

recommended

 

 

 

 

no program priority register

 

 

 

 

hardware fixed priorities

 

 

 

 

 

 

 

 

 

 

 

 

 

 

13/51

 

 

 

 

 

ST7 AND ST9 PERFORMANCE BENCHMARKING

Table 3. High-end to low-end MCU strong and weak points (cont’d)

MCU

Strong points

Weak points

 

 

 

 

 

 

 

instruction processing: standard (no prefetch)

 

 

 

medium 8/16-bit ALU:

2500 ns 8x8 multiplication

 

 

 

long average ICT:

1500 to 1750 ns

 

 

 

lacking instructions:

compare & branch like

68HC11

 

 

 

decrement & branch like

 

 

multitasking:

no special capabilities

(4 MHz)

 

 

 

 

memory space:

64 kbytes

 

 

 

 

 

 

interrupt processing:

one interrupt at a time

 

 

 

 

recommended

 

 

 

 

no program priority register

 

 

 

 

hardware fixed priorities

 

 

 

 

 

 

instruction processing:

1-byte prefetch queue

medium average ICT:

500 to 625 ns

 

fast 8-bit ALU:

8-bit datapath

lacking addr. modes:

no indirect addressing

 

 

625 ns 8x8 multiplication

multitasking:

no special capabilities

68HC08

special addr. modes:

indexed with 8-bit offset or

interrupt processing:

one interrupt at a time

 

post-increment

 

recommended

(8 MHz)

special instructions:

memory-to-memory moves

 

no program priority register

 

 

compare & branch like

 

hardware fixed priorities

 

 

decrement & branch like

 

 

 

large memory space:

up to 4 Mbytes with memory

 

 

 

 

expansion module

 

 

 

 

 

 

 

14/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

4.3 MEDIUM-END TO LOW-END MCU ANALYSIS VERSUS ST7

The Table 4 presents the strong and the weak points for medium-end to low-end MCUs, compared to the ST7 MCU.

Notes: ICT means Instruction Cycle Time and IL means Instruction Length.

Refer to paragraph <Italic>7.2.2 Average ICT/CPI and IL<Italic end> for details on calculation.

Refer to paragraph <Italic>7.3.9 ST7 MCU core<Italic end> to see the main characteristics of the ST7 MCU core.

4.3.1 Computing performance results

Regarding speed, the ST7 MCU takes the second position just below the newly arrived 68HC08. With no prefetch mechanism, it comes even so ahead of all the other MCUs. A short clock per instruction added to a standard frequency explains its short instruction cycle time and its advantageous position. The two index registers and the indirect addressing mode allow the ST7 to easily perform data manipulation like table manipulation and block move. A direct addressing mode in a 256-byte zero page give a rapid access to important data and peripheral registers.

Concerning code efficiency, the ST7 MCU ranks among the 8-bit MCUs, very closely above the 68HC08. A standard instruction length explains its average position.

4.3.2 Interrupt processing performance results

Regarding speed, the ST7 MCU ranks very close to the 68HC08. A longer instruction cycle time explains this tiny gap. The strong point of its interrupt management is the automatic stacking of the cpu state, accumulator and index register. This process eliminates software stacking, and so saves time and space.

Code efficiency results for interrupt processing performance are not really significant. The code represents only a very small part of an entire interrupt service routine, and so no conclusion can be made.

4.3.3 Conclusion

Global results and all its characteristics confirm the ST7 as an outstanding 8-bit MCU.

15/51

ST7 AND ST9 PERFORMANCE BENCHMARKING

Table 4. Medium-end to low-end MCU strong and weak points

MCU

Strong points

Weak points

 

 

 

 

 

 

 

 

medium 8/16-bit ALU:

2500 ns 8x8 multiplication

68HC11

 

 

long average ICT:

1500 to 1750 ns

 

 

lacking instructions:

compare & branch like

(4 MHz)

 

 

 

 

 

decrement & branch like

 

 

 

 

 

 

 

multitasking:

no special capabilities

 

 

 

 

 

 

instruction processing:

1-byte prefetch queue

lacking addr. modes:

no indirect addressing

 

fast 8-bit ALU:

8-bit datapath

multitasking:

no special capabilities

 

 

625 ns 8x8 multiplication

 

 

 

short average ICT:

500 to 625 ns

 

 

68HC08

special addr. modes:

indexed with 8-bit offset or

 

 

 

post-increment

 

 

(8 MHz)

 

 

 

special instructions:

compare & branch like

 

 

 

 

 

 

 

decrement & branch like

 

 

 

 

memory-to-memory moves

 

 

 

large memory space:

up to 4 Mbytes with memory

 

 

 

 

expansion module

 

 

 

 

 

 

 

 

short average IL:

1 to 2 bytes

slow 8-bit ALU:

2400 ns 8x8 multiplication

 

special addr. modes:

register indirect

long average ICT:

900 to 1000 ns

 

 

stack pointer relative

 

 

80C51

special instructions:

compare & branch like

 

 

(20 MHz)

 

decrement & branch like

 

 

 

 

bit test & bit clear & jump

 

 

 

 

memory-to-memory moves

 

 

 

multitasking:

context switching capabilities

 

 

 

 

 

 

 

 

special addr. modes:

register pair

slow 8-bit ALU:

3000 ns 8x8 multiplication

 

 

indirect register/address

long average ICT:

1250 to 1500 ns

 

 

indexed (short/long)

data memory location:

off-chip only

KS88

special instructions:

compare & increment &

 

 

 

branch like

 

 

(8 MHz)

 

 

 

 

decrement & branch like

 

 

 

 

 

 

 

multitasking:

context switching capabilities

 

 

 

interrupt processing:

nested mode

 

 

 

 

level priority control register

 

 

 

 

 

 

 

 

special addr. modes:

register indirect

mixed architecture:

only accumulator oriented

78K0

 

stack pointer relative

slow 8-bit ALU:

3200 ns 8x8 multiplication

 

indexed with 8-bit offset

long average ICT:

1400 to 1600 ns

(10 MHz)

 

special instructions:

decrement & branch like

 

 

 

multitasking:

context switching capabilities

 

 

 

 

 

 

 

16/51

Loading...
+ 35 hidden pages