STMicroelectronics has developed a set of test routines related to 8-bit and low-end 16-bit
microcontroller applications to evaluate computing performance and interrupt processing performance of microcontroller cores. These routines have been implemented on ST7 and
ST9 Microcontroller Units (MCUs) as well as several MCUs available on the market.
The routines have been written in assembler language to optimize their implementation and
focus on core performance, without being dependent upon compiler code transformation.
For each test, the two parameters of interest are execution time and code size. Timings have
been either measured whenever possible, or theoretically calculated when there was no other
alternative. In most cases, programs have really run and execution times have actually been
measured, so that assembly sources should not contain implementation errors and results can
be considered as correct and reliable.
The results of this study point out the capability of the ST9+ to compete with 16-bit MCUs on
8-bit and low-end 16-bit applications and confirms its position of high-end 8/16-bit MCU. It
also confirms the ST7 as an outstanding 8-bit MCU.
The first four sections provide synthetical information:
1. Overview of the Test Routineson page 2
2. Overview of the MCU coreson page 3
3. Benchmark resultson page 4
4. Result analysison page 11
More detailed information is provided in the appendixes:
5. Description of MCU work environmentson page 17
6. Complete numerical resultson page 21
7. MCU Core architecture analysison page 25
8. Description of the test routineson page 43
9. Measurement proceeding and calculationon page 46
Rev. 2.0
AN910/11041/51
1
ST7 AND ST9 PERFORMANCE BENCHMARKING
1 OVERVIEW OF THE TEST ROUTINES
Eleven different test routines have been implemented in assembler language.
The first ten routines are oriented at measuring core computing performance. They are
based on known algorithms and represent currently used operations in 8-bit and low-end 16bit applications. They mix bit, 8-bit and 16-bit operations as many applications do.
A more precise description of the test routines is available in section 8.
2/51
2
ST7 AND ST9 PERFORMANCE BENCHMARKING
2 OVERVIEW OF THE MCU CORES
The set of MCUs evaluated is composed of various 8-bit, 8/16-bit, and 16-bit
microcontrollers with accumulator, register file or mixed architectures.
Table 2 is an overview of the MCU cores.
Table 2. MCU cores overview
MCU nameArchitectureShort core descriptionFreq
80C51XA
PHILIPS
68HC16
MOTOROLA
68HC12
MOTOROLA
ST9+
STMicroelectronics
ST9
STMicroelectronics
H8/300
HITACHI
68HC11
MOTOROLA
68HC08
MOTOROLA
ST7
STMicroelectronics
80C51
INTEL, PHILIPS...
KS88
SAMSUNG
78K0
NEC
1)As the goal is to obtain the best of each MCU core, the maximum internal frequency (Freq) available, for each MCU, on
development board has been used (unless other specified). Note that results are directly proportional to this frequency.
16-bit;
register file
16-bit;
two
accumulators
16-bit;
two
accumulators
8/16-bit;
register file
8/16-bit;
register file
8/16-bit;
register file
8-bit;
two
accumulators
8-bit;
accumulator
8-bit;
accumulator
8-bit; register file
and accumulator
8-bit;
register file
8-bit; register file
and accumulator
eXtended Architecture (XA) of 80C51’s - upward compatible
8/16-bit register bus - 16-bit data/program memory buses
register file programming model with sixteen 16-bit banked registers
core architecture superset of 68HC11’s - upward compatible
accumulator programming model with two 16-bit accumulators, and
three 16-bit index registers (all with 4-bit extensions)
instruction set is superset of 68HC11’s - upward compatible
programming model identical to 68HC11’s
evolution of the ST9
enhanced clock speed, instruction cycle time
enlarged memory space
8/16-bit architecture; 8-bit register bus - 16-bit memory bus
register file programming model with 14 groups of sixteen 8-bit
registers, useable as 16-bit registers
modular paged registers for access to peripheral registers
RISC-like architecture and instruction set
register file programming model with sixteen 8-bit registers
market standard 8-bit MCU
accumulator programming model with two 8-bit accumulators or
one 16-bit accumulator, and two 16-bit index registers
superset of the 68HC05 - upward compatible
enhanced performance and instruction set
accumulator programming model with one 8-bit accumulator, and
one 16-bit index register
upward compatible with the 68HC05
accumulator programming model with one 8-bit accumulator, and
two 8-bit index registers
mixed accumulator and register file programming model with four
banks of eight 8-bit registers (include accumulator), and a 16-bit
data pointer
core architecture superset of SUPER8’s; 8-bit register bus
register file programming model with 192 8-bit prime data registers,
and two register sets with system/peripheral/data registers
mixed accumulator and register file programming model with four
banks of eight 8-bit or four 16-bit registers (include accumulator)
1)
20 MHz
16 MHz
8 MHz
25 MHz
12 MHz
10 MHz
4 MHz
8 MHz
4 MHz
8 MHz
20 MHz
8 MHz
10 MHz
A description of the MCU work environments is available in section 5.
3/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
3 BENCHMARK RESULTS
3.1 CORE COMPUTING PERFORMANCE
The two following charts show benchmark results for computing performance. Execution time
and code size are presented as global ratios taken the ST9+ as reference.
Preliminary ratios have been calculated for each test. Using those results, a global execution
time ratio and a global code size ratio have been calculated as an average of all ratios. As all
the tests could not have been implemented on all MCUs (
considerations<Italic end>), one or two different results are presented for each MCU. The
first one, available for all the MCUs, has been calculated with the reduced set of tests
performed on all the MCUs. The second one, only available for some MCUs, has been
calculated with the full set of tests.
Refer to section 6 for complete results. Refer to section 9 for measurement proceeding and
calculation description.
see <Italic>9.2.2 Memory
Figure 1. presents execution time ratios and Figure 2. shows code size ratios.
The 80C51 results are preliminary results. They may change in later versions.
4/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
best performance
8-bit MCUs16-bit MCUs8/16-bit MCUs
Figure 1. Computing performance global execution time ratios (ST9+ as reference)
5/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
best density
8-bit MCUs16-bit MCUs8/16-bit MCUs
6/51
Figure 2. Computing performance global code size ratios (ST9+ as reference)
ST7 AND ST9 PERFORMANCE BENCHMARKING
3.2 CORE INTERRUPT PROCESSING PERFORMANCE
The three following charts show benchmark results for interrupt processing performance.
Execution time results are presented as time values (in microseconds), and also as ratios
taken the ST9+ as reference. Code size results are presented as ratios taken the ST9+ as reference.
Refer to section 6 for complete results and details on calculation.
Figure 3. presents execution time results in microseconds, showing interrupt latency & return
time.
Figure 4. presents execution time ratios, and Figure 5. presents code size ratios.
7/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
best performance
8-bit MCUs16-bit MCUs8/16-bit MCUs
8/51
Figure 3. Interrupt processing performance execution time values
ST7 AND ST9 PERFORMANCE BENCHMARKING
best performance
8-bit MCUs16-bit MCUs8/16-bit MCUs
Figure 4. Interrupt processing performance execution time ratios (ST9+ as reference)
This section is an analysis of computing performance and interrupt processing
performance results (for execution time and code size). Based on core architecture analysis
section 7), two comparisons are presented, pointing out the strong and weak points of
(see
each MCU. The first concerns the high-end to medium-end MCUs versus ST9+. The
second concerns the medium-end to low-end MCUs versus ST7.
4.1 PRELIMINARY REMARK
Results show that the two different ratios, for execution time and code size, calculated with full
and reduced sets of tests, are in fact not very different. In most cases, the classification of the
MCUs is kept. Thus we can consider that the reduced set is sufficient to make the MCU
core comparison.
4.2 HIGH-END TO MEDIUM-END MCU ANALYSIS VERSUS ST9+
The Table 3 presents the strong and the weak points for high-end to medium-end MCUs,
compared to the ST9+ MCU.
Notes: ICT means Instruction Cycle Time and IL means Instruction Length.
Refer to paragraph <Italic>7.2.2 Average ICT/CPI and IL<Italic end> for details on
calculation.
Refer to paragraph <Italic>7.3.4 ST9+ MCU core<Italic end> to see the main characteristics of
the ST9+ MCU core.
4.2.1 Computing performance results
Regarding speed, the ST9+ MCU ranks at the top of 8/16-bit MCUs. This new version of the
ST9 has been improved on several points, including clock per instruction and clock speed.
These enhancements have considerably reduced its instruction cycle time. A large and powerful register file organized in groups allow the ST9+ to perform strong computation
(with many registers), have an easy access to peripheral and i/o port registers (with paged
registers), and manage multitasking (with register group pointers). Addressing modes like
register pair, register indirect with pre/post-increment, and indexed give the ST9+ the ability to
perform 16-bit data computation and manipulation, easily manipulate tables and move blocks. A new memory management unit enlarges the memory space up to 4 Mbytes. New
instructions have been added to handle this new space and improve the C-language support.
11/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
Concerning code efficiency, the position of the ST9+ MCU is also among the best MCUs.
The 16-bit MCUs are only a little better, although favoured by their true 16-bit computing and
data manipulation instructions. In the 8/16-bit MCUs, the H8/300 takes a little advantage due
to its special block move instruction. But all 8-bit MCUs, even with shorter instruction lengths,
have longer code size results.
4.2.2 Interrupt processing performance results
Regarding speed, the ST9+ MCU ranks at the first position. The value chart shows that it
has the shortest interrupt latency but also an interrupt routine execution time which is
among the best. These results show that its interruption management and instruction cycle
time have been considerably enhanced. The register groups bring in addition fast context switching capabilities.
Some 8-bit MCUs, such as the 68HC08, work quite well in this test. But their performance
must be moderated because such MCUs can manage only one interrupt at the time and so
cast off a complex arbitration phase. The interrupt management of the ST9+ is one of the more advanced, allowing nested interrupts with full software programmable priorities
and program priority level control.
Code efficiency results for interrupt processing performance are not really significant. The
code represents only a very small part of an entire interrupt service routine, and so no
conclusion can be made.
4.2.3 Conclusion
Global results and all its characteristics allow the ST9+ to compete with the true 16-bit
MCUs on 8-bit and low-end 16-bit applications, and confirm its position of high-end 8/16-bit
MCU.
12/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
Table 3. High-end to low-end MCU strong and weak points
MCUStrong pointsWeak points
7-byte prefetch queue
predecoding
16-bit datapath
600 ns 8x8 multiplication
250 to 300 ns
indirect with 8/16 offset or
auto-increment
compare & branch like
decrement & branch like
memory-to-memory moves
context switching capabilities
up to 16 Mbytes
nested mode
4-bit program priority register
programmable priority levels
2-stage prefetch queue
predecoding
20-bit datapath
375 ns 8x8 multiplication
375 to 500 ns
auto-incr/decrement indexed
accumulator offset indexed
memory-to-memory moves
incr/decrement & branch like
test & branch like
up to 4 Mbytes with memory
expansion module
risc-like encoding
2 to 3 bytes
register indirect, 16-bit offset
instruction processing:
medium 8/16-bit ALU:
medium average ICT:
lacking instructions:
multitasking:
memory space:
interrupt processing:
80C51XA
(20 MHz)
68HC16
(16 MHz)
68HC12
(8 MHz)
H8/300
(10 MHz)
instruction processing:
fast 8/16-bit ALU:
short average ICT:
special addr. modes:
special instructions:
multitasking:
large memory space:
interrupt processing:
instruction processing:
fast 8/16/32-bit ALU:
short average ICT:
special addr. modes:
special instructions:
multitasking:
large memory space:
interrupt processing:
instruction processing:
fast 8/16-bit ALU:
short average ICT:
special addr. modes:
special instructions:
large memory space:
instruction encoding:
short average IL:
special addr. modes:
special instructions:
even jump/branch address
even word operand address
NOP instructions in assembly
code
no indexed addressing
performance penalty if odd
word operand addresses
only even
no direct addressing
index register manipulation
compare & branch like
decrement & branch like
need memory expansion
module
one interrupt at a time
recommended
no program priority register
hardware fixed priorities
standard (no prefetch)
1400 ns 8x8 multiplication
500 to 600 ns
16-bit shifts/rotations
compare & branch like
decrement & branch like
no special capabilities
64 kbytes
one interrupt at a time
recommended
no program priority register
hardware fixed priorities
13/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
Table 3. High-end to low-end MCU strong and weak points (cont’d)
MCUStrong pointsWeak points
instruction processing:
medium 8/16-bit ALU:
long average ICT:
lacking instructions:
68HC11
(4 MHz)
68HC08
(8 MHz)
instruction processing:
fast 8-bit ALU:
special addr. modes:
special instructions:
large memory space:
1-byte prefetch queue
8-bit datapath
625 ns 8x8 multiplication
indexed with 8-bit offset or
post-increment
memory-to-memory moves
compare & branch like
decrement & branch like
up to 4 Mbytes with memory
expansion module
multitasking:
memory space:
interrupt processing:
medium average ICT:
lacking addr. modes:
multitasking:
interrupt processing:
standard (no prefetch)
2500 ns 8x8 multiplication
1500 to 1750 ns
compare & branch like
decrement & branch like
no special capabilities
64 kbytes
one interrupt at a time
recommended
no program priority register
hardware fixed priorities
500 to 625 ns
no indirect addressing
no special capabilities
one interrupt at a time
recommended
no program priority register
hardware fixed priorities
14/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
4.3 MEDIUM-END TO LOW-END MCU ANALYSIS VERSUS ST7
The Table 4 presents the strong and the weak points for medium-end to low-end MCUs,
compared to the ST7 MCU.
Notes: ICT means Instruction Cycle Time and IL means Instruction Length.
Refer to paragraph <Italic>7.2.2 Average ICT/CPI and IL<Italic end> for details on
calculation.
Refer to paragraph <Italic>7.3.9 ST7 MCU core<Italic end> to see the main characteristics of
the ST7 MCU core.
4.3.1 Computing performance results
Regarding speed, the ST7 MCU takes the second position just below the newly arrived
68HC08. With no prefetch mechanism, it comes even so ahead of all the other MCUs. A short clock per instruction added to a standard frequency explains its short instruction cycle time
and its advantageous position. The two index registers and the indirect addressing mode
allow the ST7 to easily perform data manipulation like table manipulation and block move.
A direct addressing mode in a 256-byte zero page give a rapid access to important data and peripheral registers.
Concerning code efficiency, the ST7 MCU ranks among the 8-bit MCUs, very closely above
the 68HC08. A standard instruction length explains its average position.
4.3.2 Interrupt processing performance results
Regarding speed, the ST7 MCU ranks very close to the 68HC08. A longer instruction cycle
time explains this tiny gap. The strong point of its interrupt management is the automatic stacking of the cpu state, accumulator and index register. This process eliminates software
stacking, and so saves time and space.
Code efficiency results for interrupt processing performance are not really significant. The
code represents only a very small part of an entire interrupt service routine, and so no
conclusion can be made.
4.3.3 Conclusion
Global results and all its characteristics confirm the ST7 as an outstanding 8-bit MCU.
15/51
ST7 AND ST9 PERFORMANCE BENCHMARKING
Table 4. Medium-end to low-end MCU strong and weak points
MCUStrong pointsWeak points
medium 8/16-bit ALU:
68HC11
(4 MHz)
68HC08
(8 MHz)
80C51
(20 MHz)
KS88
(8 MHz)
78K0
(10 MHz)
instruction processing:
fast 8-bit ALU:
short average ICT:
special addr. modes:
special instructions:
large memory space:
short average IL:
special addr. modes:
special instructions:
multitasking:
special addr. modes:
special instructions:
multitasking:
interrupt processing:
special addr. modes:
special instructions:
multitasking:
1-byte prefetch queue
8-bit datapath
625 ns 8x8 multiplication
500 to 625 ns
indexed with 8-bit offset or
post-increment
compare & branch like
decrement & branch like
memory-to-memory moves
up to 4 Mbytes with memory
expansion module
1 to 2 bytes
register indirect
stack pointer relative
compare & branch like
decrement & branch like
bit test & bit clear & jump
memory-to-memory moves
context switching capabilities