The STM8 family of HCMOS microcontrollers is designed and built around an enhanced
industry standard 8-bit core and a library of peripheral blocks, which include ROM, Flash,
RAM, EEPROM, I/O, Serial Interfaces (SPI, USART, I2C,...), 16-bit Timers, A/D converters,
comparators, power supervisors etc. These blocks may be assembled in various
combinations in order to provide cost-effective solutions for application-specific products.
The STM8 family forms a part of the STMicroelectronics 8-bit MCU product line, which finds
its place in a wide variety of applications such as automotive systems, remote controls,
video monitors, car radio and numerous other consumer, industrial, telecom, and multimedia
products.
The 8-bit STM8 Core is designed for high code efficiency. It contains 6 internal registers, 20
addressing modes and 80 instructions. The 6 internal registers include two 16-bit Index
registers, an 8-bit Accumulator, a 24-bit Program Counter, a 16-bit Stack Pointer and an 8-
bit Condition Code register. The two Index registers X and Y enable Indexed Addressing
modes with or without offset, along with read-modify-write type data manipulation. These
registers simplify branching routines and data/arrays modifications.
The 24-bit Program Counter is able to address up to 16-Mbyte of RAM, ROM or Flash
memory. The 16-bit Stack Pointer provides access to a 64K-level Stack. The Core also
includes a Condition Code register providing 7 Condition flags that indicate the result of the
last instruction executed.
The 20 Addressing modes, including Indirect Relative and Indexed addressing, allow
sophisticated branching routines or CASE-type functions. The Indexed Indirect Addressing
mode, for instance, permits look-up tables to be located anywhere in the address space,
thus enabling very flexible programming and compact C-based code. The stack pointer
relative addressing mode permits optimized C compiler stack model for local variables and
parameter passing.
The Instruction Set is 8-bit oriented with a 2-byte average instruction size. This Instruction
Set offers, in addition to standard data movement and logic/arithmetic functions, 8-bit by 8-
bit multiplication, 16-bit by 8-bit and 16-bit by 16-bit division, bit manipulation, data transfer
between Stack and Accumulator (Push / Pop) with direct stack access, as well as data
transfer using the X and Y registers or direct memory-to-memory transfers.
The number of Interrupt vectors can vary up to 32, and the interrupt priority level may be
managed by software providing hardware controlled nested capability. Some peripherals
include Direct Memory Access (DMA) between serial interfaces and memory. Support for
slow memories allows easy external code execution through serial or parallel interface
(ROMLESS products for instance).
The STM8 has a high energy-efficient architecture, based on a Harvard architecture and
pipelined execution. A 32-bit wide program memory bus allows most of the instructions to be
fetched in 1 CPU cycle. Moreover, as the average instruction length is 2 bytes, this allows for
a reduction in the power consumption by only accessing the program memory half of the
time, on average. The pipelined execution allowed the execution time to be minimized,
ensuring high system performance, when needed, together with the possibility to reduce the
overall energy consumption, by using different power saving operating modes. Power-saving
can be managed under program control by placing the device in SLOW, WAIT, SLOW-WAIT,
ACTIVE-HALT or HALT mode (see product datasheet for more details).
Doc ID 13590 Rev 39/162
STM8 architecturePM0044
Additional blocks
The additional blocks take the form of integrated hardware peripherals arranged around the
central processor core. The following (non-exhaustive) list details the features of some of the
currently available blocks:
Boot ROM Memory area containing the bootloader code
Flash Flash-based devices
RAM Sizes up to several Kbytes
Data EEPROM
Timers
A/D converter
I2C
SPI
USART
Watchdog
I/O ports
Sizes up to several Kbytes. Erase/programming operations do not require
additional external power sources.
Different versions based on 8/16-bit free running or autoreload timer/counter are
available. They can be coupled with either input captures, output compares or
PWM facilities. PWM functions can have software programmable duty cycle
between 0% to 100% in up to 256/65536 steps. The outputs can be filtered to
provide D/A conversion.
The Analog to Digital Converter uses a sample and hold technique. It has 12-bit
resolution.
Multi/master, single master, single slave modes, DMA or 1byte transfer, standard
and fast I2C modes, 7 and 10-bit addressing.
The Serial peripheral Interface is a fully synchronous 3/4 wire interface ideal for
Master and Slave applications such as driving devices with input shift register
(LCD driver, external memory,...).
The USART is a fast synchronous/asynchronous interface which features both
duplex transmission, NRZ format, programmable baud rates and standard error
detection. The USART can also emulate RS232 protocol.
It has the ability to induce a full reset of the MCU if its counter counts down to
zero prior to being reset by the software. This feature is especially useful in noisy
applications.
They are programmable by software to act in several input or output
configurations on an individual line basis, including high current and interrupt
generation. The basic block has eight bit lines.
1.1 STM8 development support
The STM8 family of MCUs is supported by a comprehensive range of development tools.
This family presently comprises hardware tools (emulators, programmers), a software
package (assembler-linker, debugger, archiver) and a C-compiler development tool.
STM8 and ST7 CPUs are supported by a single toolchain allowing easy reuse and
portability of the applications between product lines.
10/162Doc ID 13590 Rev 3
PM0044STM8 architecture
1.2 Enhanced STM8 features
●16-Mbyte linear program memory space with 3 FAR instructions (CALLF, RETF, JPF)
●16-Mbyte linear data memory space with 1 FAR instruction (LDF)
●Up to 32 24-bit interrupt vectors with optimized context save management
●16-bit Stack Pointer (SP=SH:S) with stack manipulation instructions and addressing
modes
●New register and memory access instructions (EXG, MOV)
●New arithmetic instructions: DIV 16/8 and DIVW 16/16
●New bit handling instructions (CCF, BCPL, BCCM)
●2 x 16-bit index registers (X=XH:XL, Y=YH:YL). 8-bit data transfers address the low
byte. The high-byte is not affected, with a reset value of 0. This allows the use of X/Y as
8-bit values.
●Fast interrupt handling through alternate register files (up to 4 contexts) with standard
stack compatible mode (for real time OS kernels)
●16-bit/8-bit stack operations (X, Y, A, CC stacking)
●16-bit pointer direct update with 16-bit relative offset (ADDW/SUBW for X/Y/SP)
●8-bit & 16-bit arithmetic and signed arithmetic support
Doc ID 13590 Rev 311/162
GlossaryPM0044
2 Glossary
mnemmnemonic
srcsource
dstdestination
cyduration of the instruction in CPU clock cycles (internal clock)
The global configuration register is a memory mapped register. It controls the configuration
of the processor. It contains the AL control bit:
AL: Activation level
If the AL bit is 0 (main), the IRET will cause the context to be retrieved from stack and the
main program will continue after the WFI instruction.
If the AL bit is 1 (interrupt only active), the IRET will cause the CPU to go back to WFI/HALT
mode without restoring the context.
This bit is used to control the low power modes of the MCU. In a very low power application,
the MCU spends most of the time in WFI/HALT mode and is woken up (through interrupts)
at specific moments in order to execute a specific task. Some of these recurring tasks are
short enough to be treated directly in an ISR, rather than going back to the main program. In
this case, by programming the AL bit to 1 before going to low power (by executing WFI/HALT
instruction), the run time/ISR execution is reduced due to the fact that the register context is
not saved/restored each time.
Condition Code register (CC)
The Condition Code register is a 8-bit register which indicates the result of the instruction
just executed as well as the state of the processor. These bits can be individually tested by a
program and specified action taken as a result of their state. The following paragraphs
describe each bit.
●V: Overflow
When set, V indicates that an overflow occurred during the last signed arithmetic
operation, on the MSB operation result bit. See INC, INCW, DEC, DECW, NEG, NEGW,
ADD, ADC, SUB, SUBW, SBC, CP, CPW instructions.
●I1: Interrupt mask level 1
The I1 flag works in conjunction with the I0 flag to define the current interruptability level
as shown in the following table. These flags can be set and cleared by software through
the RIM, SIM, HALT, WFI, IRET, TRAP and POP instructions and are automatically set
by hardware when entering an interrupt service routine.
Table 1.Interruptability levels
InterruptabilityPriorityI1I0
Interruptable Main
Interruptable Level 101
Interruptable Level 200
Non Interruptable11
●H: Half carry bit
Lowest
↕
Highest
10
The H bit is set to 1 when a carry occurs between the bits 3 and 4 of the ALU during an
ADD or ADC instruction. The H bit is useful in BCD arithmetic subroutines.
For ADDW, SUBW it is set when a carry occurs from bit 7 to 8, allowing to implement
byte arithmetic on 16-bit index registers.
Doc ID 13590 Rev 315/162
STM8 core descriptionPM0044
●I0: Interrupt mask level 0
See Flag I1
●N: Negative
When set to 1, this bit indicates that the result of the last arithmetic, logical or data
manipulation is negative (i.e. the most significant bit is a logic 1).
●Z: Zero
When set to 1, this bit indicates that the result of the last arithmetic, logical or data
manipulation is zero.
●C: Carry
When set, C indicates that a carry or borrow out of the ALU occurred during the last
arithmetic operation on the MSB operation result bit (bit 7 for 8-bit result/destination or
bit 15 for 16-bit result). This bit is also affected during bit test, branch, shift, rotate and
load instructions. See ADD, ADC, SUB, SBC instructions.
In bit test operations, C is the copy of the tested bit. See BTJF, BTJT instructions.
In shift and rotates operations, the carry is updated. See RRC, RLC, SRL, SLL, SRA
instructions.
This bit can be set, reset or complemented by software using SCF, RCF, CCF
instructions.
Example: Addition
$B5 + $94 = "C" + $49 = $149
C70
010110101
C70
+010010100
C70
=101001001
The results of each instruction on the Condition Code register are shown by tables in
Section 7: STM8 instruction set. The following table is an example:
VI1HI0NZC
V00NZ1
where
Nothing =Flag not affected
Flag name =Flag affected
0 =Flag cleared
1 =Flag set
16/162Doc ID 13590 Rev 3
PM0044STM8 memory interface
4 STM8 memory interface
4.1 Program space
The program space is 16-Mbyte and linear. To distinguish the 1, 2 and 3 byte wide
addressing modes, naming has been defined as shown in Figure 3:
●"Page" [0xXXXX00 to 0xXXXXFF]: 256-byte wide memory space with the same two
most significant address bytes (XXXX defines the page number).
●"Section" [0xXX0000 to 0xXXFFFF]: 64-Kbyte wide memory space with the same most
significant address byte (XX defines the section number).
The reset and interrupt vector table are placed at address 0x8000 for the STM8 family.
(Note: the base address may be different for later implementations.) The table has 32 4-byte
entries: RESET, Trap, NMI and up to 29 normal user interrupts. Each entry consists of the
reserved op-code 0x82, followed by a 24-bit value: PCE, PCH, PCL address of the
respective Interrupt Service Routine. The main program and ISRs can be mapped
anywhere in the 16 Mbyte memory space.
CALL/CALLR and RET must be used only in the same section. The effective address for the
CALL/RET is used as an offset to the current PCE register value. For the JP, the effective
address 16 or 17-bit (for indexed addressing) long, is added to the current PCE value. In
order to reach any address in the program space, the JPF jump and CALLF call instructions
are provided with a three byte extended addressing mode while the RETF pops also three
bytes from the stack.
As the memory space is linear, sections can be crossed by two CPU actions: next
instruction byte fetch (PC+1), relative jumps and, in some cases, by JP (for indexed
addressing mode).
Note:For safe memory usage, a function which crosses sections MUST:
- be called by a CALLF
- include only far instructions for code operation (CALLF & JPF)
All label pointers are located in section 0 (JP [ptr.w] example: ptr.w is located in section 0
and the jump address in current section)
Any illegal op-code read from the program space triggers a MCU reset.
4.2 Data space
The data space is 16-Mbyte and linear. As the stack must be located in section 0 and as
data access outside section 0/1 can be managed only with LDF instructions, frequently used
data should be located in section 0 to get the optimum code efficiency.
All data pointers are located in section 0 only.
Indexed addressing (with 16-bit index registers and long offset) allows data access over
section 0 and 1.
All the peripherals are memory mapped in the data space.
Doc ID 13590 Rev 317/162
STM8 memory interfacePM0044
VECTORS
PAGE 0
0x000000
0x0000FF
0x00807F
0x00FFFF
0x010000
0x01FFFF
0xFF0000
0xFFFFFF
1-BYTE ADDRESSING MODE
BIT HANDLING CAPABILITY
2-BYTE ADDRESSING MODE
3-BYTE ADDRESSING MODE
FAST DATA ACCESS WITH
DATA SPACE
SECTION 0
SECTION 1
SECTION 256
RESET
L
RESET
H
TRAP
L
TRAP
H
NMI
L
NMI
H
INT0
L
INT0
H
INT1
L
INT1
H
INT28
L
INT28
H
0x00807C
0x008000
PROGRAM SPACE
BIT HANDLING CAPABILITY
POWERFUL DATA MANAGEMENT
ACCESSIBLE DATA
STACK AREA
SHORT GENERATED CODE
RESET
E
TRAP
E
NMI
E
INT0
E
INT1
E
INT28
E
0x008000
POINTERS
0x82
0x82
0x82
0x82
0x82
0x82
Figure 3.Address spaces
18/162Doc ID 13590 Rev 3
PM0044STM8 memory interface
PCE PCH PCL
PROGRAM COUNTER
Data@E Data@E0:H:L0x00
YN
"LDF" INSTRUCTION
@DATABUS
RAM FETCH INSTRUCTION
YN
CPU
Memory Interface (RAM)
STALL
A15..0
7
24
17
24
D7..0
R/W
DATABUS
@BUS
Memory Interface (Flash)
STALL
A23..0
D31..0
DATABUS
@BUS
(FETCH)
24
@DATABUS
24
4.3 Memory interface architecture
The STM8 uses a Harvard architecture, with separate program and data memory buses.
However, the logical address space is unified, all memories sharing the same 16-Mbytes
space, non-overlapped. The memory interfaces are shown in Figure 4. It consists of two
buses: address, data, read/write control signal (R/W) and memory acknowledge signal
(STALL).
The STALL acknowledge signal makes the CPU compatible with slow serial or parallel
memory interfaces. When the memory interface is slow the CPU waits the memory
acknowledge before executing the instruction. So in such a case, the instruction CPU cycle
time is prolonged compare to the value given in this manual.
The program memory bus is 32-bit wide, allowing the fetch of most of the instructions in one
cycle.
As the address space is unified, the architecture allows data to be stored also in the Flash
memory and program to be fetched also from RAM (data bus). In this later case the
performance is impacted, besides the fact that data and fetch operation share the same bus,
the instructions will be fetched one byte at a time, thus taking longer (1 cycle /byte).
The STM8 family uses a 3-stage pipeline to increase the speed of the flow of instructions
sent to the processor. Pipelined execution allows several operations to be performed
simultaneously, rather than serially:
●Fetch
●Decode and address
●Execute
The Program Counter (PC) points always to the instruction in decode stage as shown in
Figure 5.
Figure 5.Pipelined execution principle
5.1 Description of pipelined execution stages
Figure 6 and Section 5.1.1, Section 5.1.2, and Section 5.1.3 provide a detailed description
The first pipeline stage includes a 64-bit fetch buffer and a 32-bit prefetch buffer, totalling 3
words named F
to be available for decoding immediately after F
The instruction access from Flash Program memory is 32-bit wide and it is performed from
an aligned address i.e. 0xXXX0, 0xXXX4, 0xXXX8, or 0xXXXC.
Unlike the decode and execute stages that are performed at every cycle, the fetch stage
accesses the program memory only when needed, and stops memory access when the
buffer is full. This allows reducing the core power consumption,
Reading program from RAM is similar to reading program from ROM. However, since the
RAM data bus is 8-bit wide, 4 consecutive read operations have to be performed to load one
word, thus resulting in RAM execution being slower than Flash execution.
F
X
5.1.2 Decoding and addressing stage
The decoding stage includes an instruction alignment unit. The alignment unit uses the 64bit input from the fetch unit and feeds an instruction (from 1 to 5 bytes depending on the
instruction) to the decoding unit.
The instruction code consists of 2 parts (see examples in Table 2 ):
●The op-code itself (1 or 2 bytes)
●and a data/address part (0 to 3 bytes).
, F2 and F3. This buffer structure allows any instruction code (up to 5 bytes)
1
(and F2 when needed) is/are loaded.
1
Doc ID 13590 Rev 321/162
Pipelined executionPM0044
The op-code is decoded in this stage. When present, the instruction address is used for
address computation, whilst the immediate operand is forwarded to the execution stage.
Table 2.Data/address decoding examples
InstructionSyntaxOp-code Data/address
Register to register
move
Register loadLD A,($12,SP)0x7B0x12
Register storeLD ($12,SP),A0x6B0x12
Data load / store with
extended address
Long/unaligned instructions
For long instructions (i.e. 5-bytes instructions), the fetch may need 2 program memory
accesses to be completed. In this case, the decoding stage (after decoding the op-code
part), is stalled waiting for the fetch stage to complete the 2nd fetch.
In case of shorter instructions, this may also happen when they cross a 32-bit boundary.
Indirect addressing
For indirect addressing, the CPU is stalled in this stage to read the pointer from the data
memory (i.e. RAM). The number of cycles during which the CPU is stalled depends on the
pointer size (short, long or extended addressing mode).
5.1.3 Execution stage
In the execution stage, the operation is executed and the result is stored in the accumulator,
index register or RAM.
LD A, XH0x95-
LDF A,($123456,Y)0x90 AF0x12 34 56
5.2 Data memory conflicts
3 types of operations perform accesses to the data memory:
●Effective address computation in case of indirect addressing
●Data read: source operand
●Data write: destination for store or read-modify-write operations
In case of simultaneous accesses to the same memory area both in execution stage (write)
and decoding stage (read), the decode stage is stalled till the execution stage releases the
resource.
22/162Doc ID 13590 Rev 3
PM0044Pipelined execution
C
y
DecCyExeCy1–+=
5.3 Pipelined execution examples
A few pipelined execution examples are reported below. The numbers of cycles for the
decoding and execution stages correspond to the minimum number of cycles needed by the
instruction itself. In some cases, depending on the instruction sequence, the cycle taken
could be more than that number.
5.4 Conventions
Although the decode and/or execute stage of some instructions may take a different number
of cycles, a simplified convention providing a good match with reality, has been used in this
section:
●The decode stage of each instruction takes one cycle only
●The execution stage takes a number of cycles equal to
Where
C
is the number of execution cycles. In case of decode and execute cycles, It
y
corresponds to the minimum number of cycles needed by the instruction itself, and
does not take into account the impact of the instruction sequence.
DecCy is the exact number of decode cycles.
ExeCy is the exact number of execute cycles.
The decode stage of the next instruction starts during the last execution cycle. In
instructions performing pipeline flush, the convention is that, in case the branch is taken, the
next fetch are performed during the last instruction execution cycle.
The exact number of cycles (see Tab l e 3 ) and the number of cycles obtained using this
convention (see Tab l e 4 ) are identical.
Table 3.Example with exact number of cycles
AddressInstruction
0xC000LDW X, [$50.w]413
0xC003ADDW X, #20223
0xC006LD A, [$30].w313
0xC009….
Decode
cycles
Execute
cycles
lgth
F
Time (cycle)
12345678910 11 12 13 14
D D D D E
1
D D D D DEE
F
2
F
3
D DDDDDE
Doc ID 13590 Rev 323/162
Pipelined executionPM0044
Table 4.Example with conventional number of cycles
AddressInstruction
0xC000 LDW X, [$50.w]433
0xC003ADDW X, #20333
0xC006LD A, [$30].w333
Decode
cycles
Execute
cycles
lgth
1 234567 8 91011121314
Time (cycle)
D E E E E
F
1
D D D D EEE
F
2
F
3
D DDD
EEE
0xC009….
Table 5.Legend
Symbol/ColorDefinition
FFetch
DDecode stalled
DDecode
EExecute
5.4.1 Optimized pipeline example – execution from Flash Program memory
In the example shown in Tab l e 6 , the code is stored in the Flash Program memory (32-bit
bus). As a result, 3 cycles are needed to fill the 96-bit prefetch buffer. At each cycle, one
word is loaded and stored in F
the instructions contained in one of the F
instruction contained in F
, F2 and F3. The next fetch operation can start only when all
1
(SWAP A) is decoded, and a fetch operation can start to fill F
3
word are decoded. In fact, at cycle 9, the last
x
word.
3
24/162Doc ID 13590 Rev 3
PM0044Pipelined execution
Table 6.Optimized pipeline example - execution from Flash
Add.Instruction
0xC000NEG A111
0xC001XOR A, $10112
0xC003LD A, #20112
0xC005SUB A,$1000113
Decod.
cycles
Exec.
cycles
lgth
1234567891011121314
DE
F
1
DE
DE
F
2
0xC008INC A111
0xC009LD XL, A111
F
0xC00ASRL A111
3
0xC00BSWAP A111
0xC00CSLA $15112
F
0xC00ECP A,#$FE112
1
0xC010 MOV $100, #11114
0xC014 MOV $101, #22114
Table 7.Legend
DE
F
2
DE
Cycle
DE
DE
DE
F
3
DE
DE
D E
D E
Symbol/ColorDefinition
FFetch
DDecode
EExecute
Doc ID 13590 Rev 325/162
Pipelined executionPM0044
5.4.2 Optimize pipeline example – execution from RAM
In the example shown in Tab l e 8 , the RAM is accessed through an 8-bit bus. As a result, 12
cycles are required to fill the 96-bit pre-fetch buffer. Every 4 cycles, one word is loaded and
stored in F
filled. This occurs for example till the 4
decoded only at the 5
In case of read/write access to the RAM, the fetch is stalled. This occurs during the 6
since RAM address 10 is read during the decode stage of XOR A, $10.
Table 8.Optimize pipeline example – execution from RAM
. The decoding of the first word instruction can start only when the Fx word is
x
th
cycle.
th
cycle, and the first instruction (NEG A) can be
Cycle
th
cycle
Add.
Instruction
0xC000NEG A111
0xC001
0xC003 LD A, #20112
0xC005
0xC008INC A111
0xC009 LD XL, A111
0xC00ASRL A111
0xC00B SWAP A111
0xC00C SLA $15112
0xC00E
XOR A,
$10
SUB
A,$1000
CP
A,#$FE
Table 9.Legend
Decode cycles
Execute cycles
112
113
112
lgth
1 2 34567 8 9 10 11 12 13 14 15 16 17 18 19 20 21
D D DDE
1_1
F
1_4F2_1
F
D E
D D D DE
FS
2_2F2_3F2_4
F
DE
DD DD E
3_1
F
FS
3_2
F
F
3_3
F
3_4
D E
1_1F1_2
F
D E
1_3F1_4
F
DE
D E
D E
1_2F1_3
F
Symbol/ColorDefinition
FFetch
FSFetch stalled
DDecode
DDecode stalled
EExecute
26/162Doc ID 13590 Rev 3
PM0044Pipelined execution
5.4.3 Pipeline with Call/Jump
In the example shown in Tab l e 1 0, a branch is taken after the JP/CALL instruction, and the
fetched instruction(s) are lost (flush). New instructions must be fetched. 3 fetch sequences
are required to refill the pre-fetch buffer. The fetch start depends on the instruction being
executed.
For a JP instruction, the fetch can start during the first cycle of the "dummy" execution.
For the CALL instruction, it starts after the last cycle of the CALL execution.
Table 10.Example of pipeline with Call/Jump
Cycle
DE
DEE
F
2
Add.Instruction
Decode
cycles
Execute
cycles
lgth
0xC000INC A111
0xC001JP label113
0xC004LDW X,[$5432.w]XX4
0xD010label: NEG A111
0xD011CALL label2123
0xD014LDW X,[$5432.w]XX4
1 23456 7 8 91011
DE
F
1
DE
F
2
Flush
F
1
0xD018LDW X,[$7895.w]XX4F3FS
0xE030label2: INCW X111
Table 11.Legend
Symbol/ColorDefinition
FFetch
FSFetch stalled
DDecode
EExecute
Flush
F1DE
5.4.4 Pipeline stalled
The decode stage can be stalled when the execution lasts more than one cycle.
The flush is due to the branch. Fetching the branch address is performed during the second
execution cycle of the BTJF instruction.
The Decode operation can also be stalled when the memory target is modified during the
previous instruction. In the example given in Ta bl e 1 2, the INCW Y instruction writes the X
Doc ID 13590 Rev 327/162
Pipelined executionPM0044
register during the first execution cycle. As a result, in this cycle, the next instruction
(LD A,(X)) cannot be decoded since it reads the X register.
Table 12.Example of stalled pipeline
Time (cycles)
DEE
DDE
AddressInstruction
Decode
cycles
Execute
cycles
0xC000SUB SP, #20112
0xC002LD A, #20112
0xC004BTJT 0x10, #5, to125
0xC009INC A111
lgth
1 2347 8 91011121314
DE
F
1
DE
F
2
F
3
0xC00ABTJF 0x20, #3, to125
F
0xC00FNOPXX1
0xC010LDW X,[$5432.w]XX4F
1
2
0xC014LDW X,[$1234.w]XX4F
0xD020to: INCW Y112
0xD023LD A,(X)112
Table 13.Legend
Symbol/ColorDefinition
FFetch
DDecode stalled
DEE
3
Flush
F
DE
1
DDE
DDecode
EExecute
28/162Doc ID 13590 Rev 3
PM0044Pipelined execution
5.4.5 Pipeline with 1 wait state
In the example given in Ta bl e 1 4 , performing the fetch takes 2 cycles, and there is no
overlap between the 2 fetch cycles.
If the instruction is decoded/executed during the last 2 fetch cycles, then the wait state is
transparent compared to the no-wait state execution.
Table 14.Pipeline with 1 wait state
AddressInstruction
Decode
cycles
Execute
cycles
0xC000NEG A111
0xC001DEC ($10, X)113
0xC004LDW X, #20113
0xC007LD (X), A111
0xC008INC A111
0xC009NEG ($5A, Y)111
Table 15.Legend
Symbol/ColorDefinition
FFetch
DDecode stalled
DDecode
MSMemory stalled
EExecute
lgth
Time (cycle)
12345678910
MS
DE
F
1
MS
DE
F
2
DEE
DDE
MS
F
3
DE
DE
Doc ID 13590 Rev 329/162
STM8 addressing modesPM0044
6 STM8 addressing modes
The STM8 core features 18 different addressing modes which can be classified in 8 main
groups:
Table 16.STM8 core addressing modes
Addressing mode groupsExample
InherentNOP
ImmediateLD A,#$55
DirectLD A,$55
IndexedLD A,($55,X)
SP IndexedLD A,($55,SP)
IndirectLD A,([$55],X)
RelativeJRNE loop
Bit operationBSET byte,#5
The STM8 Instruction set is designed to minimize the number of required bytes per
instruction. To do so, most of the addressing modes can be split in three sub-modes called
extended, long and short:
●The extended addressing mode ("e") can reach any byte in the 16-Mbyte addressing
space, but the instruction size is bigger than the short and long addressing mode.
Moreover, the number of instructions with this addressing mode (far) is limited (CALLF,
RETF, JPF and LDF)
●The long addressing mode ("w") is the most powerful for program management, when
the program is executed in the same section (same PCE value). The long addressing
mode is optimized for data management in the first 64-Kbyte addressing space (from
0x000000 to 0x00FFFF) with a complete set of instructions, but the instruction size is
bigger than the short addressing mode.
●The short addressing mode ("b") is less powerful because it can only access the page
zero (from 0x000000 to 0x0000FF), but the instruction size is more compact.
Table 17.STM8 addressing mode overview
InherentNOP
ImmediateLD A,#$55
ShortDirectLD A,$10000000..0000FF
LongDirectLD A,$1000000000..00FFFF
ExtendedDirectLDF A,$100000000000..FFFFFF
ModeSyntax
Destination
address
Pointer
address
size
Pointer
No OffsetDirectIndexedLD A,(X)000000..00FFFF
ShortDirectIndexedLD A,($10,X)000000..0100FE
30/162Doc ID 13590 Rev 3
Loading...
+ 132 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.