RISCOS

Cloe 2

Cloe 2 is a much improved Cloe, offering performance approaching that of native code! Although it uses many of the ideas of Cloe, it performs as much of the translation as possible before it actually calls the code.

Note that where this document mentions Cloe, it refers to this new version. If it refers to Cloe 1, then it is the earlier idea.

How it works

Cloe requires a start address, just like any application. This address could be the start of the application workspace (&8000), a Relocatable module's initialisation routine, or a vector entry point.

The conversion takes two stages - the conversion, and the execution.

Conversion stage

Starting at the start address, Cloe would look at the first instruction, and organise it into one of five categories:

An instruction that does not need to be emulated
An instruction that causes the PC to change that does not need to be emulated and the destination address is known
An instruction that causes the PC to change that needs to be emulated and the destination address is known
An instruction that causes the PC to change but the destination address is unknown
An instruction that doesn't cause the PC to change, but needs to be emulated
An instruction which causes execution to terminate

Examples of each of these follow:

Do not need to be emulated

  MOV     r0,r2
  ADDEQ   r7,r8,r0
  ADD     r0,pc,#32        ; ADR instruction!
  LDMNVFD r13!,{r4-r7,pc}^ ; NV - could translate into MOV r0,r0
  STMFD   r13!,{r4-r7,lr}

PC changes, destination known
```
  B       &9038
  BEQ     &1145c
```
PC changes, destination known, needs emulation
```
  BL      &20148
  BLNE    &13004
```

PC changes, destination unknown

  MOVS    pc,lr
  LDMFD   r13!,{r4-r7,r11,pc}^
  LDR     pc,[r4,#0]

Emulated instructions

  MOV     r2,pc
  ADD     r0,r1,pc
  TEQP    pc,#1<<28

Instructions which terminate
```
  SWI     XOS_Exit
```

For each of these cases, it performs the following tasks:

Skip the instruction; go on to the next
Change the emulation address to that pointed to by the instruction. If the instruction is conditional, then it pushes one instruction on a stack, and continues down one path.
Changes the instruction to a SWI which it will recognise, pushes the pointer to the next instruction on the stack, and continues emulation from the instruction which it is linking to.
Changes the instruction to a different SWI, and this execution path stops at this point. If the instruction is conditional, then it continues with the instruction afterwards, otherwise, it pops an address off the stack and continues from there. If there are no instructions, then it moves on to the execution stage
Changes the instruction to a different SWI, and then continues with the next instruction
Pulls an address from the stack, or if there isn't one, moves on to the execution stage

If, during any point in the emulation code, it reaches an instruction it has already converted, it will stop the emulation at that point, and pop an instruction off the stack. This also happens if the emulation takes it beyond the point of the end of the program.

For example, take the following code:

code
  BL      init
  BL      main
  BL      close
  SWI     XOS_Exit
main
  MOV     r0,#'A'
  MOV     r9,#7
loop
  STMFD   r13!,{lr}
  TST     r9,#1
  BLEQ    print_lowercase
  BLNE    print_uppercase
  SUBS    r9,r9,#1
  ADD     r0,r0,#1
  BGT     loop
  LDMFD   r13!,{pc}^
init
  SWI     &20100+22
  SWI     &20100+0 ; Go to mode 0
  MOVS    pc,lr
close
  SWI     &20004 ; Wait for a key
  MOVS    pc,lr
print_lowercase
  ORR     r0,r1,#32
  SWI     XOS_Write0
  MOVS    pc,lr
print_uppercase
  BIC     r0,r0,#32
  SWI     XOS_Write0
  MOVS    pc,lr

For this code, Cloe will convert it into:

code
  SWI     Cloe_BL		; Class 3 - emulates 'init', and then continues
  SWI     Cloe_BL		; Class 3 - emulates 'main', and then continues
  SWI     Cloe_BL		; Class 3 - emulates 'close', and then continues
  SWI     XOS_Exit		; Class 6 - ignored
main
  MOV     r1,#'A'		; Ignored
  MOV     r9,#7			; Ignored
loop
  STMFD   r13!,{lr}		; Ignored
  TST     r9,#1			; Ignored
  SWIEQ   Cloe_BL		; Class 2 - emulates 'print_lowercase' and continues
  SWINE   Cloe_BL		; Class 2 - emulates 'print_uppercase' and continues
  SUBS    r9,r9,#1		; Ignored
  ADD     r1,r1,#1		; Ignored
  BGT     loop			; Class 2 - already emulated 'loop', so continues
  SWI     Cloe_PullStack	; Class 4 - destination unknown
init
  SWI     &20100+22		; Ignored
  SWI     &20100+0		; Ignored
  SWI     Cloe_ALU		; Class 4 - destination unknown
close
  SWI     &20004		; Ignored
  SWI     Cloe_ALU		; Class 4 - destination unknown
print_lowercase
  ORR     r0,r1,#32		; Ignored
  SWI     XOS_Write0		; Ignored
  SWI     Cloe_ALU		; Class 4 - destination unknown
print_uppercase
  BIC     r0,r0,#32		; Ignored
  SWI     XOS_Write0		; Ignored
  SWI     Cloe_ALU		; Class 4 - destination unknown

Execution

Once Cloe has gone through all the instructions working out what needs to be emulated, it starts executing at the start of the program. When it reaches one of the class 2, 3 or 5 instructions, it emulates the instruction, and then continues the ARM execution at the instruction afterwards.

When it reaches one of the class 4 instructions, it emulates it, and then works out where the program counter would continue. Using this address, it starts the conversion process again. After emulating, it starts executing the code at that address.

This allows the following to occur:

; ...
  ADR     r0,other_routine
  MOV     lr,pc
  MOV     pc,r0
  LDMFD   r13!,{pc}^
; ...
other_routine
  SWI     &20120
  MOVS    pc,lr
; ...

When it reaches the MOV pc,r0 (which is a class 4 instruction), it is able to establish that r0 points to other_routine (which it has yet to emulate), so Cloe starts its conversion at other_routine, and continues until it reaches the end. In this example, this is at the MOVS pc,lr instruction. When it has done this, it starts executing at other_routine.

Of course, if Cloe has already converted other_routine, then it does not need to do it again, and so it will just execute at other_routine.

Overheads

There are two main overheads - memory, and processor bandwidth.

In order to find out if it has already emulated the instruction, it will require 1 bit per ARM instruction, or 1 bit per 32 bytes of memory for the program. This evaluates to 1K per 32K program. It is rare to find a program that is larger than 600K - even that would require only 19K. This memory is required throughout the life of the program, but only needs to be paged in when the code is being emulated.

Cloe would also require some stack in order to perform the conditional conversion. This stack is only required during the conversion phase. The stack requirement would depend on the complexity (and number of procedures) the program uses.

In addition, Cloe also needs to store the original instructions and their addresses to some form of memory. This memory would also be required during the life of the program. In terms of processor overhead, the two main overheads are when Cloe is performing the conversion section, and when it is actually emulating instructions.

Initial tests show that the conversion code can be very fast - up to around 16MBytes/second, or 4MIPS. This means that a 200K program would be converted in 0.06 seconds, under ideal conditions.

The emulation would be considerably slower, possibly at around 1MIPS.

Furthermore, a typical 'C' program of 83K requires 2300 emulated instructions, or around 10% of instructions need to be emulated.

Note that these tests were only performed on a bulk system (ie. no code-following was employed), and the number of emulated instructions should be much lower. Also, a 200MHz StrongARM processor was used.

Using these figures, a 500MHz processor would convert a 200K program in 0.02 seconds, and would run at an average speed of 452MIPS, or 90.4% of real processor speed!

I shall have some more up to date timings when I get the chance to write the system, and emulator, properly.

Date last modified: 2019-10-19 17:11:13