- RISCOS
Cloe 2
Cloe 2 is a much improved Cloe, offering performance approaching that of native code! Although it uses many of the ideas of Cloe, it performs as much of the translation as possible before it actually calls the code.
Note that where this document mentions Cloe, it refers to this new version. If it refers to Cloe 1, then it is the earlier idea.
How it works
Cloe requires a start address, just like any application. This address could be the start of the application workspace (&8000
), a Relocatable module's initialisation routine, or a vector entry point.
The conversion takes two stages - the conversion, and the execution.
Conversion stage
Starting at the start address, Cloe would look at the first instruction, and organise it into one of five categories:
- An instruction that does not need to be emulated
- An instruction that causes the PC to change that does not need to be emulated and the destination address is known
- An instruction that causes the PC to change that needs to be emulated and the destination address is known
- An instruction that causes the PC to change but the destination address is unknown
- An instruction that doesn't cause the PC to change, but needs to be emulated
- An instruction which causes execution to terminate
Examples of each of these follow:
- Do not need to be emulated
MOV r0,r2 ADDEQ r7,r8,r0 ADD r0,pc,#32 ; ADR instruction! LDMNVFD r13!,{r4-r7,pc}^ ; NV - could translate into MOV r0,r0 STMFD r13!,{r4-r7,lr}
- PC changes, destination known
B &9038 BEQ &1145c
- PC changes, destination known, needs emulation
BL &20148 BLNE &13004
- PC changes, destination unknown
MOVS pc,lr LDMFD r13!,{r4-r7,r11,pc}^ LDR pc,[r4,#0]
- Emulated instructions
MOV r2,pc ADD r0,r1,pc TEQP pc,#1<<28
- Instructions which terminate
SWI XOS_Exit
For each of these cases, it performs the following tasks:
- Skip the instruction; go on to the next
- Change the emulation address to that pointed to by the instruction. If the instruction is conditional, then it pushes one instruction on a stack, and continues down one path.
- Changes the instruction to a SWI which it will recognise, pushes the pointer to the next instruction on the stack, and continues emulation from the instruction which it is linking to.
- Changes the instruction to a different SWI, and this execution path stops at this point. If the instruction is conditional, then it continues with the instruction afterwards, otherwise, it pops an address off the stack and continues from there. If there are no instructions, then it moves on to the execution stage
- Changes the instruction to a different SWI, and then continues with the next instruction
- Pulls an address from the stack, or if there isn't one, moves on to the execution stage
If, during any point in the emulation code, it reaches an instruction it has already converted, it will stop the emulation at that point, and pop an instruction off the stack. This also happens if the emulation takes it beyond the point of the end of the program.
For example, take the following code:
code BL init BL main BL close SWI XOS_Exit main MOV r0,#'A' MOV r9,#7 loop STMFD r13!,{lr} TST r9,#1 BLEQ print_lowercase BLNE print_uppercase SUBS r9,r9,#1 ADD r0,r0,#1 BGT loop LDMFD r13!,{pc}^ init SWI &20100+22 SWI &20100+0 ; Go to mode 0 MOVS pc,lr close SWI &20004 ; Wait for a key MOVS pc,lr print_lowercase ORR r0,r1,#32 SWI XOS_Write0 MOVS pc,lr print_uppercase BIC r0,r0,#32 SWI XOS_Write0 MOVS pc,lr
For this code, Cloe will convert it into:
code SWI Cloe_BL ; Class 3 - emulates 'init', and then continues SWI Cloe_BL ; Class 3 - emulates 'main', and then continues SWI Cloe_BL ; Class 3 - emulates 'close', and then continues SWI XOS_Exit ; Class 6 - ignored main MOV r1,#'A' ; Ignored MOV r9,#7 ; Ignored loop STMFD r13!,{lr} ; Ignored TST r9,#1 ; Ignored SWIEQ Cloe_BL ; Class 2 - emulates 'print_lowercase' and continues SWINE Cloe_BL ; Class 2 - emulates 'print_uppercase' and continues SUBS r9,r9,#1 ; Ignored ADD r1,r1,#1 ; Ignored BGT loop ; Class 2 - already emulated 'loop', so continues SWI Cloe_PullStack ; Class 4 - destination unknown init SWI &20100+22 ; Ignored SWI &20100+0 ; Ignored SWI Cloe_ALU ; Class 4 - destination unknown close SWI &20004 ; Ignored SWI Cloe_ALU ; Class 4 - destination unknown print_lowercase ORR r0,r1,#32 ; Ignored SWI XOS_Write0 ; Ignored SWI Cloe_ALU ; Class 4 - destination unknown print_uppercase BIC r0,r0,#32 ; Ignored SWI XOS_Write0 ; Ignored SWI Cloe_ALU ; Class 4 - destination unknown
Execution
Once Cloe has gone through all the instructions working out what needs to be emulated, it starts executing at the start of the program. When it reaches one of the class 2, 3 or 5 instructions, it emulates the instruction, and then continues the ARM execution at the instruction afterwards.
When it reaches one of the class 4 instructions, it emulates it, and then works out where the program counter would continue. Using this address, it starts the conversion process again. After emulating, it starts executing the code at that address.
This allows the following to occur:
; ... ADR r0,other_routine MOV lr,pc MOV pc,r0 LDMFD r13!,{pc}^ ; ... other_routine SWI &20120 MOVS pc,lr ; ...
When it reaches the MOV pc,r0
(which is a class 4 instruction), it is able to establish that r0 points to other_routine
(which it has yet to emulate), so Cloe starts its conversion at other_routine
, and continues until it reaches the end. In this example, this is at the MOVS pc,lr
instruction. When it has done this, it starts executing at other_routine
.
Of course, if Cloe has already converted other_routine
, then it does not need to do it again, and so it will just execute at other_routine
.
Overheads
There are two main overheads - memory, and processor bandwidth.
In order to find out if it has already emulated the instruction, it will require 1 bit per ARM instruction, or 1 bit per 32 bytes of memory for the program. This evaluates to 1K per 32K program. It is rare to find a program that is larger than 600K - even that would require only 19K. This memory is required throughout the life of the program, but only needs to be paged in when the code is being emulated.
Cloe would also require some stack in order to perform the conditional conversion. This stack is only required during the conversion phase. The stack requirement would depend on the complexity (and number of procedures) the program uses.
In addition, Cloe also needs to store the original instructions and their addresses to some form of memory. This memory would also be required during the life of the program. In terms of processor overhead, the two main overheads are when Cloe is performing the conversion section, and when it is actually emulating instructions.
Initial tests show that the conversion code can be very fast - up to around 16MBytes/second, or 4MIPS. This means that a 200K program would be converted in 0.06 seconds, under ideal conditions.
The emulation would be considerably slower, possibly at around 1MIPS.
Furthermore, a typical 'C' program of 83K requires 2300 emulated instructions, or around 10% of instructions need to be emulated.
Note that these tests were only performed on a bulk system (ie. no code-following was employed), and the number of emulated instructions should be much lower. Also, a 200MHz StrongARM processor was used.
Using these figures, a 500MHz processor would convert a 200K program in 0.02 seconds, and would run at an average speed of 452MIPS, or 90.4% of real processor speed!
I shall have some more up to date timings when I get the chance to write the system, and emulator, properly.