RISC OS
and the 26-bit question
What's the problem? (a brief history)
RISC OS was originally developed on the ARM 2 processor, whose program counter also doubled as the processor status, as shown in the table below:
31 | 30 | 29 | 28 | 27 | 26 | 25..2 | 1..0 |
N | Z | C | V | I | F | 24-bit program counter (word aligned) | Mode |
With this arrangement, it was possible to have a 26-bit address space, capable of accessing up to 64MBytes of memory. At the time, this was considered a lot, but with the advent of the ARM6, memory had become a much cheaper commodity, and so a 32-bit mode was introduced, with a separate register to store the processor flags in. Existing applications could still run because the processor could be put into a special 26-bit mode, where the program counter reflected the processor status bits, as in the ARM 2.
Since RISC OS uses the 26-bit program counter, applications also use this mode of operation, as RISC OS does not support any kind of mode switching. Only the FIRQ vector operates in 32-bit mode.
As 32-bit mode is generally more desireable (you get up to 4GBytes of memory space), future processors will not have the 26-bit mode as an option. This means that RISC OS will not work on the future processors.
What's the solution?
There are five possible solutions:
- Use another OS
This would mean that the current applications would not run on the new OS, and development for RISC OS would just dwindle. - Rewrite RISC OS
This would be a mammoth job - and the existing applications would have to be rewritten (or possibly just recompiled). - Create some code convertor
This would convert 26-bit applications into 32-bit applications. There are some problems with this, as discussed below. - Emulate 26-bit operation
This would be slow, but as processors get faster, the emulation speed would increase. - Have two processors
There would be one fast processor which would perform 32-bit mode operations, and one slower processor that could do some 32-bit mode operations, but also 26-bit mode operations. This would increase the cost of machines, and timing would have to be carefully judged, but it is not impossible. This is not discussed here, but can be later on...
A code converter
This would be the neatest solution - simply pop your existing code into a 'magic' convertor, and out pops a 32-bit application. However, such a task is not easy, and may indeed not be possible.
Consider the ARM instruction :
MOVS pc,lr
This becomes the hexadecimal sequence 0xE1B0F00E
. If an automatic convertor saw this instruction, it would convert it into whatever instruction performed the same function as that when it was running under 26-bit (a possible method is discussed below).
Now, if you had a block of raw data, and one of the words in the raw data just so happened to be 0xE1B0F00E
, then that instruction would also be changed.
There is the possibility of checking around that instruction, to see if it looks like it is code, but even that can fail:
init STMFD r13!,{r14} MOV r0,#0 LDR r1,count loop BL debug_routine SUBS r1,r1,#1 BPL loop LDMFD r13!,{pc}^ count DCD 128 debug_routine [ debug=1 SWI XDebug_Code ] MOVS pc,lr data DCD 0
In this example, the convertor may correctly identify the LDMFD r13!,{pc}^
, but if the debug
variable was not one, then the MOVS pc,lr
would be surrounded by words which 'looks' more like data than ARM instructions. If you were to disassemble it, it would become:
ANDEQ r0,r0,r0,LSL#1 MOVS pc,lr ANDEQ r0,r0,r0
You could, of course, have part of the convertor that checks to see where program branches jump to. However, they would fail if the following occurs:
jump_table DCD t_code1-t_start DCD t_code2-t_start init ADR r0,jump_table ; r0 is pointer to the vector table ADR r1,t_start ; r1 is the pointer to the code start MOV r2,#1 ; r2 is the vector number BL jump_vector ; ... jump_vector STMFD r13!,{r14} LDR r0,[r0,r2,LSL#2] ; read the vector offset MOV lr,pc ; 'Fake' a BL-type instruction ADD pc,r0,r1 ; and jump to it vector_return ; This is where the vector returns MOVVC r0,#0 ; Clear r0 if no error LDMFD r13!,{pc}^ ; ... t_start ; ... data1 DCD 0 t_code2 MOVS pc,lr data2 DCD 0 ;...
Here, the automatic convertor wouldn't know what to do with it, unless it had some form of built-in emulator to work out what is happening. This may seem like a contrived example, but there could be many more complicated ways of doing the same thing.
The emulation option
In theory, any computer can emulate any other computer - it may not be as fast, but it'll still emulate it.
This solution is to emulate 26-bit mode from within a 32-bit mode. There are two ways of doing this, one is a standard emulation, the other I'm calling the "Code Lookahead Optimal Emulator" (or CLOE), which will be discussed later.
In standard emulation, the processor would 'pretend' to have 15 registers, and one program counter. Some of these may have a direct relationship with the actual registers (ie. they would not be virtual registers), but others may be virtual registers - the program counter being one of those.
For each instruction, the emulator would work out what the instruction did, and would perform it on its set of registers. This would be quite slow (it would be good performance if a 20:1 ratio could be achieved), but the 26-bit programs would work. When a SWI is called, control is passed from the emulator to the OS, and when the SWI returns, the emulator resumes. However, parts of the OS may still be 26-bit and hence use the emulator...
Code Lookahead Optimal Emulator
Since only a small subset of the ARM instruction set needs to be emulated, it would be preferable if the code is emulated where it needs to be emulated, and run naturally when it does not. In order to describe CLOE, it is best to give an example:
init MOV r0,#0 MOV r1,#8 MOV r7,#32 loop ADR r2,text BL print_text SUBS r7,r7,#1 BGT loop SWI XOS_Exit print_text STMFD r13!,{r0-r2,lr} MOV r0,r2 SWI XOS_Write0 LDMFD r13!,{r0-r2,pc}^
CLOE starts off at the first instruction. It looks at it, and decides whether or not it needs to be emulated. In this case, it doesn't, and so looks at the next instruction. This continues until it finds one of the following class of instructions:
- Branch
- Branch and link
- Any instruction which reads the PC's status bits
- Any instruction which modifies the PC
- One of the CLOE SWI calls
- Any SWI that would cause the program to exit
When it gets to one of the first 4 of the above class of instructions, it stores the current instruction at that location in some form of storage, and marks the address with a SWI. It then performs a branch to the start of the emulation code. In this example, the code becomes:
init MOV r0,#0 MOV r1,#8 MOV r7,#32 loop ADR r2,text SWI CLOE_branch_link ; This is the new instruction ; BL print_text ; This was there SUBS r7,r7,#1 BGT loop SWI XOS_Exit print_text STMFD r13!,{r0-r2,lr} MOV r0,r2 SWI XOS_Write0 LDMFD r13!,{r0-r2,pc}^
Since CLOE knows that the processor will always reach the SWI (because there is no opportunity for the program counter to change without CLOE's knowledge), the program will execute as a standard 32-bit mode.
CLOE then looks up the original instruction, and then decides that the following needs to take place:
- R14 becomes the old program counter, with the status register
- The program counter becomes the location of
print_text
;
CLOE then starts the emulation again at the new program counter. Here, the first instruction it finds which falls in the above category is LDMFD r13!,{r0-r2,pc}^
, so it marks that with a SWI:
init MOV r0,#0 MOV r1,#8 MOV r7,#32 loop ADR r2,text SWI CLOE_branch_link SUBS r7,r7,#1 BGT loop SWI XOS_Exit print_text STMFD r13!,{r0-r2,lr} MOV r0,r2 SWI XOS_Write0 SWI CLOE_pull_stack ; Another new instruction ; LDMFD r13!,{r0-r2,pc}^ ; This was the old one
As before, CLOE starts executing in normal mode from print_text
, and after calling the SWI XOS_Write0
, it is then called again. It emulates the instruction which it has stored, and finds out that the execution continues after the earlier SWI that has been called. So, it starts the emulation again, this time reaching BGT loop
. The code becomes:
init MOV r0,#0 MOV r1,#8 MOV r7,#32 loop ADR r2,text SWI CLOE_branch_link SUBS r7,r7,#1 SWI CLOE_branch ; Note that it's not SWIGT ; BGT loop ; This is the old instruction SWI XOS_Exit print_text STMFD r13!,{r0-r2,lr} MOV r0,r2 SWI XOS_Write0 SWI CLOE_pull_stack
As CLOE needs to know exactly where execution continues, CLOE needs to emulate any form of branches, so it can continue emulating where the code left off. After the first run, R7 is 31, so the routine would repeat for 32 times.
After the final CLOE_branch has failed, CLOE recognises SWI XOS_Exit, and this would return back to the 32-bit OS.
There is one problem with CLOE - code which checks itself against modification. This would have to be addressed...
OS considerations
In order to reduce the amount of emulation, it is vital that a 32-bit kernel be in place as soon as possible, as well as an active encouragement to get developers to write 32-bit applications. |
There are four main ways ARM code can get executed:
- Applications (filetype 0xff8)
- Utilities (filetype 0xffc)
- Relocatable modules (filetype 0xffa)
- Vectors (hardware and software)
To a lesser extent, BASIC programs can contain assembler, but since BASIC is currently 26-bit, any ARM assembled code would still be running under the emulator...
In order to allow 32-bit operations of the OS, the kernel would have to be written in 32-bit, and switch between 32-bit and 26-bit modes on the processor. To distinguish between 26-bit versions of the above, and 32-bit versions, different file-types/SWIs could be used:
- App32
- Util32
- RMA32
*RMLoad32
etc. - SWI
XOS_Claim32
,XOS_CallAfter32
etc.
The kernel would normally operate in 32-bit, but when it called a 26-bit module, utility, vector or application, it would jump into 26-bit mode (either emulated, or using 26-bit mode on the processor). Calling a SWI, 32-bit vector, or 32-bit utitlity the program would cause the kernel to jump into 32-bit mode (or out of the emulator), and when returning, 26-bit mode is restored. Exiting the program would make the kernel enter 32-bit mode, and carry on where it left off.
Finally...
Because new processors won't have the 26-bit mode, I don't believe that RISC OS has to die; it just needs to evolve slightly...