RISC OS

and the 26-bit question

What's the problem? (a brief history)

RISC OS was originally developed on the ARM 2 processor, whose program counter also doubled as the processor status, as shown in the table below:

31	30	29	28	27	26	25..2	1..0
N	Z	C	V	I	F	24-bit program counter (word aligned)	Mode

With this arrangement, it was possible to have a 26-bit address space, capable of accessing up to 64MBytes of memory. At the time, this was considered a lot, but with the advent of the ARM6, memory had become a much cheaper commodity, and so a 32-bit mode was introduced, with a separate register to store the processor flags in. Existing applications could still run because the processor could be put into a special 26-bit mode, where the program counter reflected the processor status bits, as in the ARM 2.

Since RISC OS uses the 26-bit program counter, applications also use this mode of operation, as RISC OS does not support any kind of mode switching. Only the FIRQ vector operates in 32-bit mode.

As 32-bit mode is generally more desireable (you get up to 4GBytes of memory space), future processors will not have the 26-bit mode as an option. This means that RISC OS will not work on the future processors.

What's the solution?

There are five possible solutions:

Use another OS
This would mean that the current applications would not run on the new OS, and development for RISC OS would just dwindle.
Rewrite RISC OS
This would be a mammoth job - and the existing applications would have to be rewritten (or possibly just recompiled).
Create some code convertor
This would convert 26-bit applications into 32-bit applications. There are some problems with this, as discussed below.
Emulate 26-bit operation
This would be slow, but as processors get faster, the emulation speed would increase.
Have two processors
There would be one fast processor which would perform 32-bit mode operations, and one slower processor that could do some 32-bit mode operations, but also 26-bit mode operations. This would increase the cost of machines, and timing would have to be carefully judged, but it is not impossible. This is not discussed here, but can be later on...

A code converter

This would be the neatest solution - simply pop your existing code into a 'magic' convertor, and out pops a 32-bit application. However, such a task is not easy, and may indeed not be possible.

Consider the ARM instruction :
MOVS pc,lr
This becomes the hexadecimal sequence 0xE1B0F00E. If an automatic convertor saw this instruction, it would convert it into whatever instruction performed the same function as that when it was running under 26-bit (a possible method is discussed below).

Now, if you had a block of raw data, and one of the words in the raw data just so happened to be 0xE1B0F00E, then that instruction would also be changed.

There is the possibility of checking around that instruction, to see if it looks like it is code, but even that can fail:

init
  STMFD  r13!,{r14}
  MOV    r0,#0
  LDR    r1,count
loop
  BL     debug_routine
  SUBS   r1,r1,#1
  BPL    loop
  LDMFD  r13!,{pc}^
count
  DCD    128
debug_routine
 [ debug=1
  SWI    XDebug_Code
 ]
  MOVS   pc,lr
data
  DCD    0

In this example, the convertor may correctly identify the LDMFD r13!,{pc}^, but if the debug variable was not one, then the MOVS pc,lr would be surrounded by words which 'looks' more like data than ARM instructions. If you were to disassemble it, it would become:

  ANDEQ   r0,r0,r0,LSL#1
  MOVS    pc,lr
  ANDEQ   r0,r0,r0

You could, of course, have part of the convertor that checks to see where program branches jump to. However, they would fail if the following occurs:

jump_table
  DCD    t_code1-t_start
  DCD    t_code2-t_start
init
  ADR    r0,jump_table		; r0 is pointer to the vector table
  ADR	 r1,t_start		; r1 is the pointer to the code start
  MOV    r2,#1			; r2 is the vector number
  BL	 jump_vector
; ...
jump_vector
  STMFD  r13!,{r14}
  LDR	 r0,[r0,r2,LSL#2] 	; read the vector offset
  MOV    lr,pc			; 'Fake' a BL-type instruction
  ADD	 pc,r0,r1		; and jump to it
vector_return			; This is where the vector returns
  MOVVC  r0,#0                  ; Clear r0 if no error
  LDMFD	 r13!,{pc}^
; ...
t_start
; ...
data1
  DCD	 0
t_code2
  MOVS   pc,lr
data2
  DCD	 0
;...

Here, the automatic convertor wouldn't know what to do with it, unless it had some form of built-in emulator to work out what is happening. This may seem like a contrived example, but there could be many more complicated ways of doing the same thing.

The emulation option

In theory, any computer can emulate any other computer - it may not be as fast, but it'll still emulate it.

This solution is to emulate 26-bit mode from within a 32-bit mode. There are two ways of doing this, one is a standard emulation, the other I'm calling the "Code Lookahead Optimal Emulator" (or CLOE), which will be discussed later.

In standard emulation, the processor would 'pretend' to have 15 registers, and one program counter. Some of these may have a direct relationship with the actual registers (ie. they would not be virtual registers), but others may be virtual registers - the program counter being one of those.

For each instruction, the emulator would work out what the instruction did, and would perform it on its set of registers. This would be quite slow (it would be good performance if a 20:1 ratio could be achieved), but the 26-bit programs would work. When a SWI is called, control is passed from the emulator to the OS, and when the SWI returns, the emulator resumes. However, parts of the OS may still be 26-bit and hence use the emulator...

Code Lookahead Optimal Emulator

Since only a small subset of the ARM instruction set needs to be emulated, it would be preferable if the code is emulated where it needs to be emulated, and run naturally when it does not. In order to describe CLOE, it is best to give an example:

init
  MOV	 r0,#0
  MOV	 r1,#8
  MOV	 r7,#32
loop
  ADR    r2,text
  BL	 print_text
  SUBS   r7,r7,#1
  BGT	 loop
  SWI	 XOS_Exit
print_text
  STMFD  r13!,{r0-r2,lr}
  MOV	 r0,r2
  SWI	 XOS_Write0
  LDMFD	 r13!,{r0-r2,pc}^

CLOE starts off at the first instruction. It looks at it, and decides whether or not it needs to be emulated. In this case, it doesn't, and so looks at the next instruction. This continues until it finds one of the following class of instructions:

Branch
Branch and link
Any instruction which reads the PC's status bits
Any instruction which modifies the PC
One of the CLOE SWI calls
Any SWI that would cause the program to exit

When it gets to one of the first 4 of the above class of instructions, it stores the current instruction at that location in some form of storage, and marks the address with a SWI. It then performs a branch to the start of the emulation code. In this example, the code becomes:

init
  MOV	 r0,#0
  MOV	 r1,#8
  MOV	 r7,#32
loop
  ADR    r2,text
  SWI    CLOE_branch_link	; This is the new instruction
;  BL     print_text		; This was there
  SUBS   r7,r7,#1
  BGT	 loop
  SWI	 XOS_Exit
print_text
  STMFD  r13!,{r0-r2,lr}
  MOV	 r0,r2
  SWI	 XOS_Write0
  LDMFD	 r13!,{r0-r2,pc}^

Since CLOE knows that the processor will always reach the SWI (because there is no opportunity for the program counter to change without CLOE's knowledge), the program will execute as a standard 32-bit mode.

CLOE then looks up the original instruction, and then decides that the following needs to take place:

R14 becomes the old program counter, with the status register
The program counter becomes the location of print_text;

CLOE then starts the emulation again at the new program counter. Here, the first instruction it finds which falls in the above category is LDMFD r13!,{r0-r2,pc}^, so it marks that with a SWI:

init
  MOV	 r0,#0
  MOV	 r1,#8
  MOV	 r7,#32
loop
  ADR    r2,text
  SWI    CLOE_branch_link
  SUBS   r7,r7,#1
  BGT	 loop
  SWI	 XOS_Exit
print_text
  STMFD  r13!,{r0-r2,lr}
  MOV	 r0,r2
  SWI	 XOS_Write0
  SWI    CLOE_pull_stack	; Another new instruction
;  LDMFD  r13!,{r0-r2,pc}^	; This was the old one

As before, CLOE starts executing in normal mode from print_text, and after calling the SWI XOS_Write0, it is then called again. It emulates the instruction which it has stored, and finds out that the execution continues after the earlier SWI that has been called. So, it starts the emulation again, this time reaching BGT loop. The code becomes:

init
  MOV	 r0,#0
  MOV	 r1,#8
  MOV	 r7,#32
loop
  ADR    r2,text
  SWI    CLOE_branch_link
  SUBS   r7,r7,#1
  SWI    CLOE_branch		; Note that it's not SWIGT
;  BGT    loop                  ; This is the old instruction
  SWI	 XOS_Exit
print_text
  STMFD  r13!,{r0-r2,lr}
  MOV	 r0,r2
  SWI	 XOS_Write0
  SWI    CLOE_pull_stack

As CLOE needs to know exactly where execution continues, CLOE needs to emulate any form of branches, so it can continue emulating where the code left off. After the first run, R7 is 31, so the routine would repeat for 32 times.

After the final CLOE_branch has failed, CLOE recognises SWI XOS_Exit, and this would return back to the 32-bit OS.

There is one problem with CLOE - code which checks itself against modification. This would have to be addressed...

OS considerations

In order to reduce the amount of emulation, it is vital that a 32-bit kernel be in place as soon as possible, as well as an active encouragement to get developers to write 32-bit applications.

There are four main ways ARM code can get executed:

Applications (filetype 0xff8)
Utilities (filetype 0xffc)
Relocatable modules (filetype 0xffa)
Vectors (hardware and software)

To a lesser extent, BASIC programs can contain assembler, but since BASIC is currently 26-bit, any ARM assembled code would still be running under the emulator...

In order to allow 32-bit operations of the OS, the kernel would have to be written in 32-bit, and switch between 32-bit and 26-bit modes on the processor. To distinguish between 26-bit versions of the above, and 32-bit versions, different file-types/SWIs could be used:

App32
Util32
RMA32
*RMLoad32 etc.
SWI XOS_Claim32, XOS_CallAfter32 etc.

The kernel would normally operate in 32-bit, but when it called a 26-bit module, utility, vector or application, it would jump into 26-bit mode (either emulated, or using 26-bit mode on the processor). Calling a SWI, 32-bit vector, or 32-bit utitlity the program would cause the kernel to jump into 32-bit mode (or out of the emulator), and when returning, 26-bit mode is restored. Exiting the program would make the kernel enter 32-bit mode, and carry on where it left off.

Finally...

Because new processors won't have the 26-bit mode, I don't believe that RISC OS has to die; it just needs to evolve slightly...

Date last modified: 2019-10-18 18:44:40