litwr (litwr) wrote,

Emotional stories about processors for first computers: part 9 (Acorn ARM)

The first ARM processors

The ARM-1 processor was an astonishing development. It continued the 6502 ideology (namely, to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle, the 80386 processor, appeared. ARM consisted of an order of magnitude fewer transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed, ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division 80386 could be faster. However, the advantages of ARM were so great that today it is the most mass processor architecture. More than 100 billion such processors have been produced.

ARM's development in 1983 began after Acorn conducted a research with the 32016 processor, which showed that many calculations with 6502 at twice the lower operating frequency than 32016 could be faster than with this seemed to be a much more powerful processor. At that time, the 80286 was already available. It showed very good performance, but Intel, perhaps sensing the potential of Acorn, refused to provide its processor for testing. The technology of 80286 was not closed as 80386 and was transferred to many firms, so the history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps, if Intel had allowed to use its processor, then Acorn would have used it, and would not have developed ARM.

ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. The debut of the processor turned out rather unsuccessful. In 1986, a second ARM processor for the BBC Micro was released with the name ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product. Indeed, if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two cheaper. But there were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system – there were a lot of Unix variants available those time which didn't require MMU, there were such Unix variants for PDP-11, 68000, 80186 and even 8088. Only in the 90s Linux was ported for Acorn Archimedes. Perhaps the delay in the appearance of a real Unix for ARM was caused by Acorn's reluctance to transfer ARM technology to other firms.

The first ARM based system

Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn, in addition to ARM, also tried to conduct expensive development of computers for business, which failed, in particular, due to the shortcomings of the 32016 processor chosen for them. Acorn Communicator was also not very successful. The development of a relatively successful, but not quite IBM PC compatible computer Master 512, was very costly. In addition, a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers, was allowed to enter into as part of a hypothetical big game of absorbing Acorn itself. By the way, after the absorption of Acorn, the role of Olivetti in the US market quickly faded away.

As part of Olivetti, Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which Archimedes personal computers were made. They were stunning then with their speed. The first models of those computers became available in 1987. However, Olivetti's management was focused on working with IBM PC compatible computers and did not want to use its resources to sell Acorn products.

ARM provides for the use of 16 32-bit registers (there are actually more of them, given the registers for system needs). One of the registers, R15, like the PDP-11 architecture, is a program counter. Almost all operations are performed in 1 clock cycle. More cycles are needed, in particular, for jumps, multiplications and memory accesses. Unlike popular processors of those years, ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented, if necessary, through one of the registers. When calling subprograms, the stack is not used; instead, the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of ARM is the combination of the program counter, which is 26-bit and therefore it allows you to address up to 64 MB, with a status register. For flags in this register, eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. ARM's instructions for working with data are 3-address. A characteristic feature of the RISC architecture is the use of register-memory commands only for loading and storing data. ARM has a built-in fast bit shifter (Barrel Shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example, multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. In the instruction system there is a reverse subtraction, which allows, in particular, to have a unary minus as a special case of this instruction and speed up the division procedure. ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction. The instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures, such an execution takes place, as a rule, only for conditional jumps. This feature of ARM allows in many cases to avoid a slow jump operation. The latter is also facilitated by a fact that when performing arithmetic operations, you can refuse to set status flags. With ARM, like the 6809 processor, you can use both fast and regular interrupts. In addition, in the interrupt modes, the higher-numbered registers are replaced with the system ones, which makes interrupt handlers more compact and fast.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system. But the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for 80386, for example, RSB (reverse subtraction), BIC (AND with inversion, such a command exists for PDP-11), 4-address MLA (multiplication with accumulation), LDM and STM (loading or unloading multiple registers from memory, they are similar to MOVEM command for 68k processors). Almost all ARM instructions are 3-address, and almost all 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal – all registers are interchangeable, some exceptions are registers R14 and R15. Most ARM commands may require 3-4 80386 commands to emulate, and most 80386 commands can be emulated by only 2-3 ARM commands. Interestingly, the IBM PC XT emulator on the hardware of Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In Commodore Amiga with 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtakes not only Apple ][ but also Macintosh based on 68k processors, the project was closed!

Among the shortcomings of ARM, we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore, loading a full 32-bit constant can take up to 4 instructions. You can, of course, load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of ARM is its relatively low code density, which makes the programs somewhat large and, most importantly, reduces the efficiency of the processor cache. However, this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of product. For a long time, a significant drawback of ARM was the lack of built-in support for memory management (MMU) – Apple, for example, demanded this support in the early 90s. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. ARM did not have such advanced features for debugging as x86 had. There is still some oddity in the standard assembler language for ARM: it is standard to write operations for the barrel shifter separated by commas. Thus, instead of the simple form R1 shl 7 – shift the contents of the register R1 by 7 bits to the left – you need to write R1, shl 7.

Since 1989, ARM3 has become available with built-in cache. In 1990, the ARM development team separated from Acorn and created ARM Holding with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that, subsequently, Acorn ceased its independent existence and ARM Holding turned into a large company. However, the separation of Acorn and ARM Holding was also initiated by Apple’s desire to have ARM processors in its Newton computers and not be dependent on another computer manufacturer.

Further development of the ARM architecture is also very interesting, it affected, in particular, the interests of such well-known companies like DEC and Microsoft, but this is another story. Although it can be mentioned that thanks to the share in ARM Holding Apple was able to avoid bankruptcy in the 90s.

A lot of thanks to jms2 and BigEd who helped to improve the style and content.

Tags: 32016, 68000, 68030, 6809, 68k, 80186, 80286, 80386, 8086, 8088, acorn, acorn archimedes, acorn communicator, acorn master, apple, arm, arm evaluation system, computer, cpu, dec, hardware, history, intel, next computer, olivetti, pdp-11, processor, x86
  • Post a new comment


    default userpic
    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.


November 22 2018, 12:15:26 UTC 3 weeks ago

  • New comment
Meynaf wrote on page 44 :
'No, address registers aren't "rather poor", and no, it's not useful to use the same register for data and address, especially not "often" and not "very". Give concrete examples if you think otherwise ; pointer types and data types don't share the same arithmetic. What's the meaning of or'ing something to a pointer ? A compiler will not let you do that, and rightly so.

The one and only case that exists for using address for data, is when running out of data registers but in this situation x86 is already out of it.
and it is just insane !
Removing freedom to the programmer in assembly language is heresy !
If you want to code in ADA, do so, and leave ASM to people who know what they are doing.
He is the one who should (try to) demonstrate the benefits of having address and data registers distinction.
'pointer types and data types don't share the same arithmetic' ? Really ?
Or'ing to a pointer is interesting for graphics operations, as can be ANDing too.
Any ARM compiler will let you do that, of course !
ASM is not ADA or any high level language, fortunately.
Having multipurpose registers allows the ARM to fast blit memory transfer loading as many registers you can.
It is also one of the reasons ARM code density is in fact excellent, with code size never longer than 15%
compared to similar 68000 code.