Log in

No account? Create an account
Emotional stories about processors for first computers: part 12 (Preface and Postface)
2x2=4, mathematics

Prologue and Epilogue

I've happened to program with assemblers of different processors. The last on the list is Xilinx MicroBlaze. I decided to put some of my observations on the features of these almost magical pieces of iron, which, like a magic key, opened the doors for us to the magical land of virtual reality and mass creativity. On the features of modern systems x86, x86-64, ARM, ARM-64, etc. I will not write, maybe another time – the topic is very large and complex. Therefore, I finish on Intel 80486 and Motorola 68040. I also wanted to include in the review IBM/370, which I had to deal with. These systems were quite far from the masses of users but had a huge impact on computer technology. They require much time for preparing materials about them, they didn't use chip-processors and there is somehow no-one of these machines left in existence, therefore they aren't included. I really hope that my materials will also attract the attention of experts, who will be able to add something from what I have not thought about or did not know.

As an illustrative material, I attach my small stone from Rosetta – tiny programs for calculating the number π on different processors and systems using a spigot-algorithm, claiming to be the fastest of its implementations.

In conclusion, I give several remarks that I have got in the course of writing these articles.

It is difficult to get rid of the feeling that 8-bit processors were only an undesirable necessity for the main characters acting in the 70s and 80s on the stage of computer history. The development of the most powerful 8-bit 6502 was actually frozen. Intel and Motorola rather slowed down their own development of small processors and restrained other developers.

I'm pretty sure that Amiga or Atari ST would work better and faster using a 4 MHz processor with a 20- or 24-bit address compatible with 6502 than with 68000. Bill Mensch said recently that it’s easy to make 6502 at 10 GHz today.

If Amstrad PCW series, the success of which Commodore CBM II could have shared, began to use optimized z80 at higher frequencies, then it is quite possible that this series would have been relevant 10 years ago.

What would the world be like if ARM had made in 1982 or 1983, which was quite possible?

What would computers made in SU be like if they copied and developed not the most expensive, but the most promising technologies?

Emotional stories about processors for first computers: part 11 (Intel 8080)
2x2=4, mathematics

Intel 8080 и 8085

The first real processor on a chip, made in the first half of 1974, is still being manufactured and is currently being used. It repeatedly cloned around the world, in the USSR he had the designation KP580BM80A. Modern Intel processors for the PC still easily reveal their kinship to this in some sense a relic product. I myself haven't written codes for this processor, but being well acquainted with the architecture of the z80, I would venture to give some of my comments.

The i8080 instruction system, like other Intel processors for the PC, can hardly be called ideal, but it is universal, quite flexible and has some very attractive features. 8080 favorably differed from their competitors, Motorola 6800 and MOS Technology 6502, by a large number of even somewhat clumsy registers, providing a user with one 8-bit accumulator, a 16-bit semi-accumulator and simultaneously fast index register HL, a 16-bit stack pointer, as well as two more 16-bit registers BC and DE. The BC, DE, and HL registers could also be used as 6 byte-registers. In addition, the 8080 has support for a full set of status flags: carry, overflow, sign, zero, and even parity and auxiliary carry. Some instructions from the 8080 instruction set had been speed champions for a long time. For example, the XCHG command makes the exchange of the contents of the 16-bit DE and HL registers in just 4 clock cycles – it was extremely fast! A number of other commands, although they did not set such bright records, were also among the best for a long time:

  • XTHL – exchange of HL register contents and data at the top of the stack, 18 cycles – it seems like a lot, but even on a real 16-bit 8086 equivalent of such a command takes at least 26 cycles, and for 6800 or 6502 such a command is hard to imagine;

  • DAD – add to the semi-accumulator HL the value of another 16-bit register (BC, DE or even SP), 10 cycles. This is a true 16-bit addition with a carry flag set. If you add HL to yourself, you will get a quick 16-bit shift left or multiplication by 2, a key operation for programming both full multiplication and dividing;

  • PUSH и POP – put in the stack and remove from the stack a 16-bit value respectively from the register or in the register. They perform in 11 and 10 cycles. These are the fastest 8080 operations for working with memory, and when they are executed, SP is automatically incremented or decremented. PUSH can be used, for example, to quickly fill memory with a pattern with values from 3 registers (BC, DE, HL). There are no commands for working with 8-bit values with the stack at all;

  • LXI – a loading of a 16-bit constant into a register (HL, DE, BC, SP) for 10 cycles;

  • RNZ, RZ, RNC, RC, RPO, RPE, RP, RM – conditional returns from a subroutine, allow to make the code cleaner, eliminating the need to write extra conditional jumps. These commands were abandoned in the x86 architecture, but they should probably have been saved, the code with it turns out nicer.

This processor was used in the first almost personal computer Altair 8800, which became very popular after the journal publication in early 1975. By the way, in the USSR a similar publication happened only in 1980 and corresponding to it in relevance only in 1986.

The first almost PC

Intel 8080 became the basis for the development of the first mass professional operating system CP/M, which occupied a dominant position among microcomputers for professional work until the mid-1980s.

Now about the shortcomings. The 8080 required three supply voltages of -5, 5, and 12 volts. Working with interrupts is clumsy and slow. And in general, 8080 rather leisurely, if you compare it with the soon appeared competitors. 6502 could be up to 3 times faster when working on the same frequency as the 8080. However, the architecture of the 8080 was laid, as it turned out, the correct vision of the future, namely the unknown in the 70s, the fact that the processors will be faster than memory. The 8080's DE and BC registers are a prototype of modern caches, with manual control, rather than general-purpose registers. 8080 started at 2 MHz and competitors only at 1, which smoothed the performance difference.

It's hard to call the 8080 an 8-bit processor at 100%. Indeed, its ALU is 8 bits wide, but there are many 16-bit commands that work faster than if you use only 8-bit counterparts instead. And for some instructions, there are no 8-bit analogs at all. XCHG instruction is essentially and by timing 100% 16-bit. There are real 16-bit registers. Therefore, I venture to call the 8080 partially 16-bit. It would be interesting to calculate the processor's bit index based on the set of its features, but as far as the author knows, no one has done such work so far.

The author does not know the reasons why Intel had abandoned direct support of the 8-bit desktops with their processors. Intel has always distinguished the complexity and ambiguity of the policy. Its connection with politics, in particular, is illustrated by the fact that for a long time Intel has fabs in Israel and until the end of the 90s it was secret. Intel practically did not try to improve the 8080, only the clock frequency was raised up to 3 MHz. In fact, the 8-bit computer market was given to Zilog with a related to 8080 z80 processor, which was able to quite successfully withstand the main competitor, The Terminator 6502.

In the USSR and Russia, the domestic clone of 8080 became the basis of many popular computers that remained popular until the early 90s. Those are, of course, Radio-86RK, Mikrosha, multicolor Orion-128, Vector, and Corvette. Eventually cheap and improved ZX Spectrum clones based on the z80 won the clone wars.

This is a real PC

In early 1976 Intel introduced the 8085 processor, compatible with the 8080, but significantly superior to its predecessor. In it, the power supply of -5 and 12 volts has become unnecessary and the connection scheme has been simplified, work with interrupts has been improved, the clock frequency has been used from 3 to a very solid 6 MHz, the command system has been expanded with very useful instructions: 16-bit subtraction, 16-bit shift right for only 7 cycles (it was very fast), 16-bit rotate left through the carry flag, loading of 16-bit register with 8-bit offset (this instruction is possible to use with the stack pointer too), writing of HL register contents to an address in DE register, analogous reading of HL via an address in DE. All the instructions mentioned above, except for the shift to the right, are executed in 10 cycles – this is sometimes significantly faster than their counterparts or emulation on the z80. Some more instructions and even a new processor status flag were added. In addition, many instructions for working with byte data were accelerated by 1 clock cycle. This was very significant, as many systems with 8080 or z80 used wait states, which due to the presence of extra cycles on the 8080 could pull the execution time almost twice. For example, in a mentioned computer Vector register-register instructions were performed for 8 cycles, and if there were 8085 or z80, then the same instructions would be executed only in 4 cycles. XTHL instruction has become faster even by two cycles. However, some instructions, for example, 16-bit increment and decrement, PUSH and conditional returns have become slower by a cycle.

The 8085 has built-in support for interrupts, which in many cases eliminates the need for a separate interrupt controller in a system, and a serial I/O port.

However, I can repeat the formula "for unknown reasons" Intel refused to promote 8085 as the main processor for computers. It is known only about several computers based on 8085 – this is TRS-80 Model 100 and, of course, the predecessor and almost competitor of IBM PC – IBM System/23 Datamaster. In the SU/Russia in the early 90s on the basis of domestic clone ИM1821BM85A there were attempts to improve some systems, for example, computer Vector. In fact, Intel had given way to the z80 in the 70s. A few years later, in the battle for the 16-bit market, Intel behaved quite differently, starting a lawsuit to ban sales of V20 and v30 processors in the United States. Interestingly, the mentioned processors of the Japanese company NEC could switch to full binary compatibility with the 8080, which made them the fastest processors of the 8080 architecture.

Another secret from Intel is the refusal to publish an extended command system. However, one of the official manufacturers of these processors has published the entire system of instructions. What are the reasons for this strange refusal? One can only guess. Could Zilog then play a role that AMD might have once played, and created the ostensible appearance of competition, while the 8085 could bring down Zilog? Maybe it's about wanting to keep the system of instructions closer to the 8086 then being designed? The latter seems doubtful. The 8086 was released more than 2 years after the release of 8085 and it’s hard to believe that in 1975 the system of its commands was already known. And in any case, compatibility with both the 8080 and 8085 on the i8086 is achievable only with the use of a macro processor, sometimes replacing one 8080/8085 instruction with several of its own. It is especially difficult to explain why Intel did not publish information about new instructions after the release of 8086.

Emotional stories about processors for first computers: part 10 (MOS Technology 6502)
2x2=4, mathematics

6502 and 65816

This is a processor with a very dramatic fate. No other processor can compare with it. Its appearance and introduction were accompanied by very large events in scope and consequences. I will list some of them:

  1. the weakening of the giant Motorola, which for some time exceeded the capabilities of Intel;

  2. the destruction of the independent company MOS Technology;

  3. cessation of development 6502 and its stagnation release with little or no modernization.

It all started with the fact that Motorola, for unknown reasons, refused to support the initiative of young engineers, offering to improve the overall rather mediocre processor 6800. They had to leave the company and continue their work in a small but promising company MOS Technology, where they soon prepared two processors (6501 and 6502) made by NMOS technology. The first one was pin-compatible with the 6800, but otherwise they were identical. The 6501/6502 team was able to successfully introduce a new chip production technology, which radically reduced the cost of new processors. In 1975, MOS Technology could offer 6502 for $25, while the starting price for the Intel 8080 and Motorola 6800 was $360 in 1974. In 1975, Motorola and Intel lowered prices, but they were still close to $100. MOS Technology specialists claimed that their processor is up to 4 times faster than 6800. I find this questionable: the 6502 can work much faster with memory, but the 6800's second accumulator greatly accelerated many calculations. Estimated I can assume that the 6502 was on average faster no more than 2 times. Motorola launched a lawsuit against its former employees – they allegedly used many of the company's technological secrets. During the trial, it was possible to establish that one of the engineers who had left Motorola took some confidential documents on the 6800, acting contrary to the attitudes of his colleagues. Whether it was his own act or there were still some guiding forces behind him is still unknown. For this and other unclear reasons, Motorola indirectly won the process and MOS Technology, whose financial capabilities were very small, was forced to pay a substantial amount of $200,000 and to abandon production of 6501. Intel in a similar situation with Zilog acted quite differently. Although it must be admitted that MOS Technology was sometimes too risky when trying to use the big money that Motorola spent on promoting the 6800 for its own purposes.

Further, the legendary Commodore firm and its no less legendary founder Jack Tramiel appear in the story, in the shadow of whom was the figure of the chief financier of the company determining its policy – a man named Irving Gould. Jack got a loan from Irving and with this money, using a few, to put it mildly, unscrupulous tactics, forced MOS Technology to become a part of the Commodore. After that, possibly against the wishes of Tramel, who was forced to give in to Gould, the development of the 6502 almost stopped, despite the fact that even in 1976 it was possible to produce prototypes of the 6502 with operating frequencies up to 10 MHz. Although the message about this appeared only many years later from a man named Bill Mensch (he was with the team that left Motorola), who sometimes made loud, but by and large empty statements and played a rather ambiguous role in the fate of 6502. The main developer of the 6502 Chuck Peddle was forever removed from the development of processors. 6502 continued to be produced not only at Commodore but also at the Western Design Center (WDC) created by Bill Mensch. It is fascinating that none of the former 6502 team worked with him in the future.

On this drama around the 6502 is not over. In 1980, a short, anonymous article appeared in Rockwell's AIM65 Interactive magazine stating that all 6502 carry a dangerous bug called the JMP (xxFF). The tone of the article suggests something completely out of the ordinary. Subsequently, this attitude moved to the position of Apple on this issue and became a kind of mainstream. Although a "bug" strictly speaking was not. Of course, a specialist accustomed to the comfortable processors of large systems of those years, one of the features that are quite relevant and even useful among microprocessors, could seem something annoying, a bug. But in fact, this hurting someone's feelings behavior was described in the official documentation from 1976 and in the textbooks on programming, published before the appearance of the mentioned article. The "bug" was eliminated by Bill Mensch, who made 65С02 (CMOS 6502) supposedly by 1983, i.e. after the release of 65816. While Intel, Motorola and others had already made 16-bit processors of new generations, the 6502 was only microscopically improved and made artificially partially incompatible with itself. In addition to eliminating the "bug," a number of changes were made, which, in particular, led to a change in the course of executing several instructions, which became slower in a tact, but at the same time they became more correct in some far-fetched academic sense. But, it must be admitted that several new instructions turned out to be expected and useful. On the other hand, the absolute majority of the new instructions only occupied the code space, adding almost nothing to the capabilities of the 6502, which left fewer new codes for possible further upgrades. Commodore and Japanese Ricoh (manufacturer of the very popular game consoles NES) did not accept these changes. The author of this material himself has encountered several times the problem of this “bug”. Knowing nothing about it, he was writing programs for the Commodores. There was an incompatibility, he had to change the codes, do a conditional assembly. The code for the 65C02 turned out to be more cumbersome and slower. Then I raised this question on the forum 6502.org, where some participants have connections to Apple's computers. I asked if anyone could give an example when the aforementioned "bug" crashed the program. I received only emotional and general comments, a specific example was never offered.


65C02 was licensed to many firms, in particular, NCR, GTE, Rockwell, Synertek and Sanyo. 6512 is a 65C02 variant. It was used in later BBC Micro models. Atari used NMOS 6502. Firms Synertek and Rockwell, in addition to CMOS 6502, also produced NMOS 6502. By the way, NMOS 6502 has its own set of undocumented instructions, the nature of which is completely different from the "secret" commands of 8085. In 6502, these instructions appeared as a side effect of the technology used, so most of them are rather useless. But several instructions, for example, loading or unloading two registers with one command at once, and some others can make the code faster and more compact.

There were other attempts to modernize the 6502. In 1979, an article appeared that for the Atari computers, the 6509 processor is being prepared for production (not to be confused with the later Commodore's processor with the same name), in which command execution acceleration by 25% and many new instructions were expected. For unknown reasons, the production of this processor never took place. Commodore conducted only microscopic upgrades. There, in particular, they switched to HMOS technology and the manufacture of static cores, which allowed slowing down the processors. From the point of view of programming, the most interesting is the processor 6509, which, albeit in a very primitive form, with the help of only two instructions specially allocated for this purpose allows addressing up to 1 MB of memory. In the super-popular Commodores 64 and 128, there are 6510/8510 processors, and in the less successful 264 series – 7501/8501. These processors have 6 and 7 embedded I/O bit-ports, respectively, while 7501/8501 do not support non-masked interrupts. Rockwell produced version 65C02 with its extended by 32 bit operations (similar to the z80 bit instructions) instruction set. However, as far as I know, such processors were not used in computers, and these bit instructions themselves were more likely to be used only in embedded systems. BTW this extension was made by Bill Mensch.

The last scene of the drama with the participation of 6502 was designated in the prevention of computers based on 6502 with a frequency of 2 MHz on the US market in the first half of the 80s. This affected English BBC Micro, their manufacturing company Acorn made a large batch of computers for the United States, but as it turned out, in vain. Some kind of lock was triggered and the computers had to be urgently redone to European standards. Semi-American, but formally Canadian computers Commodore CBM II, despite some problems (in particular, compliance with the standards for electrical equipment), were nevertheless admitted. Perhaps due to the fact that they did not have graphic modes and even color text – even the stylish Porsche design could not compensate for this. The latest in the list of losers was the 100% American Apple III – it is known that Steve Jobs did a lot to prevent this computer from being successful. He demanded obviously impracticable specifications. Do we ever know his motives? Only in 1985, when the era of 8-bit technology began to go away, did the Commodore 128 appear. It could use in one of its modes 6502 at 2 MHz clock. But even here it turned out to be more of a joke since this mode was practically not supported and there are practically no programs for it. Only in the second half of the 80s in the United States began to produce accelerators for the Apple II and since 1988 the Apple IIc+ model with a 4 MHz processor. Why did it happen so? Perhaps because 6502 at 2 or 3 MHz (and these were already produced at the very beginning of the 80s) could successfully compete with systems based on Intel 8088 or Motorola 68000 on a number of tasks and especially games. In 1991, the willful decision of Commodore closed an interesting, albeit belated project C65 based on the 4510 processor with a frequency of 3.54 MHz. 4510 is the fastest 6502, made only in 1988, it finally carried out the previously mentioned optimization of cycles, which gave a 25% increase in speed. Thus, the processor in C65 is close in speed to 6502 systems at 4.5 MHz. Surprisingly, this fastest 6502 with an extended set of instructions (in some detail this extension turned out to be more convenient than in 65816) has never been used anywhere else.

Anti-advertising – multiple Porsche PETs in the apartments of the villain of The Jewel of the Nile – The Apple Only era in Hollywood had not yet come

Now a few words about the instruction system of 6502. The main feature of this processor is that it was made almost as fast as possible, with almost no extra clock cycles, which are especially numerous in the 8080/8085/z80/8088/68000 processors. In fact, it was the ideology later appeared under the direct influence of the 6502 processor RISC architecture. The same ideology dominates, starting with the Pentium series, and among Intel processors. In addition, the 6502 responded very quickly to interrupts, which made it very useful in some embedded systems. 6502 has one accumulator and two index registers, in addition, the first 256 bytes of memory can be used in dedicated commands either as faster memory or as a set of 16-bit registers (which are almost identical in their functionality to the BC and DE registers in 8080/z80) for pretty powerful ways to address. Some arithmetic commands (shifts, rotation, increment, and decrement) can be used with memory directly, without using registers. There are no 16-bit instructions – this is a 100% 8-bit processor. It supports all the basic flags but the parity flag which is typical only for the Intel's architecture. There is some more special flag of the low-useful decimal mode. Intel and Motorola processors use special corrective instructions for working with decimal numbers, and 6502 can switch to decimal mode, which makes its speed advantage with decimal numbers even more significant than with binary ones. Very impressive is the presence for 6502 table multiplication of 8-bit operands with a 16-bit result in less than 30 cycles, with an auxiliary table size of 2048 bytes. One of the slowest 6502 operations is a block memory copy. It can take more than 14 clocks per byte.

65816 was released by WDC in 1983. Interestingly, that some specifications of the new processor Bill Mensch received from Apple. Of course, this was a big step forward, but clearly belated and with large architectural flaws. 65816 was not considered by anyone as a competitor for the main processors of Intel or Motorola – it was already a minor outsider, which was already somehow set to further loss of positions. 65816 had two important advantages – it was relatively cheap and almost compatible with the still very popular 6502. In subsequent years, Bill Mensch didn’t even try to somehow improve his brainchild, do cycle optimization, replace the zero page addressing by extended using Z register (this was done in 4510), add at least multiplication, ... WDC only increased the limiting clock speeds, reaching 14 MHz by the mid-90s (this processor was used in the popular accelerator for the C64, SuperCPU, at a frequency of 20 MHz). However, even now (2018!) WDC offers 65816 for some reason only at the same 14 MHz. 65816 can use up to 16 MB of memory, but the addressing methods used for this look far from optimal. For example, index registers can be only 8- or 16-bit, the stack can be placed only in the first 64 KB of memory, only there you can use the convenient short addressing of the direct page (the generalization of zero page addressing), working with memory above 64 KB is comparatively awkward, ... The 65816 has a 16-bit ALU but an 8-bit data bus, so it is only about 50% faster than the 6502 with arithmetic operations. Nevertheless, 65816 was released in more than a billion. Indeed, the number of commands 65816 clearly completes the gaps in the 6502 architecture, for example, the commands for block copying memory in 7 clock cycles per byte.

65802 is another version of 65816, which uses a 16-bit address bus and compatible with the 6502 pin layout. An upgrade for Apple II based on this processor was offered, but slight acceleration with such an upgrade can only be obtained on specially written programs.

6502 was used in a large number of computer systems, the most popular of which are 8-bit Commodore, Atari, Apple, NES. It is interesting that 6502 was also used in the keyboard controller of Commodore Amiga. 65816 was used in the rather popular Apple IIgs computer, in the Super NES gaming console, and also in the rare Acorn Communicator computer.

In 1984, an article about a bad copy of an Apple ][ computer, Agat, made in the USSR appeared on the background of pictures with red banners, Lenin and marching soldiers in Byte magazine. This article cited a curious price for this computer – $ 17,000 (it was an absurd number, the real price was about 4000 rubles) and ironically indicated that Soviet manufacturers would have to dramatically lower the price if they want to sell their product in the West. Agat was used mainly in school education. Older Agat models were almost 100% compatible with Apple ][ and had some pretty useful extensions.

One can only try to fantasize about what would have happened if 6502 could develop at the same pace as its competitors. It seems to me that the gradual moving of zero-page memory to registers and the gradual expansion of the command system with simultaneous optimization of cycles would allow The Terminator 6502 to remain in the lead in terms of performance until the early 90's. Introducing 16-bit mode and then 32-bit would allow more memory and faster commands to be used. Would its competitors have been able to oppose this?

I would like to finish with some general philosophical arguments. Why was the 6502 slowed down and lacking a much brighter future? Perhaps due to the fact that it really could very much press large companies and create a completely new reality. Was the 6502 team set up for this? Rather, no, they just wanted to make a better processor.

Emotional stories about processors for first computers: part 9 (Acorn ARM)
2x2=4, mathematics

The first ARM processors

The ARM-1 processor was an astonishing development. It continued the 6502 ideology (namely, to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle, the 80386 processor, appeared. ARM consisted of an order of magnitude fewer transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed, ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division 80386 could be faster. However, the advantages of ARM were so great that today it is the most mass processor architecture. More than 100 billion such processors have been produced.

ARM's development in 1983 began after Acorn conducted a research with the 32016 processor, which showed that many calculations with 6502 at twice the lower operating frequency than 32016 could be faster than with this seemed to be a much more powerful processor. At that time, the 80286 was already available. It showed very good performance, but Intel, perhaps sensing the potential of Acorn, refused to provide its processor for testing. The technology of 80286 was not closed as 80386 and was transferred to many firms, so the history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps, if Intel had allowed to use its processor, then Acorn would have used it, and would not have developed ARM.

ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. The debut of the processor turned out rather unsuccessful. In 1986, a second ARM processor for the BBC Micro was released with the name ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product. Indeed, if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two cheaper. But there were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system – there were a lot of Unix variants available those time which didn't require MMU, there were such Unix variants for PDP-11, 68000, 80186 and even 8088. Only in the 90s Linux was ported for Acorn Archimedes. Perhaps the delay in the appearance of a real Unix for ARM was caused by Acorn's reluctance to transfer ARM technology to other firms.

The first ARM based system

Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn, in addition to ARM, also tried to conduct expensive development of computers for business, which failed, in particular, due to the shortcomings of the 32016 processor chosen for them. Acorn Communicator was also not very successful. The development of a relatively successful, but not quite IBM PC compatible computer Master 512, was very costly. In addition, a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers, was allowed to enter into as part of a hypothetical big game of absorbing Acorn itself. By the way, after the absorption of Acorn, the role of Olivetti in the US market quickly faded away.

As part of Olivetti, Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which Archimedes personal computers were made. They were stunning then with their speed. The first models of those computers became available in 1987. However, Olivetti's management was focused on working with IBM PC compatible computers and did not want to use its resources to sell Acorn products.

ARM provides for the use of 16 32-bit registers (there are actually more of them, given the registers for system needs). One of the registers, R15, like the PDP-11 architecture, is a program counter. Almost all operations are performed in 1 clock cycle. More cycles are needed, in particular, for jumps, multiplications and memory accesses. Unlike popular processors of those years, ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented, if necessary, through one of the registers. When calling subprograms, the stack is not used; instead, the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of ARM is the combination of the program counter, which is 26-bit and therefore it allows you to address up to 64 MB, with a status register. For flags in this register, eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. ARM's instructions for working with data are 3-address. A characteristic feature of the RISC architecture is the use of register-memory commands only for loading and storing data. ARM has a built-in fast bit shifter (Barrel Shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example, multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. In the instruction system there is a reverse subtraction, which allows, in particular, to have a unary minus as a special case of this instruction and speed up the division procedure. ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction. The instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures, such an execution takes place, as a rule, only for conditional jumps. This feature of ARM allows in many cases to avoid a slow jump operation. The latter is also facilitated by a fact that when performing arithmetic operations, you can refuse to set status flags. With ARM, like the 6809 processor, you can use both fast and regular interrupts. In addition, in the interrupt modes, the higher-numbered registers are replaced with the system ones, which makes interrupt handlers more compact and fast.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system. But the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for 80386, for example, RSB (reverse subtraction), BIC (AND with inversion, such a command exists for PDP-11), 4-address MLA (multiplication with accumulation), LDM and STM (loading or unloading multiple registers from memory, they are similar to MOVEM command for 68k processors). Almost all ARM instructions are 3-address, and almost all 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal – all registers are interchangeable, some exceptions are registers R14 and R15. Most ARM commands may require 3-4 80386 commands to emulate, and most 80386 commands can be emulated by only 2-3 ARM commands. Interestingly, the IBM PC XT emulator on the hardware of Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In Commodore Amiga with 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtakes not only Apple ][ but also Macintosh based on 68k processors, the project was closed!

Among the shortcomings of ARM, we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore, loading a full 32-bit constant can take up to 4 instructions. You can, of course, load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of ARM is its relatively low code density, which makes the programs somewhat large and, most importantly, reduces the efficiency of the processor cache. However, this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of product. For a long time, a significant drawback of ARM was the lack of built-in support for memory management (MMU) – Apple, for example, demanded this support in the early 90s. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. ARM did not have such advanced features for debugging as x86 had. There is still some oddity in the standard assembler language for ARM: it is standard to write operations for the barrel shifter separated by commas. Thus, instead of the simple form R1 shl 7 – shift the contents of the register R1 by 7 bits to the left – you need to write R1, shl 7.

Since 1989, ARM3 has become available with built-in cache. In 1990, the ARM development team separated from Acorn and created ARM Holding with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that, subsequently, Acorn ceased its independent existence and ARM Holding turned into a large company. However, the separation of Acorn and ARM Holding was also initiated by Apple’s desire to have ARM processors in its Newton computers and not be dependent on another computer manufacturer.

Further development of the ARM architecture is also very interesting, it affected, in particular, the interests of such well-known companies like DEC and Microsoft, but this is another story. Although it can be mentioned that thanks to the share in ARM Holding Apple was able to avoid bankruptcy in the 90s.

A lot of thanks to jms2 and BigEd who helped to improve the style and content.

Emotional stories about processors for first computers: part 8 (DEC VAX-11)
2x2=4, mathematics

Processor for DEC VAX-11

VAX-11 systems were quite popular in the 80s, especially in higher education. Now it is difficult to understand some of the concepts described in the books of those years, without knowing the features of the architecture of these systems. VAX-11s were more expensive than PDP-11s, but more oriented towards universal programming and still significantly cheaper than IBM/370 systems. For the VAX architecture, the V-11 processor was made by the mid-80s, and before that time, processor assemblies were used.

The VAX-11 architecture is 32-bit, it uses 16 registers, among which, like the PDP-11, there is a program counter. It assumes the use of two stacks, one of which is used to store frames of subroutines. In addition, one of the registers is assigned to work with the arguments of called functions. Thus, 3 of 16 registers are allocated for stacks. The instruction system of VAX-11 cannot fail to amaze with its vastness and the presence of very rare and often unique commands, for example, for working with bit fields or several types of queues, for calculating the CRC, multiplying decimal strings, ... Many instructions have both three-address variants (like ARM) and two-address variants (like x86), but there are also four-address instructions, for example, the extended division – EDIV. Of course, there is support for working with floating point numbers.

But VAX-11 is a very slow system for its class and price. Even the super-simple 6502 at 4 MHz could outrun the slowest family member VAX-11/30, and the fastest VAX-11 systems – huge cabinets and “whole furniture sets” are at the same level of speed as the first PC ATs. When the 80286 appeared, it became clear that the days of the VAX-11 were numbered and even slowdown with the development of systems based on 80286 could not change anything fundamentally. The straightforward men from Acorn, having made ARM in 1985, without hiding anything, said that ARM is much cheaper and much faster. VAX-11, however, remained relevant until the early 90s, while still having some advantages over the PC, in particular, faster systems for working with disks.

VAX-11 is probably the last mass computer system, in which the convenience of working in assembly language was considered more important than its speed. In a sense, this approach has moved to modern popular scripting languages.

The VAX-11/785 is also a computer (1984) – the fastest among the VAX-11, with its processor speed comparable to the IBM PC AT or ARM Evaluation System

Surprisingly, there is very little literature available on VAX-11 systems in open access. As if there is some strange law of oblivion. Several episodes close to politics and correlated with the history of the USSR are associated with the history of this architecture. It is possible that the actual rejection of the development of the PDP-11 architecture was caused by its low cost and the success of its cloning in the Soviet Union. And the cloning of the VAX-11 cost an order of magnitude more resources and led to a dead end. Interest in VAX-11 was created using, for example, hoaxes like the famous Kremvax on April 1, 1984, in which the then USSR leader Konstantin Chernenko offered to drink vodka on the occasion of connecting to the Usenet network. Another joke was that some VAX-11 chips were impressed with a message in broken Russian about how good the VAX-11 is.

Some models of VAX-11 were cloned in the USSR by the end of the 80's, but such clones were produced very little and they almost did not find a use.

Several VAX-11 systems are available for use over the network. And this distinguishes them favorably from the IBM/370 systems with which they competed.

Emotional stories about processors for first computers: part 7 (NS 32016)
2x2=4, mathematics

The first 32-bit CPU – National Semiconductor 32016

This is the first true 32-bit processor proposed for use in computers back in 1982. This processor was originally planned as a VAX-11 on a chip, but due to the impossibility to negotiate with DEC, National Semiconductor had to make the processor only separate details of which are similar to the VAX-11 architecture.

With this processor, the use of paged virtual memory begins – today it is the dominant technology. Though the virtual memory support is not built into the processor, it is available through a separate coprocessor. A separate coprocessor is also required for working with real numbers.

The instruction system of NS32016 is huge and similar to the VAX-11 instruction system, in particular, by the presence of a separate stack for sub-program frames. The address bus is 24-bit that allows to use up to 16 MB of memory. The distinguishing feature of 32016 is a bit unusual set of status flags. In addition to the standard flags of carry (which can be used as a flag for a conditional jump), overflow, sign, equality (or zero), there is also the L flag, which means less – this is a carry flag for comparisons only. The situation around carry flag is similar to that of the Motorola 68k processors. The overflow flag is for some reason called F. There are flags of step-by-step mode, privileged mode and (uniqueness!) flag of the current stack selection. When executing arithmetic instructions, the flags of the sign, zero, and less are not set, they are set only by comparison commands.

You can use eight 32-bit general purpose registers. In addition, there is also a program counter, two stack pointers, a stack pointer of the subroutine frames, a program database pointer (this is something unique), a module base pointer (also something very rare), a pointer to the interrupt vector table, a configuration register and a processor status register. The performance of NS32016 was comparable to 68000, maybe only a bit faster.

32016, as far as I know, was used only with BBC Micro personal computers as a second processor. It was possible to order the processor with frequencies 6, 8 and 10 MHz. This second processor was a very expensive and prestigious device for 1984. The software for it was very small, only made by the efforts of Acorn. It includes Panos operating system which is similar to Unix and the permanent Acorn satellite BASIC. BBC Micro did not use the MMU chip – although it could be connected, there were no programs for its use. The arithmetic coprocessor was not even supposed to be connected.

It is known that this very complex processor had serious hardware errors that were being fixed for years.

Emotional stories about processors for first computers: part 6 (TI TMS9900)
2x2=4, mathematics

Texas Instruments TMS9900

I have never written codes for this very special processor. Though this is the first 16-bit processor available for use in personal computers. It was produced since 1976. It uses a much rarer big-endian order of bytes. This order is used only in processors of Motorola 6800 and 68000 series and in the architecture of giant IBM/370. All other processors in this review use little-endian byte order.

TMS9900 has only three 16-bit registers: the program counter, the status register and the base register for pseudo-registers. This processor uses a dedicated 32-byte memory space as 16 double-byte registers. Such a way of use of memory is somewhat like the zero page memory in the 6502 architecture. Using the base register, the TMS9900 can very quickly change the context. Processor status flags differ in originality, along with typical flags of carry, zero (equality), overflow, parity, there are two more unique flags of logical and arithmetic less. Working with the stack and subroutines resembles the RISC-processors of the incoming future. There is simply no ready to use stack, but you can do it using one of the pseudo-registers. When a subroutine is called, a new value is selected for the counter and base registers, and all three registers are stored in the pseudo-registers of the new context. Thus, calling a subroutine is more like calling a software interrupt. TMS9900 has a built-in interrupt controller, designed to work with hardware interrupts of up to 16.

The first 16-bit home computer – it has even color sprites!

The system of instructions looks very impressive. There are even multiplication and division. The unique X instruction allows you to execute one instruction on any memory address and move on to the next one. The execution of instructions is rather slow, еhe fastest instructions require 8 cycles, arithmetic instructions – 14, but multiplication (16*16=32) for 52 cycles and especially division (32/16=16,16) for 124 cycles were probably the fastest among processors of the 70s.

TMS9900 requires three supply voltages of -5, 5 and 12 volts and four phases of the clock signal – these are the antirecords among the processors known to me. In 1979, this processor was demonstrated to IBM specialists, who then were looking for a processor for the IBM PC prototype. The obvious drawbacks of the TMS9900 (addressability of only 64 KB of memory, lack of the necessary controllers, relative slowness) made an appropriate impression and Intel 8088 was chosen for the future leader among PCs. To deal with the lack of controllers, Texas Instruments also produced the TMS9900 with an 8-bit bus, TMS-9980, which works 33% slower.

TMS9900 is used in fairly popular in the USA computers TI99/4 and TI99/4A, which were "crushed" in the price war by the computer Commodore VIC-20 by 1983. Curiously, as a result of this war, Texas Instruments was forced to cut prices on its computer to the incredible for 1983 $49 (in 1979 the price was $1150!) and sell them with a big loss for themselves. As an example, we can mention the relatively unpopular Commodore +4 computer, which ceased to produce in 1986, but the price of which fell to the $49 only in 1989. T99/4A was ceased to be produced in 1984, when, because of the ultra-low prices, it began to gain popularity. This computer might only be conditionally called 16-bit: it has only 256 bytes (!) of RAM and all ROM memory addressable through a 16-bit bus. The rest of the memory and I/O-devices work over a slow 8-bit bus. Therefore, it is possible to more correctly consider BK0010 as the first home 16-bit computer. It is interesting that the TI99/4 and TI99/4A use a processor at a frequency of 3 MHz – exactly the same as the BK0010 uses.

In TI-99/4 and TI99/4A, a rather successful TMS9918 chip was used as a video controller, which became the basis for the very popular worldwide MSX standard, as well as for some other computers and game consoles. In the Japanese company Yamaha, this video chip was significantly improved and was subsequently used, in particular, to upgrade the TI-99/4 and TI99/4A themselves!

TI99/4 series is a rare example of computers where the processor and computer manufacturer was the same.

Emotional stories about processors for first computers: part 5 (Motorola 6800 family)
2x2=4, mathematics

Motorola 6800 and close relatives

Motorola's processors have always been distinguished by the presence of several very attractive "zests", while at the same time there are the presence of some absurd abstraction and poor practicality of architectural solutions. The main "zest" of all processors under consideration is the second complete and very fast register-accumulator.

6800 because of the oneness of the cumbersome 16-bit index register for the 8-bit architecture turned out to be the product inconvenient for programming and use. It was released in 1974, not much later than 8080, but it did not become the basis for any known computer system. Interestingly, the 6502 developers, Chuck Peddle and Bill Mensch, called the 6800 not right, “too big.” However, it and its variants were widely used as microcontrollers. Perhaps here it is worth noting that Intel has been producing processors since 1971, which put Motorola in the position of a catch-up party, for which the 6800 was the very first processor. And if you compare the 6800 not with the 8080, but with its predecessor 8008, then the 6800 will be much preferable. Motorola almost caught up with Intel with 68000/20/30/40. We can also note that in the 70s, Motorola was a significantly larger company than Intel.

The 6809 was released in 1978, when the 16-bit era began with 8086, and has a highly developed command system, including multiplying two accumulators to obtain a 16-bit result in 11 clock cycles (for comparison, 8086 requires 70 clock cycles for such an operation). Two accumulators can in several cases be grouped into one 16-bit, which gives fast 16-bit instructions. 6809 has two index registers and a record number of addressing methods among 8-bit processors – 12. Among the addressing methods are unique for 8-bit chips, such as index with auto-increment or decrement, relative to the command counter, index with offset. 6809 has an interesting opportunity to use two types of interrupts: you can use fast interrupts with automatic partial register saving and interrupts with all registers saving. 6809 has three inputs for interrupt signals FIRQ (fast maskable), IRQ (maskable), NMI (non-maskable). Also, it's sometimes convenient to use fast instructions for reading and setting all flags at once.

However, memory operations require a clock cycle greater than 6502. Index registers have remained bulky 16-bit "dinosaurs" in the 8-bit world. Some operations simply shock with their slowness, for example, sending one byte from one accumulator to another takes 6 clock cycles, and the exchange of their contents – 8 clock cycles (compare with 8080, where 16-bit exchange passes for 4 clock cycles)! For some reason, two stack pointers are offered at once, perhaps it was the influence of the dead-end architecture VAX-11 – in an 8-bit architecture with 64 KB of memory looks very awkward. And even the existence of an instruction with an interesting name SEX of all problems 6809 cannot eliminate. In general, 6809 is still somewhat faster than 6502 at the same frequency, but it requires the same memory speed. I managed to make a division procedure for 6809 with 32-bit divisible and 16-bit divider (32/16 = 32,16) for just over 520 cycles, for 6502 I could not achieve less than 650 clock cycles. The second accumulator is a big advantage, but other 6502 features, in particular, inverted carry flag, reduce this advantage only to the aforementioned 25%. But multiplication by a 16-bit constant turned out to be slower than a table multiplication for 6502 with a table of 768 bytes. 6809 allows you to write quite compact and fast codes using the direct page addressing mode, but this mode makes the codes a bit tangling. The essence of this addressing is to set the high byte of the data address in a special register and specify only the low byte of the address in the commands. The same system with only a fixed high byte value is used in 6502, where it is called zero page addressing. The direct page addressing is an exact analogue of the use of the DS segment register in x86 only not for 64K segments, but for segmenties sized only of 256 bytes. Another artificiality of the 6800 architecture is the use of the order of bytes from major to minor (Big Endian), which slows down 16-bit addition and subtraction operations. 6809 is not fully compatible with 6800 instruction codes. 6809 became the last 8-bit processor from Motorola. In further developments, it was decided to use 68008 instead of it.

We can assume that Motorola spent a lot of resources to promote the 6809. This affects so far at the mention of this processor. About 6809 there are many favorable reviews, notable in some fuzziness, generalizations, and inconsistency. 6809 was positioned as an 8-bit super-processor for micromainframe. For it was even made almost Unix, OS-9 and UniFlex. He was chosen as the main processor for Apple Macintosh and, as follows from the films about Steve Jobs, only his emotional intervention determined the transition to a more promising 68000. Indeed, 6809 is a good processor, but in general, only slightly better than its competitors appeared much earlier: 6502 (three years earlier) and z80 (two). One can only guess what would have happened if Motorola had spent at least half of its efforts on the development and promotion of the 6809 on the development of the 6502.

The 6809 has been used in several fairly well-known computer systems. The most famous among them is the American computer Tandy Color or Tandy Coco, as well as their British or more precisely Welsh clone Dragon-32/64. The computer markets of the 1980s were notable for a significant non-transparency and Tandy Coco was distributed mainly only in the US, and Dragons besides Britain itself gained some popularity in Spain. In France, 6809 for some reason became the basis for mass computers of the 80s, the Thomson series, which remained virtually unknown anywhere else except France. 6809 was also used as a second processor at least in two systems: in the series Commodore SuperPET 9000 and in a produced in very limited number and now almost forgotten TUBE-interface device of BBC Micro computers. This processor was used in other systems less well known to me, in particular, Japanese ones. It has also gained some popularity in the world of gaming consoles. It is worth mentioning one of these consoles, Vectrex, which uses a unique technology – a vector display.

Tandy CoCo 3

6800 and 6809 have interesting undocumented instructions with the interesting name Halt and Catch Fire (HCF), which are used for testing at the electronics level, for example, with an oscilloscope. Its use causes the processor to hang, from which it is possible to exit only by its reset. These processors also have other undocumented instructions. In 6800 there are, for example, instructions that are symmetrical to register immediate loading, i.e. instructions for immediate storing a register to the address following this instruction!

Like 8080, 8085 or z80, 6809 is very difficult to call a pure 8-bit one. And 6309 is even formally difficult to call 8-bit, it was produced by the Japanese company Toshiba (I was not able to find the exact year of the beginning of its production, but there are some data pointing to 1982) as a processor fully compatible with the 6809. However, this processor could be switched to a new mode, which, while maintaining almost full compatibility with the 6809, provided almost an order of magnitude more opportunities. These features were hidden in the official documentation but were published in 1988 on Usenet. Two more accumulators were added, but the instructions with them are much slower than with the first two. The execution time of most instructions is greatly shortened. A number of commands were added, among which is a really fantastic division for the processors of this class – it is signed division of a 32-bit dividend and a 16-bit divisor (32/16 = 16,16) for 34 cycles, with the divisor being taken from memory. As well 16-bit multiplication with a 32-bit result for 28 clocks appeared. Also, very useful instructions were added for quick copying blocks of memory with a runtime of 6 + 3n, where n is the number of bytes to be copied, you can copy both with decreasing and with increasing addresses. The same instructions can also be used to quickly fill the memory with a specified byte. When they are executing, interrupts may occur. New bit operations, a zero-register etc. appeared too. Interrupts are now invoked when executing an unknown instruction and when dividing by 0. In a sense, 6309 is the pinnacle of technological achievements among 8-bit processors or more precisely processors with the addressable memory size of 64 KB.

The 6309 is electrically fully compatible with the 6809, making it a popular upgrade for color Tandy or Dragons. There are also special OS versions that use the new features of 6309.

Emotional stories about processors for first computers: part 4 (Zilog Z80)
2x2=4, mathematics

Zilog Z80

This processor became along with 6502 the main processor of the first personal computers. There are no dramatic events in the history of its appearance and use. There is only some mystery in the fact of the departure of the founder of Zilog, Federico Faggin, from Intel and in the further relations between these firms. There is still some sort of intrigue in the failure of Zilog to make the next generation of processors. Z80 was begun to produce in 1976 and its variants are still produced. Once even Bill Gates himself announced support for systems based on the z80.

Z80 is more convenient for inclusion in computer systems than the 8080. It requires only one power supply voltage and has built-in support for the regeneration of dynamic memory. In addition, though it is fully compatible with the 8080, it has a lot of new commands, a second set of basic registers and several completely new registers. It is interesting that Zilog refused to use the 8080 assembler mnemonics, and began to use their own mnemonics, more suitable for the extended command system of z80. A similar story happened to the Intel x86 assembler in the GNU software world, for some reason they also use their own conventions for writing programs in assembler by default.

Among the new z80 commands, the block memory copy commands for 21 cycles per byte are especially impressive, as well as an interesting search for a byte in memory instruction. However, EXX instruction is the most interesting, it swaps the contents of 48 bytes of register memory, registers BC, DE, HL with their counterparts in just 4 cycles! Even 32-bit ARM will need at least 6 cycles for the same operation. The remaining additional instructions are not so impressive, although they can sometimes be useful. Additionally added the following commands:

  • 16-bit subtraction with borrow and 16-bit addition with carry for 15 clocks;
  • unary minus for the accumulator for 8 clocks;
  • possibility to read from memory and write to it, using registers BC, DE, SP, IX, IY – not just HL;
  • shifts, rotates and input-output for all 8-bit registers;
  • instructions to check, set and reset a bit by its number;
  • jumps with offsets (JR);
  • a loop instruction.

Most new commands are rather slow, but using them right can still make the code somewhat faster and significantly more compact. This particularly applies to the use of new 16-bit registers IX and IY, which can be used for new addressing modes.

Many 8080 commands in the z80 became faster by one clock and this is a very noticeable acceleration. But the basic for 16-bit arithmetic, ADD instruction became slower by one clock, which makes arithmetic in general, if faster, only slightly.

The system of working with interrupts became much more interesting than that available at 8080. With z80, you can use both non-maskable interrupts and three methods (one of them is compatible with 8080) to work with masked ones. The masked interrupts mode 2 is the most interesting, as it allows you flexibly to change the address of the code to handle the interrupt.

Z80 has a number of undocumented instructions, some of them were documented by some firms and in fact become part of standard instructions. Especially useful are instructions that allow you to work with individual bytes of clumsy 16-bit registers IX and IY.

Of course, the z80 is even more so than the 8080 has the right to be called slightly 16-bit. The hypothetical bit index of z80 is clearly slightly higher than for 8080, but it is paradoxical that the ALU of the z80 is actually 4-bit! At the electronic level, the z80 and 8080 are completely different chips.

Much has been written about the comparison of the performance of z80 and 6502, as these processors were very widely used in the first mass computers. In this topic there are several difficult moments and without understanding them it is very difficult to maintain objectivity. Due to the presence of a rather large number of registers, z80 is naturally used at a frequency higher than memory. Therefore, z80 at 4 MHz can use the same memory as 6502 or 6809 at 1.3 MHz. According to many experienced programmers who wrote codes for both processors, at the same frequency 6502 is on average about 2.4 to 2.6 times faster than z80. The author of this material agrees with this. Just need to add that writing good, fast codes for the z80 is very difficult, you need to repeatedly optimize the use of the registers, and to work with memory as much as possible using the stack. If you really try, then, in my opinion, you can reduce the difference between z80 and 6502 to about 2.2 times. And if you do not try and ignore timings, then you can easily get the difference up to 4 times. In some individual cases, z80 can show very fast work. On the task of filling memory using PUSH instruction Z80 can be even slightly faster than 6502, but this is at the cost of prohibiting interrupts. On copying memory blocks, the z80 is only 1.5 times slower. It is especially impressive that in the division of the 32-bit divisible by the 16-bit divider z80 is slower only 1.7 times. By the way, such a notable division was implemented by a programmer from Russia. Thus, we get that ZX Spectrum with z80 at 3.5 MHz is about 1.5 times faster than C64 with 6502 at 1 MHz. It should also be noted that some of the ticks in most systems with z80 or 6502 are taken from the processor to the schemes for generating the video signal, for example, because of this, the popular Amstrad CPC/PCW computers have the effective processor frequency of 3.2 MHz, not full 4. On 6502 systems, you can usually turn off the screen for maximum processor performance. If we take as a basis the frequency of memory, not the processor, it turns out that z80 25-40% faster than 6502. The last result can be illustrated by the fact that with memory with a frequency of 2 MHz z80 can operate at a frequency of up to 6 MHz, and 6502 only up to 2 MHz.

Z80 was used in a very large number of computer systems. In the USA, Tandy TRS-80 was very popular, in Europe – ZX Spectrum, and later Amstrad CPC and PCW. Interestingly, the Amstrad PCW computers have maintained their importance until the mid-90s and massively and actively used for its intended purpose until the late 90s. In Japan and other countries produced quite successful around the world MSX computers. The rather popular C128 could also use the z80, but in this case the users were arranged in a rather embarrassing – this late 1985 release, the 8-bit computer with z80, officially clocked at 2 MHz, really only works at 1.6 MHz. It's slower even than the first systems of the mid-70's based on the 8080. The range of computers for using the operating system CP/M has at least three dozen fairly well-known systems.

Such a PC looked decent even in the mid-90's, but its z80 is slower than that in the ZX Spectrum

The fastest known to me computer system based on the z80 is BBC Micro with TUBE z80B at 6 MHz second processor. It was produced since 1984. The processor in this system runs at full speed, as it possible to say, "without brakes".

Since the second half of the 80's, work began on optimizing the cycles and adding new commands for the z80. In Japan, for MSX TurboR systems, the R800 processor was made compatible with the z80, but without delays in the cycles. In a sense, we can say that the R800 managed to execute the z80 command system with timings of 6502. In addition, in the R800 was added hardware 16-bit multiplication with 32-bit result. Although, when multiplying a 16-bit constant, table multiplication with the table of 768 bytes is one clock faster. Surprisingly, this fantastic processor of the late 80's, comparable in speed to 80386, was not tried to use in European or American systems. In the 21st century are produced z80 clones with timings almost like R800. They are used in various equipment, in particular, in network cards.

Zilog did work on improving the Z80 very inconsistently and extremely slowly. The first Z80 worked at frequencies up to 2.5 MHz, soon appeared Z80A had limiting frequency 4 MHZ – these processors became the basis for most popular computers using Z80. Z80B appeared by 1980 but was used relatively rarely, for example, in the mentioned second processor card for the BBC Micro or in the late (1989) computer Sam Coupé. Z80H appeared by the mid-80s and could operate at frequencies up to 8 MHz. It was not used in well-known computers.

A deeper upgrade of the z80 was hampered by the desire of Zilog to create processors that are competitive with 16-bit Intel processors. In 1978, a little later than the 8086, the Z8000 was released. It was not compatible with the z80. This processor was unable to withstand competitors from Intel and, especially, Motorola – 68000 surpassed the Z8000 in almost all parameters, although the Z8000 was used in about a dozen different low-cost systems, usually for working with Unix variants. Interestingly, IBM did not even consider the Z8000 as a possible processor for the IBM PC, since Zilog was funded by Exxon, which was going to compete with IBM. Perhaps due to the lack of success of the Z8000 Zilog became an Exxon subsidiary by 1980. There was also an attempt to create a competitive 32-bit processor. In 1986, the Z80000 appeared, compatible with the Z8000, which has never been used anywhere.

Creating a new processor based on the Z80 was postponed until 1985 when the Z800 was made. However, then the main efforts of Zilog were directed at the Z80000 and the Z800 was released very few. In 1986, after the failure of the z80000, the Z280 was released, an insignificantly improved version of the Z800, which, in particular, could work on the internal frequency several times higher than the bus frequency – this new idea brought a big success to the Intel 486DX2 and 486DX4 processors. But, perhaps because of poor performance – the Z280, despite many technological innovations, could use only relatively low clock frequencies, this processor also has not found any use anywhere. It is considered that the Z280 roughly matched the capabilities of the Intel 80286, but was significantly, at least 50% slower when using the same clock speed as used with 80286. Perhaps, if the Z280 appeared 5 years earlier, it could be very successful.

The greatest success was achieved thanks to cooperation with the Japanese company Hitachi, which in 1985 released its super-Z80, HD64180, similar in capabilities with Intel 80186, which allowed the use of 512 KB of memory, added a dozen new instructions, but at the same time some almost standard undocumented Z80 instructions were not supported. Zilog received a license for HD64180 and began to produce them with the marking Z64180. Zilog managed to slightly improve this processor, in particular, to add support for working with 1 MB of memory and release it by the end of 1986. This new processor was called Z180 and became the basis for a family of processors and controllers, with clock frequencies up to 33 MHz. It was used in some rare MSX2 computers, but more as a controller. It is a curious coincidence that Z280 and Z180 appeared in one year, as well as their approximate counterparts 80286 and 80186 four years before. In 1994, a 32-bit Z380 was made on the basis of the Z180, which retained compatibility with the z80 and roughly corresponds to the capabilities of the Intel 80386 or Motorola 68020 – Zilog showed a lag behind competitors in almost 10 years. For the 21st century, again on the basis of the Z180, the successful eZ80 controller-processors have been manufactured with timings almost like the R800. They are used in various equipment, in particular, in network cards, DVD-drives, calculators, ...

Emotional stories about first processors for computers: part 3 (Motorola 68k)
2x2=4, mathematics

Motorola: from 68000 to 68040

Motorola is the only company that could successfully compete with Intel processors in the field of production of processors for personal computers for some time.

68000 was released in 1979 and at first glance looked much more impressive than 8086. I could address 16 MB of memory directly, which did not create any restrictions, for example, for large arrays. However, careful analysis of features of 68000 shows that not everything is as good as it seems. In those years, to have a memory of more than 1 MB is an unattainable luxury even for medium-sized organizations. The 68000 code density is worse than for 8086, which means that 68000 codes with the same functionality occupy more space. The latter is also due to the fact that codes for 68k should be multiples of 2 bytes in length, and for x86 – 1. But the information about the code density is controversial – there are some points showing that in some cases 68000 could have the better code density. Out of 16 registers there are 8 address registers, in some respect they are slightly more advanced analogues of x86 segment registers. The ALU and data bus are 16-bit, so operations with 32-bit data are slower than someone could expect. The execution time of register-register operations is 4 cycles, and for 8086 is only 2.

As always, with products from Motorola, the architecture of the 68000 shows a few clumsiness and contrived oddities. For example, there are two stacks and two carry flags (one for condition checks and another for operations). Some operations are irritated by their non-optimization, for example, CLR instruction of writing zero to memory is slower than writing a constant 0 to memory with MOVE instruction or shift to the left is slower than adding an operand to itself. There are some almost unnecessary commands, for example, there are both arithmetic and logical shifts to the left. Even the address registers, while seemingly superior to the 8086 segment registers, have a number of annoying disadvantages. For example, they needed to load as much as 4 bytes instead of two for 8086 and of these four, one was extra.

The codes for Motorola look somewhat more cumbersome and clumsy compared to x86 or ARM. On the other hand, 68000 is faster than 8086, according to my estimates by about 20-30%. The 680x0 codes, however, are inherent some special beauty and elegance, less mechanicality, than x86. Additionally, as shown by eab.abime.net experts, the code density of 68k is often better than that of x86.

Overall, the 68000 is a good processor, with a large instruction set. It was used in many of the now legendary personal computers: the first Apple Macintosh computers that were produced before the early 90's, the first Commodore Amiga multimedia computers, in relatively inexpensive and high-quality Atari ST computers. 68000 was also used in relatively inexpensive computers working with Unix variants, in particular, in the rather popular Tandy 16B. Interestingly, IBM simultaneously with the development of the PC led the development of the System 9000 computer based on the 68000, which was released less than a year after the PC.

68010 appeared clearly belatedly, only in 1982, at the same time Intel released 80286, which put personal computers on the level of a mini-computer. 68010 is compatible with the plug with 68000, but the system of its instructions is slightly different, so the replacement of 68000 by 68010 has not become popular. The incompatibility was caused by a contrived reason to bring the 68000 into more correspondence with the ideal theory of virtualization. 68010 is only slightly, no more than 10% faster than 68000. Obviously, the 68010 was badly losing 80286 and was even weaker than the 80186 that appeared in the same year. Like the 80186, 68010 almost never found a use in personal computers.

68008 was also released in 1982, probably with the hope of repeating the success of the 8088. It's 68000, but with an 8-bit data bus, which allowed it to be used in cheaper systems. But 68008, like 68000, does not have an instruction queue, which makes it about 50% slower than 68000. Thus, the 68008 may even be a little slower than the 8088, which is only about 20% slower than the 8086 due to the presence of the instruction queue.

Based on it, Sir Clive Sinclair made Spectrum QL – a very interesting computer that, because of the lower price, could compete with Atari ST and similar computers. But Clive, in parallel and clearly prematurely, began to invest a lot in the development of electric vehicles, leaving QL (Quantum Leap) rather as a secondary task, which, in the presence of some unsuccessful constructive decisions, led the computer and the whole company to premature closure (the company became part of Amstrad, which refused to produce QL).

It would be interesting to calculate the bit index for 68000, it seems to me, clearly higher than 16, although rather not higher than 24.

Appearing in 1984, 68020 again returned Motorola to the first positions. In this processor, many very interesting and promising innovations were realized. The strongest effect is certainly the instruction pipeline, which sometimes allows you to execute up to three instructions at a time! The 32-bit address bus looked a little premature in those years, and therefore a cheaper version of the processor (68020EC) with a 24-bit bus was available. But the 32-bit data bus looked quite appropriate and allowed to significantly speed up the processor. The built-in cache appeared to be an innovation even though it was a small 256 bytes of capacity, which allowed to significantly improve the performance because the main dynamic memory could not keep up with the processor. Quick enough operations for division (64/32 = 32,32) and multiplication (32*32 = 64), for approximately 80 and up to 45 cycles, respectively, were added. The timings of the instructions were generally improved, for example, the division (32/16 = 16,16) began to be performed for approximately 45 cycles (more than 140 cycles in 68000). Some instructions in the most favorable cases can be performed without occupying clocks at all! New address modes were added, in particular, with scaling – in x86 this mode appeared only in the next year at 80386. Another new address modes allow the use of double indirect addressing, using several offsets, – PDP-11 has been remarkably outdone here.

But some new instructions, for example, bulky operations with bit fields or new operations with decimal numbers that have become little needed in the presence of rapid division and multiplication, looked more like a fifth wheel of a bus than something essentially useful. Address modes with double indirect addressing theoretically look interesting, but practically are needed quite rarely and are executed very slowly. Unlike 80286, the 68020 takes time to compute the address of the operand, the so-called effective address. The division at 68020 is still almost twice as slow as the fantastic division at 80286. Multiplication and some operations are also slower. The 68020 doesn't have a built-in memory management unit and rather the exotic ability to connect up to eight coprocessors couldn't fix this.

68020 was widely used in mass computers Apple Macintosh II, Macintosh LC and Commodore Amiga 1200. It was also used in several Unix systems.

The appearance of the 80386 with a built-in and very well-made MMU and 32-bit buses and registers, again put Motorola in position number 2. 68030, appearing in 1987, for the last time, briefly returned the leadership to Motorola. The 68030 has a built-in memory management unit and a doubled cache, divided into a cache for instructions and data – it was a very prospective novelty. In addition, the 68030 could use a faster memory access interface, which can speed up memory operations by almost a third. Despite all the innovations, the 68030 turned out to be somewhat slower than the 80386 at the same frequency. However, the 68030 was available at frequencies up to 50 MHz, and 80386 only up to 40 MHz, which made top systems based on the 68030 slightly faster.

68030 was used in computers of the Apple Macintosh II series, Commodore Amiga 3000, Atari TT, Atari Falcon and some others.

With 68040 Motorola once again tried to outperform Intel. This processor appeared a year later after 80486, but on a set of useful qualities, it was never able to surpass it. In fact, Motorola, having a more overloaded system of instructions, was not able to support it and in a sense has disappeared from the race. In 68040, only a very truncated coprocessor could be placed to work with real numbers, and the chip itself was heated significantly more than 80486. 68040 almost did not find applications in popular computers. Some noticeable use was found only by its cheaper version – 68LC040, which does not have a built-in coprocessor. However, the first versions of this chip had a serious hardware defect, which did not allow using even the software emulation of the coprocessor!

Motorola always had problems with mathematical coprocessors. As was mentioned above Motorola never released such a coprocessor for the 68000/68010, while Intel released its very successful 8087 since 1980. But to get a significant performance boost, the code for 68882 needs to be compiled differently than for 68881.

It is appropriate to say that Intel still has problems with the mathematical coprocessor – the accuracy of calculations of some functions, for example, the sine of some arguments is very small, sometimes no more than 4 digits. Therefore, modern compilers often calculate such functions without using the services of the coprocessor.

Emotional stories about first processors for computers: part 2 (DEC PDP-11)
2x2=4, mathematics

Processors of DEC PDP-11

Since the early 70's in the world began a 10-year era of domination of the company DEC. DEC computers were significantly cheaper than those produced by IBM and therefore attracted attention from small organizations for which IBM systems were unaffordable. With these computers also begins the era of mass professional programming. PDP-11 computer series was very successful. Various PDP-11 models were produced from the early 70's to the early 90's. They were successfully cloned in the SU and became the first mass popular computer systems there. Some of SU made PDP-11 compatible computers have several unique traits. For example, several models like DVK are rather personal computers than minicomputers and several models like UKNC and BK are pure personal computers. BTW the mentioned BK became the first PC available for SU ordinary people to buy since 1985.

DEC also promoted the more expensive and complex computers of the VAX-11 family, the situation around which was somewhat politicized. And from the second half of the 70s, DEC practically stopped development in the PDP-11 line, in particular, the support of hexadecimal numbers for the assembler was not introduced. The performance of PDP-11 systems has also remained virtually unchanged since the mid-70s.

PDP-11 used various processors compatible with the main command system, for example, LSI-11, F-11, J-11. In the late 70's, DEC made a cheap processor T-11 for microcomputers. However, for unclear reasons, despite the seemingly large and high-quality software that could eventually be transferred to the system using it, it was not seen by the manufacturers of computer systems. The only exception was one model of the Atari gaming console. The T-11 found itself a mass application only in the world of embedded equipment, although in terms of capabilities it was slightly higher than the z80. SU produced processors K1801VM1, K1801VM2, K1801VM3, ... similar to DEC processors and also exact copies of DEC processors. The latter were much more expensive and were produced in small quantities.

The PDP-11 processor command system is almost completely orthogonal, a pleasant quality, but when it is brought to the extreme, it can create ridiculous commands. The command system of the PDP-11 processors has had an impact on many architectures and in particular on the Motorola 68000.

PDP-11 system of commands is strictly 16 bit. All 8 general purpose registers (and the program counter in this architecture is the usual R7 register) are 16 bit, the processor status word (it contains typical flags) 16 bit too, the size of instructions is from 1 to 3 16-bit words. Any operand of an instruction can be (although there are exceptions, for example, the XOR instruction) any type – this is orthogonality. Among the types are registers and memory locations. SU's programmers in the 80s sometimes didn't understand why Intel x86 instruction system misses memory to memory types of instructions. This was the influence of the PDP-11 school, where you can easily write the full addresses of each operand. This, indeed, is slow and especially slow for systems with typical since the early 90's slow RAM. It is possible to form a memory address using a register, a register with an offset, a register with autoincrement or autodecrement. Particularity of the PDP-11 instruction system is a possibility to use double indirect access to memory through a register, for example, MOV @(R0)+,@-(R1) means the same as the operator **–r1 = **r1++; of the C/C++ programming languages, where r0 and r1 are declared as signed short **r0, **r1;.

Another example, an instruction MOVB @11(R2),@-20(R3) corresponds to **(r3-20) = **(r2+11);, where r2 and r3 are declared as char **r2, **r3;.

In the modern popular architectures, one instruction for such cases can be not enough, it may require at least than 10 instructions. It is also possible to get an address relative to the current value of the program counter. I will give another example with more simple addressing. The x86 instruction ADD [BX+11],16 corresponds to ADD #16,11(R4). In DEC assemblers, it is common to write operands from left to right, unlike Intel, where they write from the right-left. There is a reason to believe that the GNU assembler for x86 was made under the influence of the PDP-11 assembler.

Division and multiplication instructions are only signed and not available on all processors. The arithmetic of the decimal numbers is optional too – it is so-called commercial arithmetic in DEC terminology. As a curiosity of full orthogonality, I will give an example of the command MOV #11,#22, which after execution turns into MOV #11,#11 – it is an example of using a direct constant as an operand. Another curious instruction is a unique instruction MARK which code needs to be placed on the stack and which can never be used explicitly. Calling subroutines in the architecture of the PDP-11 is also somewhat peculiar. The corresponding instruction first saves the allocated register (can be any) on the stack, then saves the program counter in this register and only then writes a new value to the program counter. The return from the subroutine instruction must do the reverse and know which register was used when calling the subroutine. Strange effects can be sometimes obtained using the program counter as a normal register.

It is interesting that among the programmers on PDP-11 there is a culture of working directly with machine codes. Programmers could, for example, work without a disassembler when debugging. Or even write small programs directly into memory, not assembling!

Indeed, instruction timings are not too fast. It was surprising to find out that on a BK home computer the instruction to send from a register to register takes as much as 12 clocks (10 clocks when using the code from ROM), and the instructions with two operands with double indirect addressing are executed for more than 100 clocks. Z80 does 16-bit register transfer for 8 clocks. However, the slowness of BK is caused not so much by the processor, but by the poor quality of SU made RAM, under the features of which BK had to be adapted. If fast enough memory was used, BK would send 16 register bits for 8 clock cycles too. Once there was a lot of controversy, which is faster than BK or Sinclair ZX Spectrum? I must say that the Spectrum is one of the fastest mass 8-bit personal computers when using the top 32 KB of memory. Therefore it is not surprising than the Spectrum is faster than BK, but not much. And if BK worked with fast enough memory it could be even a bit faster.

The code density is also rather a weak point in the PDP-11 architecture. Instruction codes must be multiples of the machine word length – 2 bytes, which is especially frustrating when working with byte arguments or simple commands like setting or resetting a flag.

There were interesting attempts to make a personal computer on the basis of PDP-11 architecture. One of the first PCs in the world that appeared only a bit later that Apple ][ and Commodore PET and rather a bit earlier than Tandy TRS-80, was Terak 8510/a, which has black and white graphics and an ability to load an incomplete variant of Unix. This computer was quite expensive and, as far as I know, was only used in the system of higher education in the USA. DEC itself also tried to make its PC, but very inconsistently. DEC, for example, produced PCs based on z80 and 8088 explicitly playing against its own main developments. PDP-11 architecture based PCs DEC PRO-325/350/380 have some rather contrived incompatibilities with the underlying architecture that impeded the use of some software. Best of all personalization of technologies of mini-computers turned out in the USSR, where produced BK, DVK, UKNC, ... By the way, the Electronica-85 is a quite accurate clone of the DEC PRO-350.

Made in USSR 16-bit home computer (model of 1987) – it is almost PDP-11 compatible

K1801VM2 processor which is used in DVK is about two times faster than K1801VM1, K1801VM3 is even faster and has the performance close to Intel 8086.

Processors of top PDP-11 computers can address up to 4 MB of memory but for one program can be allocated no more than 64 KB. The performance of these processors in the number of operations per megahertz is also close to Intel 8086 too, still slower than it though.

Emotional stories about first processors for computers: part 1 (Intel x86)
2x2=4, mathematics

Intel: from 8086 to 80486

Of course, one of the best processors made in the 70's is definitely the 8086, and also its cheaper almost analogue 8088. The architecture of these processors is pleasantly distinguished by the absence of mechanical borrowings and adherence to abstract theories, the thoughtfulness and balance of architecture, steadiness and focus on further development. Of the drawbacks of the architecture of x86, you can call it a bit cumbersome and prone to an extensive increase in the number of instructions.

One of the brilliant constructive solutions of the 8086 was the invention of segment registers. This, as it were, simultaneously achieved two goals – the "free" ability to relocate codes of programs, up to 64 KB in size (this was even a decent amount for computer memory for one program up to the mid-80's), and addressability up to 1 MB of address space. You can also notice that the 8086, like the 8080 or z80, also has a special address space for 64 KB I/O ports (this is 256 bytes for the 8080 and 8085). Segment registers are only four: for a code, for a stack, and two for data. Thus, 64 * 4 = 256 KB of memory is available for quick use, but it was very much even in the mid-80's. In fact, there is no problem with the size of code, since it is possible to use long subroutine calls with loading and storing a full address from two registers. There is only a limit of 64 KB for the size of one subroutine – this is enough even for many modern applications. Some problem is created by the impossibility of fast addressing to data arrays larger than 64 KB – when using such arrays, it is necessary to load a segment register and an address itself with each access, which reduces the speed of work with such large arrays several times.

The segment registers are implemented in such a way that their presence is almost invisible in the machine codes, so, when time had come, it was easy to abandon them.

The architecture of the 8086 retained its proximity to the architecture of the 8080, which allowed relatively small efforts to transfer programs from 8080 to 8086, and especially if the source code was available.

The 8086's instructions are not very fast, but they are comparable to competitors, for example, the Motorola's 68000, which appeared a year later. One of the innovations, some accelerating rather slow 8086, became instructions queue.

8086 uses eight 16-bit general purpose registers, some of which can be used as two one-byte registers, and some as index registers. Thus, the 8086 registers characterize some heterogeneity, but it is well balanced and the registers are very convenient to use. This heterogeneity, by the way, allows having more dense codes. 8086 uses the same flags as the 8080, plus a few new ones. For example, a flag appeared typical for the architecture of PDP-11 – step-by-step execution.

8086 allows you to use very interesting addressing modes, for example, the address can be made up of a sum of two registers and a constant 16-bit offset, on which the value of one of the segment registers is superimposed. From the amount that makes up the address, you can leave only two or even one summand. Such on the PDP-11 by one command will not turn out. Most commands in the 8086 do not allow both operands of memory type, one of the operands must be a register. But there are string commands that just know how to work with memory using addresses. String commands allow you to do quick block copying (17 cycles per byte or word), search, fill, load and compare. In addition, string commands can be used when working with I/O ports. Very interesting is the idea of ​​using the 8086 instruction prefixes, allowing to use often very useful additional functionality without significantly complicating the encoding schemes of CPU instructions.

8086 has one of the best design of work with the stack among all computer systems. Using only two registers (BP and SP), the 8086 allows solving all problems when organizing subroutine calls with parameters.

Among the commands there are signed and unsigned multiplication and division. There are even unique commands for decimal corrections for multiplication and division instructions. It's hard to say that in the 8086 command system, something is clearly missing. Quite the contrary. The division of a 32-bit dividend by a 16-bit divisor to obtain a 32-bit quotient and 16-bit remainder may require up to 300 clock cycles – not particularly fast, but several times faster than such a division on any 8-bit processors (except 6309) and is comparable in speed with 68000. The division in x86 has one unexpected feature – it corrupts all arithmetic flags.

It's worth adding that in the x86 architecture, the XCHG command inherited from the 8080 has been improved. In addition, the later processors began to use instructions XADD, CMPXCHG and CMPXCHG8B, which can also perform atomic exchange of arguments. Such instructions are one of the features of x86, they are difficult to meet on the processors of other architectures.

It can be summarized that 8086 is a very good processor, which combines the ease of programming and attachment to the limitations on the amount of memory of that time. 8086 was used comparatively rarely, giving way to a cheaper 8088 honorable place to be the first processor for the mainstream PC for the personal computers of our time. 8088 uses 8-digit data bus what did him somewhat slower, but allowed to build systems on its base more accessible to the customers.

Interestingly, Intel fundamentally refused to make improvements to its processors, preferring instead to develop their next generations. One of Intel's largest second source, the Japanese corporation NEC, which was much larger than Intel in the early 80s, decided to upgrade 8088 and 8086, launching V20 and V30 processors pin-compatible with them and about 30% faster. NEC even offered Intel to become its second source! Intel instead launched a lawsuit against the NEC, which, however, could not win. For some reason, this big clash between Intel and NEC is still completely ignored by Wikipedia.

80186 and 80286 appeared in 1982. Thus, it can be assumed that Intel had two almost independent development teams. 80186 is 8086 improved by several commands and shortened timings plus several chips integrated into the chip typical of the x86 architecture: a clock generator, timers, DMA, interrupt controller, delay generator, etc. Such a processor, it would seem, could greatly simplify the production of computers based on it, but due to the fact that the embedded interrupt controller was for some reason not compatible with the IBM PC, it was almost never used on the PC. The author knows only the BBC Master 512 based on the BBC Micro computer, which did not use built-in circuits, even a timer, but there were several other systems using 80186. Addressed memory with 80186 remained as with 8086 sizes in 1 МБ. The Japanese corporation NEC produced analogues 80186, which were compatible with the IBM PC.

80286 had even better timings than 80186, among which stands out just a fantastic division (32/16=16,16) for 22 clock cycles – since then they have not learned how to do the division faster! 80286 supports working with all new instructions of 80186 plus many instructions for working in a new, protected mode. 80286 became the first processor with built-in support for protected mode, which allowed to organize memory protection, proper use of privileged instructions and access to virtual memory. Although the new mode created many problems (the protected mode was rather unsuccessful) and was relatively rarely used, it was a big breakthrough. In this new mode, segment registers have acquired a new quality, allowing up to 16 MB of addressable memory and up to 1 GB of virtual memory per task. The big problem with 80286 was the inability to switch from the protected mode to real mode, in which most programs worked. Using the "secret" undocumented instruction LOADALL, it was possible to use 16 MB of memory being in the real mode.

In 80286, the calculation of an address in an instruction operand became a separate scheme and stopped slowing down the execution of commands. This added interesting features, for example, with the command LEA AX, [BX + SI + 4000] in just 3 cycles it became possible to perform two additions and transfer the result to the AX register!

The number of manufacturers and specific systems using 80286 is huge, but, indeed, the first computers were IBM PC AT with almost fantastic personal computer performance indicators for speed. With these computers, memory began to lag behind the speed of the processor, wait states appeared, but then it seemed still something temporary.

In some 80286, as in 8086/8088, work with interrupts was not implemented 100% correctly, that in very rare cases could lead to very unpleasant consequences. For example, the POPF command in the 80286 always allowed interrupts during its execution, and when executing a command with two prefixes (as an example, you can take REP ES:MOVSB) on 8086/8088/80286 after the interrupt call, one of the prefixes was lost. The bug in the POPF was only in the early 286 processors.

Protected mode of 80286 was extremely inconvenient, divided all memory into segments of no more than 64 KB and required complicated software support for working with virtual memory. 80386, appeared in 1985, made the work in the protected mode quite comfortable, allowed to use up to 4 GB of addressable memory and easily switch between modes. In addition, to support multitasking for programs for the 8086, the virtual 8086 mode was made. For virtual memory, it became possible to use a relatively easy-to-manage page mode. 80386 for all its innovations has remained fully compatible with the programs written for the 80286. Among the innovations of 80386, you can also find the extension of registers to 32-bits and the addition of two new segment registers. The timings have changed, but ambiguously. A barrel shifter was added, which allowed multiple shifts with timings of one. However, this innovation for some reason very slowed down the execution of the commands of cyclic rotates. The multiplication became slightly slower than that of 80286. Working with memory became, on the contrary, a little faster, but this does not apply to string commands that stayed faster for 80286. The author of this material has often had to come across the view that in real mode with 16-bit code 80286 in the end is still a little bit faster than 80386 at the same frequency.

Several new instructions were added to 80386, most of which just gave new ways for work with data, actually duplicating with optimization some already present instructions. For example, the following commands were added:

  • to check, set and reset a bit by number, similar to those that were made for z80;

  • bit-scan BSF and BSR;

  • copy a value with a signed or zero bit extension, MOVSX and MOVZX;

  • setting a value depending on the values of operation flags by SETxx ;

  • shifts of double values by SHLD, SHRD.

X86 processors before the appearance of 80386 could use only short, with an offset of one-byte conditional jumps – this was often not enough. With 80386 it became possible to use offset of two or four bytes, and despite the fact that the code of new jumps became two or three times longer, the time of its execution remained the same as in previous, short jumps.

The support for debugging was radically improved by the introduction of 4 hardware breakpoints, using them, it became possible to stop programs even on memory addresses that may not be changed.

The protected mode became much easier to manage than in 80286, which made a number of inherited commands unnecessary rudiments. In the main protected, so-called flat-mode, segments up to 4 GB in size are used, which turns all segmented registers into an unobtrusive formality. A semi-documented unreal mode allowed even to use all the memory as in flat-mode, but from easy to install and control the real mode.

Since 80386, Intel has refused to share its technology, becoming in fact the monopoly processor manufacturer for IBM PC architecture, and with the weakening of Motorola's positions, and for other personal computer architectures. Systems based on the 80386 were very expensive until the early 90's, when they became finally available to mass consumers at frequencies from 25 to 40 MHz. Since 80386 IBM began to lose its position as a leading manufacturer of IBM PC compatible computers. This was manifested, in particular, in that the first PC based on 80386 was in 1986 a computer made by Compaq.

It's hard not to hold back admiration for the volume of work that was done by the creators of the 80386 and its results. I dare even suggest that 80386 contains more achievements than all the technological achievements of mankind before 1970, and maybe even until 1980.

Quite interesting is the topic of errors in 80386. I will write about two. The first chips had some instructions, which then disappeared from the manuals for this processor and stopped executing on later chips. It's about the instructions of IBTS and XBTS. All 80386DX/SXs produced by both AMD and Intel (which reveals their curious internal identity) have a very strange and unpleasant bug that manifested itself in destroying the value of the EAX register if, after writing to the stack or unloading from there all registers with POPAD or PUSHAD used a command that used the address with the BX register. In some situations, the processor could even hang. Just a nightmare bug and very massive, but in Wikipedia, there is still not even a mention of it. There were other bugs, indeed.

The emergence of ARM changed the situation in the world of computer technology. Despite the problems, the ARM processors continued their development. The answer of Intel was the 80486. In the struggle for speed and for the first place in the world of advanced technologies Intel even took a decision to use a cooling fan that spoils the look of the PC till present time.

In the 80486, timings for most instructions were improved and some of them began to be executed as on ARM processors during one clock. Although the multiplication and division for some reason became slightly slower. Especially strange that a single binary shift or rotation of a register began to run even slower than with 8088! There is quite a big built-in cache memory for those years, with the size of 8 KB. There were also new instructions, for example, CMPXCHG – it took the place of the imperceptibly missing instructions of IBTS and XBTS (interesting, as a secret this instruction was available already at the late 80386). There are very few new instructions – only six, of which it is worth mentioning a very useful instruction for changing the order of bytes in the 32-bit word BSWAP. A big useful innovation was the presence of a built-in arithmetic coprocessor chip – so no one else did.

The first systems based on the 80486 were incredibly expensive. Quite unusual is that the first computers based on the 80486, the VX FT model, were made by the English firm Apricot – their price in 1989 was from 18 to 40 thousand dollars, and the weight of the system unit is over 60 kg! IBM released the first computer based on the 80486 in 1990, it was a PS/2 model 90 with a cost of $17,000.

It's hard to imagine the Intel processors without secret, undocumented officially features. Some of these features have been hidden from users since the very first 8086. For example, such an albeit useless fact that the second byte in the instructions of the decimal correction of AAD and AAM matters and can be different, generally non-decimal (it was documented only for the Pentium processor after 15 years!). It is more unpleasant to silence the shortened AND/OR/XOR instructions with an operand byte constant, for example, AND BX, 7 with an opcode of three bytes length (83 E3 07). These commands, making the code more compact, which was especially important with the first PCs, were quietly inserted into the documentation only for 80386. It is interesting that the Intel's manuals for 8086 or 80286 have a hint about these commands, but there are no specific opcodes for them. Unlike similar instructions ADD/ADC/SBB/SUB, for which the full information was provided. This, in particular, led to the fact that many assemblers (all?) could not produce shorter codes. Another group of secrets may be called some strange thing – a number of instructions have two codes of operations. It is, for example, the instructions SAL/SHL (opcodes D0 E0, D0 F0 or D1 E0, D1 F0). Usually, and maybe always, only the first operation code is used. Second, the secret is used almost never. One can only wonder why Intel so carefully preserves these superfluous, cluttering space of opcodes duplicating instructions? The SALC instruction waited for its official documentation until 1995 almost 20 years! Instruction for debugging ICEBP was officially non-existent for 10 years from 1985 to 1995. Most of all, it was written about the secret instructions LOADALL and LOADALLD – they will remain forever secret, as they could be used for easy access to large memory sizes only on 80286 and 80386 respectively. Until recently, there was an intrigue around the UD1 (0F B9) instruction, which was unofficially an example of an incorrect opcode. The informal has recently become official.

In the USSR, the production of clones of processors 8088 and 8086 was mastered, but it could not fully reproduce 80286.