2x2=4, mathematics

Emotional stories about processors for first computers: part 12 (Preface and Postface)

Prologue and Epilogue

I've happened to program with assemblers of different processors. The last on this list is the Xilinx MicroBlaze. I decided to put some of my observations on the features of these almost magical pieces of iron, which, like a magic key opened the doors for us to the magical land of virtual reality and mass creativity. On the features of modern systems the x86, x86-64, ARM, ARM-64, etc., I will not write, maybe another time – the topic is very large and complex. Therefore, I finish on the Intel 80486 and Motorola 68040. I also wanted to include in the review the IBM/370, which I had to deal with. These systems were quite far from the mass users but had a huge impact on computer technology. They require much time to prepare materials about them, they didn't use chip-processors and there is somehow none of these machines left in existence, therefore they aren't included. I really hope that my materials will also attract the attention of experts, who will be able to add something I have not thought about or didn't know.

As illustrative material, I attach my small stone from Rosetta, tiny programs for calculating the number π on different processors and systems using a spigot-algorithm, claiming to be the fastest of its implementations.

In conclusion, I make several remarks that I have got in the course of writing these articles.

It is difficult to get rid of the feeling that 8-bit processors were only an undesirable necessity for the main characters acting in the 70's and 80's on the stage of computer history. The development of the most powerful 8-bit 6502 was actually frozen. Intel and Motorola rather slowed down their own development of small processors and restrained other developers.

I'm pretty sure that the Amiga or Atari ST would work better and faster using a 4 MHz processor compatible with the 6502 with a 20 or 24 bit address bus than with the 68000. Bill Mensch said recently that it’s easy to make the 6502 at 10 GHz today.

If the Amstrad PCW series (the success of which the Commodore CBM II could have shared) had begun to use the upgraded Z80 at higher frequencies, then it is quite possible that this series would have been relevant 10 years ago.

What would the world be like if the ARM had been made in 1982 or 1983? In my humble opinion it was quite possible.

What would computers made in the SU be like if they had copied and developed not the most expensive but the most promising technologies?

2x2=4, mathematics

Emotional stories about processors for first computers: part 11 (Intel 8080)

Intel 8080 and 8085

The first real processor on a chip, made in the first half of 1974, is still being manufactured and is currently being used. It was repeatedly cloned around the world, in the USSR it had the designation KP580BM80A. Modern Intel processors for the PC still easily reveal their kinship to this in some sense relic product. I myself haven't written codes for this processor but being well acquainted with the architecture of the Z80, I would venture to give some of my comments.

The 8080 instruction system like other Intel processors for the PC can hardly be called ideal, but it is universal, quite flexible and has some very attractive features. The 8080 favorably differed from its competitors, the Motorola 6800 and the MOS Technology 6502, by a large number of even somewhat clumsy registers. The 8080 provided a user with one 8-bit accumulator, a 16-bit semi-accumulator and simultaneously fast index register HL, a 16-bit stack pointer, as well as two more 16-bit registers BC and DE. The BC, DE, and HL registers could also be used as 6 byte-registers. In addition the 8080 had support for an almost full set of status flags: carry, sign, zero and even parity and auxiliary carry. Some commands from the 8080 instruction set had been speed champions for a long time. For example the XCHG command makes the exchange of the contents of the 16-bit DE and HL registers in just 4 clock cycles, it was extremely fast! A number of other commands, although they did not set such bright records, were also among the best for a long time:

  • XTHL – exchange of HL register contents and data at the top of the stack, 18 cycles – it seems like a lot, but even on the real 16-bit 8086 an equivalent of such a command takes at least 26 cycles, and for the 6800 or 6502 such a command is hard to imagine;
  • DAD – add to the semi-accumulator HL the value of another 16-bit register (BC, DE or even SP), 10 cycles. This is a true 16-bit addition with a carry flag set. If you add HL to itself you will get a quick 16-bit shift left or multiplication by 2, which is a key operation for programming both full multiplication and division;
  • PUSH and POP – put in the stack and remove from the stack a 16-bit value respectively from a register or in a register. They perform in 11 and 10 cycles. These are the fastest 8080's operations for working with memory, and when they are executed SP is automatically incremented or decremented. The PUSH can be used for example to quickly fill memory with a pattern with values from 3 registers (BC, DE, HL). There are no stack instructions for working with 8-bit values at all;
  • LXI – a loading of a 16-bit constant into a register (HL, DE, BC, SP) for 10 cycles;
  • RNZ, RZ, RNC, RC, RPO, RPE, RP, RM – conditional returns from any subroutine, allow to make code cleaner eliminating the need to write extra conditional jumps. These commands were abandoned in the x86 architecture, but they should probably have been saved, the code with them turns out nicer.

This processor was used in the first 'almost personal computer' the Altair 8800, which became very popular after the journal publication in early 1975. By the way in the USSR a similar publication happened only in 1980 and corresponding to it in relevance only in 1986.

The first almost PC

The Intel's 8080 became the basis for the development of the first mass professional operating system CP/M, which occupied a dominant position among microcomputers for professional work until the mid-80's.

Now about the shortcomings. The 8080 required three supply voltages of -5, 5, and 12 volts. Working with interrupts was clumsy and slow. In general the 8080 was rather leisurely if you compare it with competitors which soon appeared. The 6502 could be up to 3 times faster when working on the same frequency as the 8080.

However in the architecture of the 8080 was laid as it turned out the correct vision of the future, namely it was a vision of a fact unknown in the 70's that processors would be faster than memory. The 8080's DE and BC registers are a prototype of modern caches with manual control, rather than general-purpose registers. The 8080 could use 2 MHz frequency, while competitors could only use 1 MHz, which reduced the performance difference between them.

It's hard to call the 8080 a 100% 8-bit processor. Indeed its ALU is 8 bits wide, but there are many 16-bit commands that work faster than if you use only 8-bit counterparts instead, and for some instructions there are no 8-bit analogs at all. The XCHG instruction is essentially and by timing 100% 16-bit and there are real 16-bit registers. Therefore I venture to call the 8080 partially 16-bit. It would be interesting to calculate this processor's bit index based on the set of its features, but as far as the author knows, no one has still done such work.

The author of this text does not know the reasons why Intel abandoned direct support of the 8-bit PC's with their processors. Intel has always distinguished the complexity and ambiguity of the policy. Its connection with politics in particular is illustrated by the fact that for a long time Intel has had fabs in Israel and until the end of the 90's it was secret. Intel practically did not try to improve the 8080, only the clock frequency was raised to 3 MHz. In fact the 8-bit computer market was given to Zilog with the z80 processor which was related to the 8080, and the z80 was able to quite successfully withstand the main competitor, The Terminator 6502.

In the USSR and Russia the domestic clone of the 8080 became the basis of many popular computers that remained popular until the early 90s. Those are of course the Radio-86RK, Mikrosha, the multicolor Orion-128, Vector, and Corvette. Eventually cheap and improved ZX Spectrum clones based on the z80 won the clone wars.

This is a real PC

In early 1976 Intel introduced the 8085 processor, compatible with the 8080, but significantly superior to its predecessor. In it the power supply of -5 and 12 volts has become unnecessary and the connection scheme has been simplified, work with interrupts has been improved, the clock frequency has been used from 3 to a very solid 6 MHz, the command system has been expanded with very useful instructions: 16-bit subtraction, 16-bit shift right for only 7 cycles (it was very fast), 16-bit rotate left through the carry flag, loading of a 16-bit register with an 8-bit offset (this instruction is possible to use with the stack pointer too), writing of the HL register contents to an address in the DE register, analogous reading of the HL via an address in the DE. All the instructions mentioned above, except for the shift to the right, are executed in 10 cycles – this is sometimes significantly faster than their counterparts or emulation on the Z80. Some more instructions and even two new processor status flags were added. Among the new flags it is worth noting the overflow flag, although the work with it was almost not supported. In addition many instructions for working with byte data were accelerated by 1 clock cycle. This was very significant as many systems with the 8080 or Z80 used wait states, which due to the presence of extra cycles on the 8080 could stretch the execution time by almost twice. For example in the mentioned computer Vector, register-register instructions were performed for 8 cycles, and if there were the 8085 or Z80, then the same instructions would be executed only in 4 cycles. The XTHL instruction became faster even by two cycles. With the new instructions you can write code to copy a block of memory that runs faster than the Z80's LDI/LDD commands! However, some instructions, for example a 16-bit increment and decrement, the PUSH and conditional returns became slower by a cycle.

The 8085 has built-in support for interrupts, which in many cases eliminates the need for a separate interrupt controller in a system, and a serial I/O port. As already noted in the 8085 the full support was not added for the overflow flag, so the arithmetic of signed numbers remained somewhat incomplete.

However I can repeat the statement "for unknown reasons" Intel refused to promote the 8085 as the main processor for PC's. It was only in the 80's that some fairly successful 8085-based systems appeared. The IBM System/23 Datamaster first appeared in the 1981, it was a predecessor and almost a competitor to the IBM PC. Then in 1982 a very fast computer with excellent graphics, the Zenith Z-100, was released, in which the 8085 was running at 5 MHz. In 1983 Japanese company Kyotronic created a very successful KC-85 laptop, versions of which were also produced by other companies: Tandy were producing the TRS-80 model 100, NEC – the PC-8201a, Olivetti – the M-10. In total they released perhaps more than 10 million of these computers! In Russia in the early 90's on the basis of domestic clone the ИM1821BM85A there were attempts to improve some systems, for example, the computer Vector. Surprisingly the main processor of the Sojourner rover, which reached the surface of Mars in 1997, was the 8085 at 2 MHz!

In fact Intel gave way to the Z80 in the 70's. A few years later in the battle for the 16-bit market Intel behaved quite differently, starting a lawsuit to ban sales of the V20 and V30 processors in the United States. Interestingly the mentioned processors of Japanese company NEC could switch to full binary compatibility with the 8080, which made them the fastest processors of the 8080 architecture.

Another secret from Intel is the refusal to publish an extended command system, including support for two new flags. However one of the official manufacturers of these processors has published the entire system of instructions. What are the reasons for this strange refusal? One can only guess. Could Zilog then have played a role, that AMD might have once played, and created the ostensible appearance of competition while the 8085 could have brought down Zilog? Was it maybe about wanting to keep the system of instructions closer to the 8086 then being designed? The latter seems doubtful. The 8086 was released more than 2 years after the release of the 8085 and it’s hard to believe that in 1975 the system of its commands was already known. And in any case compatibility with both the 8080 and 8085 on the 8086 is achievable only with the use of a macro processor, sometimes replacing one of the 8080's or 8085's instruction with several of its own. Moreover the two published new instructions of the 8085 in the 8086 are not implementable at all. It is especially difficult to explain why Intel did not publish information about new instructions after the release of the 8086. We can also assume that most likely it was in the marketing. Due to artificially worsening specifications of the 8085, they received on this background a more spectacular 8086.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 10 (MOS Technology 6502)

6502 and 65816

This is a processor with a very dramatic fate, no other processor can compare with it. Its appearance and introduction were accompanied by very large events in scope and consequences. I will list some of them:

  1. the weakening of the giant Motorola company, which for some time exceeded the capabilities of Intel;
  2. the destruction of the independent company MOS Technology;
  3. the cessation of the 6502 development and its stagnant production with little or no modernization.

It all started with the fact that Motorola for unknown reasons refused to support the initiative of young engineers, who offered to improve the overall rather mediocre processor 6800. They had to leave Motorola and continue their work in a small but promising MOS Technology company, where they soon prepared two processors, the 6501 and 6502, both of them (like almost all processors of that time) were fabricated using NMOS technology. The first one was pin-compatible with the 6800, but in other details they were identical. The 6501/6502 team was able to successfully introduce a new chip production technology, which radically reduced the cost of new processors. In 1975, MOS Technology could offer the 6502 for $25, while the starting price for the Intel 8080 and Motorola 6800 was $360 in 1974. In 1975, Motorola and Intel lowered prices, but they were still close to $100. MOS Technology specialists claimed that their processor was up to 4 times faster than the 6800. I find this questionable: the 6502 can work much faster with memory, but the 6800's second accumulator greatly accelerated many calculations. I can assume on estimation that the 6502 was on average no more than 2 times faster. Motorola launched a lawsuit against its former employees – they allegedly used many of the company's technological secrets. During the trial it was possible to establish that one of the engineers who had left Motorola took some confidential documents on the 6800, acting contrary to the attitudes of his colleagues. Whether it was his own act or there were still some guiding forces behind him is still unknown. For this and other unclear reasons, Motorola indirectly won the case and MOS Technology whose financial capabilities were very small, was forced to pay a substantial amount of $200,000 and to abandon production of the 6501. Intel in a similar situation with Zilog acted quite differently. Although it must be admitted that MOS Technology was sometimes too risky when trying to use the big money that Motorola spent on promoting the 6800 for its own purposes.

Further, the legendary Commodore company and its no less legendary founder Jack Tramiel appeared in the 6502 story, in the shadows of whom was the figure of the chief financier of the company determining its policy – a man named Irving Gould. Jack got a loan from Irving and with this money, using a few to put it mildly unscrupulous tactics, forced MOS Technology to become a part of Commodore. After that and possibly against the wishes of Tramiel, who was forced to give in to Gould, the development of the 6502 almost stopped, despite the fact that even in 1976 it was possible to produce prototypes of the 6502 with operating frequencies up to 10 MHz. Although the message about this appeared only many years later from a man named Bill Mensch, who was with the team that left Motorola and sometimes made loud but by and large empty statements and played a rather ambiguous role in the fate of the 6502. The main developer of the 6502 Chuck Peddle was forever removed from the development of processors. The 6502 continued to be produced not only at Commodore but also at Western Design Center (WDC) created by Bill Mensch. It is fascinating that none of the former 6502 team worked with him in the future.

The continuing drama around the 6502 was not over. In 1980, a short anonymous article appeared in Rockwell's AIM65 Interactive magazine stating that all 6502's carry a dangerous bug called the JMP (xxFF). The tone of the article suggests something completely out of the ordinary. Subsequently this attitude moved to Apple regarding the issue and became a kind of mainstream. Although a "bug" strictly speaking it was not. Of course for a specialist accustomed to the comfortable processors of large systems of those years one of the features that is quite relevant and even useful among microprocessors, could seem something annoying, a bug. But in fact this behavior of hurting someone's feelings was described in the official documentation from 1976, and in the textbooks on programming published before the appearance of the mentioned article. The "bug" was eliminated by Bill Mensch, who made the 65С02 (CMOS 6502) supposedly by 1983, i.e. after the official release of the 65816. While Intel, Motorola and others had already made 16-bit processors of new generations, the 6502 was only microscopically improved and made artificially partially incompatible with itself. In addition to eliminating the "bug", a number of changes were made, which in particular led to a change in the course of executing several instructions. These instructions became slower in a cycle, but at the same time they became more correct in some far-fetched academic sense. But it must be admitted that several new instructions turned out to be expected and useful. On the other hand, the absolute majority of the new instructions only occupied the code space, adding almost nothing to the capabilities of the 6502, which left fewer new codes for possible further upgrades. Commodore and Japanese Ricoh (manufacturer of the very popular game consoles NES) did not accept these changes. The author of this material himself has encountered several times the problem of this “bug”, although knowing nothing about it, he was writing programs for the Commodores. There was an incompatibility, he had to change the codes, to do a conditional assembly. The code for the 65C02 turned out to be more cumbersome and slower. Then I raised this question on the forum 6502.org, where some participants had familiarity with the Apple ][ computers. I asked if anyone could give an example when the aforementioned "bug" crashed the program. I received only emotional and general comments, a specific example was never offered.


The 65C02 was licensed to many companies, in particular NCR, GTE, Rockwell, Synertek, and Sanyo. The 6512 was a 65C02 variant which was used in later BBC Micro models. Atari used the NMOS 6502. Synertek and Rockwell companies in addition to the CMOS 6502, also produced the NMOS 6502. By the way the NMOS 6502 has its own set of undocumented instructions, the nature of which is completely different from the secret commands of the 8085. In the 6502 these instructions appeared as a side effect of the technology used, so most of them are rather useless. But several instructions for example, loading or unloading two registers with one command at once, and some others can make the code faster and more compact.

There were other attempts to modernize the 6502. In 1979, an article appeared that for the Atari computers, the 6509 processor was being prepared for production (not to be confused with the later Commodore's processor with the same name), in which command execution acceleration by 25% and many new instructions were expected. For unknown reasons the production of this processor never took place. Commodore conducted only microscopic upgrades. There, in particular they switched to HMOS technology and the manufacture of static cores, which allowed slowing down of the processors. From the point of view of programming, the most interesting was the processor 6509 which, albeit in a very primitive form, with the help of only two instructions specially allocated for this purpose allows addressing up to 1 MB of memory. In the super-popular Commodore 64 and 128, there were the 6510/8510 processors, and in the less successful 264 series – the 7501/8501. These processors had 6 and 7 embedded I/O bit-ports respectively, while the 7501/8501 did not support non-masked interrupts. Rockwell produced a version of the 65C02 extended by their own 32 operations for one-bit values (similar to the z80's bit instructions). However, as far as I know such processors were not used in computers, and these bit instructions themselves were more likely to be used only in embedded systems. This extension was made by Bill Mensch.

The last scene of the drama with the participation of the 6502 was featured in the prevention of computers based on the 6502 with a frequency of 2 MHz on the US market in the first half of the 80's. This affected the English BBC Micro, their manufacturing company Acorn made a large batch of computers for the United States, but as it turned out, in vain. Some kind of lock was triggered and the computers had to be urgently redone to European standards. Almost American, but formally Canadian computers Commodore CBM II (1982), despite some problems (in particular, compliance with the standards for electrical equipment), were nevertheless admitted. Perhaps it was due to a fact that they did not have graphic modes and even color text which made them little threat to American market mainstreams and even the stylish Porsche design could not compensate for this. The latest in the list of losers was the 100% American Apple III (1980) – it is known that Steve Jobs like Apple's management in general did a lot to prevent this computer from being successful. Steve demanded obviously impracticable specifications and the management asked for unrealistic deadlines. Will we ever know their motives? It became possible to eliminate the flaws of the Apple III in the Apple III Plus (1983), but the Apple’s management quietly closed the project in 1984 because of their reluctance to have a competition with the Macintosh computer. Only in 1985, when the era of 8-bit technology began to go away, did the Commodore 128 appear which could use in one of its modes the 6502 at 2 MHz clock. But even here it turned out to be more of a joke since this mode was practically not supported and there are practically no programs for it. Only in the second half of the 80's in the United States there began production of accelerators for the Apple II and since 1988 the Apple IIc+ model with a 4 MHz processor. Why did it happen that way? Perhaps because the 6502 at 2 or 3 MHz (and these were already produced at the very beginning of the 80's) could successfully compete with systems based on the Intel 8088 or Motorola 68000 in a number of tasks and especially games. In 1991, the willful decision of Commodore closed an interesting albeit belated project, the C65 based on the 4510 processor with a frequency of 3.54 MHz. The 4510 is the fastest 6502, made only in 1988, it finally carried out the previously mentioned optimization of cycles which gave a 25% increase in speed. Thus, the processor in the C65 is close in speed to the 6502 systems at 4.5 MHz. Surprisingly, this fastest 6502 with an extended set of instructions (in some detail this extension turned out to be more convenient than in the 65816) has never been used anywhere else.

The Commodore C128 and Apple III Plus had a MMU that allowed them to use several stacks and zero pages, to address more than 64 KB of memory, etc. The C128's MMU was artificially trimmed to work with only 128KB of memory. For the BBC Micro computers the second processor boards were produced with the 6502 at 3 MHz (1984) and 4 MHz (1986).

Anti-advertising – multiple Porsche PETs in the apartment of the villain of The Jewel of the Nile – The Apple only era in Hollywood had not yet come

Now a few words about the instruction system of the 6502. The main feature of this processor is that it was made almost as fast as possible, with almost no extra clock cycles which are especially numerous in the 8080/8085/z80/8088/68000 processors. In fact, it was the main concept of RISC-architecture processors that appeared later under the direct influence of the 6502. The same concept dominates starting with the 80486, and among Intel processors. In addition, the 6502 responded very quickly to interrupts, which made it very useful in some embedded systems. The 6502 has one accumulator and two index registers, in addition, the first 256 bytes of memory can be used in dedicated commands either as faster memory or as a set of 16-bit registers (which are almost identical in their functionality to the BC and DE registers in the 8080/z80) for pretty powerful ways to address memory locations. Some arithmetic commands (shifts, rotation, increment, and decrement) can be used with memory directly, without using registers. There are no 16-bit instructions – this is a 100% 8-bit processor. It supports all the basic flags but the parity flag which is typical only for the Intel's architecture. There is one more special flag of the low-useful decimal mode. Intel and Motorola processors use special corrective instructions for working with decimal numbers, and the 6502 can switch to decimal mode which makes its speed advantage with decimal numbers even more significant than with binary ones. Very impressive for the 6502 is the presence of a table multiplication of 8-bit operands with a 16-bit result in less than 30 cycles, with an auxiliary table size of 2048 bytes. One of the slowest 6502's operations is a block memory copy, it can take more than 14 cycles per byte.

The 6502 can work in parallel with another device, for example another 6502. As far as I know, such dual-processor systems have never been produced, instead of the second processor a video controller was usually used, which shared memory with the 6502.

The 65816 was released by WDC in 1983. Interesting is the fact that some specifications of the new processor Bill Mensch received from Apple. Of course, this was a big step forward, but clearly belated and with large architectural flaws. The 65816 was not considered by anyone as a competitor for the main processors of Intel or Motorola – it was already a minor outsider, which was already somehow programmed to be set to further lose its positions. The 65816 had two important advantages – it was relatively cheap and almost compatible with the still very popular 6502. In subsequent years, Bill Mensch didn’t even try to somehow improve his brainchild, do cycle optimization, replace the zero page addressing by extended one using the Z register (this was done in the 4510), add at least multiplication, etc. WDC only increased the limiting clock speeds, reaching 14 MHz by the mid-90's (this processor was used in the popular accelerator for the C64, the SuperCPU at a frequency of 20 MHz). However, even now (2019!) WDC for some reason, offers the 65816 only at the same 14 MHz. The 65816 can use up to 16 MB of memory, but the addressing methods used for this look far from optimal. For example: index registers can be only 8 or 16 bit, the stack can be placed only in the first 64 KB of memory, only there you can use the convenient short addressing of the direct page (the generalization of zero page addressing), working with memory above 64 KB is comparatively awkward, etc. The 65816 has a 16-bit ALU but an 8-bit data bus, so it is only about 50% faster than the 6502 with arithmetic operations. Nevertheless, the 65816 was released in more than a billion units. Indeed, some instructions of the 65816 clearly fill the gaps in the 6502 architecture, for example, the commands for block copying of memory in 7 clock cycles per byte. I can also add that the 65816 uses almost all instruction codes, 255 out of 256. The last unused code is for future long instructions that have not yet appeared.

The Apple IIx in the development of which Steve Wozniak was actively involved had to use the 65816. However, it was possible to start mass production of this processor only in 1984 and the first batches of it were defective, which caused excessive delays and eventually the closure of the entire project.

The 65802 is another version of the 65816, which uses a 16-bit address bus and a pin layout compatible with the 6502. An upgrade for the Apple II based on this processor was offered, but slight acceleration with such an upgrade can only be obtained with specially written programs.

The 6502 was used in a large number of computer systems, the most popular of which were the 8-bit Commodore, Atari, Apple, and NES. It is interesting that the 6502 was also used in the keyboard controller of the Commodore Amiga, and two 6502's at 10 MHz were used in the high-performance Apple Macintosh IIfx. Here it is impossible not to mention the Atari game consoles, produced from 1977 to 1996 – about 35 million of them were sold! The 65816 was used in the rather popular Apple IIgs computer, in the Super NES gaming console, and also in the rare Acorn Communicator computer.

In 1984, an article in Byte magazine about a bad copy of the Apple ][ computer, the Agat, made in the USSR appeared in the background of pictures with red banners, Lenin and marching soldiers. This article cited a curious price for this computer of $17,000 (it was an absurd amount, the real price was about 4000 rubles) and ironically indicated that Soviet manufacturers would have to dramatically lower the price if they wanted to sell their product in the West. The Agat was used mainly in school education. The older Agat models were almost 100% compatible with the Apple ][ and had some pretty useful extensions.

One can only try to fantasize about what would have happened if the 6502 had developed at the same pace as its competitors. It seems to me that the gradual moving of zero-page memory to registers, and the gradual expansion of the command system with simultaneous optimization of cycles would allow The Terminator 6502 to remain in the lead in terms of performance until the early 90's. Introducing 16-bit mode and then 32-bit, would allow more memory and faster commands to be used. Would its competitors have been able to oppose this?

I would like to finish with some general philosophical arguments. Why the 6502 was slowed down in its development and deprived of a much brighter future? Maybe due to the fact that this development really could very much press large firms and create a completely new reality. Was the 6502 team set up for this? In my humble opinion, rather no, they just wanted to make a better processor.

Already much later, at the beginning of the 21st century, with the help of lawsuits imposed from far-fetched reasons, the Lexra company, which produced various innovative processors for 5 years, was crushed. This sad story is somewhat reminiscent of what happened to MOS Technology.

Edited by Richard BN and Dr Jefyll

2x2=4, mathematics

Emotional stories about processors for first computers: part 9 (Acorn ARM)

The first ARM processors

The ARM-1 processor was an astonishing development, it continued the 6502 ideology (namely to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle the 80386 processor appeared. ARM consisted of about ten times less transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division the 80386 could be faster. However the advantages of ARM were so great that today it is the most mass processor architecture, more than 100 billion such processors have been produced.

The ARM's development in 1983 began after Acorn conducted research with the 32016 processor, which showed that many calculations with the 6502 at twice the lower operating frequency than the 32016 could be faster than with what seemed to be a much more powerful processor. At that time the 80286 was already available, it showed very good performance. But Intel perhaps sensing the potential of Acorn refused to provide its processor for testing. The technology of the 80286 was not restricted as was the 80386 and was transferred to many companies, so history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps if Intel had allowed the use of its processor, then Acorn would have used it, and would not have developed the ARM.

The ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. The debut of the processor turned out rather unsuccessfully. In 1986 the second ARM processor for the BBC Micro was released with the name of the ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product (above 4000 pounds, it was about $6000). Indeed if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two orders of magnitude cheaper. There were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system, there were a lot of Unix variants available in that time which didn't require MMU, there were such Unix variants for the 68000, PDP-11, 80186 and even 8088. Linux was ported for the Acorn Archimedes only in the 90's. Perhaps the delay in the appearance of a real Unix for the ARM was caused by Acorn's reluctance to transfer ARM technology to other companies.

The first ARM based system

The Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn in addition to the ARM also tried to conduct expensive development of computers for business which failed, in particular due to the shortcomings of the 32016 processor chosen for them. The Acorn Communicator computer was also not very successful. The development of a relatively successful but not quite IBM PC compatible computer Master 512, was very costly. In addition a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers was allowed to enter into, as part of a hypothetical big game of absorbing Acorn itself. By the way after the absorption of Acorn the role of Olivetti in the US market quickly faded away.

As part of Olivetti Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which the Archimedes personal computers were made. They were stunning then for their speed. The first models of those computers became available in 1987. However Olivetti's management was focused on working with the IBM PC compatible computers and did not want to use its resources to sell Acorn products.

The ARM provides for the use of 16 32-bit registers. There are actually more of them if we take into account the registers for system needs. One of the registers the R15 is (like the PDP-11 architecture) a program counter. Almost all operations are performed in 1 clock cycle, more cycles are needed in particular for jumps, multiplications and memory accesses. Unlike popular processors of those years ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented if necessary through one of the registers. When calling subprograms the stack is not used; instead the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of the ARM is the combination of the program counter (which is 26-bit and therefore it allows you to address up to 64 MB of memory) with a status register. For flags in this register eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. The ARM's instructions for working with data are 3-address.

A characteristic feature of RISC architecture is the use of register-memory commands only for loading and storing data. The ARM has a built-in fast bit shifter (barrel shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. In the instruction system there is a reverse subtraction, which allows in particular to have a unary minus as a special case of this instruction and speed up the division procedure. The ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction, an instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures such an execution takes place, as a rule only for conditional jumps. This feature of the ARM allows to avoid slow jump operations in many cases. The latter is also facilitated by a fact that when performing arithmetic operations you can refuse to set status flags. With the ARM like with the 6809 processor you can use both fast and regular interrupts. In addition in the interrupt modes the higher-numbered registers are replaced with the system ones, which makes interrupt handlers more compact and fast.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system, but the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for the 80386, for example, the RSB (reverse subtraction), the BIC (the AND with inversion, such a command exists for the PDP-11), the 4-address MLA (multiplication with accumulation), the LDM and STM (loading or unloading multiple registers from memory, they are both similar to the MOVEM command for the 68k processors). Almost all of the ARM instructions are 3-address, and almost all of the 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal that means that all registers are interchangeable, some exceptions are registers R14 and R15. Most of the ARM's commands may require 3-4 of the 80386's commands to emulate them, and most of the 80386's commands can be emulated by only 2-3 ARM commands. Interestingly the IBM PC XT emulator on the hardware of the Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In the Commodore Amiga with the 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with the 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make the Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtook not only the Apple ][ but also the Macintosh based on the 68k processors, the project was closed!

Among the shortcomings of the ARM we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore loading a full 32-bit constant can take up to 4 instructions. You can of course load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of the ARM is its relatively low code density, which makes the programs somewhat large and, most importantly reduces the efficiency of the processor cache. However this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of the product. For a long time a significant drawback of the ARM was the lack of built-in support for memory management (MMU), Apple for example demanded this support in the early 90's. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. The ARM did not have such advanced features for debugging as the x86 had. There is still some oddity in the standard assembler language for the ARM: it is standard to write operations for the barrel shifter separated by commas. Thus instead of the simple form R1 shl 7 (shift the contents of the register R1 by 7 bits to the left) you need to write R1, shl 7.

Since 1989 the ARM3 has become available with a built-in cache. In 1990 the ARM development team separated from Acorn and created ARM Holdings with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that subsequently Acorn ceased its independent existence and ARM Holdings became a large company. However the separation of Acorn and ARM Holdings was also initiated by Apple’s desire to have the ARM processors in its Newton computers and not be dependent on another computer manufacturer.

The ARM showed performance on integer data exceeding the 80486 at the same frequency by approximately 10-20%! Intel was able to achieve the advantage by using clock multiplication technology. Later Intel could firmly fix this advantage with the Pentium. The StrongARM (developed by DEC) was able to briefly regain the ARM's leadership in 1996, after which the technology was purchased by Intel, which has since been a large manufacturer of ARM-architecture processors. Thus, there are two centers of development of this architecture.

Further development of the ARM architecture is also very interesting, but this is another story. Although it can be mentioned that thanks to a share in ARM Holdings Apple was able to avoid bankruptcy in the 90's.

A lot of thanks to jms2 and BigEd who helped to improve the style and content. Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 8 (DEC VAX-11)

Processor for DEC VAX-11

The VAX-11 systems were quite popular in the 80's, especially in higher education. Now it is difficult to understand some of the concepts described in the books from those years, without knowing the features of the architecture of those systems.

The VAX-11 was more expensive than the PDP-11. However it was more oriented towards universal programming than the PDP-11. Additionally the VAX-11 was significantly cheaper than the IBM/370 systems.

The V-11 processor that was produced by the mid-80s for the VAX architecture, before that time processor assemblies were the only option.

The VAX-11 architecture is 32-bit, it uses 16 registers, among which, like the PDP-11, there is a program counter. It assumes the use of two stacks, one of which is used to store frames of subroutines. In addition one of the registers is assigned to work with the arguments of called functions. Thus, 3 of 16 registers are allocated for stacks.

The instruction system of the VAX-11 cannot fail to amaze with its vastness and the presence of very rare and often unique commands. For example it has commands for working with bit fields, for working with several types of queues, for calculating the CRC, for multiplying decimal strings, etc. Many instructions have both three-address variants (like the ARM) and two-address variants (like the x86), but there are also four-address instructions, for example, the extended division – EDIV. Of course, there is support for working with floating point numbers.

However the VAX-11 is a very slow system for its class and price. Even the super-simple 6502 at 4 MHz could outrun the slowest family member VAX-11/730. The fastest VAX-11 systems – huge cabinets and “whole furniture sets” – was at the same level of speed as the first PC AT's. When the 80286 appeared it became clear that the days of the VAX-11 were numbered and even the slowdown of the development of systems based on the 80286 could not change anything fundamentally. The straightforward people from Acorn having made the ARM in 1985 without hiding anything, said that the ARM is much cheaper and much faster. The VAX-11 however remained relevant until the early 90's, while still having some advantages over the PC, in particular faster systems for working with disks.

The VAX-11 is probably the last mass computer system in which the convenience of working in assembly language was considered more important than its performance. In a sense this approach has moved to modern popular scripting languages.

The VAX-11/785 is also a computer (1984) – the fastest among the VAX-11 series, with its processor speed comparable to the IBM PC AT or ARM Evaluation System

Surprisingly there is very little literature available on the VAX-11 systems in open access, as if there is some strange law of oblivion. Several episodes close to politics and correlated with the history of the USSR have been associated with the history of this architecture. It is possible that the actual rejection of the development of the PDP-11 architecture was caused by its low cost and the success of its cloning in the Soviet Union. The cloning of the VAX-11 cost a higher order of magnitude in resources and led to a dead end. Interest in the VAX-11 was created using for example, hoaxes like the famous Kremvax on April 1 1984, in which the then USSR leader Konstantin Chernenko offered to drink vodka on the occasion of connecting to the Usenet network. Another joke was that some VAX-11 chips were impressed with a message in broken Russian about how good the VAX-11 was.

Some models of the VAX-11 were cloned in the USSR by the end of the 80's, but such clones were produced in very little numbers and they almost did not find a use.

Several VAX-11 systems are available for use over the network and this distinguishes them favorably from the IBM/370 systems with which they competed.

Edited by Ralph Kernbach and Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 7 (NS 32016)

The first 32-bit CPU – National Semiconductor 32016

This is the first true 32-bit processor proposed for use in computers back in 1982. This processor was originally planned as a VAX-11 on a chip, but due to the impossibility to negotiate with DEC, National Semiconductor had to make the processor which was similar to the VAX-11 architecture in only some details.

The use of paged virtual memory begins with this processor – it is the dominant technology today. Though the virtual memory support is not built into the processor, it is available through a separate coprocessor. A separate coprocessor is also required for working with real numbers.

The instruction system of the NS32016 is huge and similar to the VAX-11 instruction system, in particular, with the presence of a separate stack for sub-program frames. The address bus is 24-bit that allows to use up to 16 MB of memory. The distinguishing feature of the 32016 is slightly unusual set of status flags. In addition to the standard flags of carry (which can be used as a flag for a conditional jump), overflow, sign, equality (or zero), there is also the L flag, which means 'less', this is a carry flag for comparisons only. The situation around the carry flag is similar to that of the Motorola 68k processors. The overflow flag is for some reason called F. There are flags of step-by-step mode, privileged mode, and a unique flag of the current stack selection. When executing arithmetic instructions, the flags of the sign, zero, and less are not set, they are set only by comparison commands.

You can use eight 32-bit general purpose registers. In addition there is also a program counter, two stack pointers, a stack pointer of the subroutine frames, a program database pointer (this is something unique), a module base pointer (also something very rare), a pointer to the interrupt vector table, a configuration register, and a processor status register. The performance of the NS32016 was comparable to the 68000, it was maybe only a bit faster.

The 32016 as far as I know was used only with the BBC Micro personal computers as a second processor. It was possible to order a processor with frequencies 6, 8 and 10 MHz. This second processor was a very expensive and prestigious device for 1984. The software for it was very limited in number and was only made by the efforts of Acorn. It includes the Panos operating system which is similar to Unix and the permanent Acorn satellite BASIC. The BBC Micro did not use an MMU chip – there were no programs for its use, although it could be plugged in. An arithmetic coprocessor was not even supposed to be connected.

It is known that this very complex processor had serious hardware errors that were being fixed for years.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 6 (TI TMS9900)

Texas Instruments TMS9900

I have never written codes for this very special processor, even though this is the first 16-bit processor available for use in personal computers. This processor was produced since 1976. It used a much rarer big-endian order of bytes. This order was used only in processors of the Motorola's 6800 and 68000 series and in the architecture of the gigantic IBM mainframes. All other processors in this review used little-endian byte order.

The TMS9900 has only three 16-bit registers: the program counter, the status register and the base register for pseudo-registers. This processor uses a dedicated 32-byte memory space (a context) as 16 double-byte registers. Such a way of use of memory is somewhat like the zero page memory in the 6502 architecture. Using the base register, the TMS9900 can very quickly change the context. This is similar to the Z80 which has two register contexts. Processor status flags are distinguished in originality, along with typical flags of carry, zero (equality), overflow, parity, there also are two more unique flags of logical and arithmetic less. Working with the stack and subroutines resembled the RISC-processors of the incoming future. There is simply no ready to use stack, but you can create a stack using one of the pseudo-registers. When a subroutine is called, a new value is selected for the counter and base registers, and all three registers are stored in the pseudo-registers of the new context. Thus calling a subroutine is more like calling a software interrupt. The TMS9900 has a built-in interrupt controller designed to work with hardware interrupts of up to 16.

The first 16-bit home computer – it has even color sprites!

The system of instructions looks very impressive, there are even multiplication and division. The unique X instruction allows you to execute one instruction on any memory address and move on to the next one. The execution of instructions is rather slow, the fastest instructions require 8 cycles and arithmetic instructions 14 cycles. However, multiplication (16*16=32) for 52 cycles and division (32/16=16,16) for only 124 cycles were probably the fastest among processors of the 70's.

The TMS9900 requires three supply voltages of -5, 5 and 12 volts and four phases of the clock signal – these are the worse specifications among the processors known to me. In 1979 this processor was demonstrated to IBM specialists, who then were looking for a processor for the IBM PC prototype. The obvious drawbacks of the TMS9900 (addressability of only 64 KB of memory, lack of the necessary controllers, relative slowness) made an appropriate impression and the Intel 8088 was chosen for the future leader among PC's. To deal with the lack of controllers Texas Instruments also produced the TMS9900 variant with an 8-bit bus, the TMS-9980, which worked 33% slower.

The TMS9900 used in the TI-99/4 and TI-99/4A computers which were fairly popular in the USA. They were "crushed" in the price war by the computer Commodore VIC-20 by 1983. Curiously as a result of this war Texas Instruments was forced to cut prices on its computer to the incredible price for 1983 of $49 (in 1979 the price was $1150!) and sold them with a big loss to themselves. As an example we can mention the relatively unpopular Commodore +4 computer, which ceased to be produced in 1986, the price of which fell to $49 in 1989 only. The TI-99/4A was stopped being produced in 1984, when because of the ultra-low prices it began to gain popularity. This computer might only be conditionally called 16-bit: only 256 bytes (very little) of its RAM and all of its ROM are addressable through a 16-bit bus. The rest of the memory and I/O-devices work over a slow 8-bit bus. Therefore it is possible to more correctly consider the BK0010 as the first 16-bit home computer. It is an interesting coincidence that the TI-99/4 and TI-99/4A use a processor at a frequency of 3 MHz – exactly the same as the BK0010 uses.

In the TI-99/4 and TI-99/4A a rather successful TMS9918 chip was used as a video controller, which became the basis for the very popular worldwide MSX standard, as well as for some other computers and game consoles. In the Japanese company Yamaha this video chip was significantly improved and was subsequently used in particular to upgrade the TI-99/4 and TI-99/4A themselves!

The TI-99/4 series is a rare example of computers where a processor and a computer manufacturer was the same.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 5 (Motorola 6800 family)

Motorola 6800 and close relatives

Motorola's processors have always been distinguished by the presence of several very attractive features, while at the same time there are both the presence of some absurd abstraction and poor practicality of architectural solutions. The main attractive feature of all processors under consideration is the second complete and very fast register-accumulator.

The 6800 was the first microprocessor to require a single 5-volts power supply – it was a very useful feature. However the 6800 because of the oneness of the cumbersome 16-bit index register for an 8-bit architecture turned out to be a product inconvenient for programming and use. It was released in 1974, not much later than the 8080, but it did not become the basis for any known computer system. Interestingly, the 6502 developers, Chuck Peddle and Bill Mensch, called the 6800 not right and too big. However, it and its variants were widely used as microcontrollers. Perhaps here it is worth noting that Intel has been producing processors since 1971, which put Motorola in the position of a catch-up party, for which the 6800 was the very first processor. If you compared the 6800 not with the 8080, but with its predecessor the 8008, then the 6800 would be much preferable. Motorola almost caught up with Intel with 68000/20/30/40. I can also note that in the 70s, Motorola was a significantly larger company than Intel.

Numerous variants of the 6800 were also produced: 6801, 6802, 6803, 6805, ... Most of them are microcontrollers with built-in memory and I/O ports. The 6803 is a simplified 6801, it was used in the Tandy TRS-80 MC-10 computer and in its French clone Matra Alice. These computers were very late (1983) for their class and were comparable to the Commodore VIC-20 (1980) or Sinclair ZX81 (1981). The command system of the 6801/6803 has been significantly improved by 16-bit commands, multiplication, and several others. An unusual unconditional branch instruction (BRN – branch never) has appeared, which actually never takes its branch! Some instructions became a little faster.

The 680x range fully supports signed integers, the z80 and 6502 support it worse, and the 8080 and 8085 have almost no such support at all. However, in 8-bit software such support was needed very rarely.

The 6809 was released in 1978, when the 16-bit era began with the 8086, and has a highly developed command system, including multiplying two accumulators to obtain a 16-bit result in 11 clock cycles (for comparison, the 8086 requires 70 clock cycles for such an operation). Two accumulators can in several cases be grouped into one 16-bit, which gives fast 16-bit instructions. The 6809 has two index registers and a record number of addressing methods among 8-bit processors – 12. Some of the addressing methods are unique for 8-bit chips, such as indexing with auto-increment or decrement, addressing relative to the command counter and indexing with an offset. The 6809 has an interesting opportunity to use two types of interrupts: you can use fast interrupts with automatic partial register saving and interrupts with all registers saving. 6809 has three inputs for interrupt signals FIRQ (fast maskable), IRQ (maskable), NMI (non-maskable). Also, it's sometimes convenient to use fast instructions for reading and setting all flags at once.

However, memory operations require a clock cycle greater than the 6502. Index registers have remained bulky 16-bit "dinosaurs" in the 8-bit world. Some operations simply shock with their slowness, for example, sending one byte from one accumulator to another takes 6 clock cycles, and the exchange of their contents – 8 clock cycles (compare with the 8080, where 16-bit exchange passes for 4 clock cycles)! For some reason, two stack pointers are offered at once, perhaps it was the influence of the dead-end architecture VAX-11 – in an 8-bit architecture with 64 KB of memory looks very awkward. And even the existence of an instruction with an interesting name SEX of all problems the 6809 cannot eliminate. In general, the 6809 is still somewhat faster than the 6502 at the same frequency, but it requires the same memory speed. I managed to make a division procedure for the 6809 with 32-bit divisible and 16-bit divider (32/16 = 32,16) for just over 520 cycles, for the 6502 I could not achieve less than 650 clock cycles. The second accumulator is a big advantage, but other 6502 features, in particular, the inverted carry flag, reduce this advantage only to the aforementioned 25%. But multiplication by a 16-bit constant turned out to be slower than a table multiplication for the 6502 with a table of 768 bytes. The 6809 allows you to write quite compact and fast code using the direct page addressing mode, but this mode makes the code a bit tangled. The essence of this addressing is to set the high byte of the data address in a special register and specify only the low byte of the address in the commands. The same system with only a fixed high byte value is used in the 6502, where it is called zero page addressing. The direct page addressing is an exact analogue of the use of the DS segment register in the x86 only not for 64K segments, but for segmenties sized only of 256 bytes. Another artificiality of the 6800 architecture is the use of the order of bytes from major to minor (Big Endian), which slows down 16-bit addition and subtraction operations. The 6809 is not compatible with the 6800 instruction codes. The 6809 became the last 8-bit processor from Motorola. In further developments, it was decided to use the 68008 instead.

We can assume that Motorola spent a lot of resources to promote the 6809. This has had a lasting effect at mention of this processor. About the 6809 there are many favorable reviews, notable in some fuzziness, generalizations, and inconsistency. The 6809 was positioned as an 8-bit super-processor for micromainframes. Several similar to Unix operating systems were made for it: OS-9 and UniFlex. It was chosen as the main processor for Apple Macintosh and, as follows from the films about Steve Jobs, only his emotional intervention determined the transition to the more promising 68000. Indeed, the 6809 is a good processor, but in general, only slightly better than its competitors that appeared much earlier: the 6502 (three years earlier) and the z80 (two). One can only guess what would have happened if Motorola had spent at least half of their efforts on the development and promotion of the 6809 on the development of the 6502 instead.

The 6809 has been used in several fairly well-known computer systems. The most famous among them is the American computer Tandy Color or Tandy Coco, as well as their British, or more precisely, Welsh clone Dragon-32/64. The computer markets of the 1980's were notable for a significant non-transparency and Tandy Coco was distributed mainly only in the US. Dragons, once only popular in Britain, gained also some popularity in Spain. In France, the 6809 for some reason became the basis for mass computers of the 80s, the Thomson series, which remained virtually unknown anywhere else. The 6809 was also used as a second processor in at least two systems: in the series Commodore SuperPET 9000 and in an extremely rare TUBE-interface device for BBC Micro computers. This processor was used in other systems less well known to me, in particular, Japanese ones. It has also gained some popularity in the world of gaming consoles. It is worth mentioning one of these consoles, Vectrex, which uses a unique technology – a vector display.

Tandy CoCo 3

All the 680x have interesting undocumented instructions with an fascinating name Halt and Catch Fire (HCF), which are used for testing at the electronics level, for example, with an oscilloscope. Its use causes the processor to hang, from which it is possible to exit only by its reset. These processors also have other undocumented instructions. In the 6800 there are, for example, instructions that are opposite to register immediate loading commands, i.e. instructions for storing a register value to the immediate constant!

Like the 8080, 8085 or z80, it is very difficult to call the 6809 a pure 8-bit processor. It is even more difficult to call the 6309 processor 8-bit. The 6309 was produced by the Japanese company Toshiba as a processor fully compatible with the 6809. I was not able to find the exact year when its production began, but there is some evidence pointing to 1982. This processor could be switched to a new mode, which, while maintaining almost full compatibility with the 6809, provided many more capabilities. These capabilities were hidden in the official documentation but were published in 1988 on Usenet. Two additional accumulators were added, but the instructions with them were much slower than with the first two. The execution time of most instructions was greatly shortened. A number of commands were added, among which was a really fantastic division for the processors of this class – it was signed division of a 32-bit dividend and a 16-bit divisor (32/16 = 16,16) for 34 cycles, with the divisor being taken from memory. Furthermore, 16-bit multiplication with a 32-bit result for 28 clocks appeared. Also, very useful instructions were added for quick copying blocks of memory with a runtime of 6 + 3n, where n is the number of bytes to be copied: you could copy both with decreasing and with increasing addresses. The same instructions could also be used to quickly fill the memory with a specified byte. When they were executing, interrupts could occur. New bit operations, a zero-register, etc., appeared too. Interrupts were then invoked when executing an unknown instruction and when dividing by 0. In a sense, the 6309 was the pinnacle of technological achievements among 8-bit processors or more precisely processors with the addressable memory size of 64 KB.

The 6309 is electrically fully compatible with the 6809, making it a popular upgrade for the color Tandy or Dragons. There are also special OS versions that use the new features of the 6309.

Edited by Jim Tickner and Ralph Kernbach.

2x2=4, mathematics

Emotional stories about processors for first computers: part 4 (Zilog Z80)

Zilog Z80

This processor became along with the 6502 the main processor of the first personal computers. There are no dramatic events in the history of its appearance and use. There is only some intrigue in the failure of Zilog to make the next generation of processors. The Z80 was first produced in 1976 and its variants are still in production. Once even Bill Gates himself announced support for systems based on the z80.

A number of coincidences are interesting. As in the case of the 6502 the main developer of the Z80 Federico Faggin left a large company, Intel. After working on the z80 Federico almost did not work with the next generation the Z8000 processor. He left Zilog (founded by him) in the early 80's and never dealt with processors in the future. He then created several relatively successful startups, which were communication systems, touchpads and digital cameras. It can be mentioned that in addition to the z80 being with Zilog he had also developed a successful and still-produced Z8 microcontroller.

The Z80 is more convenient for inclusion in computer systems than the 8080. It requires only one power supply voltage and has built-in support for the regeneration of dynamic memory. In addition though it is fully compatible with the 8080 it has a lot of new commands, a second set of basic registers and several completely new registers. It is interesting that Zilog refused to use the 8080 assembler mnemonics, and began to use their own mnemonics more suitable for the extended command system of the z80. A similar story happened to the Intel x86 assembler in the GNU software world, for some reason they also use their own conventions for writing programs in assembler by default. The Z80 added support for the overflow flag, Intel officially added support for this flag only in the 8086. However this flag in the z80 was combined with the parity flag, so you cannot use both flags at the same time as in the 8086. In the z80 as in the 6502 there are only basic checks of the value of the flag, i.e. there are no checks of two or three flags at once, which is necessary for comparisons "greater" or "less or equal", as well as for all signed comparisons. In such cases it is necessary to do several checks, while with the 8086, 6800 or PDP-11 one is enough.

Among the new z80's commands, block memory copy commands for 21 cycles per byte are especially impressive, as well as an interesting search for a byte in memory instruction. However the EXX instruction is the most interesting it swaps the contents of 48 bytes of register memory, registers BC, DE, HL, with their counterparts in just 4 cycles! Even the 32-bit ARM will need at least 6 cycles for the same operation. The remaining additional instructions are not so impressive, although they can sometimes be useful. Additionally added were the following commands:

  • 16-bit subtraction with borrow and 16-bit addition with carry for 15 clocks;
  • unary minus for the accumulator for 8 clocks;
  • possibility to read from memory and write to it, using registers BC, DE, SP, IX, IY – not just HL;
  • shifts, rotates and input-output for all 8-bit registers;
  • instructions to check, set and reset a bit by its number;
  • jumps with offsets (JR);
  • a loop instruction.

Most new commands are rather slow, but using them right can still make the code somewhat faster and significantly more compact. This particularly applies to the use of new 16-bit registers IX and IY, which can be used for new addressing modes. Interestingly, the index registers IX and IY appeared in the Z80 in order to attract the 6800 users to use the Z80 instead! But I dare to express my opinion, operations with the Z80's index registers are made rather ineffective, due to the presence of an almost useless byte offset in commands using these registers.

Many of the 8080's commands in the z80 became faster by one clock and this is a very noticeable acceleration. But the basic command for 16-bit arithmetic, the ADD instruction became slower by one clock which makes all arithmetic, if faster, only slightly.

The system of working with interrupts became much more interesting than that available at the 8080. With the z80 you can use both non-maskable interrupts and three methods (one of them is compatible with the 8080) to work with masked ones. The masked interrupts mode 2 is the most interesting, as it allows you the flexibility to change the address of the code to handle the interrupt.

The Z80 has quite a few undocumented instructions, many of these instructions disappeared during the transition to CMOS technology, but those that have survived have become virtually standard and have been documented by some firms. Especially useful are instructions that allow you to work with individual bytes of the clumsy 16-bit registers IX and IY. In addition to undocumented instructions, the Z80 also has other undocumented properties, for example, two extra flags in the status register.

Of course the z80 even more so than the 8080 has the right to be called slightly 16-bit. The hypothetical bit index of the z80 is clearly slightly higher than for the 8080, but it is paradoxical that the ALU of the z80 is actually 4-bit! At the electronic level the z80 and 8080 are completely different chips.

Much has been written about the comparison of the performance of the z80 and 6502, as these processors were very widely used in the first mass computers. In this topic there are several difficult points and without understanding them it is very problematic to maintain objectivity. Due to the presence of a rather large number of registers, the z80 is naturally used at a frequency higher than memory. Therefore the z80 at 4 MHz can use the same memory as the 6502 or 6809 at 1.3 MHz. According to many experienced programmers who wrote code for both processors, at the same frequency the 6502 is on average about 2.4 to 2.6 times faster than the z80. The author of this material agrees with this. I just need to add that writing fast code for the z80 is very difficult, you need to repeatedly optimize the use of the registers, and to work with memory as much as possible using the stack. If you really try then in my opinion, you can reduce the difference between the z80 and 6502 to about 2.2. If you do not try and ignore timings, then you can even easily get the difference up to 4. In some individual cases the z80 can show very good timings. On the task of filling memory using the PUSH instruction the Z80 can be even slightly faster than the 6502, but this is at the cost of disabling interrupts. On copying memory blocks the z80 is only 1.5 times slower. It is especially impressive that in the division of the 32-bit divisible by the 16-bit divider the z80 is slower by only 1.7 times. By the way such a notable division was implemented by a programmer from Russia. Thus, we get that the ZX Spectrum with the z80 at 3.5 MHz is about 1.5 times faster than the C64 with the 6502 at 1 MHz. It should also be noted that some ticks in most systems with the z80 or 6502 are taken from the processor to circuits for generating the video signal. For example because of this the popular Amstrad CPC/PCW computers have the effective processor frequency of 3.2 MHz, not the full 4. On the 6502 systems you can usually turn off the screen for maximum processor performance. If we take as a basis the frequency of memory not the processor, it turns out that the z80 25-40% is faster than the 6502. The last result can be illustrated by the fact that with memory with a frequency of 2 MHz the z80 can operate at a frequency of up to 6 MHz, and the 6502 only up to 2 MHz.

The Z80 was used in a very large number of computer systems. In the USA the Tandy TRS-80 was very popular, in Europe it was the ZX Spectrum, and later the Amstrad CPC and PCW. Interestingly the Amstrad PCW computers maintained their importance until the mid-90's and massively and actively were used for their intended purpose until the late 90's. Japan and other countries produced quite successful around the world the MSX computers. The rather popular C128 could also use the z80, but in this case the users were left in a rather embarrassing situation. This late 1985 release, the 8-bit computer with the z80, which officially clocked at 2 MHz, really only worked at 1.6 MHz. It was slower even than the first systems of the mid-70's based on the 8080. The range of computers for using the operating system CP/M has at least three dozen fairly well-known systems.

Such a PC looked decent even in the mid-90's, but its z80 was slower than that in the ZX Spectrum

The fastest computer system known to me based on the Z80 is the BBC Micro which has 6 MHz second processor the TUBE's Z80B, which was produced from 1984. The processor in this system runs at full speed, as it is possible to say, "without brakes". Similar devices were produced for Apple ][ since 1979. Some such Z80-cards later used the Z80H at 8 MHz and even higher. Interestingly Microsoft in 1980 received the greatest revenue from the sale of such devices. We can also mention the Amstrad PcW16 produced since 1994, which uses the CMOS Z80 at a frequency of 16 MHz.

In Japan for the MSX TurboR systems (1990), the R800 processor was made compatible with the Z80. In the R800 they added hardware 16-bit multiplication with a 32-bit result. Although when multiplying a 16-bit constant, table multiplication with the table of 768 bytes is one clock faster. There are opinions that the R800 is just a simplified Z800, running at four times the frequency of the bus which is about 7.16 MHz. So the R800 internal clock is about 28.64 MHz!

Zilog did work on improving the Z80 very inconsistently and extremely slowly. The first Z80 worked at frequencies up to 2.5 MHz, the Z80A which soon appeared had limiting frequency of 4 MHz. The latter processors became the basis for most popular computers using the Z80. The Z80B appeared by 1980 but was used relatively rarely, for example, in the mentioned second processor card for the BBC Micro or in the late (1989) computer Sam Coupé. The Z80H appeared by the mid-80's and could operate at frequencies up to 8 MHz, it was not used in well-known computers. Interestingly Zilog products had special traps on all chips for those who tried to make copies of them. For example the base Z80 had 9 traps and they, according to reviews of those who tried copying, slowed down the process for almost a year.

A deeper upgrade of the Z80 was hampered by the desire of Zilog to create processors that were competitive with 16-bit Intel processors. In 1978 a little later than the 8086 the Z8000 was released, it was not compatible with the Z80. This processor was unable to withstand competition from Intel and especially Motorola, the 68000 surpassed the Z8000 in almost all parameters, although the Z8000 was used in about a dozen different low-cost systems, usually for working with Unix variants. Interestingly IBM did not even consider the Z8000 as a possible processor for the IBM PC, since Zilog was funded by Exxon which was going to compete with IBM. Perhaps due to the lack of success of the Z8000 Zilog became an Exxon subsidiary by 1980. There was also an attempt to create a competitive 32-bit processor. In 1986 the Z80000 appeared compatible with the Z8000, which has never been used anywhere. Some circumstances, in particular very strange complaints from Zilog's team on excessive financing, allow us to think that maybe Zilog (as part of Exxon) for some obscure reason rather sabotaged their work.

One can only wonder why Zilog abandoned its approach (which showed super-successful results with the Z80) namely, to make processors software-compatible with Intel processors, but better than them and at the same time completely different on the hardware level. Subsequently, this approach was successfully used by many firms, in particular, AMD, Cyrix, VIA.

Creating a new processor based on the Z80 was postponed until 1985 when the Z800 was produced. However then the main efforts of Zilog were directed at the Z80000 and the Z800 was released in very few numbers. In 1986 after the failure of the Z80000 the Z280 was released, an insignificantly improved version of the Z800 (maybe it was just a rebranding). The Z800/Z280 in particular could work on the internal frequency several times higher than the bus frequency. This new idea brought a big success to the Intel 486DX2 and 486DX4 processors later. But perhaps because of poor performance the Z280 despite many technological innovations could use only relatively low clock frequencies, this processor also has not been used anywhere. It is considered that the Z280 roughly matched the capabilities of the Intel 80286, but was significantly at least 50% slower when using the same clock speed as used with the 80286. Perhaps if the Z280 had appeared 5 years earlier it could have been very successful.

The greatest success was achieved thanks to cooperation with the Japanese company Hitachi, which in 1985 released its super-Z80 (the HD64180) similar in capabilities with the Intel 80186. The HD64180 allowed the use of 512 KB of memory, added a dozen new instructions, but at the same time some almost standard undocumented Z80 instructions were not supported. This processor was used in some computer systems. Zilog received a license for the HD64180 and began to produce them with the marking Z64180. Zilog managed to slightly improve this processor in particular to add support for working with 1 MB of memory and released it by the end of 1986. This new processor was called the Z180 and became the basis for a family of processors and controllers with clock frequencies up to 33 MHz. It was used in some rare models of the MSX2 computers, but more as a controller. It is a curious coincidence that the Z280 and Z180 appeared in one year as was the case of their approximate counterparts the 80286 and 80186 four years before. In 1994 the 32-bit Z380 was made on the basis of the Z180, which retained compatibility with the Z80 and roughly corresponds to the capabilities of the Intel 80386 or Motorola 68020. In fact Zilog showed a lag behind competitors by almost 10 years. For the 21st century again on the basis of the Z180 the successful eZ80 controller-processors have been manufactured with timings almost like the 6502. They are used in various equipment in particular in network cards, DVD-drives, calculators, etc.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about first processors for computers: part 3 (Motorola 68k)

Motorola: from 68000 to 68040

Motorola was the only company that could successfully compete with Intel in the field of production of processors for personal computers for some time.

The 68000 was released in 1979 and at first glance looked much more impressive than the 8086. It had 16 32-bit registers (more accurately, even 17), a separate command counter and a state register. It could address 16 MB of memory directly which did not create any restrictions for example for large arrays. However careful analysis of features of the 68000 shows that not everything was as good as it seemed. In those years to have a memory of more than 1 MB was an unattainable luxury even for medium-sized organizations. The 68000 code density was worse than for 8086, which means that 68000 code with the same functionality occupied more space. The latter is also due to the fact that any instruction for the 68k processors should be multiples of 2 bytes in length, and for the x86 of 1 byte. But the information about the code density is controversial as there is evidence showing that in some cases the 68000 could have the better code density. Out of 16 registers of the 68k there are 8 address registers, which in some respect are slightly more advanced analogues of the x86 segment registers. The ALU and data bus are 16-bit, so operations with 32-bit data are slower than someone could expect. The execution time of register-register operations for the 68000 is 4 cycles, and for the 8086 it is only 2.

As always with products from Motorola the architecture of the 68000 shows some clumsiness and contrived oddities. For example there are two stacks and two carry flags (one for condition checks and another for operations). The oddities with the flags do not end with that. For some reason many instructions including even MOVE zero the carry and overflow flags. Another oddity is that the command to save the state of arithmetic flags which worked normally with the 68000, was made privileged in all processors starting with the 68010. Some operations irritate by their non-optimization, for example, the CLR instruction of writing zero to memory is slower than writing a constant 0 to memory with the MOVE instruction or shift to the left is slower than adding an operand to itself. There are some almost unnecessary commands, for example there are both arithmetic and logical shifts to the left. Even the address registers while seemingly superior to the 8086 segment registers have a number of annoying disadvantages. For example they need to load as much as 4 bytes instead of 2 for the 8086 and of these four, one was extra. The 68000 command system reveals many similarities with the PDP-11 command system developed back in the 60's.

The codes for Motorola look somewhat more cumbersome and clumsy compared to the x86 or ARM. On the other hand the 68000 is faster than the 8086, according to my estimates by about 20-30%. The 680x0's code however has its inherent special beauty, elegance and less mechanicality than the x86's. Additionally as shown by eab.abime.net experts, the code density of the 68k is often better than that of the x86.

Overall the 68000 is a good processor with a large instruction set. It was used in many of the now legendary personal computers: the first Apple Macintosh computers that were produced before the early 90's, the first Commodore Amiga multimedia computers, and in relatively inexpensive and high-quality Atari ST computers. The 68000 was also used in relatively inexpensive computers working with Unix variants, in particular in the rather popular Tandy 16B. Interestingly IBM simultaneously led the development of the PC and the System 9000 computer based on the 68000, which was released less than a year after the PC.

The 68010 appeared clearly belatedly only in 1982 at the same time when Intel released the 80286, which put personal computers on the same level as a mini-computer. The 68010 is pin-compatible with the 68000 but the system of its instructions is slightly different, so the replacement of the 68000 by 68010 has not become popular. This incompatibility was caused by a contrived reason to bring the 68000 into more correspondence with the ideal theory of virtualization. The 68010 is only slightly no more than 10% faster than the 68000. Obviously the the 68010 was badly losing to the 80286 and was even weaker than the 80186 that appeared in the same year. Like the 80186 the 68010 almost never found a use in personal computers.

The 68008 was also released in 1982 probably with a hope of repeating the success of the 8088. It's the 68000 but with an 8-bit data bus which allowed it to be used in cheaper systems. But the 68008 like the 68000 does not have an instruction queue which makes it about 50% slower than the 68000. Thus the 68008 may even be a little slower than the 8088, which is only about 20% slower than the 8086 due to the presence of the instruction queue.

Based on it Sir Clive Sinclair made the Spectrum QL, a very interesting computer that because of the lower price could compete with the Atari ST and similar computers. But Clive in parallel and clearly prematurely began to invest a lot in the development of electric vehicles leaving the QL (Quantum Leap) rather as a secondary task, that in the presence of some unsuccessful constructive decisions led the computer and the whole company to premature closure. The company became part of Amstrad, which refused to produce QL.

It would be interesting to calculate the bit index for the 68000, which seems to me clearly higher than 16 although maybe it is not higher than 24.

Appearing in 1984 the 68020 again returned Motorola to the first position. In this processor many very interesting and promising innovations were realized. The strongest effect is certainly the instruction pipeline, which sometimes allows you to execute up to three instructions at once! The 32-bit address bus looked a little premature in those years, and therefore a cheaper version of the processor (the 68020EC) with a 24-bit bus was available, but the 32-bit data bus looked quite appropriate and allowed to significantly speed up the processor. The built-in cache appeared to be an innovation even though it was a small 256 bytes of capacity, which allowed to significantly improve the performance because the main dynamic memory could not keep up with the processor. Quick enough operations for division (64/32 = 32,32) and multiplication (32*32 = 64) for approximately 80 and up to 45 cycles respectively were added. The timings of the instructions were generally improved for example the division (32/16 = 16,16) began to be performed for approximately 45 cycles (more than 140 cycles in the 68000). Some instructions in the most favorable cases can be performed without occupying clocks at all! New address modes were added in particular with scaling, in the x86 this mode appeared only in the next year with the 80386. Other new address modes allow the use of double indirect addressing using several offsets, the PDP-11 has been remarkably outdone here.

Some new instructions for example bulky operations with bit fields or new operations with decimal numbers that have become little needed in the presence of rapid division and multiplication looked more like a fifth wheel of a bus than something essentially useful. Address modes with double indirect addressing theoretically look interesting but practically are needed quite rarely and are executed very slowly. Unlike the 80286 the 68020 takes time to compute the address of the operand, the so-called effective address. The division at the 68020 is still almost twice as slow as the fantastic division of the 80286. Multiplication and some other operations are also slower. The 68020 doesn't have a built-in memory management unit and the rather exotic ability to connect up to eight coprocessors couldn't fix this.

The 68020 was widely used in mass computers the Apple Macintosh II, Macintosh LC and Commodore Amiga 1200, it was also used in several Unix systems.

The appearance of the 80386 with a built-in and very well-made MMU and 32-bit buses and registers again put Motorola in position number 2. The 68030 appearing in 1987 for the last time briefly returned the leadership to Motorola. The 68030 has a built-in memory management unit and a doubled cache, divided into a cache for instructions and data, it was a very prospective novelty. In addition the 68030 could use a faster memory access interface which can speed up memory operations by almost a third. Despite all the innovations the 68030 turned out to be somewhat slower than the 80386 at the same frequency. However the 68030 was available at frequencies up to 50 MHz, and the 80386 only up to 40 MHz, which made top systems based on the 68030 slightly faster.

The 68030 was used in computers of the Apple Macintosh II series, Commodore Amiga 3000, Atari TT, Atari Falcon and some others.

With the 68040 Motorola once again tried to outperform Intel, this processor appeared a year later after the 80486. However the 68040's set of useful qualities was never able to surpass the 80486's. In fact Motorola having a more overloaded system of instructions was not able to support it, and in a sense has disappeared from the race. In the 68040 only a very truncated coprocessor could be placed to work with real numbers, and the chip itself was heated significantly more than the 80486. According to the results on lowendmac.com/benchmarks, the 68040 only about 2.1 times faster than the 68030 which means that the 68040 is slightly slower than the 80486 at the same frequency. The 68040 almost did not find applications in popular computers. Some noticeable use was found only by its cheaper version the 68LC040 which does not have a built-in coprocessor. However the first versions of this chip had a serious hardware defect which did not allow using even the software emulation of the coprocessor!

Motorola always had problems with mathematical coprocessors. As was mentioned above Motorola never released such a coprocessor for the 68000/68010, while Intel had released its very successful 8087 since 1980. To get a significant performance boost the code for the 68882 needs to be compiled differently than for the 68881.

It is appropriate to say that the Intel x86 still has problems with the mathematical coprocessor. The accuracy of calculations of some functions, for example the sine of some arguments, is very small, sometimes no more than 4 digits. Therefore modern compilers often calculate such functions without using the services of the coprocessor.

Edited by Richard BN