?

Log in

No account? Create an account
Emotional stories about processors for first computers: part 12 (Preface and Postface)
2x2=4, mathematics
litwr

Prologue and Epilogue



I've happened to program with assemblers of different processors. The last on this list is the Xilinx MicroBlaze. I decided to put some of my observations on the features of these almost magical pieces of iron, which, like a magic key opened the doors for us to the magical land of virtual reality and mass creativity. On the features of modern systems the x86, x86-64, ARM, ARM-64, etc., I will not write, maybe another time – the topic is very large and complex. Therefore, I finish on the Intel 80486 and Motorola 68040. I also wanted to include in the review the IBM/370, which I had to deal with. These systems were quite far from the mass users but had a huge impact on computer technology. They require much time to prepare materials about them, they didn't use chip-processors and there is somehow none of these machines left in existence, therefore they aren't included. I really hope that my materials will also attract the attention of experts, who will be able to add something I have not thought about or didn't know.

As illustrative material, I attach my small stone from Rosetta, tiny programs for calculating the number π on different processors and systems using a spigot-algorithm, claiming to be the fastest of its implementations.

In conclusion, I make several remarks that I have got in the course of writing these articles.

It is difficult to get rid of the feeling that 8-bit processors were only an undesirable necessity for the main characters acting in the 70's and 80's on the stage of computer history. The development of the most powerful 8-bit 6502 was actually frozen. Intel and Motorola rather slowed down their own development of small processors and restrained other developers.

I'm pretty sure that the Amiga or Atari ST would work better and faster using a 4 MHz processor compatible with the 6502 with a 20 or 24 bit address bus than with the 68000. Bill Mensch said recently that it’s easy to make the 6502 at 10 GHz today.

If the Amstrad PCW series (the success of which the Commodore CBM II could have shared) had begun to use the upgraded Z80 at higher frequencies, then it is quite possible that this series would have been relevant 10 years ago.

What would the world be like if the ARM had been made in 1982 or 1983? In my humble opinion it was quite possible.

What would computers made in the SU be like if they had copied and developed not the most expensive but the most promising technologies?


Emotional stories about processors for first computers: part 11 (Intel 8080)
2x2=4, mathematics
litwr

Intel 8080 and 8085



The first real processor on a chip, made in the first half of 1974, is still being manufactured and is currently being used. It was repeatedly cloned around the world, in the USSR it had the designation KP580BM80A. Modern Intel processors for the PC still easily reveal their kinship to this in some sense relic product. I myself haven't written codes for this processor but being well acquainted with the architecture of the Z80, I would venture to give some of my comments.

The 8080 instruction system like other Intel processors for the PC can hardly be called ideal, but it is universal, quite flexible and has some very attractive features. The 8080 favorably differed from its competitors, the Motorola 6800 and the MOS Technology 6502, by a large number of even somewhat clumsy registers. The 8080 provided a user with one 8-bit accumulator, a 16-bit semi-accumulator and simultaneously fast index register HL, a 16-bit stack pointer, as well as two more 16-bit registers BC and DE. The BC, DE, and HL registers could also be used as 6 byte-registers. In addition the 8080 had support for an almost full set of status flags: carry, sign, zero and even parity and auxiliary carry. Some commands from the 8080 instruction set had been speed champions for a long time. For example the XCHG command makes the exchange of the contents of the 16-bit DE and HL registers in just 4 clock cycles, it was extremely fast! A number of other commands, although they did not set such bright records, were also among the best for a long time:


  • XTHL – exchange of HL register contents and data at the top of the stack, 18 cycles – it seems like a lot, but even on the real 16-bit 8086 an equivalent of such a command takes at least 26 cycles, and for the 6800 or 6502 such a command is hard to imagine;
  • DAD – add to the semi-accumulator HL the value of another 16-bit register (BC, DE or even SP), 10 cycles. This is a true 16-bit addition with a carry flag set. If you add HL to itself you will get a quick 16-bit shift left or multiplication by 2, which is a key operation for programming both full multiplication and division;
  • PUSH and POP – put in the stack and remove from the stack a 16-bit value respectively from a register or in a register. They perform in 11 and 10 cycles. These are the fastest 8080's operations for working with memory, and when they are executed SP is automatically incremented or decremented. The PUSH can be used for example to quickly fill memory with a pattern with values from 3 registers (BC, DE, HL). There are no stack instructions for working with 8-bit values at all;
  • LXI – a loading of a 16-bit constant into a register (HL, DE, BC, SP) for 10 cycles;
  • RNZ, RZ, RNC, RC, RPO, RPE, RP, RM – conditional returns from any subroutine, allow to make code cleaner eliminating the need to write extra conditional jumps. These commands were abandoned in the x86 architecture, but they should probably have been saved, the code with them turns out nicer.

This processor was used in the first 'almost personal computer' the Altair 8800, which became very popular after the journal publication in early 1975. By the way in the USSR a similar publication happened only in 1980 and corresponding to it in relevance only in 1986.



The first almost PC



The Intel's 8080 became the basis for the development of the first mass professional operating system CP/M, which occupied a dominant position among microcomputers for professional work until the mid-80's.

Now about the shortcomings. The 8080 required three supply voltages of -5, 5, and 12 volts. Working with interrupts was clumsy and slow. In general the 8080 was rather leisurely if you compare it with competitors which soon appeared. The 6502 could be up to 3 times faster when working on the same frequency as the 8080.

However in the architecture of the 8080 was laid as it turned out the correct vision of the future, namely it was a vision of a fact unknown in the 70's that processors would be faster than memory. The 8080's DE and BC registers are a prototype of modern caches with manual control, rather than general-purpose registers. The 8080 could use 2 MHz frequency, while competitors could only use 1 MHz, which reduced the performance difference between them.

It's hard to call the 8080 a 100% 8-bit processor. Indeed its ALU is 8 bits wide, but there are many 16-bit commands that work faster than if you use only 8-bit counterparts instead, and for some instructions there are no 8-bit analogs at all. The XCHG instruction is essentially and by timing 100% 16-bit and there are real 16-bit registers. Therefore I venture to call the 8080 partially 16-bit. It would be interesting to calculate this processor's bit index based on the set of its features, but as far as the author knows, no one has still done such work.

The author of this text does not know the reasons why Intel abandoned direct support of the 8-bit PC's with their processors. Intel has always distinguished the complexity and ambiguity of the policy. Its connection with politics in particular is illustrated by the fact that for a long time Intel has had fabs in Israel and until the end of the 90's it was secret. Intel practically did not try to improve the 8080, only the clock frequency was raised to 3 MHz. In fact the 8-bit computer market was given to Zilog with the z80 processor which was related to the 8080, and the z80 was able to quite successfully withstand the main competitor, The Terminator 6502.

In the USSR and Russia the domestic clone of the 8080 became the basis of many popular computers that remained popular until the early 90s. Those are of course the Radio-86RK, Mikrosha, the multicolor Orion-128, Vector, and Corvette. Eventually cheap and improved ZX Spectrum clones based on the z80 won the clone wars.




This is a real PC



In early 1976 Intel introduced the 8085 processor, compatible with the 8080, but significantly superior to its predecessor. In it the power supply of -5 and 12 volts has become unnecessary and the connection scheme has been simplified, work with interrupts has been improved, the clock frequency has been used from 3 to a very solid 6 MHz, the command system has been expanded with very useful instructions: 16-bit subtraction, 16-bit shift right for only 7 cycles (it was very fast), 16-bit rotate left through the carry flag, loading of a 16-bit register with an 8-bit offset (this instruction is possible to use with the stack pointer too), writing of the HL register contents to an address in the DE register, analogous reading of the HL via an address in the DE. All the instructions mentioned above, except for the shift to the right, are executed in 10 cycles – this is sometimes significantly faster than their counterparts or emulation on the Z80. Some more instructions and even two new processor status flags were added. Among the new flags it is worth noting the overflow flag, although the work with it was almost not supported. In addition many instructions for working with byte data were accelerated by 1 clock cycle. This was very significant as many systems with the 8080 or Z80 used wait states, which due to the presence of extra cycles on the 8080 could stretch the execution time by almost twice. For example in the mentioned computer Vector, register-register instructions were performed for 8 cycles, and if there were the 8085 or Z80, then the same instructions would be executed only in 4 cycles. The XTHL instruction became faster even by two cycles. With the new instructions you can write code to copy a block of memory that runs faster than the Z80's LDI/LDD commands! However, some instructions, for example a 16-bit increment and decrement, the PUSH and conditional returns became slower by a cycle.

The 8085 has built-in support for interrupts, which in many cases eliminates the need for a separate interrupt controller in a system, and a serial I/O port. As already noted in the 8085 the full support was not added for the overflow flag, so the arithmetic of signed numbers remained somewhat incomplete.

However I can repeat the statement "for unknown reasons" Intel refused to promote the 8085 as the main processor for PC's. It was only in the 80's that some fairly successful 8085-based systems appeared. The IBM System/23 Datamaster first appeared in the 1981, it was a predecessor and almost a competitor to the IBM PC. Then in 1982 a very fast computer with excellent graphics, the Zenith Z-100, was released, in which the 8085 was running at 5 MHz. In 1983 Japanese company Kyotronic created a very successful KC-85 laptop, versions of which were also produced by other companies: Tandy were producing the TRS-80 model 100, NEC – the PC-8201a, Olivetti – the M-10. In total they released perhaps more than 10 million of these computers! In Russia in the early 90's on the basis of domestic clone the ИM1821BM85A there were attempts to improve some systems, for example, the computer Vector. Surprisingly the main processor of the Sojourner rover, which reached the surface of Mars in 1997, was the 8085 at 2 MHz!

In fact Intel gave way to the Z80 in the 70's. A few years later in the battle for the 16-bit market Intel behaved quite differently, starting a lawsuit to ban sales of the V20 and V30 processors in the United States. Interestingly the mentioned processors of Japanese company NEC could switch to full binary compatibility with the 8080, which made them the fastest processors of the 8080 architecture.

Another secret from Intel is the refusal to publish an extended command system, including support for two new flags. However one of the official manufacturers of these processors has published the entire system of instructions. What are the reasons for this strange refusal? One can only guess. Could Zilog then have played a role, that AMD might have once played, and created the ostensible appearance of competition while the 8085 could have brought down Zilog? Was it maybe about wanting to keep the system of instructions closer to the 8086 then being designed? The latter seems doubtful. The 8086 was released more than 2 years after the release of the 8085 and it’s hard to believe that in 1975 the system of its commands was already known. And in any case compatibility with both the 8080 and 8085 on the 8086 is achievable only with the use of a macro processor, sometimes replacing one of the 8080's or 8085's instruction with several of its own. Moreover the two published new instructions of the 8085 in the 8086 are not implementable at all. It is especially difficult to explain why Intel did not publish information about new instructions after the release of the 8086. We can also assume that most likely it was in the marketing. Due to artificially worsening specifications of the 8085, they received on this background a more spectacular 8086.

Edited by Richard BN


Emotional stories about processors for first computers: part 10 (MOS Technology 6502)
2x2=4, mathematics
litwr

6502 and 65816



This is a processor with a very dramatic fate, no other processor can compare with it. Its appearance and introduction were accompanied by very large events in scope and consequences. I will list some of them:


  1. the weakening of the giant Motorola company, which for some time exceeded the capabilities of Intel;
  2. the destruction of the independent company MOS Technology;
  3. the cessation of the 6502 development and its stagnant production with little or no modernization.

It all started with the fact that Motorola for unknown reasons refused to support the initiative of young engineers, who offered to improve the overall rather mediocre processor 6800. They had to leave Motorola and continue their work in a small but promising MOS Technology company, where they soon prepared two processors, the 6501 and 6502, both of them (like almost all processors of that time) were fabricated using NMOS technology. The first one was pin-compatible with the 6800, but in other details they were identical. The 6501/6502 team was able to successfully introduce a new chip production technology, which radically reduced the cost of new processors. In 1975, MOS Technology could offer the 6502 for $25, while the starting price for the Intel 8080 and Motorola 6800 was $360 in 1974. In 1975, Motorola and Intel lowered prices, but they were still close to $100. MOS Technology specialists claimed that their processor was up to 4 times faster than the 6800. I find this questionable: the 6502 can work much faster with memory, but the 6800's second accumulator greatly accelerated many calculations. I can assume on estimation that the 6502 was on average no more than 2 times faster. Motorola launched a lawsuit against its former employees – they allegedly used many of the company's technological secrets. During the trial it was possible to establish that one of the engineers who had left Motorola took some confidential documents on the 6800, acting contrary to the attitudes of his colleagues. Whether it was his own act or there were still some guiding forces behind him is still unknown. For this and other unclear reasons, Motorola indirectly won the case and MOS Technology whose financial capabilities were very small, was forced to pay a substantial amount of $200,000 and to abandon production of the 6501. Intel in a similar situation with Zilog acted quite differently. Although it must be admitted that MOS Technology was sometimes too risky when trying to use the big money that Motorola spent on promoting the 6800 for its own purposes.

Further, the legendary Commodore company and its no less legendary founder Jack Tramiel appeared in the 6502 story, in the shadows of whom was the figure of the chief financier of the company determining its policy – a man named Irving Gould. Jack got a loan from Irving and with this money, using a few to put it mildly unscrupulous tactics, forced MOS Technology to become a part of Commodore. After that and possibly against the wishes of Tramiel, who was forced to give in to Gould, the development of the 6502 almost stopped, despite the fact that even in 1976 it was possible to produce prototypes of the 6502 with operating frequencies up to 10 MHz. Although the message about this appeared only many years later from a man named Bill Mensch, who was with the team that left Motorola and sometimes made loud but by and large empty statements and played a rather ambiguous role in the fate of the 6502. The main developer of the 6502 Chuck Peddle was forever removed from the development of processors. The 6502 continued to be produced not only at Commodore but also at Western Design Center (WDC) created by Bill Mensch. It is fascinating that none of the former 6502 team worked with him in the future.

The continuing drama around the 6502 was not over. In 1980, a short anonymous article appeared in Rockwell's AIM65 Interactive magazine stating that all 6502's carry a dangerous bug called the JMP (xxFF). The tone of the article suggests something completely out of the ordinary. Subsequently this attitude moved to Apple regarding the issue and became a kind of mainstream. Although a "bug" strictly speaking it was not. Of course for a specialist accustomed to the comfortable processors of large systems of those years one of the features that is quite relevant and even useful among microprocessors, could seem something annoying, a bug. But in fact this behavior of hurting someone's feelings was described in the official documentation from 1976, and in the textbooks on programming published before the appearance of the mentioned article. The "bug" was eliminated by Bill Mensch, who made the 65С02 (CMOS 6502) supposedly by 1983, i.e. after the official release of the 65816. While Intel, Motorola and others had already made 16-bit processors of new generations, the 6502 was only microscopically improved and made artificially partially incompatible with itself. In addition to eliminating the "bug", a number of changes were made, which in particular led to a change in the course of executing several instructions. These instructions became slower in a cycle, but at the same time they became more correct in some far-fetched academic sense. But it must be admitted that several new instructions turned out to be expected and useful. On the other hand, the absolute majority of the new instructions only occupied the code space, adding almost nothing to the capabilities of the 6502, which left fewer new codes for possible further upgrades. Commodore and Japanese Ricoh (manufacturer of the very popular game consoles NES) did not accept these changes. The author of this material himself has encountered several times the problem of this “bug”, although knowing nothing about it, he was writing programs for the Commodores. There was an incompatibility, he had to change the codes, to do a conditional assembly. The code for the 65C02 turned out to be more cumbersome and slower. Then I raised this question on the forum 6502.org, where some participants had familiarity with the Apple ][ computers. I asked if anyone could give an example when the aforementioned "bug" crashed the program. I received only emotional and general comments, a specific example was never offered.



Bug!!!



The 65C02 was licensed to many companies, in particular NCR, GTE, Rockwell, Synertek, and Sanyo. The 6512 was a 65C02 variant which was used in later BBC Micro models. Atari used the NMOS 6502. Synertek and Rockwell companies in addition to the CMOS 6502, also produced the NMOS 6502. By the way the NMOS 6502 has its own set of undocumented instructions, the nature of which is completely different from the secret commands of the 8085. In the 6502 these instructions appeared as a side effect of the technology used, so most of them are rather useless. But several instructions for example, loading or unloading two registers with one command at once, and some others can make the code faster and more compact.

There were other attempts to modernize the 6502. In 1979, an article appeared that for the Atari computers, the 6509 processor was being prepared for production (not to be confused with the later Commodore's processor with the same name), in which command execution acceleration by 25% and many new instructions were expected. For unknown reasons the production of this processor never took place. Commodore conducted only microscopic upgrades. There, in particular they switched to HMOS technology and the manufacture of static cores, which allowed slowing down of the processors. From the point of view of programming, the most interesting was the processor 6509 which, albeit in a very primitive form, with the help of only two instructions specially allocated for this purpose allows addressing up to 1 MB of memory. In the super-popular Commodore 64 and 128, there were the 6510/8510 processors, and in the less successful 264 series – the 7501/8501. These processors had 6 and 7 embedded I/O bit-ports respectively, while the 7501/8501 did not support non-masked interrupts. Rockwell produced a version of the 65C02 extended by their own 32 operations for one-bit values (similar to the z80's bit instructions). However, as far as I know such processors were not used in computers, and these bit instructions themselves were more likely to be used only in embedded systems. This extension was made by Bill Mensch.

The last scene of the drama with the participation of the 6502 was featured in the prevention of computers based on the 6502 with a frequency of 2 MHz on the US market in the first half of the 80's. This affected the English BBC Micro, their manufacturing company Acorn made a large batch of computers for the United States, but as it turned out, in vain. Some kind of lock was triggered and the computers had to be urgently redone to European standards. Almost American, but formally Canadian computers Commodore CBM II (1982), despite some problems (in particular, compliance with the standards for electrical equipment), were nevertheless admitted. Perhaps it was due to a fact that they did not have graphic modes and even color text which made them little threat to American market mainstreams and even the stylish Porsche design could not compensate for this. The latest in the list of losers was the 100% American Apple III (1980) – it is known that Steve Jobs like Apple's management in general did a lot to prevent this computer from being successful. Steve demanded obviously impracticable specifications and the management asked for unrealistic deadlines. Will we ever know their motives? It became possible to eliminate the flaws of the Apple III in the Apple III Plus (1983), but the Apple’s management quietly closed the project in 1984 because of their reluctance to have a competition with the Macintosh computer. Only in 1985, when the era of 8-bit technology began to go away, did the Commodore 128 appear which could use in one of its modes the 6502 at 2 MHz clock. But even here it turned out to be more of a joke since this mode was practically not supported and there are practically no programs for it. Only in the second half of the 80's in the United States there began production of accelerators for the Apple II and since 1988 the Apple IIc+ model with a 4 MHz processor. Why did it happen that way? Perhaps because the 6502 at 2 or 3 MHz (and these were already produced at the very beginning of the 80's) could successfully compete with systems based on the Intel 8088 or Motorola 68000 in a number of tasks and especially games. In 1991, the willful decision of Commodore closed an interesting albeit belated project, the C65 based on the 4510 processor with a frequency of 3.54 MHz. The 4510 is the fastest 6502, made only in 1988, it finally carried out the previously mentioned optimization of cycles which gave a 25% increase in speed. Thus, the processor in the C65 is close in speed to the 6502 systems at 4.5 MHz. Surprisingly, this fastest 6502 with an extended set of instructions (in some detail this extension turned out to be more convenient than in the 65816) has never been used anywhere else.

The Commodore C128 and Apple III Plus had a MMU that allowed them to use several stacks and zero pages, to address more than 64 KB of memory, etc. The C128's MMU was artificially trimmed to work with only 128KB of memory. For the BBC Micro computers the second processor boards were produced with the 6502 at 3 MHz (1984) and 4 MHz (1986).




Anti-advertising – multiple Porsche PETs in the apartment of the villain of The Jewel of the Nile – The Apple only era in Hollywood had not yet come



Now a few words about the instruction system of the 6502. The main feature of this processor is that it was made almost as fast as possible, with almost no extra clock cycles which are especially numerous in the 8080/8085/z80/8088/68000 processors. In fact, it was the main concept of RISC-architecture processors that appeared later under the direct influence of the 6502. The same concept dominates starting with the 80486, and among Intel processors. In addition, the 6502 responded very quickly to interrupts, which made it very useful in some embedded systems. The 6502 has one accumulator and two index registers, in addition, the first 256 bytes of memory can be used in dedicated commands either as faster memory or as a set of 16-bit registers (which are almost identical in their functionality to the BC and DE registers in the 8080/z80) for pretty powerful ways to address memory locations. Some arithmetic commands (shifts, rotation, increment, and decrement) can be used with memory directly, without using registers. There are no 16-bit instructions – this is a 100% 8-bit processor. It supports all the basic flags but the parity flag which is typical only for the Intel's architecture. There is one more special flag of the low-useful decimal mode. Intel and Motorola processors use special corrective instructions for working with decimal numbers, and the 6502 can switch to decimal mode which makes its speed advantage with decimal numbers even more significant than with binary ones. Very impressive for the 6502 is the presence of a table multiplication of 8-bit operands with a 16-bit result in less than 30 cycles, with an auxiliary table size of 2048 bytes. One of the slowest 6502's operations is a block memory copy, it can take more than 14 cycles per byte.

The 6502 can work in parallel with another device, for example another 6502. As far as I know, such dual-processor systems have never been produced, instead of the second processor a video controller was usually used, which shared memory with the 6502.

The 65816 was released by WDC in 1983. Interesting is the fact that some specifications of the new processor Bill Mensch received from Apple. Of course, this was a big step forward, but clearly belated and with large architectural flaws. The 65816 was not considered by anyone as a competitor for the main processors of Intel or Motorola – it was already a minor outsider, which was already somehow programmed to be set to further lose its positions. The 65816 had two important advantages – it was relatively cheap and almost compatible with the still very popular 6502. In subsequent years, Bill Mensch didn’t even try to somehow improve his brainchild, do cycle optimization, replace the zero page addressing by extended one using the Z register (this was done in the 4510), add at least multiplication, etc. WDC only increased the limiting clock speeds, reaching 14 MHz by the mid-90's (this processor was used in the popular accelerator for the C64, the SuperCPU at a frequency of 20 MHz). However, even now (2019!) WDC for some reason, offers the 65816 only at the same 14 MHz. The 65816 can use up to 16 MB of memory, but the addressing methods used for this look far from optimal. For example: index registers can be only 8 or 16 bit, the stack can be placed only in the first 64 KB of memory, only there you can use the convenient short addressing of the direct page (the generalization of zero page addressing), working with memory above 64 KB is comparatively awkward, etc. The 65816 has a 16-bit ALU but an 8-bit data bus, so it is only about 50% faster than the 6502 with arithmetic operations. Nevertheless, the 65816 was released in more than a billion units. Indeed, some instructions of the 65816 clearly fill the gaps in the 6502 architecture, for example, the commands for block copying of memory in 7 clock cycles per byte. I can also add that the 65816 uses almost all instruction codes, 255 out of 256. The last unused code is for future long instructions that have not yet appeared.

The Apple IIx in the development of which Steve Wozniak was actively involved had to use the 65816. However, it was possible to start mass production of this processor only in 1984 and the first batches of it were defective, which caused excessive delays and eventually the closure of the entire project.

The 65802 is another version of the 65816, which uses a 16-bit address bus and a pin layout compatible with the 6502. An upgrade for the Apple II based on this processor was offered, but slight acceleration with such an upgrade can only be obtained with specially written programs.

The 6502 was used in a large number of computer systems, the most popular of which were the 8-bit Commodore, Atari, Apple, and NES. It is interesting that the 6502 was also used in the keyboard controller of the Commodore Amiga, and two 6502's at 10 MHz were used in the high-performance Apple Macintosh IIfx. Here it is impossible not to mention the Atari game consoles, produced from 1977 to 1996 – about 35 million of them were sold! The 65816 was used in the rather popular Apple IIgs computer, in the Super NES gaming console, and also in the rare Acorn Communicator computer.

In 1984, an article in Byte magazine about a bad copy of the Apple ][ computer, the Agat, made in the USSR appeared in the background of pictures with red banners, Lenin and marching soldiers. This article cited a curious price for this computer of $17,000 (it was an absurd amount, the real price was about 4000 rubles) and ironically indicated that Soviet manufacturers would have to dramatically lower the price if they wanted to sell their product in the West. The Agat was used mainly in school education. The older Agat models were almost 100% compatible with the Apple ][ and had some pretty useful extensions.

One can only try to fantasize about what would have happened if the 6502 had developed at the same pace as its competitors. It seems to me that the gradual moving of zero-page memory to registers, and the gradual expansion of the command system with simultaneous optimization of cycles would allow The Terminator 6502 to remain in the lead in terms of performance until the early 90's. Introducing 16-bit mode and then 32-bit, would allow more memory and faster commands to be used. Would its competitors have been able to oppose this?

I would like to finish with some general philosophical arguments. Why the 6502 was slowed down in its development and deprived of a much brighter future? Maybe due to the fact that this development really could very much press large firms and create a completely new reality. Was the 6502 team set up for this? In my humble opinion, rather no, they just wanted to make a better processor.

Edited by Richard BN and Dr Jefyll


Emotional stories about processors for first computers: part 9 (Acorn ARM)
2x2=4, mathematics
litwr

The first ARM processors



The ARM-1 processor was an astonishing development, it continued the 6502 ideology (namely to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle the 80386 processor appeared. ARM consisted of about ten times less transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division the 80386 could be faster. However the advantages of ARM were so great that today it is the most mass processor architecture, more than 100 billion such processors have been produced.

The ARM's development in 1983 began after Acorn conducted research with the 32016 processor, which showed that many calculations with the 6502 at twice the lower operating frequency than the 32016 could be faster than with what seemed to be a much more powerful processor. At that time the 80286 was already available, it showed very good performance. But Intel perhaps sensing the potential of Acorn refused to provide its processor for testing. The technology of the 80286 was not restricted as was the 80386 and was transferred to many companies, so history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps if Intel had allowed the use of its processor, then Acorn would have used it, and would not have developed the ARM.

The ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. The debut of the processor turned out rather unsuccessfully. In 1986 the second ARM processor for the BBC Micro was released with the name of the ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product (above 4000 pounds, it was about $6000). Indeed if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two orders of magnitude cheaper. There were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system, there were a lot of Unix variants available in that time which didn't require MMU, there were such Unix variants for the 68000, PDP-11, 80186 and even 8088. Linux was ported for the Acorn Archimedes only in the 90's. Perhaps the delay in the appearance of a real Unix for the ARM was caused by Acorn's reluctance to transfer ARM technology to other companies.



The first ARM based system



The Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn in addition to the ARM also tried to conduct expensive development of computers for business which failed, in particular due to the shortcomings of the 32016 processor chosen for them. The Acorn Communicator computer was also not very successful. The development of a relatively successful but not quite IBM PC compatible computer Master 512, was very costly. In addition a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers was allowed to enter into, as part of a hypothetical big game of absorbing Acorn itself. By the way after the absorption of Acorn the role of Olivetti in the US market quickly faded away.

As part of Olivetti Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which the Archimedes personal computers were made. They were stunning then for their speed. The first models of those computers became available in 1987. However Olivetti's management was focused on working with the IBM PC compatible computers and did not want to use its resources to sell Acorn products.

The ARM provides for the use of 16 32-bit registers. There are actually more of them if we take into account the registers for system needs. One of the registers the R15 is (like the PDP-11 architecture) a program counter. Almost all operations are performed in 1 clock cycle, more cycles are needed in particular for jumps, multiplications and memory accesses. Unlike popular processors of those years ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented if necessary through one of the registers. When calling subprograms the stack is not used; instead the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of the ARM is the combination of the program counter (which is 26-bit and therefore it allows you to address up to 64 MB of memory) with a status register. For flags in this register eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. The ARM's instructions for working with data are 3-address.

A characteristic feature of RISC architecture is the use of register-memory commands only for loading and storing data. The ARM has a built-in fast bit shifter (barrel shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. In the instruction system there is a reverse subtraction, which allows in particular to have a unary minus as a special case of this instruction and speed up the division procedure. The ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction, an instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures such an execution takes place, as a rule only for conditional jumps. This feature of the ARM allows to avoid slow jump operations in many cases. The latter is also facilitated by a fact that when performing arithmetic operations you can refuse to set status flags. With the ARM like with the 6809 processor you can use both fast and regular interrupts. In addition in the interrupt modes the higher-numbered registers are replaced with the system ones, which makes interrupt handlers more compact and fast.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system, but the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for the 80386, for example, the RSB (reverse subtraction), the BIC (the AND with inversion, such a command exists for the PDP-11), the 4-address MLA (multiplication with accumulation), the LDM and STM (loading or unloading multiple registers from memory, they are both similar to the MOVEM command for the 68k processors). Almost all of the ARM instructions are 3-address, and almost all of the 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal that means that all registers are interchangeable, some exceptions are registers R14 and R15. Most of the ARM's commands may require 3-4 of the 80386's commands to emulate them, and most of the 80386's commands can be emulated by only 2-3 ARM commands. Interestingly the IBM PC XT emulator on the hardware of the Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In the Commodore Amiga with the 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with the 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make the Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtook not only the Apple ][ but also the Macintosh based on the 68k processors, the project was closed!

Among the shortcomings of the ARM we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore loading a full 32-bit constant can take up to 4 instructions. You can of course load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of the ARM is its relatively low code density, which makes the programs somewhat large and, most importantly reduces the efficiency of the processor cache. However this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of the product. For a long time a significant drawback of the ARM was the lack of built-in support for memory management (MMU), Apple for example demanded this support in the early 90's. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. The ARM did not have such advanced features for debugging as the x86 had. There is still some oddity in the standard assembler language for the ARM: it is standard to write operations for the barrel shifter separated by commas. Thus instead of the simple form R1 shl 7 (shift the contents of the register R1 by 7 bits to the left) you need to write R1, shl 7.

Since 1989 the ARM3 has become available with a built-in cache. In 1990 the ARM development team separated from Acorn and created ARM Holdings with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that subsequently Acorn ceased its independent existence and ARM Holdings became a large company. However the separation of Acorn and ARM Holdings was also initiated by Apple’s desire to have the ARM processors in its Newton computers and not be dependent on another computer manufacturer.

Further development of the ARM architecture is also very interesting, it affected in particular the interests of such well-known companies like Intel, DEC and Microsoft, but this is another story. Although it can be mentioned that thanks to a share in ARM Holdings Apple was able to avoid bankruptcy in the 90's.

A lot of thanks to jms2 and BigEd who helped to improve the style and content. Edited by Richard BN


Emotional stories about processors for first computers: part 8 (DEC VAX-11)
2x2=4, mathematics
litwr

Processor for DEC VAX-11



The VAX-11 systems were quite popular in the 80's, especially in higher education. Now it is difficult to understand some of the concepts described in the books from those years, without knowing the features of the architecture of those systems.

The VAX-11 was more expensive than the PDP-11. However it was more oriented towards universal programming than the PDP-11. Additionally the VAX-11 was significantly cheaper than the IBM/370 systems.

The V-11 processor that was produced by the mid-80s for the VAX architecture, before that time processor assemblies were the only option.

The VAX-11 architecture is 32-bit, it uses 16 registers, among which, like the PDP-11, there is a program counter. It assumes the use of two stacks, one of which is used to store frames of subroutines. In addition one of the registers is assigned to work with the arguments of called functions. Thus, 3 of 16 registers are allocated for stacks.

The instruction system of the VAX-11 cannot fail to amaze with its vastness and the presence of very rare and often unique commands. For example it has commands for working with bit fields, for working with several types of queues, for calculating the CRC, for multiplying decimal strings, etc. Many instructions have both three-address variants (like the ARM) and two-address variants (like the x86), but there are also four-address instructions, for example, the extended division – EDIV. Of course, there is support for working with floating point numbers.

However the VAX-11 is a very slow system for its class and price. Even the super-simple 6502 at 4 MHz could outrun the slowest family member VAX-11/730. The fastest VAX-11 systems – huge cabinets and “whole furniture sets” – was at the same level of speed as the first PC AT's. When the 80286 appeared it became clear that the days of the VAX-11 were numbered and even the slowdown of the development of systems based on the 80286 could not change anything fundamentally. The straightforward people from Acorn having made the ARM in 1985 without hiding anything, said that the ARM is much cheaper and much faster. The VAX-11 however remained relevant until the early 90's, while still having some advantages over the PC, in particular faster systems for working with disks.

The VAX-11 is probably the last mass computer system in which the convenience of working in assembly language was considered more important than its performance. In a sense this approach has moved to modern popular scripting languages.



The VAX-11/785 is also a computer (1984) – the fastest among the VAX-11 series, with its processor speed comparable to the IBM PC AT or ARM Evaluation System



Surprisingly there is very little literature available on the VAX-11 systems in open access, as if there is some strange law of oblivion. Several episodes close to politics and correlated with the history of the USSR have been associated with the history of this architecture. It is possible that the actual rejection of the development of the PDP-11 architecture was caused by its low cost and the success of its cloning in the Soviet Union. The cloning of the VAX-11 cost a higher order of magnitude in resources and led to a dead end. Interest in the VAX-11 was created using for example, hoaxes like the famous Kremvax on April 1 1984, in which the then USSR leader Konstantin Chernenko offered to drink vodka on the occasion of connecting to the Usenet network. Another joke was that some VAX-11 chips were impressed with a message in broken Russian about how good the VAX-11 was.

Some models of the VAX-11 were cloned in the USSR by the end of the 80's, but such clones were produced in very little numbers and they almost did not find a use.

Several VAX-11 systems are available for use over the network and this distinguishes them favorably from the IBM/370 systems with which they competed.

Edited by Ralph Kernbach and Richard BN


Emotional stories about processors for first computers: part 7 (NS 32016)
2x2=4, mathematics
litwr

The first 32-bit CPU – National Semiconductor 32016



This is the first true 32-bit processor proposed for use in computers back in 1982. This processor was originally planned as a VAX-11 on a chip, but due to the impossibility to negotiate with DEC, National Semiconductor had to make the processor which was similar to the VAX-11 architecture in only some details.

The use of paged virtual memory begins with this processor – it is the dominant technology today. Though the virtual memory support is not built into the processor, it is available through a separate coprocessor. A separate coprocessor is also required for working with real numbers.

The instruction system of the NS32016 is huge and similar to the VAX-11 instruction system, in particular, with the presence of a separate stack for sub-program frames. The address bus is 24-bit that allows to use up to 16 MB of memory. The distinguishing feature of the 32016 is slightly unusual set of status flags. In addition to the standard flags of carry (which can be used as a flag for a conditional jump), overflow, sign, equality (or zero), there is also the L flag, which means 'less', this is a carry flag for comparisons only. The situation around the carry flag is similar to that of the Motorola 68k processors. The overflow flag is for some reason called F. There are flags of step-by-step mode, privileged mode, and a unique flag of the current stack selection. When executing arithmetic instructions, the flags of the sign, zero, and less are not set, they are set only by comparison commands.

You can use eight 32-bit general purpose registers. In addition there is also a program counter, two stack pointers, a stack pointer of the subroutine frames, a program database pointer (this is something unique), a module base pointer (also something very rare), a pointer to the interrupt vector table, a configuration register, and a processor status register. The performance of the NS32016 was comparable to the 68000, it was maybe only a bit faster.

The 32016 as far as I know was used only with the BBC Micro personal computers as a second processor. It was possible to order a processor with frequencies 6, 8 and 10 MHz. This second processor was a very expensive and prestigious device for 1984. The software for it was very limited in number and was only made by the efforts of Acorn. It includes the Panos operating system which is similar to Unix and the permanent Acorn satellite BASIC. The BBC Micro did not use an MMU chip – there were no programs for its use, although it could be plugged in. An arithmetic coprocessor was not even supposed to be connected.

It is known that this very complex processor had serious hardware errors that were being fixed for years.

Edited by Richard BN


Emotional stories about processors for first computers: part 6 (TI TMS9900)
2x2=4, mathematics
litwr

Texas Instruments TMS9900



I have never written codes for this very special processor, even though this is the first 16-bit processor available for use in personal computers. This processor was produced since 1976. It used a much rarer big-endian order of bytes. This order was used only in processors of the Motorola's 6800 and 68000 series and in the architecture of the gigantic IBM mainframes. All other processors in this review used little-endian byte order.

The TMS9900 has only three 16-bit registers: the program counter, the status register and the base register for pseudo-registers. This processor uses a dedicated 32-byte memory space (a context) as 16 double-byte registers. Such a way of use of memory is somewhat like the zero page memory in the 6502 architecture. Using the base register, the TMS9900 can very quickly change the context. This is similar to the Z80 which has two register contexts. Processor status flags are distinguished in originality, along with typical flags of carry, zero (equality), overflow, parity, there also are two more unique flags of logical and arithmetic less. Working with the stack and subroutines resembled the RISC-processors of the incoming future. There is simply no ready to use stack, but you can create a stack using one of the pseudo-registers. When a subroutine is called, a new value is selected for the counter and base registers, and all three registers are stored in the pseudo-registers of the new context. Thus calling a subroutine is more like calling a software interrupt. The TMS9900 has a built-in interrupt controller designed to work with hardware interrupts of up to 16.



The first 16-bit home computer – it has even color sprites!



The system of instructions looks very impressive, there are even multiplication and division. The unique X instruction allows you to execute one instruction on any memory address and move on to the next one. The execution of instructions is rather slow, the fastest instructions require 8 cycles and arithmetic instructions 14 cycles. However, multiplication (16*16=32) for 52 cycles and division (32/16=16,16) for only 124 cycles were probably the fastest among processors of the 70's.

The TMS9900 requires three supply voltages of -5, 5 and 12 volts and four phases of the clock signal – these are the worse specifications among the processors known to me. In 1979 this processor was demonstrated to IBM specialists, who then were looking for a processor for the IBM PC prototype. The obvious drawbacks of the TMS9900 (addressability of only 64 KB of memory, lack of the necessary controllers, relative slowness) made an appropriate impression and the Intel 8088 was chosen for the future leader among PC's. To deal with the lack of controllers Texas Instruments also produced the TMS9900 variant with an 8-bit bus, the TMS-9980, which worked 33% slower.

The TMS9900 used in the TI-99/4 and TI-99/4A computers which were fairly popular in the USA. They were "crushed" in the price war by the computer Commodore VIC-20 by 1983. Curiously as a result of this war Texas Instruments was forced to cut prices on its computer to the incredible price for 1983 of $49 (in 1979 the price was $1150!) and sold them with a big loss to themselves. As an example we can mention the relatively unpopular Commodore +4 computer, which ceased to be produced in 1986, the price of which fell to $49 in 1989 only. The TI-99/4A was stopped being produced in 1984, when because of the ultra-low prices it began to gain popularity. This computer might only be conditionally called 16-bit: only 256 bytes (very little) of its RAM and all of its ROM are addressable through a 16-bit bus. The rest of the memory and I/O-devices work over a slow 8-bit bus. Therefore it is possible to more correctly consider the BK0010 as the first 16-bit home computer. It is an interesting coincidence that the TI-99/4 and TI-99/4A use a processor at a frequency of 3 MHz – exactly the same as the BK0010 uses.

In the TI-99/4 and TI-99/4A a rather successful TMS9918 chip was used as a video controller, which became the basis for the very popular worldwide MSX standard, as well as for some other computers and game consoles. In the Japanese company Yamaha this video chip was significantly improved and was subsequently used in particular to upgrade the TI-99/4 and TI-99/4A themselves!

The TI-99/4 series is a rare example of computers where a processor and a computer manufacturer was the same.

Edited by Richard BN


Emotional stories about processors for first computers: part 5 (Motorola 6800 family)
2x2=4, mathematics
litwr

Motorola 6800 and close relatives


Motorola's processors have always been distinguished by the presence of several very attractive features, while at the same time there are both the presence of some absurd abstraction and poor practicality of architectural solutions. The main attractive feature of all processors under consideration is the second complete and very fast register-accumulator.

The 6800 was the first microprocessor to require a single 5-volts power supply – it was a very useful feature. However the 6800 because of the oneness of the cumbersome 16-bit index register for an 8-bit architecture turned out to be a product inconvenient for programming and use. It was released in 1974, not much later than the 8080, but it did not become the basis for any known computer system. Interestingly, the 6502 developers, Chuck Peddle and Bill Mensch, called the 6800 not right and too big. However, it and its variants were widely used as microcontrollers. Perhaps here it is worth noting that Intel has been producing processors since 1971, which put Motorola in the position of a catch-up party, for which the 6800 was the very first processor. If you compared the 6800 not with the 8080, but with its predecessor the 8008, then the 6800 would be much preferable. Motorola almost caught up with Intel with 68000/20/30/40. I can also note that in the 70s, Motorola was a significantly larger company than Intel.

Numerous variants of the 6800 were also produced: 6801, 6802, 6803, 6805, ... Most of them are microcontrollers with built-in memory and I/O ports. The 6803 is a simplified 6801, it was used in the Tandy TRS-80 MC-10 computer and in its French clone Matra Alice. These computers were very late (1983) for their class and were comparable to the Commodore VIC-20 (1980) or Sinclair ZX81 (1981). The command system of the 6801/6803 has been significantly improved by 16-bit commands, multiplication, and several others. An unusual unconditional branch instruction (BRN – branch never) has appeared, which actually never takes its branch! Some instructions became a little faster.

The 680x range fully supports signed integers, the z80 and 6502 support it worse, and the 8080 and 8085 have almost no such support at all. However, in 8-bit software such support was needed very rarely.

The 6809 was released in 1978, when the 16-bit era began with the 8086, and has a highly developed command system, including multiplying two accumulators to obtain a 16-bit result in 11 clock cycles (for comparison, the 8086 requires 70 clock cycles for such an operation). Two accumulators can in several cases be grouped into one 16-bit, which gives fast 16-bit instructions. The 6809 has two index registers and a record number of addressing methods among 8-bit processors – 12. Some of the addressing methods are unique for 8-bit chips, such as indexing with auto-increment or decrement, addressing relative to the command counter and indexing with an offset. The 6809 has an interesting opportunity to use two types of interrupts: you can use fast interrupts with automatic partial register saving and interrupts with all registers saving. 6809 has three inputs for interrupt signals FIRQ (fast maskable), IRQ (maskable), NMI (non-maskable). Also, it's sometimes convenient to use fast instructions for reading and setting all flags at once.

However, memory operations require a clock cycle greater than the 6502. Index registers have remained bulky 16-bit "dinosaurs" in the 8-bit world. Some operations simply shock with their slowness, for example, sending one byte from one accumulator to another takes 6 clock cycles, and the exchange of their contents – 8 clock cycles (compare with the 8080, where 16-bit exchange passes for 4 clock cycles)! For some reason, two stack pointers are offered at once, perhaps it was the influence of the dead-end architecture VAX-11 – in an 8-bit architecture with 64 KB of memory looks very awkward. And even the existence of an instruction with an interesting name SEX of all problems the 6809 cannot eliminate. In general, the 6809 is still somewhat faster than the 6502 at the same frequency, but it requires the same memory speed. I managed to make a division procedure for the 6809 with 32-bit divisible and 16-bit divider (32/16 = 32,16) for just over 520 cycles, for the 6502 I could not achieve less than 650 clock cycles. The second accumulator is a big advantage, but other 6502 features, in particular, the inverted carry flag, reduce this advantage only to the aforementioned 25%. But multiplication by a 16-bit constant turned out to be slower than a table multiplication for the 6502 with a table of 768 bytes. The 6809 allows you to write quite compact and fast code using the direct page addressing mode, but this mode makes the code a bit tangled. The essence of this addressing is to set the high byte of the data address in a special register and specify only the low byte of the address in the commands. The same system with only a fixed high byte value is used in the 6502, where it is called zero page addressing. The direct page addressing is an exact analogue of the use of the DS segment register in the x86 only not for 64K segments, but for segmenties sized only of 256 bytes. Another artificiality of the 6800 architecture is the use of the order of bytes from major to minor (Big Endian), which slows down 16-bit addition and subtraction operations. The 6809 is not compatible with the 6800 instruction codes. The 6809 became the last 8-bit processor from Motorola. In further developments, it was decided to use the 68008 instead.

We can assume that Motorola spent a lot of resources to promote the 6809. This has had a lasting effect at mention of this processor. About the 6809 there are many favorable reviews, notable in some fuzziness, generalizations, and inconsistency. The 6809 was positioned as an 8-bit super-processor for micromainframes. Several similar to Unix operating systems were made for it: OS-9 and UniFlex. It was chosen as the main processor for Apple Macintosh and, as follows from the films about Steve Jobs, only his emotional intervention determined the transition to the more promising 68000. Indeed, the 6809 is a good processor, but in general, only slightly better than its competitors that appeared much earlier: the 6502 (three years earlier) and the z80 (two). One can only guess what would have happened if Motorola had spent at least half of its efforts on the development and promotion of the 6809 on the development of the 6502 instead.

The 6809 has been used in several fairly well-known computer systems. The most famous among them is the American computer Tandy Color or Tandy Coco, as well as their British, or more precisely, Welsh clone Dragon-32/64. The computer markets of the 1980's were notable for a significant non-transparency and Tandy Coco was distributed mainly only in the US. Dragons, once only popular in Britain, gained also some popularity in Spain. In France, the 6809 for some reason became the basis for mass computers of the 80s, the Thomson series, which remained virtually unknown anywhere else. The 6809 was also used as a second processor in at least two systems: in the series Commodore SuperPET 9000 and in an extremely rare TUBE-interface device for BBC Micro computers. This processor was used in other systems less well known to me, in particular, Japanese ones. It has also gained some popularity in the world of gaming consoles. It is worth mentioning one of these consoles, Vectrex, which uses a unique technology – a vector display.



Tandy CoCo 3



All the 680x have interesting undocumented instructions with an fascinating name Halt and Catch Fire (HCF), which are used for testing at the electronics level, for example, with an oscilloscope. Its use causes the processor to hang, from which it is possible to exit only by its reset. These processors also have other undocumented instructions. In the 6800 there are, for example, instructions that are opposite to register immediate loading commands, i.e. instructions for storing a register value to the immediate constant!

Like the 8080, 8085 or z80, it is very difficult to call the 6809 a pure 8-bit processor. It is even more difficult to call the 6309 processor 8-bit. The 6309 was produced by the Japanese company Toshiba as a processor fully compatible with the 6809. I was not able to find the exact year when its production began, but there is some evidence pointing to 1982. This processor could be switched to a new mode, which, while maintaining almost full compatibility with the 6809, provided many more capabilities. These capabilities were hidden in the official documentation but were published in 1988 on Usenet. Two additional accumulators were added, but the instructions with them were much slower than with the first two. The execution time of most instructions was greatly shortened. A number of commands were added, among which was a really fantastic division for the processors of this class – it was signed division of a 32-bit dividend and a 16-bit divisor (32/16 = 16,16) for 34 cycles, with the divisor being taken from memory. Furthermore, 16-bit multiplication with a 32-bit result for 28 clocks appeared. Also, very useful instructions were added for quick copying blocks of memory with a runtime of 6 + 3n, where n is the number of bytes to be copied: you could copy both with decreasing and with increasing addresses. The same instructions could also be used to quickly fill the memory with a specified byte. When they were executing, interrupts could occur. New bit operations, a zero-register, etc., appeared too. Interrupts were then invoked when executing an unknown instruction and when dividing by 0. In a sense, the 6309 was the pinnacle of technological achievements among 8-bit processors or more precisely processors with the addressable memory size of 64 KB.

The 6309 is electrically fully compatible with the 6809, making it a popular upgrade for the color Tandy or Dragons. There are also special OS versions that use the new features of the 6309.

Edited by Jim Tickner and Ralph Kernbach.


Emotional stories about processors for first computers: part 4 (Zilog Z80)
2x2=4, mathematics
litwr

Zilog Z80


This processor became along with the 6502 the main processor of the first personal computers. There are no dramatic events in the history of its appearance and use. There is only some intrigue in the failure of Zilog to make the next generation of processors. The Z80 was first produced in 1976 and its variants are still in production. Once even Bill Gates himself announced support for systems based on the z80.

A number of coincidences are interesting. As in the case of the 6502 the main developer of the Z80 Federico Faggin left a large company, Intel. After working on the z80 Federico almost did not work with the next generation the Z8000 processor. He left Zilog (founded by him) in the early 80's and never dealt with processors in the future. He then created several relatively successful startups, which were communication systems, touchpads and digital cameras. It can be mentioned that in addition to the z80 being with Zilog he had also developed a successful and still-produced Z8 microcontroller.

The Z80 is more convenient for inclusion in computer systems than the 8080. It requires only one power supply voltage and has built-in support for the regeneration of dynamic memory. In addition though it is fully compatible with the 8080 it has a lot of new commands, a second set of basic registers and several completely new registers. It is interesting that Zilog refused to use the 8080 assembler mnemonics, and began to use their own mnemonics more suitable for the extended command system of the z80. A similar story happened to the Intel x86 assembler in the GNU software world, for some reason they also use their own conventions for writing programs in assembler by default. The Z80 added support for the overflow flag, Intel officially added support for this flag only in the 8086. However this flag in the z80 was combined with the parity flag, so you cannot use both flags at the same time as in the 8086. In the z80 as in the 6502 there are only basic checks of the value of the flag, i.e. there are no checks of two or three flags at once, which is necessary for comparisons "greater" or "less or equal", as well as for all signed comparisons. In such cases it is necessary to do several checks, while with the 8086, 6800 or PDP-11 one is enough.

Among the new z80's commands, block memory copy commands for 21 cycles per byte are especially impressive, as well as an interesting search for a byte in memory instruction. However the EXX instruction is the most interesting it swaps the contents of 48 bytes of register memory, registers BC, DE, HL, with their counterparts in just 4 cycles! Even the 32-bit ARM will need at least 6 cycles for the same operation. The remaining additional instructions are not so impressive, although they can sometimes be useful. Additionally added were the following commands:


  • 16-bit subtraction with borrow and 16-bit addition with carry for 15 clocks;
  • unary minus for the accumulator for 8 clocks;
  • possibility to read from memory and write to it, using registers BC, DE, SP, IX, IY – not just HL;
  • shifts, rotates and input-output for all 8-bit registers;
  • instructions to check, set and reset a bit by its number;
  • jumps with offsets (JR);
  • a loop instruction.

Most new commands are rather slow, but using them right can still make the code somewhat faster and significantly more compact. This particularly applies to the use of new 16-bit registers IX and IY, which can be used for new addressing modes. Interestingly, the index registers IX and IY appeared in the Z80 in order to attract the 6800 users to use the Z80 instead! But I dare to express my opinion, operations with the Z80's index registers are made rather ineffective, due to the presence of an almost useless byte offset in commands using these registers.

Many of the 8080's commands in the z80 became faster by one clock and this is a very noticeable acceleration. But the basic command for 16-bit arithmetic, the ADD instruction became slower by one clock which makes all arithmetic, if faster, only slightly.

The system of working with interrupts became much more interesting than that available at the 8080. With the z80 you can use both non-maskable interrupts and three methods (one of them is compatible with the 8080) to work with masked ones. The masked interrupts mode 2 is the most interesting, as it allows you the flexibility to change the address of the code to handle the interrupt.

The Z80 has quite a few undocumented instructions, many of these instructions disappeared during the transition to CMOS technology, but those that have survived have become virtually standard and have been documented by some firms. Especially useful are instructions that allow you to work with individual bytes of the clumsy 16-bit registers IX and IY. In addition to undocumented instructions, the Z80 also has other undocumented properties, for example, two extra flags in the status register.

Of course the z80 even more so than the 8080 has the right to be called slightly 16-bit. The hypothetical bit index of the z80 is clearly slightly higher than for the 8080, but it is paradoxical that the ALU of the z80 is actually 4-bit! At the electronic level the z80 and 8080 are completely different chips.

Much has been written about the comparison of the performance of the z80 and 6502, as these processors were very widely used in the first mass computers. In this topic there are several difficult points and without understanding them it is very problematic to maintain objectivity. Due to the presence of a rather large number of registers, the z80 is naturally used at a frequency higher than memory. Therefore the z80 at 4 MHz can use the same memory as the 6502 or 6809 at 1.3 MHz. According to many experienced programmers who wrote code for both processors, at the same frequency the 6502 is on average about 2.4 to 2.6 times faster than the z80. The author of this material agrees with this. I just need to add that writing fast code for the z80 is very difficult, you need to repeatedly optimize the use of the registers, and to work with memory as much as possible using the stack. If you really try then in my opinion, you can reduce the difference between the z80 and 6502 to about 2.2. If you do not try and ignore timings, then you can even easily get the difference up to 4. In some individual cases the z80 can show very good timings. On the task of filling memory using the PUSH instruction the Z80 can be even slightly faster than the 6502, but this is at the cost of disabling interrupts. On copying memory blocks the z80 is only 1.5 times slower. It is especially impressive that in the division of the 32-bit divisible by the 16-bit divider the z80 is slower by only 1.7 times. By the way such a notable division was implemented by a programmer from Russia. Thus, we get that the ZX Spectrum with the z80 at 3.5 MHz is about 1.5 times faster than the C64 with the 6502 at 1 MHz. It should also be noted that some ticks in most systems with the z80 or 6502 are taken from the processor to circuits for generating the video signal. For example because of this the popular Amstrad CPC/PCW computers have the effective processor frequency of 3.2 MHz, not the full 4. On the 6502 systems you can usually turn off the screen for maximum processor performance. If we take as a basis the frequency of memory not the processor, it turns out that the z80 25-40% is faster than the 6502. The last result can be illustrated by the fact that with memory with a frequency of 2 MHz the z80 can operate at a frequency of up to 6 MHz, and the 6502 only up to 2 MHz.

The Z80 was used in a very large number of computer systems. In the USA the Tandy TRS-80 was very popular, in Europe it was the ZX Spectrum, and later the Amstrad CPC and PCW. Interestingly the Amstrad PCW computers maintained their importance until the mid-90's and massively and actively were used for their intended purpose until the late 90's. Japan and other countries produced quite successful around the world the MSX computers. The rather popular C128 could also use the z80, but in this case the users were left in a rather embarrassing situation. This late 1985 release, the 8-bit computer with the z80, which officially clocked at 2 MHz, really only worked at 1.6 MHz. It was slower even than the first systems of the mid-70's based on the 8080. The range of computers for using the operating system CP/M has at least three dozen fairly well-known systems.



Such a PC looked decent even in the mid-90's, but its z80 was slower than that in the ZX Spectrum



The fastest computer system known to me based on the Z80 is the BBC Micro which has 6 MHz second processor the TUBE's Z80B, which was produced from 1984. The processor in this system runs at full speed, as it is possible to say, "without brakes". Similar devices were produced for Apple ][ since 1979. Some such Z80-cards later used the Z80H at 8 MHz and even higher. Interestingly Microsoft in 1980 received the greatest revenue from the sale of such devices. We can also mention the Amstrad PcW16 produced since 1994, which uses the CMOS Z80 at a frequency of 16 MHz.

In Japan for the MSX TurboR systems (1990), the R800 processor was made compatible with the Z80. In the R800 they added hardware 16-bit multiplication with a 32-bit result. Although when multiplying a 16-bit constant, table multiplication with the table of 768 bytes is one clock faster. There are opinions that the R800 is just a simplified Z800, running at four times the frequency of the bus which is about 7.16 MHz. So the R800 internal clock is about 28.64 MHz!

Zilog did work on improving the Z80 very inconsistently and extremely slowly. The first Z80 worked at frequencies up to 2.5 MHz, the Z80A which soon appeared had limiting frequency of 4 MHz. The latter processors became the basis for most popular computers using the Z80. The Z80B appeared by 1980 but was used relatively rarely, for example, in the mentioned second processor card for the BBC Micro or in the late (1989) computer Sam Coupé. The Z80H appeared by the mid-80's and could operate at frequencies up to 8 MHz, it was not used in well-known computers. Interestingly Zilog products had special traps on all chips for those who tried to make copies of them. For example the base Z80 had 9 traps and they, according to reviews of those who tried copying, slowed down the process for almost a year.

A deeper upgrade of the Z80 was hampered by the desire of Zilog to create processors that were competitive with 16-bit Intel processors. In 1978 a little later than the 8086 the Z8000 was released, it was not compatible with the Z80. This processor was unable to withstand competition from Intel and especially Motorola, the 68000 surpassed the Z8000 in almost all parameters, although the Z8000 was used in about a dozen different low-cost systems, usually for working with Unix variants. Interestingly IBM did not even consider the Z8000 as a possible processor for the IBM PC, since Zilog was funded by Exxon which was going to compete with IBM. Perhaps due to the lack of success of the Z8000 Zilog became an Exxon subsidiary by 1980. There was also an attempt to create a competitive 32-bit processor. In 1986 the Z80000 appeared compatible with the Z8000, which has never been used anywhere.

One can only wonder why Zilog abandoned its approach (which showed super-successful results with the Z80) namely, to make processors software-compatible with Intel processors, but better than them and at the same time completely different on the hardware level. Subsequently, this approach was successfully used by many firms, in particular, AMD, Cyrix, VIA.

Creating a new processor based on the Z80 was postponed until 1985 when the Z800 was produced. However then the main efforts of Zilog were directed at the Z80000 and the Z800 was released in very few numbers. In 1986 after the failure of the Z80000 the Z280 was released, an insignificantly improved version of the Z800 (maybe it was just a rebranding). The Z800/Z280 in particular could work on the internal frequency several times higher than the bus frequency. This new idea brought a big success to the Intel 486DX2 and 486DX4 processors later. But perhaps because of poor performance the Z280 despite many technological innovations could use only relatively low clock frequencies, this processor also has not been used anywhere. It is considered that the Z280 roughly matched the capabilities of the Intel 80286, but was significantly at least 50% slower when using the same clock speed as used with the 80286. Perhaps if the Z280 had appeared 5 years earlier it could have been very successful.

The greatest success was achieved thanks to cooperation with the Japanese company Hitachi, which in 1985 released its super-Z80 (the HD64180) similar in capabilities with the Intel 80186. The HD64180 allowed the use of 512 KB of memory, added a dozen new instructions, but at the same time some almost standard undocumented Z80 instructions were not supported. This processor was used in some computer systems. Zilog received a license for the HD64180 and began to produce them with the marking Z64180. Zilog managed to slightly improve this processor in particular to add support for working with 1 MB of memory and released it by the end of 1986. This new processor was called the Z180 and became the basis for a family of processors and controllers with clock frequencies up to 33 MHz. It was used in some rare models of the MSX2 computers, but more as a controller. It is a curious coincidence that the Z280 and Z180 appeared in one year as was the case of their approximate counterparts the 80286 and 80186 four years before. In 1994 the 32-bit Z380 was made on the basis of the Z180, which retained compatibility with the Z80 and roughly corresponds to the capabilities of the Intel 80386 or Motorola 68020. In fact Zilog showed a lag behind competitors by almost 10 years. For the 21st century again on the basis of the Z180 the successful eZ80 controller-processors have been manufactured with timings almost like the 6502. They are used in various equipment in particular in network cards, DVD-drives, calculators, etc.

Edited by Richard BN


Emotional stories about first processors for computers: part 3 (Motorola 68k)
2x2=4, mathematics
litwr

Motorola: from 68000 to 68040


Motorola was the only company that could successfully compete with Intel in the field of production of processors for personal computers for some time.

The 68000 was released in 1979 and at first glance looked much more impressive than the 8086. It had 16 32-bit registers (more accurately, even 17), a separate command counter and a state register. It could address 16 MB of memory directly which did not create any restrictions for example for large arrays. However careful analysis of features of the 68000 shows that not everything was as good as it seemed. In those years to have a memory of more than 1 MB was an unattainable luxury even for medium-sized organizations. The 68000 code density was worse than for 8086, which means that 68000 code with the same functionality occupied more space. The latter is also due to the fact that any instruction for the 68k processors should be multiples of 2 bytes in length, and for the x86 of 1 byte. But the information about the code density is controversial as there is evidence showing that in some cases the 68000 could have the better code density. Out of 16 registers of the 68k there are 8 address registers, which in some respect are slightly more advanced analogues of the x86 segment registers. The ALU and data bus are 16-bit, so operations with 32-bit data are slower than someone could expect. The execution time of register-register operations for the 68000 is 4 cycles, and for the 8086 it is only 2.

As always with products from Motorola the architecture of the 68000 shows some clumsiness and contrived oddities. For example there are two stacks and two carry flags (one for condition checks and another for operations). The oddities with the flags do not end with that. For some reason many instructions including even MOVE zero the carry and overflow flags. Another oddity is that the command to save the state of arithmetic flags which worked normally with the 68000, was made privileged in all processors starting with the 68010. Some operations irritate by their non-optimization, for example, the CLR instruction of writing zero to memory is slower than writing a constant 0 to memory with the MOVE instruction or shift to the left is slower than adding an operand to itself. There are some almost unnecessary commands, for example there are both arithmetic and logical shifts to the left. Even the address registers while seemingly superior to the 8086 segment registers have a number of annoying disadvantages. For example they need to load as much as 4 bytes instead of 2 for the 8086 and of these four, one was extra. The 68000 command system reveals many similarities with the PDP-11 command system developed back in the 60's.

The codes for Motorola look somewhat more cumbersome and clumsy compared to the x86 or ARM. On the other hand the 68000 is faster than the 8086, according to my estimates by about 20-30%. The 680x0's code however has its inherent special beauty, elegance and less mechanicality than the x86's. Additionally as shown by eab.abime.net experts, the code density of the 68k is often better than that of the x86.

Overall the 68000 is a good processor with a large instruction set. It was used in many of the now legendary personal computers: the first Apple Macintosh computers that were produced before the early 90's, the first Commodore Amiga multimedia computers, and in relatively inexpensive and high-quality Atari ST computers. The 68000 was also used in relatively inexpensive computers working with Unix variants, in particular in the rather popular Tandy 16B. Interestingly IBM simultaneously led the development of the PC and the System 9000 computer based on the 68000, which was released less than a year after the PC.

The 68010 appeared clearly belatedly only in 1982 at the same time when Intel released the 80286, which put personal computers on the same level as a mini-computer. The 68010 is pin-compatible with the 68000 but the system of its instructions is slightly different, so the replacement of the 68000 by 68010 has not become popular. This incompatibility was caused by a contrived reason to bring the 68000 into more correspondence with the ideal theory of virtualization. The 68010 is only slightly no more than 10% faster than the 68000. Obviously the the 68010 was badly losing to the 80286 and was even weaker than the 80186 that appeared in the same year. Like the 80186 the 68010 almost never found a use in personal computers.

The 68008 was also released in 1982 probably with a hope of repeating the success of the 8088. It's the 68000 but with an 8-bit data bus which allowed it to be used in cheaper systems. But the 68008 like the 68000 does not have an instruction queue which makes it about 50% slower than the 68000. Thus the 68008 may even be a little slower than the 8088, which is only about 20% slower than the 8086 due to the presence of the instruction queue.

Based on it Sir Clive Sinclair made the Spectrum QL, a very interesting computer that because of the lower price could compete with the Atari ST and similar computers. But Clive in parallel and clearly prematurely began to invest a lot in the development of electric vehicles leaving the QL (Quantum Leap) rather as a secondary task, that in the presence of some unsuccessful constructive decisions led the computer and the whole company to premature closure. The company became part of Amstrad, which refused to produce QL.

It would be interesting to calculate the bit index for the 68000, which seems to me clearly higher than 16 although maybe it is not higher than 24.

Appearing in 1984 the 68020 again returned Motorola to the first position. In this processor many very interesting and promising innovations were realized. The strongest effect is certainly the instruction pipeline, which sometimes allows you to execute up to three instructions at once! The 32-bit address bus looked a little premature in those years, and therefore a cheaper version of the processor (the 68020EC) with a 24-bit bus was available, but the 32-bit data bus looked quite appropriate and allowed to significantly speed up the processor. The built-in cache appeared to be an innovation even though it was a small 256 bytes of capacity, which allowed to significantly improve the performance because the main dynamic memory could not keep up with the processor. Quick enough operations for division (64/32 = 32,32) and multiplication (32*32 = 64) for approximately 80 and up to 45 cycles respectively were added. The timings of the instructions were generally improved for example the division (32/16 = 16,16) began to be performed for approximately 45 cycles (more than 140 cycles in the 68000). Some instructions in the most favorable cases can be performed without occupying clocks at all! New address modes were added in particular with scaling, in the x86 this mode appeared only in the next year with the 80386. Other new address modes allow the use of double indirect addressing using several offsets, the PDP-11 has been remarkably outdone here.

Some new instructions for example bulky operations with bit fields or new operations with decimal numbers that have become little needed in the presence of rapid division and multiplication looked more like a fifth wheel of a bus than something essentially useful. Address modes with double indirect addressing theoretically look interesting but practically are needed quite rarely and are executed very slowly. Unlike the 80286 the 68020 takes time to compute the address of the operand, the so-called effective address. The division at the 68020 is still almost twice as slow as the fantastic division of the 80286. Multiplication and some other operations are also slower. The 68020 doesn't have a built-in memory management unit and the rather exotic ability to connect up to eight coprocessors couldn't fix this.

The 68020 was widely used in mass computers the Apple Macintosh II, Macintosh LC and Commodore Amiga 1200, it was also used in several Unix systems.

The appearance of the 80386 with a built-in and very well-made MMU and 32-bit buses and registers again put Motorola in position number 2. The 68030 appearing in 1987 for the last time briefly returned the leadership to Motorola. The 68030 has a built-in memory management unit and a doubled cache, divided into a cache for instructions and data, it was a very prospective novelty. In addition the 68030 could use a faster memory access interface which can speed up memory operations by almost a third. Despite all the innovations the 68030 turned out to be somewhat slower than the 80386 at the same frequency. However the 68030 was available at frequencies up to 50 MHz, and the 80386 only up to 40 MHz, which made top systems based on the 68030 slightly faster.

The 68030 was used in computers of the Apple Macintosh II series, Commodore Amiga 3000, Atari TT, Atari Falcon and some others.

With the 68040 Motorola once again tried to outperform Intel, this processor appeared a year later after the 80486. However the 68040's set of useful qualities was never able to surpass the 80486's. In fact Motorola having a more overloaded system of instructions was not able to support it, and in a sense has disappeared from the race. In the 68040 only a very truncated coprocessor could be placed to work with real numbers, and the chip itself was heated significantly more than the 80486. According to the results on lowendmac.com/benchmarks, the 68040 only about 2.1 times faster than the 68030 which means that the 68040 is slightly slower than the 80486 at the same frequency. The 68040 almost did not find applications in popular computers. Some noticeable use was found only by its cheaper version the 68LC040 which does not have a built-in coprocessor. However the first versions of this chip had a serious hardware defect which did not allow using even the software emulation of the coprocessor!

Motorola always had problems with mathematical coprocessors. As was mentioned above Motorola never released such a coprocessor for the 68000/68010, while Intel had released its very successful 8087 since 1980. To get a significant performance boost the code for the 68882 needs to be compiled differently than for the 68881.

It is appropriate to say that the Intel x86 still has problems with the mathematical coprocessor. The accuracy of calculations of some functions, for example the sine of some arguments, is very small, sometimes no more than 4 digits. Therefore modern compilers often calculate such functions without using the services of the coprocessor.

Edited by Richard BN


Emotional stories about first processors for computers: part 2 (DEC PDP-11)
2x2=4, mathematics
litwr

Processors of DEC PDP-11


Since the early 70's in the world began a 10-year era of domination of the company DEC. DEC computers were significantly cheaper than those produced by IBM and therefore attracted attention from small organizations for which IBM systems were unaffordable. With these computers also begins the era of mass professional programming. The PDP-11 computer series was very successful. Various PDP-11 models were produced from the early 70's to the early 90's. They were successfully cloned in the SU and became the first mass popular computer systems there. Some of the SU made PDP-11 compatible computers have several unique traits. For example, several models like DVK are rather personal computers than minicomputers and several models like UKNC and BK are pure personal computers. BTW the mentioned BK became the first PC available for SU ordinary people to buy since 1985.

DEC also promoted the more expensive and complex computers of the VAX-11 family, the situation around which was somewhat politicized. From the second half of the 70's, DEC practically stopped development in the PDP-11 series. In particular the support of hexadecimal numbers for the assembler was not introduced. The performance of PDP-11 systems has also remained virtually unchanged since the mid-70's.

The PDP-11 used various processors compatible with the main command system for example, the LSI-11, F-11, J-11. In the late 70's DEC made a cheap processor T-11 for microcomputers. However for unclear reasons despite the seemingly large and high-quality software that could eventually be transferred to the system using it, it was not noted by the manufacturers of any computer systems. The only exception was one model of the Atari gaming console. The T-11 found itself a mass application only in the world of embedded equipment, although in terms of capabilities it was slightly higher than the z80. The SU produced processors K1801VM1, K1801VM2, K1801VM3, etc. similar to DEC processors and also exact copies of DEC processors. The latter were much more expensive and were produced in small quantities.

The PDP-11 processor command system is almost completely orthogonal, a pleasant quality, but when it is brought to the extreme it can create ridiculous commands. The command system of the PDP-11 processors has had an impact on many architectures and in particular on the Motorola 68000.

The PDP-11 system of commands is strictly 16 bit. All 8 general purpose registers (and the program counter in this architecture is the usual R7 register) are 16 bit, the processor status word (it contains typical flags) is 16 bit too, the size of instructions is from 1 to 3 16-bit words. Any operand of an instruction can be (although there are exceptions, for example, the XOR instruction) any type – this is orthogonality. Among the types of operands are registers and memory locations. The SU's programmers in the 80s sometimes didn't understand why the Intel's x86 instruction system misses memory to memory types of instructions. This was the influence of the PDP-11 school, where you can easily write the full addresses of each operand. This indeed is slow and especially slow for systems with typical slow RAM which was used since the early 90's. It is possible to form a memory address using a register, a register with an offset, a register with autoincrement or autodecrement. A particularity of the PDP-11 instruction system is a possibility to use double indirect access to memory through a register, for example, MOV @(R0)+,@-(R1) means the same as the operator **–r1 = **r1++; in the C/C++ programming languages, where r0 and r1 are declared as signed short **r0, **r1;.

Another example, the instruction MOVB @11(R2),@-20(R3) corresponds to **(r3-20) = **(r2+11);, where r2 and r3 are declared as char **r2, **r3;.

In the modern popular architectures, one instruction for such cases can be insufficient, it may require at least 10 instructions. It is also possible to get an address relative to the current value of the program counter. I will give another example with more simple addressing. The x86 instruction ADD [BX+11],16 corresponds to ADD #16,11(R4). In DEC assemblers it is common to write operands from left to right, unlike Intel where they write from the right-left. There is a reason to believe that the GNU assembler for the x86 was made under the influence of the PDP-11 assembler.

Division and multiplication instructions are only signed and not available on all processors. The arithmetic of the decimal numbers is optional too – it is so-called commercial arithmetic in DEC terminology. As an oddity of full orthogonality I will give an example of the command MOV #11,#22, which after execution turns into MOV #11,#11 – it is an example of using a direct constant as an operand. Another curious instruction is a unique instruction MARK which code needs to be placed on the stack and which may never be used explicitly. Calling subroutines in the architecture of the PDP-11 is also somewhat peculiar. The corresponding instruction first saves the allocated register (can be any) on the stack, then saves the program counter in this register and only then writes a new value to the program counter. The return from the subroutine instruction must do the reverse and know which register was used when calling the subroutine. Strange effects can be sometimes obtained using the program counter as a normal register.

It is interesting that among the programmers for the PDP-11 there is a culture of working directly with machine codes. Programmers could for example work without a disassembler when debugging. Or even write small programs directly into memory, without assembling!

Indeed instruction timings were not too fast. It was surprising to find out that on a BK home computer the instruction to send from a register to a register takes as much as 12 clocks (10 clocks when using the code in ROM), and the instructions with two operands with double indirect addressing are executed for more than 100 clocks. The Z80 does 16-bit register transfer for 8 clocks. However the slowness of the BK is caused not so much by the processor, but by the poor quality of the SU made RAM, under the features of which the BK had to be adapted. If fast enough memory was used the BK would send 16 register bits for 8 clock cycles too. Once there was a lot of controversy, which is faster than the BK or Sinclair ZX Spectrum? I must say that the Spectrum is one of the fastest mass 8-bit personal computers when using the top 32 KB of memory. Therefore it is not surprising than the Spectrum is faster than the BK but not much. And if BK worked with fast enough memory it could even be a bit faster.

The code density is also rather a weak point in the PDP-11 architecture. Instruction codes must be multiples of the machine word length – 2 bytes, which is especially frustrating when working with byte arguments or simple commands like setting or resetting a flag.

There were interesting attempts to make a personal computer on the basis of PDP-11 architecture. One of the first PC's in the world that appeared only a bit later that the Apple ][ and Commodore PET and rather a bit earlier than the Tandy TRS-80, was the Terak 8510/a, which had black and white graphics and an ability to load an incomplete variant of Unix. This computer was quite expensive and as far as I know was only used in the system of higher education in the USA. The Heathkit H11 was produced since 1978, it was a kit-format computer. DEC itself also tried to make its own PC, but very inconsistently. DEC for example produced PC's based on the z80 and 8088 explicitly playing against its own main developments. The PDP-11's architecture based PC's DEC PRO-325/350/380 have some rather contrived incompatibilities with the underlying architecture that impeded the use of some software. Best of all personalization of technologies of mini-computers turned out in the USSR, where produced the BK, DVK, UKNC, ... By the way the Electronica-85 was a quite accurate clone of the DEC PRO-350. In addition, the CP1600 processor, akin to the PDP-11 architecture, was used in the Intellivision game consoles which were popular in the early 80's.



Made in USSR 16-bit home computer (model of 1987) – it is almost PDP-11 compatible



The K1801VM2 processor which was used in the DVK is about two times faster than the K1801VM1. The K1801VM3 is even faster and has a performance close to the Intel 8086.

Processors of the top PDP-11 computers can address up to 4 MB of memory but for one program can be allocated no more than 64 KB. The performance of these processors in the number of operations per megahertz is also close to the 8086, although still slower.

Edited by Richard BN


Emotional stories about first processors for computers: part 1 (Intel x86)
2x2=4, mathematics
litwr


Intel: from 8086 to 80486


One of the best processors made in the 70's is definitely the 8086, and also the cheaper, almost analogue, 8088. The architecture of these processors is pleasantly distinguished by the absence of any notable copy relating to other processors developed and in use at the time. It was also distinguished by adherence to abstract theories, the thoughtfulness and balance of architecture, steadiness and focus on further development. Of the drawbacks of the architecture of the x86, you can call it a bit cumbersome and prone to an extensive increase in the number of instructions.

One of the brilliant constructive solutions of the 8086 was the invention of segment registers. This, as it were, simultaneously achieved two goals – the "free" ability to relocate codes of programs, up to 64 KB in size (this was even a decent amount for computer memory for one program up to the mid-80's), and accessibility up to 1 MB of address space. You can also see that the 8086, like the 8080 or z80, also has a special address space for 64 KB I/O ports (this is 256 bytes for the 8080 and 8085). There are only four segment registers: one for code, one for stack, and two for data. Thus, 64*4 = 256 KB of memory is available for quick use and it was a lot even in the mid-80's. In fact, there is no problem with the size of code, since it is possible to use long subroutine calls while loading and storing a full address from two registers. There is only a limit of 64 KB for the size of one subroutine – this is enough even for many modern applications. Some problem is created by the impossibility of fast addressing of data arrays larger than 64 KB – when using such arrays, it is necessary to load a segment register and an address itself on each access, which reduces the speed of work with such large arrays several times.

The segment registers are implemented in such a way that their presence is almost invisible in the machine code, so, when time had come, it was easy to abandon them.

The architecture of the 8086 retained its proximity to the architecture of the 8080, which allowed relatively small amount of effort to transfer programs from the 8080 (or even from the z80) to the 8086, and especially if the source code was available.

The 8086's instructions are not very fast, but they are comparable to competitors, for example, the Motorola's 68000, which appeared a year later. One of the innovations, some accelerating of the rather slow 8086, became an instructions queue.

The 8086 uses eight 16-bit general purpose registers, some of which can be used as two one-byte registers, and some as index registers. Thus, the 8086 registers characterize some heterogeneity, but it is well balanced and the registers are very convenient to use. This heterogeneity, by the way, allows having more dense codes. The 8086 uses the same flags as the 8080, plus a few new ones. For example, a flag appeared typical for the architecture of PDP-11 – step-by-step execution.

The 8086 allows you to use very interesting addressing modes, for example, the address can be made up of a sum of two registers and a constant 16-bit offset, on which the value of one of the segment registers is superimposed. From the amount that makes up the address, you can keep only two or even one out of three. Such is the case on the PDP-11 where the use of one command will not work. Most commands in the 8086 do not allow both operands of memory type, one of the operands must be a register. But there are string commands that just know how to work with memory using addresses. String commands allow you to do quick block copying (17 cycles per byte or word), search, fill, load and compare. In addition, string commands can be used when working with I/O ports. The idea of ​​using the 8086 instruction prefixes is very interesting allowing it to use often very useful additional functionality without significantly complicating the encoding schemes of CPU instructions.

The 8086 has one of the best designs to work with the stack among all computer systems. Using only two registers (BP and SP), the 8086 allows the solving of all problems when organizing subroutine calls with parameters.

Among the commands there are signed and unsigned multiplication and division. There are even unique commands for decimal corrections for multiplication and division instructions. It's hard to say that in the 8086 command system, something is clearly missing. Quite the contrary. The division of a 32-bit dividend by a 16-bit divisor to obtain a 32-bit quotient and 16-bit remainder may require up to 300 clock cycles – not particularly fast, but several times faster than such a division on any 8-bit processors (except the 6309) and is comparable in speed with the 68000. The division in the x86 has one unexpected feature – it corrupts all arithmetic flags.

It's worth adding that in the x86 architecture, the XCHG command inherited from the 8080 has been improved. In addition the later processors began to use instructions XADD, CMPXCHG and CMPXCHG8B, which can also perform atomic exchange of arguments. Such instructions are one of the features of the x86, they are difficult to find on the processors of other architectures.

It can be summarized that the 8086 is a very good processor, which combines the ease of programming and attachment to the limitations on the amount of memory of that time. The 8086 was used comparatively rarely, giving way to the cheaper 8088 becoming the first processor for the mainstream personal computer architecture of the IBM PC compatible computers. The 8088 used an 8-bit data bus that did its performance somewhat slower, but allowed to build systems on its base more accessible to the customers.

Interestingly, Intel fundamentally refused to make improvements to its processors, preferring instead to develop their next generations. One of Intel's largest second source, the Japanese corporation NEC, which was much larger than Intel in the early 80s, decided to upgrade the 8088 and 8086, launching the V20 and V30 processors which were pin-compatible with them and about 30% faster. NEC even offered Intel to become its second source! Intel instead launched a lawsuit against NEC, which however it could not win. For some reason this big clash between Intel and NEC is still completely ignored by Wikipedia.

The 80186 and 80286 appeared in 1982. Thus, it can be assumed that Intel had two almost independent development teams. The 80186 was the 8086 improved by several commands and shortened timings plus several chips were integrated together into the chip typical of the x86 architecture: a clock generator, timers, DMA, interrupt controller, delay generator, etc. Such a processor, it would seem, could greatly simplify the production of computers based on it, but due to the fact that the embedded interrupt controller was for some reason not compatible with the IBM PC, it was almost never used on any PC. The author knows only the BBC Master 512 based on the BBC Micro computer, which did not use built-in circuits or even a timer, but there were several other systems using the 80186. Addressed memory with the 80186 remained as with the 8086 sizes at 1 МБ. The Japanese corporation NEC produced analogues of the 80186 which were compatible with the IBM PC.

The 80286 had even better timings than the 80186, among which stands out just a fantastic division (32/16=16,16) for 22 clock cycles – since then they have not learned how to do the division any faster! The 80286 supports working with all new instructions of the 80186 plus many instructions for working in a new, protected mode. The 80286 became the first processor with built-in support for protected mode, which allowed it to organize memory protection, proper use of privileged instructions and access to virtual memory. Although the new mode created many problems (protected mode was rather unsuccessful) and was relatively rarely used, it was a big breakthrough. In this new mode, segment registers have acquired a new quality, allowing up to 16 MB of addressable memory and up to 1 GB of virtual memory per task. The big problem with the 80286 was the inability to switch from protected mode to real mode, in which most programs worked. Using the "secret" undocumented instruction LOADALL, it was possible to use 16 MB of memory being in the real mode.

In the 80286, the calculation of an address in an instruction operand became a separate scheme and stopped slowing down the execution of commands. This added interesting features, for example, with a command LEA AX,[BX+SI+4000] in just 3 cycles it became possible to perform two additions and transfer the result to the AX register!

The segment registers in protected mode became part of a complete memory management unit. In real mode these registers only partially provided the functionality of the MMU.

The number of manufacturers and specific systems using the 80286 is huge, but indeed the first computers were IBM PC AT's with almost fantastic personal computer performance indicators for speed. With these computers memory began to lag behind the speed of the processor, wait states appeared, but it still seemed something temporary.

In the early versions of the 80286 as in the 8086/8088 using interrupts was not implemented 100% correctly, that in very rare cases could lead to very unpleasant consequences. For example the POPF command in the first 80286 always allowed interrupts during its execution, and when executing a command with two prefixes (as an example; you can take REP ES:MOVSB) on the 8086/8088 after the interrupt call, one of the prefixes was lost.

Protected mode of the 80286 (segmented) was rather inconvenient, it divided all memory into segments of no more than 64 KB and required complicated software support for working with virtual memory. The 80386 which appeared in 1985, made the work in protected mode quite comfortable, it allowed the use of up to 4 GB of addressable memory and easy switching between modes. In addition to support multitasking for programs for the 8086, the virtual 8086 mode was made. For virtual memory it became possible to use a relatively easy-to-manage page mode. The 80386 for all its innovations has remained fully compatible with programs written for the 80286. Among the innovations of the 80386, you can also find the extension of registers to 32-bits and the addition of two new segment registers. The timings have changed, but ambiguously. A barrel shifter was added, which allowed multiple shifts with timings equal to one shift. However, this innovation for some reason considerably slowed down the execution of commands of the cyclic rotates. The multiplication became slightly slower than that of the 80286. Working with memory became on the contrary, a little faster, but this does not apply to string commands that stayed faster for the 80286. The author of this material has often had to come across the view that in the real mode with 16-bit code the 80286 in the end is still a little bit faster than the 80386 at the same frequency.

Several new instructions were added to the 80386, most of which just gave new ways for work with data, actually duplicating with optimization some already present instructions. For example, the following commands were added:


  • to check, set and reset a bit by number, similar to those that were made for the z80;
  • bit-scan BSF and BSR;
  • copy a value with a signed or zero bit extension, MOVSX and MOVZX;
  • setting a value depending on the values of operation flags by SETxx;
  • shifts of double values by SHLD, SHRD.

The x86 processors before the appearance of the 80386 could use only short, with an offset of one-byte, conditional jumps – this was often not enough. With the 80386 it became possible to use offset of two or four bytes, and despite the fact that the code of new jumps became two or three times longer, the time of its execution remained the same as in previous, short jumps.

The support for debugging was radically improved by the introduction of 4 hardware breakpoints, using them it became possible to stop programs even on memory addresses that may not be changed.

Protected mode became much easier to manage than in the 80286, which made a number of inherited commands unnecessary rudiments. In the main protected so-called flat-mode, segments up to 4 GB in size are used, which turns all segment registers into an unobtrusive formality. The semi-documented unreal mode even allowed the use of all the memory as in flat-mode, using real mode which is easy to setup and control.

Since the 80386, Intel has refused to share its technology, becoming in fact the monopoly processor manufacturer for the IBM PC architecture, and with the weakening of Motorola's position, and for other personal computer architectures. Systems based on the 80386 were very expensive until the early 90's, when they became finally available to mass consumers at frequencies from 25 to 40 MHz. Since the 80386 IBM began to lose its position as a leading manufacturer of IBM PC compatible computers. This was manifested, in particular, in that the first PC based on the 80386 was a computer made by Compaq in 1986.

It's hard not to hold back admiration for the volume of work that was done by the creators of the 80386 and its results. I dare even suggest that the 80386 contains more achievements than all the technological achievements of mankind before 1970, and maybe even until 1980.

Quite interesting is the topic of errors in the 80386. I will write about two. The first chips had some instructions which then disappeared from the manuals for this processor and stopped executing on later chips. It's about the instructions of IBTS and XBTS. All 80386DX/SX's produced by both AMD and Intel (which reveals their curious internal identity) have a very strange and unpleasant bug that manifested itself in destroying the value of the EAX register after writing to the stack or unloading from there all registers with POPAD or PUSHAD after which a command that used an address with the BX register was used. In some situations the processor could even hang. Just a nightmare bug and very massive, but in Wikipedia, there is still not even a mention of it. There were other bugs, indeed.

The emergence of the ARM changed the situation in the world of computer technology. Despite the problems the ARM processors continued their development. The answer from Intel was the 80486. In the struggle for speed and for the first place in the world of advanced technologies Intel even took a decision to use a cooling fan that spoiled the look of the PC till present time.

In the 80486 timings for most instructions were improved and some of them began to be executed as on the ARM processors during one clock cycle. Although the multiplication and division for some reason became slightly slower. What was specially strange was that a single binary shift or rotation of a register began to run even slower than with the 8088! There was quite a big built-in cache memory for those years, with the size of 8 KB. There were also new instructions, for example CMPXCHG – it took the place of the imperceptibly missing instructions of IBTS and XBTS (interestingly as a secret this instruction was available already at the late 80386). There were very few new instructions – only six, of which one is worth mentioning a very useful instruction for changing the order of bytes in the 32-bit word BSWAP. A big useful innovation was the presence of a built-in arithmetic coprocessor chip – no other producer had made anything similar.

The first systems based on the 80486 were incredibly expensive. Quite unusual is that the first computers based on the 80486 the VX FT model, were made by the English firm Apricot – their price in 1989 was from 18 to 40 thousand dollars, and the weight of the system unit is over 60 kg! IBM released the first computer based on the 80486 in 1990, it was a PS/2 model 90 with a cost of $17,000.

It's hard to imagine the Intel processors without secret officially undocumented features. Some of these features have been hidden from users since the very first 8086. For example, such an albeit useless fact that the second byte in the instructions of the decimal correction (AAD and AAM) matters and can be various, generally not equal to 10, was documented only for the Pentium processor after 15 years! It is more unpleasant to silence the shortened AND/OR/XOR instructions with an operand byte constant for example, AND BX,7 with an opcode of three bytes length (83 E3 07). These commands making the code more compact which was especially important with the first PC's, were quietly inserted into the documentation only for the 80386. It is interesting that Intel's manuals for the 8086 or 80286 have a hint about these commands, but there are no specific opcodes for them. Unlike similar instructions ADD/ADC/SBB/SUB, for which the full information was provided. This in particular, led to the fact that many assemblers (all?) could not produce shorter codes. Another group of secrets may be called some strange thing because a number of instructions have two codes of operations. For example it is the instructions SAL and SHL (opcodes D0 E0, D0 F0 or D1 E0, D1 F0). Usually and maybe always only the first operation code is used. The second opcode which is secret is used almost never. One can only wonder why Intel so carefully preserves these superfluous cluttering space of opcodes duplicating instructions? The SALC instruction waited for its official documentation until 1995 almost 20 years! Instruction for debugging ICEBP was officially non-existent for 10 years from 1985 to 1995. It was written most about the secret instructions LOADALL and LOADALLD although they will remain forever secret, as they could be used for easy access to large memory sizes only on the 80286 and 80386 respectively. Until recently, there was intrigue around the UD1 (0F B9) instruction, which was unofficially an example of an incorrect opcode. The unofficial has recently become official.

In the USSR the production of clones of the processors 8088 and 8086 was mastered, but they were unable to fully reproduce the 80286.

Edited by Jim Tickner, BigEd and Richard BN.