2x2=4, mathematics

Emotional stories about processors for first computers: part 13 (IBM 370)

IBM System/370



This part primarily focuses on comparing the mainframe machine language with other systems that were popular between the 70s and 90s. These are primarily the x86, 68k, VAX, and ARM. The System/390 and, in particular System/Z are considered very fragmentary – the main attention is paid to the System/370.

The first System/360 began shipping to customers in 1965, and the more the advanced System/370 from 1970. IBM maintains software compatibility with these systems to this day! Surprisingly, before the System/390 which were delivered as you can guess from 1990, mainframes worked with 24-bit addresses, that is, they could address no more than 16 megabytes of memory, the same amount as, for example, the 68000, released in 1979, or the 65816 or 32016, released in 1982. The VAX initially supported 32-bit addressing. Popular processors such as the 68020 or 80386, which appeared in the mid-80's, also supported 32-bit addresses. Actually, 16 MB of memory for the best systems of the second half of the 80's was already not enough. However, since 1983, IBM was producing the 370-compatible computers that could use 31 bits for an address as an extension, which eliminated the problem of memory capacity for the best computers. Unusually and uniquely, these extensions and the System/390 used the 31-bit addressing rather than the full 32-bit one. In 2000, IBM announced the first System/Z that uses 64-bit addressing and data. The System/Z has been using processor chips since 2008. Since 2007, they have been trying to combine the Z architecture with the POWER architecture in a single chip, but so far without success. So far, only Intel has managed to combine CISC and RISC in one chip – the Pentium Pro in 1995 became the first chip of this kind.



The IBM System/370-145 with the 2401 tape unit and a printer instead of display, 1971. It may be surprising that there was no display in this very expensive system, given that TV sets were mass-produced for over 20 years



By the way, some computer authorities believe that the first serial personal computer was the IBM 5100, first produced in 1975, which could execute instructions of the System/360 via a hardware emulator. Its improved versions were produced until the mid-80's. Although most likely the first was rather the Wang 2200. For the price (around $10000) the first personal computers were clearly not for home use. Surprisingly, the IBM 5100 with Basic was several times slower than the first cheap personal computers, such as the Apple II.




The IBM 5100, a version with APL support



With the advent of the IBM PC architecture which as it turned out for decades determined the mainstream of development for computing, IBM tried in 1983 to combine almost all the best computer technologies in a single product. The PC XT/370 combines elements of the System/370, the IBM PC XT, the Motorola 68000 and Intel 8087. This XT/370 could be used as a smart terminal for working with a mainframe, like a regular IBM XT, or to directly run mainframe software. Interestingly, the XT/370 had support for using virtual memory, which required two 68000s. In 1984, with the introduction of the PC AT, an improved version of the personal mainframe AT/370 was released, which in mainframe mode was about twice as fast as the XT/370. This is not the end of the history of such systems, since the 90s, similar products were produced that corresponded to the System/390. As far as I know, such hardware has not been made for the System/Z.

IBM for its mainframes uses a rather unusual business model for today, in which computers are not sold, but are leased. One of the advantages of this model is that it guarantees a constant upgrade of equipment, outdated equipment is automatically replaced with updated one of the corresponding class. This model also has drawbacks. For example, a particularly noticeable disadvantage for those who are engaged in the history of computer technology is that computers that have served their time are almost always disposed of and therefore they are almost impossible to find in any museum.

It was amazing to find a live IBM 4361 in LCM! However there is reason to believe that this may not be real hardware. For some reason, museum visitors have no access to this computer. It is also unclear what model is allegedly represented there, and this is despite the fact that other computers in the museum are identified very accurately. Among the IBM 4361, three models 3, 4, and 5 are known, with model 3 appearing later than models 4 and 5. But the system in the museum is self-identified as model 1. It is possible that this is a prototype. However, the museum staff did not answer a direct question about providing help with identification, and this is despite the fact that they answer other and often rather more complex questions quite quickly. Some features of code execution timings give grounds although not absolutely firm to assume that the emulator is most likely connected to the network. This summer (2020), the museum, due to the Covid-19 pandemic has overtly switched to an emulator... There is still a chance to get to the real mainframes via the HNET network, but I have not yet succeeded.

But whatever it is, everyone can connect and try to work in much the same kind of environment that high-paid professionals were experiencing from the mid-70s. The prices were such that today it is difficult to believe. For example, an hour of computer time cost more than $20 in the mid-80s and you still had to pay extra for disk space! True, we are talking about the mainframe operating time, not the terminal through which the work was going. This is why for example, when editing a text, someone could pay for only 5 minutes of mainframe time during an hour of actual work. The prices for the mainframes themselves were also fantastic. For example, Intel employees recall that in the early 80's they were given only one mainframe to work with. Its performance was 10 MIPS, and its price was about 20 million dollars then, and the dollar value was three times greater than today! Although this price seems to be some kind of exaggeration. Typical mainframe prices were in the hundreds of thousands of dollars. The cheapest mainframes could cost tens of thousands, but the most expensive up to several million. For example, the Cray-1 supercomputer cost 8 million in 1978, the IBM 4361 model 4 mainframe – about 130 thousand in 1985, the IBM 3081 model QX mainframe – more than 6 million in the same 1985, and the IBM 4321 mini-mainframe – more than 80 thousand in 1982. More details on prices can be found here or here. Now, even a tablet-sized Raspberry Pi at a few dollars can easily produce over 1000 MIPS. By the way on a Raspberry Pi or almost any modern computer, you can run the IBM/370 emulator, which will work much faster than any IBM system from the 80's or even 90's. However, the emulator needs to be configured and not all useful programs for the IBM/370 are freely available, so free access to a well-tuned system is often the best way to work with the mainframe. Surprisingly, such access programs as the 3270 terminal emulators are available even on mobile phones! By the way, I managed to set up my VM/CMS system on the Hercules emulator and deal with the file transfer, but it took at least a week of my time.

The Hercules emulator can emulate the later IBM/390 and IBM/Z, but this is much more difficult to do due to problems with the software license. As an illustration of such problems, I will cite a well-known case when IBM insisted on removing the Emulation section from an already published book! In modern electronic versions of this book this section does not exist; it can only be found in the printed edition or as a separate file on sites dedicated to free software. The fact is that emulation on regular personal computers since the early 2000s could be noticeably faster than execution on much more expensive mainframes. IBM therefore had to change the licenses for its software so that it can only be legally used on hardware purchased from IBM. Of course, it is not that emulators are faster than the best mainframes, they are only demonstrating a markedly better ratio of performance to cost.

One way to work with the Z or 390 systems is to install Linux into an emulator of these systems. At least Ubuntu and Debian distributions are available for the 390 and Z. Here it is worth noting that the rapid development of Linux is largely due to significant support from IBM. In particular, IBM invested a billion dollars in Linux development in 2001.

Let's now look at the features of the machine language of systems compatible with the 360. The basic assembler of such systems is called BAL – Basic Assembly Language. Surprisingly, if you believe the rumors about IBM, an assembler is still one of the main working programming languages there.

The assembler of the mainframes in question has a number of archaic features that were no longer present in other well-known architectures that appeared later. For example, this is about the fact that BAL mnemonics determine the type of arguments. Consider the x86 assembly instructions MOV EAX,EBX and MOV EAX,address as an example – both use the mnemonic MOV. For BAL for such cases different mnemonics LR and L are used in the commands LR 0,1 and L 0,address respectively. However, similar different mnemonics allow using numbers for naming registers, although usually macros R0, R1, ... instead of numbers 0, 1, ... are the first thing that is defined in macro packages for programming convenience. Another archaism is the use of label jumps in conditional compilation constructs, although in my humble opinion this is sometimes more convenient than block structures. But the most famous archaism is the use of EBCDIC encoding to work with symbolic information. In this strange, even for yesterday, encoding the letters of the English alphabet are not encoded in succession, for example, the letter I has code 201, and the next one J – 209! This encoding comes from technologies for working with punched cards that originated in the pre-computer era. The System/360 also supports ASCII encoding in hardware, but in its ancient and long-forgotten version, where a character for the digit 0 has code 80, not 48 as it is now. As far as I know, ASCII was better not even trying to use on IBM mainframes. ASCII support was removed already in the System/370, but introduced at a new level in the System/390. Some BAL mnemonics are striking in their super-brevity and even non-mnemonicity, for example, N means AND, O – OR, X – XOR, A – ADD, S – SUBTRACT, M – MULTIPLY, ...

BAL allows you to work with three basic data types: binary, decimal, and real numbers. The System/390 uses another special type to work with real numbers. Some Z systems may also use completely unique type of data such as decimal real numbers. Instructions for working with each type form a special and rather isolated class of instructions. Generally, with very few exceptions all the IBM 360-compatible systems support decimal and real arithmetic instructions. As you know, for the x86 or 68k architectures, support for working with real numbers did not appear immediately and was an optional choice for a long time, and working with decimal numbers was not something completely separate from binary arithmetic – it was rather an extension.

For working with real and binary numbers, different sets of registers are used, and for working with decimal numbers, registers are not used at all. The System/370 provides 16 32-bit general purpose registers for binary integers, with the command counter being part of the processor status word. There is no separate stack, it can be organized using any register – this is how the stack was later implemented in the ARM. The subroutine call is also made as in the ARM, via a link register. All registers are almost always interchangeable, exceptions are very rare. If you compare a system of BAL binary registers with the competitive VAX architecture, you will notice that VAX has one register less. This is true for the ARM as well.

The structure of operands in the instructions will seem quite familiar to those who know the x86 assembler. For binary numbers, operands have a "register-register" or "register-memory" structure, and for the latter case both 32-bit and sign-extensible 16-bit values can be loaded from memory. For example, the analog of the x86 ADD EAX,EBX instruction is AR 0,1, ADD EAX,addressA 0,address, ADD EAX,address[EBX]A 0,address(1), ADD EAX,address[EBX][EDX]A 0,address(1,3). However, the System/360 and even its later development do not know how to work with scaling, for example, ADD EAX,address[8*EBX] can not be written in BAL with a single instruction. On the other hand, the x86 cannot usually do a signed extension for a 16-bit number, for example, BAL instruction AH 0,address, which means to take a 16-bit signed number from memory and add it to the content of register 0, will require two commands for its implementation on the x86.

A rare peculiarity of BAL is the presence of separate instructions for addition and subtraction for signed and unsigned numbers, and unsigned operations in BAL are called logical. This oddity is caused by the lack of flags in the 360 architecture that are usual to most other architectures. Instead, only two bits are used which are set differently by different instructions! The only difference between signed and unsigned operations is that they set the two status bits mentioned differently. For signed operations, you can find out whether the result was zero, whether it was positive or negative, whether an overflow occurred, and for unsigned operations, whether the result was zero and whether there was a carry or borrow. Conditional jump instructions allow you to consider all 16 subsets of cases that are possible when using 2 bits. Due to this unusual for today way of working with operation flags, conditional jump instructions are difficult to quickly understand. Although BAL extensions usually add fairly easy-to-understand macros for conditional jumps, where you do not need to parse each of the 4 bits. Here, to be fair, we can note that there are separate commands for signed and unsigned addition and subtraction, for example, in the MIPS architecture, where there are no flags at all!

Another rare peculiarity is in separate instructions for signed and unsigned comparisons. I've met similar ones not only on the MIPS, but also on the MicroBlaze. In the latter, by the way, the carry is the only supported flag for the operation status.

On systems compatible with the IBM 360, there are no arithmetic operations with the carry flag, so if we need to work with binary numbers, for example, in an 128-bit addition, we must check the carry flag after performing the first 32-bit operations and use a jump if necessary to organize this addition. This is of course, very cumbersome compared to the x86, ARM, 68k, or even 6502, but on the much later MIPS it is even more cumbersome. The normal working with the carry was realized only in the System/Z.

There are no cyclic shifts in BAL, but non-cyclic shifts like in x86, can be either single or double. However, BAL has separate shift instructions for unsigned and signed numbers, only the latter sets status flags. It is obvious that the results of left shifts for both cases differ only in the flags. Rotations were only added to the System/390.

Among the register loading commands in BAL, there are most likely unique ones. You may load the absolute value of an integer, a negation of this value, or a number with an inverted sign – something remotely similar I've only encountered in the ARM architecture. Here it is worth noting that the entire architecture of the System/360 tends to sign arithmetic, and unsigned arithmetic in this architecture is rather secondary. BAL originally did not have unsigned division and multiplication, they were only added to the System/390. When loading the register, the flags as in the x86 do not change, but there is a special loading instruction that sets the flags – this again resembles the ARM, where the setting of flags can be controlled.

All signed arithmetic operations, including shifts, can throw an overflow exception. Whether to generate an exception or not is determined by a special mask flag in the status register. Interestingly, binary division and multiplication in BAL do not affect flags at all – here you can remember the x86, where division only spoils the flags.

Bitwise logical operations in BAL are represented by the usual set of operations AND, OR, excluding OR, i.e. there is no separate operation NOT. Logical operations can have not only a "register-register" or "register-memory" structure, but also "memory-constant" or "memory-memory" – the latter addressing method is similar to that used for decimal numbers. The memory-constant addressing is only possible for working with bytes. Obviously for logical operations, as opposed to arithmetic, the use of 16-bit numbers is not possible. For the memory-memory addressing you can work with data up to 256 bytes long! It turns out that we have three types of data for working with logical operations: bytes, 32-bit words, byte sequences – and special instructions for each of these types, which is rather somehow non-universal.

Logical operations in BAL are adjacent to operations for transferring of bytes. In addition to the usual transfer of up to 256 bytes with a single instruction, there are also unique instructions for transferring of byte tetrads. You may only send the higher or lower halves of bytes and the other halves retain their value after copying! Such strange operations are needed to support BAL features when working with character and decimal information. There are also transfer and comparison instructions that appeared for the System/370 for up to over 16 million bytes at a time, which may be interrupted. Surprisingly, also less than fast commands for working with blocks up to 256 bytes long can't be interrupted, which can create an unpleasant delay in response to an interrupt request. You can also use transfer commands to fill memory with a specified byte. In addition to transferring data from memory to memory, you can also set an individual byte to a specified value. Obviously, commands for byte transferring, if we don't consider new instructions for the 390 and Z, were implemented as more advanced for the x86.

You can load not only a value at a specified address into a register, but also the address itself, as in the LEA instructions for the x86 or 68k. This feature also allows you to directly load a required constant into a register, although its maximum value cannot be greater than 4095. It also allows you to increment the register by no more than 4095. However the decrement of the register can be done only by 1. Both an increment and a decrement work with addressing, so they do not change the flags. It is possible to load individual bytes and even groups of bytes from a word in memory into a register, for example only the first and third bytes – such a tricky operation for all other 32-bit architectures known to me is possible only through a series of 4 instructions. Likewise, BAL allows parts of a register to be stored into memory.

A number of BAL instructions are very specialized – in other architectures, they are implemented as a series of simpler instructions. For example, the TR instruction allows you to recode a character string – one argument specifies the string to recode, and the other specifies the address of the conversion table. A special variant of this instruction, TRT, can be used to scan a given string and skip empty characters – this is the functionality of the standard C strpos() call. The ED and EDMK instructions are absolutely unique – they have the functionality of a primitive version of sprintf()! However almost all string operations are limited to the maximum string length, no more than 255 bytes, that significantly reduces their power.

In BAL, it was rather difficult to work with 16-bit unsigned values due to the lack of rotation or SWAP-type commands. Since the System/390, the situation with this problem became better. Some BAL instructions are almost deprecated, for example, the MVO nibble shift instruction has been supplanted by the more convenient SRP. For block transfers and comparisons, it is better to use the new instructions, although because they use a different addressing method this may not be optimal in some rare cases.

Examples of the four basic BAL addressing modes have already been given. There is also a fifth one for three-address instructions. There are no modes such as those typical for the VAX, 68k, PDP-11 or even 6809 with auto-increment or decrement in BAL. There are also no double indirect memory access modes available for the VAX, 68020, or PDP-11, and of course BAL, unlike the VAX or PDP-11 assemblers, is completely non-orthogonal. BAL is the closest to the x86 and ARM assemblers – the most successful modern architectures. The order of operands in BAL is right-to-left, just like in Intel's x86 assembler or ARM assembler, and thus not the same as in the VAX, PDP-11, or 68k. Although the byte order for data in BAL is from higher to lower (MSB), which is different from the x86, ARM, or VAX, but corresponds to the accepted for the 68k or MIPS.

Operations with decimal numbers are implemented in BAL only via memory-to-memory addressing. Decimal numbers can be set in memory chunks up to 16 bytes long, which allows you to use numbers with up to 31 decimal digits. This corresponds to the precision of a 107-bit binary number. Thus, only the most modern programming systems that use integer binary numbers can work with larger values than the System/360 almost 60 years ago! Of course, you can use binary arithmetic to implement arbitrarily large numbers, but for some reason there were no popular programming languages that support numbers larger than the ancient System/360 until recently. Even now, support for 128-bit integers for the x86 is usually by only unofficial extensions, such as for GCC.

Decimal numbers in BAL are represented uniquely, they must keep a sign – this is not the case for the VAX, x86, 68k, and maybe others. Moreover, the sign is stored in the last byte of the number representation! For decimal numbers, BAL has direct support for all basic operations: addition, subtraction, multiplication, and even division – this is also not available in any other architecture I know. In addition, BAL also provides instructions for copying, comparing, and shifting decimal numbers. The MVO and SRP instructions mentioned above are intended for such shifts. Operations can only be performed on packed decimal numbers, but they must be unpacked to print them, and to represent unpacked digits in BAL, you also need a sign, which in this case does not take up space since it is placed in the high tetrad which requires special work with this tetrad before printing. It is strange that the operations for packing and unpacking can only work with no more than 16 bytes of the unpacked decimal number, which allows you to use only no more than 15-digit numbers with them. This unpleasant problem can be solved by using ED or EDMK instructions for unpacking, but packing a large unpacked number has to be done through a not very simple sequence of instructions. New instructions have been added to the System/390 to solve this problem. Unexpectedly, the instructions for packing and unpacking work with any binary data, not just decimal data.

BAL has special unique instructions that allow you to convert one binary number to a packed decimal number at a time and vice versa. For a decimal number, these instructions always allocate 8 bytes, i.e. 15 digits and a sign. However, a 32-bit register is only sufficient to represent a signed number corresponding to a 9-digit decimal number, so not every decimal number in the correct BAL format can be converted to binary with a single command. For the System/Z, there are extended instructions for such transformations.

Jump instructions in BAL distinguish in that they are, as a rule, paired – the jump address can be set both explicitly and by the contents of a register – in many other architectures, jumps on the contents of the register are available only for unconditional jumps. By the way, there are no pure unconditional jumps in BAL, such jumps are implemented by setting an always true condition, which is similar to the ARM architecture. Conditional branching in BAL, as noted, has a unique syntax. Consider for example the BT 9,address instruction, which means to jump if conditions 0 and 3 are encountered, but conditions after different commands mean different things. For example after signed addition, these conditions mean "the result is 0 or an overflow occurred", and after an unsigned addition, "the result is 0 and there was no carry, or the result is not 0 and there was a carry". Despite the clumsiness and some redundancy, one cannot but admit that such a system for working with conditions for jumps is probably the most flexible of all known. The nine in the command from the example is used in binary representation, 1001 that is, it determines the bit numbers – such a system to encode all combinations of conditions with 4 bits is also used in the ARM. In addition to conditional jumps in BAL, there are also jumps by the counter with a decrement, approximately the same as in assemblers for the Z80, x86, 68k, PDP-11, ... But BAL also has two completely unique instructions for jumps, which depending on the number of a register operand can be three or four addresses! In these unique commands, two registers are added together and the resulting sum is compared with the contents of the other register, and the result of this comparison determines whether to jump or not. These unusual instructions are believed to be useful for working with jump tables.

As already noted, the call of subroutines in BAL is implemented without using a stack, by simply storing the return address in a register. However, the BAL instructions for such calls, one of which is also called BAL, store not only the return address, but also part of the status register, in particular the condition flags, the length of the current instruction, and even a mask for optional exceptions, such as integer or decimal overflow – this mask was mentioned above. This unusual extended information storage is due to the fact that the program counter in the mainframe architecture is the highest part of the machine word, and instructions for calling subroutines mechanically preserve this highest part. There are no special commands for returning from subroutines, you need to use a normal jump to the address in the register. In the System/390, new commands for calling and even returning from subroutines were added in connection with the transition to a 31-bit architecture. These new instructions allow you to flexibly use codes that are executed in different modes in the same program.

To quickly call single-command routines, BAL has a unique EX instruction that executes another instruction at the specified address and proceeds to the next command. The EX statement can modify the called instruction, which allows you to use any desired register in this instruction, or set parameters for mass byte transfer. A similar instruction, but simpler, is also in the TMS9900 instruction set.

Initially, BAL did not have relative, relocatable jumps like the Z80 or x86. They were only added to the System/390.

The SPM, TM, TS, STCK, and STPT instructions are also somewhat unusual. The first one allows you to set all operation flags and the optional exception mask with a single command. The TM instruction allows you to check a group of bits and determine three cases: all zeros, all ones, and a mix of zeros and ones. A similar check cannot be performed by a single command in other architectures. However, TM only works with individual bytes in memory. TS is used when working with multiple processors – there is a similar command for the 68k. The STCK instruction reads the value of an external (!) timer, and the STPT instruction reads the value of an internal timer embedded in the processor circuit. Strangely, the STPT command is privileged, but the STCK is not.

It is also worth mentioning the CS and CSS instructions, which are designed to support multiprocessing. They are implemented for the System/370, i.e. they became available since the early 70s. In the x86, the CS analog, the CMPXCHG instruction, was implemented no earlier than 1987, and the CDS analog, the CMPXCHG8B instruction, was implemented only in 1994!

The STIDP processor identification instruction is introduced from the System/370. It is privileged and not very informative. For the x86, analogous command is significantly more powerful. Here you can also notice that the IBM 4361 in LCM allows any user to execute STIDP. This is obviously an exception-triggered emulation.

Four BAL addressing modes specify two operands for the instruction, and the fifth mode specifies three-operand commands. However, ignoring some of the information allows you to have one-operand commands, and the use of implicit information allows you to have four-operand commands. When used in addressing, register 0 has a special role: it is simply ignored there – this allows you to skip the base and index when calculating the address. All BAL instructions take up strictly 2, 4, or 6 bytes. It is similar to the 68000 or PDP-11 but not to the x86, VAX or ARM.

Several more addressing modes were added to the System/390, bringing their number to 18. The number of instructions has also increased significantly. Among new instructions there are even ones that support working with Unicode – this is still not available for x86! Among the new instructions of the System/390, there are other unique ones. Several more addressing modes were added to the System/Z, and the total number of instructions for the modern Z is very large and probably even more than the number of commands for the modern x86-64!

In the systems 360, 370 and 390, the offset when accessing data in memory, as in the ARM, is 12-bit, i.e. no more than 4095, which is not very convenient, in large programs there may be a lack of registers for basing. In x86, this offset in real mode is 16-bit, which, of course, is much more convenient. But the System/Z has added support for addressing with a 20-bit offset, which is of course even better. Although, it is worth noting that in protected mode of the x86 or on the 68020, the offset can be 32-bit. As already noted, in systems before the 390, as in the ARM, it was not possible to use large constants when working with registers. The x86 architecture was much more flexible here. Therefore, when using an assembler with the systems 360 or 370 it was often mandatory to use literals or pseudo-constants, which is somewhat slower.

Systems that are compatible with the IBM/360 have always had good performance. My experiments with the LCM's 4361-1, in particular in the project of calculating the number π using the spigot-algorithm showed quite good timings. The 4361-1 instructions work almost without delay, this is like the ARM or other modern processors. However, due to the somewhat awkward command system inherited from the 60s, in particular due to the lack of division by a 16-bit divider, the result for the efficiency of the processor electronics was at the level of the 80186. This is about 80% slower than the result shown by the then best computer from the VAX family, the model 785. However, the mainframe in LCM is clearly not the best among the IBM mainframes available then. It is also worth noting that the mainframes used channels, specialized processors that made I/O very fast, much faster than it was for most other computers of those years.

As a student, I happened to work with a Soviet IBM/370 clone, the ES-1045, in 1987 through the batch mode, and in 1989 through the dialog mode. For batch mode, we had to prepare punch cards. At that time, I was already using a home computer and therefore the use of archaic punch cards did not leave the best impression. However the dialog mode was not bad, but it often broke down when there were a large number of users. Therefore, some students came to work at 4 am! Since then, I have not been able to deal with mainframes anymore. I only recently decided to use emulation to sort out this landmark technology for the history of computers.

The cloning of the IBM/360 was very popular. Such clones were made in England, Germany, Japan, and by other companies in the United States. In the USSR, this cloning took on a very dramatic connotation. For the sake of this cloning, almost all domestic developments in the field of IT, some of which were quite promising, were curtailed. In particular, a branch of Ural computers was cut off, about which a known computer specialist, Charles Simonyi later spoke with warmth. The BESM-10 project was also closed, although the machines of the previous BESM-6 class were comparable to the IBM/360 in performance. The development of low-cost promising MIR computers, one of which was even bought by IBM in 1967, received low priority. Also, for the sake of this cloning, an almost concluded contract with ICL was cancelled, perhaps with this contract, the British IT industry would have acquired a new dynamic and would not have fallen into decline. Only the Elbrus supercomputers, perhaps because of their connection to the defense industry, survived the "clone invasion" which Dijkstra called the greatest US victory in the Cold war.

As people who worked with mainframes in the USSR recall, domestic clones were distinguished by extremely low reliability and required constant attention from the maintenance staff. While the original American IBM mainframes were among the most reliable computers of their time. Sometimes more than a dozen (typically above 5) kilograms of precious metals (gold, platinum, palladium and silver) were put into a Soviet clone, but this did not help to fix the reliability problem. Because of such a large number of highly liquid assets, it is difficult to imagine that a working domestic clone could be preserved anywhere. It is not difficult to assume that if they simply sold all such assets used in the production of a Soviet mainframe, then with the proceeds, they could buy a reliable American or English mainframe.

As one of the main reasons in favor of the need to switch to cloning, the argument was used that the Soviet economy is not able to produce the necessary software. And cloning made it possible to use the programs for free, or to put it bluntly, it was recommended programs to just steal! However, practice showed that some domestic programs for mainframes turned out to be very successful and they almost completely replaced the branded ones. As one of the examples of such programs, I can name the Primus dialog monitor. Of course, it should also be noted that really branded programs were not obtained for absolutely free at all – they required quite a lot of effort to adapt, in particular, localization. With the latter, sometimes there was an overkill, for example, when programming language could replace all keywords with their translation – for example, it was with Cobol.

Interestingly, the chief architect of the IBM/360 left IBM and founded Amdahl Corporation, which for more than two decades specialized in the production of systems compatible with IBM mainframes and at the same time slightly superior in performance and reliability at lower prices. As a result, due to major changes in the mainframe market, Amdahl, like ICL, became part of the Japanese company Fujitsu.

There were other mainframes, besides computers of the IBM/360 architecture. In the 60s, American mainframe manufacturers unofficially received a sonorous name of Snow White and the Seven Dwarfs. It’s probably not hard to guess that Snow White was IBM. Mainframes of original architectures were also produced in other countries. The British architecture ICL 1900 is especially worth mentioning.

As is already written above, I managed to set up a working configuration for VM/CMS 6. However, it turned out that the XEDIT editor is not freely available, and plain EDIT is too peculiar and inconvenient, so I have to edit my texts on the host. It has also been discovered that the standard program for transferring files from a terminal emulator to a mainframe and back has been unavailable, this required the use of virtual punch cards for such transfers. Another unpleasant surprise was found in connection with debugging. The DEBUG command does not support step-by-step execution, while this feature was even available for the DDT debugger for the 8080 processor! It is also surprising, though less critical, that DEBUG is not able to do disassembly which was often embedded even in the simplest monitors of the 70s processors. Under CMS, long line wrapping and line break control characters are not supported at the low level! Therefore, when printing from an assembly language program, you need to format lines manually so that they do not disappear beyond the right edge of the screen, and also take care of filling the last line with finishing spaces. The lack of automatic vertical scrolling is also unusual.

Those who want to work with mainframes for the first time should keep in mind that mainframes are a huge "ecosystem", where many familiar concepts may have a different interpretation. For example, there is no simple concept of file. One of the key attributes of a file is its record size, there is nothing like this for Linux or Microsoft Windows files. The files themselves differ in the methods of accessing them and this was written about and may even be being written about in non-thin books. It is also unusual that in CMS the disk name is written at the end of the full file name, and the name, extension, and disk are separated by spaces, and the disk name itself is called the file mode for some reason. I would also like to study more about multitasking MVS, as far as I know they never used it in the USSR.

In general, it is somewhat unexpected that some well-known operating systems that were used on very expensive computers did not support working with file directories, which equated them with the very first and primitive OS for microcomputers, such as CP/M or Commodore DOS. This is why CMS was sometimes called CP/M for mainframes. Surprisingly, as far as I know, support for directories in CMS has not been introduced, although the last release of the system dates back to 2018. For some reason, working with directories for expensive computers before the 80's was often poorly supported. For example, there was no such support in DEC RT-11 and even one of the best OS for the PDP-11, RSX-11, only supported two-level directories. The most popular IBM operating system until the 2000s was MVS (1974) and even here the directories were only partially made, as in Apple MFS (1984). Although in Unix (1973), MS-DOS (since 1983), or even 8-bit Apple ProDOS (1983), this was all right from the start. The most advanced work with files was offered in VAX/VMS (1977), where in addition to directories, there is even built-in support for file versioning.

Interestingly, Rexx the scripting language for CMS, MVS and some other IBM operating systems, in its reduced form became the language of batch files for the Commodore Amiga. It may have been some kind of compensation, as Commodore was a firm supporter of the IBM PC architecture since 1984.

Mainframe software usually uses only two colors. Color-enabled terminals were used relatively rarely and therefore there were few programs using colors. There are also few programs with dynamic graphics: frequent screen updates lead to noticeable unpleasant flickering.



A dynamic demo running on the IBM 4381 emulator in LCM, an emulator of terminal 3270-3 is used



In conclusion, I can't help but express my admiration for IBM's technologies. They have always been distinguished by their unique originality and high level quality. I would especially like to note the very good quality of documentation, which is publicly available even for modern systems. IBM demonstrates tremendous dynamism in technology development, even though it is one of the largest companies in the world. In terms of the number of employees, it is almost equal to Microsoft, Google and Intel put together!

The theme of mainframes is really vast. Of course I was able to write only a small part of what this theme might contain. I would be very grateful for any clarification and additional information.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 12 (Preface and Postface)

Prologue and Epilogue



I've happened to program with assemblers of different processors. The last on this list is the Xilinx MicroBlaze. I decided to put some of my observations on the features of these almost magical pieces of iron, which, like a magic key opened the doors for us to the magical land of virtual reality and mass creativity. On the features of modern systems the x86, x86-64, ARM, ARM-64, etc., I will not write, maybe another time – the topic is very large and complex. Therefore, I finish on the Intel 80486 and Motorola 68040. I also wanted to include in the review the IBM/370, which I had to deal with. These systems were quite far from the mass users but had a huge impact on computer technology. They require much time to prepare materials about them, they didn't use chip-processors and there is somehow none of these machines left in existence, therefore they aren't included. I really hope that my materials will also attract the attention of experts, who will be able to add something I have not thought about or didn't know.

As illustrative material, I attach my small stone from Rosetta, tiny programs for calculating the number π on different processors and systems using a spigot-algorithm, claiming to be the fastest of its implementations.

In conclusion, I make several remarks that I have got in the course of writing these articles.

It is difficult to get rid of the feeling that 8-bit processors were only an undesirable necessity for the main characters acting in the 70's and 80's on the stage of computer history. The development of the most powerful 8-bit 6502 was actually frozen. Intel and Motorola rather slowed down their own development of small processors and restrained other developers.

I'm pretty sure that the Amiga or Atari ST would work better and faster using a 4 MHz processor compatible with the 6502 with a 20 or 24 bit address bus than with the 68000. Bill Mensch said recently that it’s easy to make the 6502 at 10 GHz today.

If the Amstrad PCW series (the success of which the Commodore CBM II could have shared) had begun to use the upgraded Z80 at higher frequencies, then it is quite possible that this series would have been relevant 10 years ago.

What would the world be like if the ARM had been made in 1982 or 1983? In my humble opinion it was quite possible.

What would computers made in the SU be like if they had copied and developed not the most expensive but the most promising technologies?

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 11 (Intel 8080)

Intel 8080 and 8085



The first real processor on a chip, made in the first half of 1974, is still being manufactured and is currently being used. It was repeatedly cloned around the world, in the USSR it had the designation KP580BM80A. Modern Intel processors for the PC still easily reveal their kinship to this in some sense relic product. I myself haven't written codes for this processor but being well acquainted with the architecture of the Z80, I would venture to give some of my comments.

The 8080 instruction system like other Intel processors for the PC can hardly be called ideal, but it is universal, quite flexible and has some very attractive features. The 8080 favorably differed from its competitors, the Motorola 6800 and the MOS Technology 6502, by a large number of even somewhat clumsy registers. The 8080 provided a user with one 8-bit accumulator, a 16-bit semi-accumulator and simultaneously fast index register HL, a 16-bit stack pointer, as well as two more 16-bit registers BC and DE. The BC, DE, and HL registers could also be used as 6 byte-registers. In addition the 8080 had support for an almost full set of status flags: carry, sign, zero and even parity and auxiliary carry. Some commands from the 8080 instruction set had been speed champions for a long time. For example the XCHG command makes the exchange of the contents of the 16-bit DE and HL registers in just 4 clock cycles, it was extremely fast! A number of other commands, although they did not set such bright records, were also among the best for a long time:


  • XTHL – exchange of HL register contents and data at the top of the stack, 18 cycles – it seems like a lot, but even on the real 16-bit 8086 an equivalent of such a command takes at least 26 cycles, and for the 6800 or 6502 such a command is hard to imagine;
  • DAD – add to the semi-accumulator HL the value of another 16-bit register (BC, DE or even SP), 10 cycles. This is a true 16-bit addition with a carry flag set. If you add HL to itself you will get a quick 16-bit shift left or multiplication by 2, which is a key operation for programming both full multiplication and division;
  • PUSH and POP – put in the stack and remove from the stack a 16-bit value respectively from a register or in a register. They perform in 11 and 10 cycles. These are the fastest 8080's operations for working with memory, and when they are executed SP is automatically incremented or decremented. The PUSH can be used for example to quickly fill memory with a pattern with values from 3 registers (BC, DE, HL). There are no stack instructions for working with 8-bit values at all;
  • LXI – a loading of a 16-bit constant into a register (HL, DE, BC, SP) for 10 cycles;
  • RNZ, RZ, RNC, RC, RPO, RPE, RP, RM – conditional returns from any subroutine, allow to make code cleaner eliminating the need to write extra conditional jumps. These commands were abandoned in the x86 architecture, but they should probably have been saved, the code with them turns out nicer. The 8080 can also use corresponding conditional subroutine calls (CNZ, CZ, CNC, ...) although this usually does not provide any benefit because a subroutine call as a rule needs parameter settings inserted before the call.

This processor was used in the first 'almost personal computer' the Altair 8800, which became very popular after the journal publication in early 1975. By the way in the USSR a similar publication happened only in 1983 and corresponding to it in relevance only in 1986.



The first almost PC



The Intel's 8080 became the basis for the development of the first mass professional operating system CP/M, which occupied a dominant position among microcomputers for professional work until the mid-80's.

Now about the shortcomings. The 8080 required three supply voltages of -5, 5, and 12 volts. Working with interrupts was rather clumsy: it required a dedicated controller, and non-maskable interrupts were not supported at all. In general the 8080 was rather slower if you compare it with competitors, which soon appeared. The 6502 could be up to 3 times faster when working on the same frequency as the 8080. In the instruction set, the presence of 6 senseless instructions (a kind of MOV A,A) slightly irritates – they could have been left undocumented saving opcode space for new operations. The instruction for decimal correction can only be used after addition. After subtraction, a special code must be used for decimal correction, usually consisting of 4 instructions. There are no non-rotating shifts.

However in the architecture of the 8080 was laid as it turned out the correct vision of the future, namely it was a vision of a fact unclear in the 70's that processors would be faster than memory. The 8080's DE and BC registers are a prototype of modern caches with manual control, rather than general-purpose registers. The 8080 could use 2 MHz frequency, while competitors could only use 1 MHz, which reduced the performance difference between them.

At first, the 8080 was sold at a very high price of $360. This was a kind of reference to the large IBM/360 computers. Intel seemed to say that if you buy the 8080, you can get something similar to a very expensive mainframe.

It's hard to call the 8080 a 100% 8-bit processor. Indeed its ALU is 8 bits wide, but there are many 16-bit commands that work faster than if you use only 8-bit counterparts instead, and for some instructions there are no 8-bit analogs at all. The XCHG instruction is essentially and by timing 100% 16-bit and there are real 16-bit registers. Therefore I venture to call the 8080 partially 16-bit. It would be interesting to calculate this processor's bit index based on the set of its features, but as far as the author knows, no one has still done such work.

The author of this text does not know the reasons why Intel abandoned direct support of the 8-bit PC's with their processors. Intel has always distinguished the complexity and ambiguity of the policy. Its connection with politics in particular is illustrated by the fact that for a long time Intel has had fabs in Israel and until the end of the 90's it was secret. Intel practically did not try to improve the 8080, only the clock frequency was raised to 3 MHz. In fact the 8-bit computer market was given to Zilog with the z80 processor which was related to the 8080, and the z80 was able to quite successfully withstand the main competitor, The Terminator 6502. Zilog by the end of the 70s was a company with huge capabilities, with almost unlimited funding from Exxon and even two newest fabs, this was really a lot – Motorola, with a billion-dollar business, also had only two chip factories at the time. Interestingly, that in the mid 80's when the importance of the 6502 became rather insignificant, Zilog also rapidly lost its own significance. The 8080 and 8085 were usually used as controllers and as such could be successfully sold at a higher price. The presence of the z80 allowed Intel to distance itself from the competition of 8-bit processors for computers where the 6502 greatly influenced the reduction of prices.

In the USSR and Russia the domestic clone of the 8080 became the basis of many popular computers that remained popular until the early 90s. Those are of course the Radio-86RK, Mikrosha, the multicolor Orion-128, Vector, and Corvette. Eventually cheap and improved ZX Spectrum clones based on the z80 won the clone wars.




This is a real PC



In early 1976 Intel introduced the 8085 processor, compatible with the 8080, but significantly superior to its predecessor. In it the power supply of -5 and 12 volts became unnecessary and the clock frequency was used from 3 to a very solid 6 MHz, the command system was expanded with very useful instructions: 16-bit subtraction, 16-bit shift right for only 7 cycles (it was very fast), 16-bit rotate left through the carry flag, loading of a 16-bit register with an 8-bit offset (this instruction is possible to use with the stack pointer too), writing of the HL register contents to an address in the DE register, analogous reading of the HL via an address in the DE. All the instructions mentioned above, except for the shift to the right, are executed in 10 cycles – this is sometimes significantly faster than their counterparts or emulation on the Z80. Some more instructions and even two new processor status flags were added: the overflow flag and the flag XORing overflow and sign flags. The exact purpose of the second flag, typical for sign arithmetic, became known only in 2013, 37 years after the appearance of the 8085! This flag allows you to check "greater than or equal to" or "less" relationships at a time, but checks for paired relationships will also require an additional checking of the zero flag. Many instructions for working with byte data were accelerated by 1 clock cycle. This was very significant as many systems with the 8080 or Z80 used wait states, which due to the presence of extra cycles on the 8080 could stretch the execution time by almost twice. For example in the mentioned computer Vector, register-register instructions were performed for 8 cycles, and if there were the 8085 or Z80, then the same instructions would be executed only in 4 cycles. The XTHL instruction became faster by two cycles and jump instructions became faster even by three. With the new instructions, you can write code to copy a block of memory that runs faster than the Z80's LDI/LDD commands! The 8085 also usually executes programs for the 8080 somewhat faster than the Z80. However, some instructions, for example a subroutine call, a 16-bit increment and decrement, loading of SP, the PUSH and conditional returns became slower by a cycle.

The 8085 has a built-in serial I/O port and improved support for working with interrupts: in addition to the method dependent on an external controller inherited from the 8080, support for a non-maskable and three maskable interrupts has been added – this allows you to do without a separate interrupt controller in the system, if necessary. Working with the port and with interrupt management was implemented via the SIM and RIM instructions – only these two new instructions were officially documented. However, the interrupt handling itself remains the same as the 8080 – very minimalistic: when interrupting, the processor does not even save the status word; this saving must be explicitly written in the code. As already noted, in the 8085, the work with signed arithmetic remained somewhat not completely realized, but its realization was more complete than in the Z80. The 8085's 16-bit arithmetic also didn't get several very desirable commands including addition with carry and subtraction, these commands were added to the Z80. In the 8085, when adding for example 32-bit integers you need to use a conditional branch to account for the carry – this, by the way also looks like the IBM mainframes.

However I can repeat the statement "for unknown reasons" Intel refused to promote the 8085 as the main processor for PC's. It was only in the 80's that some fairly successful 8085-based systems appeared. The IBM System/23 Datamaster first appeared in the 1981, it was a predecessor and almost a competitor to the IBM PC. Then in 1982 a very fast computer with excellent graphics, the Zenith Z-100, was released, in which the 8085 was running at 5 MHz. In 1983 Japanese company Kyotronic created a very successful KC-85 laptop, versions of which were also produced by other companies: Tandy were producing the TRS-80 model 100, NEC – the PC-8201a, Olivetti – the M-10. In total they released perhaps more than 10 million of these computers! In Russia in the early 90's on the basis of domestic clone the ИM1821BM85A there were attempts to improve some systems, for example, the computer Vector. Surprisingly the main processor of the Sojourner rover, which reached the surface of Mars in 1997, was the 8085 at 2 MHz! Such a success of the 8085 in the 1980s is largely due to the fact that in early 1979 the 80C85 was ready as the 8085 low-power variant. The aforementioned first real pocket computer the Tandy 100 could work up to 20 hours on a single charge! It is possible that without the ARM, the 80C85 would have been actively used in mobile computers in the 90s.

In fact Intel gave way to the Z80 in the 70's. A few years later in the battle for the 16-bit market Intel behaved quite differently, starting a lawsuit to ban sales of the V20 and V30 processors in the United States. Interestingly the mentioned processors of Japanese company NEC could switch to full binary compatibility with the 8080, which made them the fastest processors of the 8080 architecture.

Another secret from Intel is the refusal to publish an extended instruction system which included support for two new flags. It was first published as an article in 1979, and then by some of the manufacturers of these processors. However published information about these new flags was rather incomplete. What are the reasons for this strange refusal? One can only guess. Could Zilog then play a role, that AMD might have once played, and create the ostensible appearance of competition while the 8085 could have brought down Zilog? Was it maybe about wanting to keep the system of instructions closer to the 8086 then being designed? The latter seems doubtful. The 8086 was released more than 2 years after the release of the 8085 and it’s hard to believe that in 1975 the system of its commands was already known. And in any case compatibility with both the 8080 and 8085 on the 8086 is achievable only with the use of a macro processor, sometimes replacing one of the 8080's or 8085's instruction (POP/PUSH PSW, Jcc, Ccc, Rcc, XTHL, LDHI, LDSI, LHLX, SHLX) with several of its own. Moreover the two published new instructions (SIM and RIM) of the 8085 in the 8086 are not implementable at all. It is believed that the refusal occurred only because of difficulties in supporting the transfer of the new 8085 flags to the 8086 code. Indeed, such a transfer with support for bitwise operations with the processor status word turns out to be extremely cumbersome. But 7 out of 10 new instructions have no direct relation to the use of new flags and they could be published without creating difficulties for compatibility with the 8086. We can also assume that Intel was dissatisfied with the implementation of the signed arithmetic in the 8085 and decided that it was better to hide than to prepare the ground for constant criticism. Although in this case, seven new instructions could have been published, and only the flags and three instructions could have been hidden.

It is especially difficult to explain why Intel did not publish information about new instructions after the release of the 8086. We can also assume that most likely it was in the marketing. Due to artificially worsening specifications of the 8085, they received on this background a more spectacular 8086.

I would venture to suggest yet another version. The 8085 was very difficult, almost impossible to expand to a real 16-bit processor. On the contrary, the 6502, with almost half of the opcodes unused, could easily be expanded to 16-bits. Therefore, it was important for Intel to create a trend to switch to a 16-bit architecture, without binary compatibility with 8-bit. Rejecting the new useful functionality of the 8085, as if to say that the 8-bit is bad and no longer important, you need to switch to the 16-bit. Something similar happened around the 32-bit architecture, when Intel created a false trend to develop a complex and unpromising Intel 8800 a.k.a. iAPX 432.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 10 (MOS Technology 6502)

6502 and 65816



This is a processor with a very dramatic fate, no other processor can compare with it. Its appearance and introduction were accompanied by very large events in scope and consequences. I will list some of them:


  1. the weakening of the giant Motorola company, which for some time exceeded the capabilities of Intel;
  2. the destruction of the independent company MOS Technology;
  3. the cessation of the 6502 development and its stagnant production with little or no modernization.

It all started with the fact that Motorola for unknown reasons refused to support the initiative engineer, Chuck Peddle, who offered to improve the overall rather mediocre 6800 processor, making it faster and, most importantly much cheaper. He had to leave the company and continue his work in a small but promising company called MOS Technology. He was able to persuade seven other engineers to follow him, and only one of them subsequently returned to Motorola. In MOS Technology they soon prepared two processors, the 6501 and 6502, both of them (like almost all processors of that time) were fabricated using NMOS technology. The first one was pin-compatible with the 6800, but in other details they were identical. The 6501/6502 team was able to successfully introduce a new chip production technology, which radically reduced the cost of new processors. In 1975, MOS Technology could offer the 6502 for $25, while the starting price for the Intel 8080 and Motorola 6800 was $360 in 1974. In 1975, Motorola and Intel lowered prices, but they were still close to $100. MOS Technology specialists claimed that their processor was up to 4 times faster than the 6800. I find this questionable: the 6502 can work much faster with memory, but the 6800's second accumulator greatly accelerated many calculations. I can assume on estimation that the 6502 was on average no more than 2 times faster. As follows from some publications as far back as 1975, MOS Technology had plans to expand the 6502 to the 16-bit level...


MOS 6501 tears up competitors apart: Intel and Motorola are already crossed out!



But Motorola launched a lawsuit against its former employees – they allegedly used some of the company's technological secrets. During the trial it was possible to establish that one of the engineers who had left Motorola took some confidential documents on the 6800, acting contrary to the attitudes of his colleagues. Whether it was his own act or there were still some guiding forces behind him is still unknown. Eventually, Motorola indirectly won the case and MOS Technology whose financial capabilities were limited, was forced to pay a substantial amount of $200,000 and to abandon production of the 6501. Intel in a similar situation with Zilog acted quite differently. Although it must be admitted that MOS Technology was sometimes too risky when trying to use the big money that Motorola spent on promoting the 6800 for its own purposes. The irony of the situation is that as the leader of the 6502 development team noted, since the 6501 they just wanted to do "a shot across the bow" and observe what happened, there was no goal to suggest using the 6501 instead of the 6800. The 6501 was never actually sold. It should be also pointed out the 6501 was still not completely compatible with the equipment for the 6800: the 6501 and 6502 in particular, did not support working with three output states and therefore could not be used directly, for example with hardware with direct memory access.

We still don't know exactly why Motorola won the lawsuit against MOS Technologies. Perhaps MOS Technologies just ran out of money, they spent up to $800,000 on the lawsuit, and money then was several times more significant than now. Although there were other factors: like the mentioned removal of Motorola's documents and it is also known that, for example the 6520 parallel interface chip manufactured by MOS Technology was an exact copy of the Motorola 6820. Everyone who left Motorola had to sign their consent to the decision of the lawsuit, since before the start of the process they had signed a paper with the fact that they would acknowledge any result.

Further, the legendary Commodore company and its no less legendary founder Jack Tramiel appeared in the 6502 story, in the shadows of whom was the figure of the chief financier of the company determining its policy – a man named Irving Gould. Jack got a loan from Irving and with this money, using a few to put it mildly unscrupulous tactics, forced MOS Technology to become a part of Commodore. After that and possibly against the wishes of Tramiel, who was forced to give in to Gould, the development of the 6502 almost stopped, despite the fact that even in 1976 it was possible to produce prototypes of the 6502 with operating frequencies up to 10 MHz. Although the message about this appeared only many years later from a man named Bill Mensch, who was with the team that left Motorola and sometimes made loud but by and large empty statements and played a rather ambiguous role in the fate of the 6502. Chuck Peddle was forever removed from the development of processors. Work with the 6502 continued not only at Commodore but also at Western Design Center (WDC) which was created by Bill Mensch. It is fascinating that none of the former 6502 team worked with him in the future. By the way, it was Bill Mensch who developed the aforementioned 6820, which became the 6520. Perhaps if the development of the 6502 had not been stopped, then it could have turned into the main processor for personal computers. IBM, for example, had plans to make its first mainstream PC based on the 6502. As a first step in this direction, they were going to buy Atari, which was one of the first to use this processor.

However the continuing drama around the 6502 was not over. In 1980, a short anonymous article appeared in Rockwell's AIM65 Interactive magazine stating that all 6502's carry a dangerous bug called the JMP (xxFF). The tone of the article suggests something completely out of the ordinary. Subsequently this attitude moved to Apple regarding the issue and became a kind of mainstream. Although a "bug" strictly speaking it was not. Of course for a specialist accustomed to the comfortable processors of large systems of those years one of the features that is quite relevant and even useful among microprocessors, could seem something annoying, a bug. But in fact this behavior of hurting someone's feelings was described in the official documentation from 1976, and in the textbooks on programming published before the appearance of the mentioned article. The "bug" was eliminated by Bill Mensch, who made the 65С02 (CMOS 6502) by 1983. The author of this material himself has encountered several times the problem of the aforementioned “bug”, although knowing nothing about it, he was writing programs for the Commodores. There was an incompatibility, he had to change the codes, to do a conditional assembly. The code for the 65C02 turned out to be more cumbersome and slower. Then I raised this question on the forum 6502.org, where some participants had familiarity with the Apple ][ computers. I asked if anyone could give an example when the "bug" crashed the program. I received only emotional and general comments, a specific example was never offered. Ironically, in the official WDC documentation, the "bug" is not directly called a bug, but rather a quirk.


Bug!!!



While Intel, Motorola and others had already made 16-bit processors of new generations, the 6502 was only microscopically improved and made artificially partially incompatible with itself. Even if you compare the improvements made in the Motorola 6801 over the 6800 or the Intel 8085 over the 8080, they are gigantic compared to those made in the 65C02, and Intel and Motorola made them much earlier. A number of tiny changes were made in the 65C02, which in particular led to a change in the course of executing several instructions. These instructions became slower in a cycle, but at the same time they became more correct in some far-fetched academic sense. We are talking about the mentioned "bug" and instructions for decimal arithmetic. The latter were "adjusted" so that the oVerflow, Negative, and Zero flags started working "correctly". However, when working with decimal numbers on the 6502 (and other microprocessors), the sign is not supported and, accordingly the N and V flags do not make any sense. Only the correction of the Z flag makes some sense, but it is extremely insignificant. Dozens of new instructions were also added, the absolute majority of which only occupied the code space, adding almost nothing to the capabilities of the 6502, which left fewer opcodes for possible further upgrades. But it must be admitted that several new instructions turned out to be expected and useful, for instance new addressing modes for BIT or instruction JMP (ABS,X). Although again, we must admit that the new instructions allow you to get slightly faster and more compact codes. Besides this, four relatively rare instructions became sometimes a clock cycle faster on the 65C02. Additionally, the 65C02 became to reset the decimal mode flag on interrupt, which sometimes allows an interrupt handler to be 2 cycles faster and 1 byte shorter – this tiny improvement illustrates the overall amount of improvements made in the 65C02.

The 65C02 was licensed to many companies, in particular NCR, GTE, Rockwell, Synertek, and Sanyo. It was used in the Apple II starting with model IIe, although many IIe used NMOS 6502. The 6512 was a 65C02 variant which was used in later BBC Micro models. Atari used the NMOS 6502, although there was an attempt to switch to the CMOS 6502, but due to discovered problems, in particular, the famous game Asteroids did not run on 65C02, these attempts were abandoned. Commodore never released a CMOS 6502-based computer, although some prototypes used it. Synertek and Rockwell companies in addition to the production of the CMOS 6502, also continued to produce the NMOS 6502. By the way the NMOS 6502 has its own set of undocumented instructions, the nature of which is completely different from the secret commands of the 8085. In the 6502 these instructions appeared as a side effect, so most of them are rather useless. But several instructions for example, loading or unloading two registers with one command at once, and some others can make the code faster and more compact.

Interestingly, the NMOS 6502 compatible with the 65C02 was never made. Although in the early 80's CMOS technology had no obvious advantages (except for reduced power consumption) over NMOS/HMOS and was noticeably more expensive. It is worth noting of course that WDC was able to create a CMOS processor only a few years after Intel, and Motorola made CMOS versions of their 8085 and 680x processors. In this, it was significantly ahead of Zilog, where the CMOS version of the Z80 was created only by 1987. However, if the CMOS 8085 and Z80 immediately found the wide use in mobile computers, the low power consumption 65C02 found its application in computers relatively late. I can only name the Atari Lynx game console, produced since 1989. It is also worth noting that the introduction of CMOS technology itself is quite a routine process, through which other processors (the x86, 68k, ...) passed almost unnoticed.

There were other attempts to modernize the 6502. In 1979, an article appeared that for the Atari computers, the 6509 processor was being prepared for production (not to be confused with the later Commodore's processor with the same name), in which command execution acceleration by 25% and many new instructions were expected. For unknown reasons the production of this processor never took place. Commodore conducted only tiny upgrades. From the point of view of programming, the most interesting was the processor 6509 which, albeit in a very primitive form, with the help of only two instructions specially allocated for this purpose and two bytes of the page zero allows addressing up to 1 MB of memory. In the super-popular Commodore 64 and 128, there were the 6510/8510 processors, and in the less successful 264 series – the 7501/8501. These processors had 6 and 7 embedded I/O bit-ports respectively, while the 7501/8501 did not support non-masked interrupts. In addition, these processors implemented support of tri-state logic which was necessary for the working of video controllers in the C64 and C264. Rockwell produced a version of the 65C02 extended by their own 32 operations for one-bit values (similar to the z80's bit instructions). However, as far as I know such processors were not used in computers, and these bit instructions themselves were more likely to be used only in embedded systems. This extension was also made by Bill Mensch. It has turned out that Bill worked on the 6502, with only specifications received, and he never tried to improve this processor himself.

The last scene of the drama with the participation of the 6502 was featured in the prevention of computers based on the 6502 with a frequency of 2 MHz on the US market in the first half of the 80's. This affected the English BBC Micro, their manufacturing company Acorn made a large batch of computers for the United States, but as it turned out, in vain. Some kind of lock was triggered and the computers had to be urgently redone to European standards. Almost American, but formally Canadian computers Commodore CBM II (1982), which have stylish Porsche design, despite some problems (in particular, compliance with the standards for electrical equipment), were nevertheless admitted. However, for some reason, Commodore sold these computers, as well as later Amiga computers, mainly outside the United States... The latest in the list of losers was the 100% American Apple III (1980) – it is known that Steve Jobs like Apple's management in general did a lot to prevent this computer from being successful. Steve demanded obviously impracticable specifications and the management asked for unrealistic deadlines. Surprisingly, according to one of the management requirements, the Apple III had to be only limitedly compatible with the Apple II! It became possible to eliminate some flaws of the Apple III in the Apple III Plus (1983), but the Apple’s management quietly closed the project in 1984 possibly because of their reluctance to have a competition with the Macintosh computer. Only in 1985, when the era of 8-bit technology began to go away, did the Commodore 128 appear which could use in one of its modes the 6502 at 2 MHz clock. But even here it turned out to be more of a joke since this mode was practically not supported and there are practically no programs for it. In the same year, the promising Commodore LCD laptop based on the 65C02 @2MHz was due to be released. However, despite a successful public demonstration, the project was closed. Only in the second half of the 80's in the United States there began production of accelerators for the Apple II and since 1988 the Apple IIc+ model with a 4 MHz processor. Why did it happen that way? Perhaps because the 6502 at 2 or 3 MHz (and these were already produced at the very beginning of the 80's) could successfully compete with systems based on the Intel 8088 or Motorola 68000 in a number of tasks and especially games. In 1991, the willful decision of Commodore closed an interesting albeit belated project, the C65 based on the 4510 processor with a frequency of 3.54 MHz. The 4510 chip was based on the 65CE02 processor, which in turn is based on the WDC 65C02. The 65CE02 is the fastest 6502, made only in 1988, it finally carried out the previously mentioned optimization of cycles which gave a 25% increase in speed. Thus, the processor in the C65 is close in speed to the 6502 systems at 4.5 MHz. Surprisingly, this fastest 6502 with an extended set of instructions (in some detail this extension turned out to be more convenient than in the 65816) has never been used anywhere else.

The Commodore C128 and Apple III Plus had a MMU that allowed them to use several stacks and zero pages, to address more than 64 KB of memory, etc. The C128's MMU was artificially trimmed to work with only 128KB of memory. For the BBC Micro computers the second processor boards were produced with the 6502 at 3 MHz (1984) and 4 MHz (1986).


Anti-advertising – multiple Porsche PETs in the apartments of the villain of The Jewel of the Nile – The Apple Only era in Hollywood had not yet come



Now a few words about the instruction system of the 6502. The main feature of this processor is that it was made almost as fast as possible, with almost no extra clock cycles which are especially numerous in the 8080/8085/z80/8088/68000 processors. In fact, it was the main concept of the ARM architecture for processors that appeared later under the direct influence of the 6502. The same concept dominates among Intel processors starting with the 80486. In addition, the 6502 responded very quickly to interrupts, which made it very useful in some embedded systems. The 6502 has one accumulator and two index registers, in addition, the first 256 bytes of memory can be used in dedicated commands either as faster memory or as a set of 16-bit registers (which are almost identical in their functionality to the BC and DE registers in the 8080/z80) for pretty powerful ways to address memory locations. Some arithmetic commands (shifts, rotation, increment, and decrement) can be used with memory directly, without using registers. There are no 16-bit instructions – this is a 100% 8-bit processor. It supports all the basic flags but the parity flag which is typical only for the Intel's architecture. There is one more special flag of the low-useful decimal mode which has replaced the half-carry flag used by most other processors. Intel, Zilog and Motorola processors use special corrective instructions for working with decimal numbers, but the 6502 can switch to decimal mode which makes its speed advantage with decimal numbers even more significant than with binary ones. Very impressive for the 6502 is the presence of a table multiplication of 8-bit operands with a 16-bit result in less than 30 cycles, with an auxiliary table size of 2048 bytes. The 6502 uses a simple instruction pipeline that speeds up the execution time of many instructions by 1 clock cycle. One of the slowest 6502's operations is a block memory copy, it can take more than 14 cycles per byte. The instruction system 6502 in some particular ways is unusually asymmetric, for example, there is an instruction to load register Y, LDY addr,X, but there is no pairwise unload to it. There is an instruction to reset the overflow flag, but there is no paired instruction to set it. The 6502 allows the overflow flag to be set via a hardware signal instead of this instruction. This way of working with the overflow flag allows you to use a very fast input port, but for programming arithmetic both setting this flag and clearing it are useless. Therefore, in the 6510 or 7501/8501, they abandoned the special method of setting the overflow flag, but the now completely useless instruction for resetting it remains!

The main drawback of the 6502 is the small stack size, only 256 bytes. However, for a system with a memory capacity of 64 KB, this, as practice has shown, is usually quite enough. The 6502 has few registers and therefore the pressure on its stack is less than for example, the 8080, 6809 or Z80. Besides that the 6502 architecture naturally assumes the organization of an additional stack on the zero page – such a stack is especially good when working with pointers, since the addressing (zp,X) is ideal for such cases. Of course, the size of such an additional stack is very limited and on many systems can not be more than a few dozen bytes. In addition, you can organize stacks of any size based on abs,X/Y-addressing. Additional stacks can be used only for data, for example, parameters of subroutines; there is no alternative to the main stack for storing return addresses.

Support for hardware interrupts in the 6502 is implemented plainly and efficiently. For masked and non-masked interrupts, two fixed addresses are allocated in memory, where the addresses of the corresponding handlers are written. Similarly, but even simpler, the most popular interrupt mode 1 on the Z80 was later realized. But software interrupts in the 6502 are implemented quite primitively: they use the address for masked interrupts, which requires a cumbersome additional software check to distinguish them. This is why there is a unique software interrupt flag among the 6502 flags. In addition, the software interrupt instruction has no argument, although such an argument can be added at the cost of complicating the handler procedure. Due to the fact that the ability to handle software interrupts significantly slows down the processing of hardware interrupts, support for software interrupts is often simply not implemented.

The 6502 can work in parallel with another device, for example another 6502. Such dual-processor systems were extremely rare. As an example of such systems, I know of only a few very rare models of Commodore drives. Instead of the second processor a video controller was usually used, which shared memory with the 6502.

The 6502 is pretty good for simulating other processors. Back in the 70s, the 8080 simulator was written. I know of cases of porting codes from the Z80, PDP-11, 8086 to the 6502 platform.

The 65816 was released by WDC in 1983. This was the first time that a 16-bit processor compatible with its 8-bit predecessor was made – for the Z80, similar developments (the Z800, Z180, Z380, eZ80, ...) began to appear only from 1985. In addition, it was one of the first 16-bit processors manufactured using CMOS technology! Interesting is the fact that some specifications of the new processor Bill Mensch received from Apple. Of course, this was a big step forward, but clearly belated and with large architectural flaws. The 65816 was not considered by anyone as a competitor for the main processors of Intel or Motorola – it was already a minor outsider, which was already somehow programmed to be set to further lose its positions. The 65816 had two important advantages – it was relatively cheap and almost compatible with the still very popular 6502. In subsequent years, Bill Mensch didn’t even try to somehow improve his brainchild, do cycle optimization, replace the zero page addressing by extended one using the Z register (this was done in the 65CE02), add at least multiplication, etc. WDC only increased the limiting clock speeds, reaching 14 MHz by the mid-90's (this processor was used in the popular accelerator for the C64, the SuperCPU at a frequency of 20 MHz). However, even now (2020!) WDC for some reason, offers the 65816 only at the same 14 MHz. The 65816 can use up to 16 MB of memory, but the addressing methods used for this look far from optimal. For example: index registers can be only 8 or 16 bit, the stack can be placed only in the first 64 KB of memory, only there you can use the convenient short addressing of the direct page (the generalization of zero page addressing), working with memory above 64 KB is comparatively awkward, etc. The 65816 has a 16-bit ALU but an 8-bit data bus, so it is only about 50% faster than the 6502 with arithmetic operations. Nevertheless, the 65816, according to Bill Mensch, was released as an amount of more than a billion. Indeed, some instructions of the 65816 clearly fill the gaps in the 6502 architecture, for example, the commands for block copying of memory in 7 clock cycles per byte and addressing modes for working with the stack. I can also add that the 65816 uses almost all instruction codes, 255 out of 256. The last unused code is for future long instructions that have not yet appeared. Interestingly, the 65816 was made by relatives, by Bill himself and his sister.

The Apple IIx in the development of which Steve Wozniak was actively involved had to use the 65816. However, it was possible to start mass production of this processor only in 1984 and the first batches of it were defective, which caused excessive delays and eventually the closure of the entire project.

The 65802 is another version of the 65816, which uses a 16-bit address bus and a pin layout compatible with the 6502. An upgrade for the Apple II based on this processor was offered, but acceleration with such an upgrade can only be obtained with specially written programs.

The 6502 was used in a large number of computer systems, the most popular of which were the 8-bit Commodore, Atari, Apple, Acorn, and NES. The Commodore PET in the development of which Chuck Peddle was actively involved appeared on sale half a year earlier than the Apple ][, although its mass production started only half a year later than the Apple ][. It and its variants were the first computers widely used in schools in the United States and Canada – one can only wonder why Commodore so easily lost its position to Apple in this business. It is also surprising that Commodore easily gave up its good position in the text editing business, where later Amstrad achieved impressive success. Earlier, Commodore for still unclear reasons left the calculator market. Also, as it was mentioned afore, for unclear reasons the production of the very promising Commodore LCD laptop computer was abandoned. Thus, the rapidly growing mobile device market was ignored. Nevertheless, the Commodore VIC-20 was the first home computer to sell more than a million units. The Commodore 64 became the most popular PC in history, with up to 17 million units sold. Atari game consoles produced from 1977 to 1996 were sold in the amount of approximately 35 million! But this is not a record, the NES game consoles were sold about 62 million between 1983 and 2003. By the way, the NES used Ricoh's 6502 version without decimal mode support perhaps because of its almost complete uselessness or maybe simply because of the reluctance to get involved with MOS Technology patent for this mode. We can probably say that for most users until the end of the 80's, the door to the world of digital technology was precisely based on the 6502. The 6502 was also used in the keyboard controller of the Commodore Amiga, and two 6502's at 10 MHz were used in the high-performance Apple Macintosh IIfx. The 6502-based processor is used in the famous Tomagotchi digital pet: probably released in more than a hundred million copies. The 65816 was used in the rather popular Apple IIgs computer, and also in the rare Acorn Communicator computer. The Super NES game consoles based on the 65816 were produced from 1990 to 2003, approx. 50 million were sold. The 65816 was also used in some early e-book models.

Interestingly, of the three mass PCs that appeared in 1977, "the holy trinity", two were based on the 6502 and only one on the Z80. Unfortunately, the most important American computer manufacturers (Apple, Tandy RadioShack, IBM), from the 80's did not publish information about the number of PCs they produced.

In 1984, an article in Byte magazine about a bad copy of the Apple ][ computer, the Agat, made in the USSR appeared in the background of pictures with red banners, Lenin and marching soldiers. This article cited a curious price for this computer of $17,000 (it was an absurd amount, the real price of the first serial Agats complete with a monitor and a printer was about 4000 rubles) and ironically indicated that Soviet manufacturers would have to dramatically lower the price if they wanted to sell their product in the West. This was a material about the Agat prototype which used the 6502 hardware emulator instead of the real 6502. The Agat was used mainly in school education. The older Agat models may be almost 100% compatible with the Apple ][ and may have some pretty useful extensions. Surprisingly, processors for Agats and Bulgarian clones of the Apple II were purchased en masse in the United States – this was an exceptional case when processors were purchased in large quantities, rather than cloned. In Bulgaria, they still managed to establish a small production of their 6502 clone by the mid-80s. In the USSR it was possible to make a 65C02 clone only by the end of the 80s. Interestingly, the Soviet 6502 clone could be overclocked up to 5 MHz by increasing the supply voltage from 5 to 15 volts – there was a card with such an accelerator for the Agat.

One can only try to fantasize about what would have happened if the 6502 had developed at the same pace as its competitors. It seems to me that the gradual moving of zero-page memory to registers, the improvement of instruction pipelining, and the gradual expansion of the command system with simultaneous optimization of cycles would allow The Terminator 6502 to remain in the lead in terms of performance until the early 90's. Introducing 16-bit mode and then 32-bit, would allow more memory and faster commands to be used. Would its competitors have been able to oppose this?

Bill Mensch was able to provide some support for the development of 6502. However, the capabilities of one person to support the competitiveness of a processor is clearly not enough. Bill, as an excellent electronics engineer was able to provide support for the execution of orders for the 6502 upgrade, but ensuring the independent development of a successful processor required a team. Someone had to develop an upgrade of the instruction system, someone had to develop new marketing strategies, etc. In addition, at least the years 1976-78 were lost for development, and one person was no longer able to catch up. In a sense, WDC created an illusion of well-being around the 6502 development situation, and this had a rather negative effect on the real development.

Although Chuck Peddle himself saw the future of the 6502 more as a cheap controller and competitor not for the Z80, but rather for microcontrollers like the Intel 8048 and processors like the 6800 which were usually used only as controllers. In 2014, he was working on a solid-state drive that used 10 pieces of the 6502-based controllers. In the early 80's, he and his partner created a company where he developed the Victor 9000 computer in 1981. The 8088 was used as a processor in this system – Chuck didn't think the 6502 would be the best choice for his personal computer.

Interestingly, Chuck himself, as well as some other key IT figures in the USA of the 1970s and 1980s, went to work using high technologies with the idea of preventing the backlog from the USSR in the development of IT, this idea was very popular after the launch of Sputnik. In this connection, we can also mention a key figure for Intel Bill Davidow whom Chuck greatly respected and kept in touch with.

Chuck passed away at the end of 2019. He studied under Claude Shannon and was the first to suggest using group encoding when working with disks. Although he failed to patent it, he developed disk drives in the late 70s and early 80s that allowed up to twice as much space as typical drives. This, in particular was used in some of the best Commodore disk drives and in the Victor computer. Then he made the first portable hard drive. That drive didn't break if it fell on the floor! Subsequently he worked with the production of low-cost RAM, after that, he developed the ultra-fast solid-state drive that has already been mentioned.

I would like to finish with some general philosophical arguments. Why the 6502 was slowed down in its development and deprived of a much brighter future? Maybe due to the fact that this development really could very much press large firms and create a completely new reality. Was the 6502 team set up for this? In my humble opinion, rather no, they just wanted to make a better processor. Perhaps they themselves did not understand how good their processor was and that speed is the main feature of any processor. It is very unusual that the leading manufacturers of computers based on the 6502 (Apple, Commodore, Atari) in the 80s clearly slowed down the development of systems based on this processor. There was nothing like this for other processors. Perhaps hidden regulatory mechanisms protected other computer and processor manufacturers in this way.

Already much later, at the beginning of the 21st century, with the help of lawsuits imposed from far-fetched reasons, the Lexra company, which produced various innovative processors for 5 years, was crushed. This sad story is somewhat reminiscent of what happened to MOS Technology.

Edited by Richard BN and Dr Jefyll

2x2=4, mathematics

Emotional stories about processors for first computers: part 9 (Acorn ARM)

The first ARM processors



The ARM-1 processor was an astonishing development, it continued the 6502 ideology (namely to make a processor that is easier, cheaper and better), and was released by Acorn in 1985. This was at the same time when Intel's technological miracle the 80386 processor appeared. ARM consisted of about ten times less transistors and therefore consumed significantly less energy and was at the same time much faster on average. Indeed ARM did not have an MMU and even divide and multiply operations, so in some calculations based on the division the 80386 could be faster. However the advantages of ARM were so great that today it is the most mass processor architecture, more than 100 billion such processors have been produced.

The ARM's development in 1983 began after Acorn conducted research with the 32016 processor, which showed that many calculations with the 6502 at twice the lower operating frequency than the 32016 could be faster than with what seemed to be a much more powerful processor. At that time the 80286 was already available, it showed very good performance. But Intel perhaps sensing the potential of Acorn refused to provide its processor for testing. The technology of the 80286 was not restricted as was the 80386 and was transferred to many companies, so history is still waiting for the disclosure of details of this somewhat unusual refusal. Perhaps if Intel had allowed the use of its processor, then Acorn would have used it, and would not have developed the ARM.

The ARM was developed by only a few people, and they tested the instruction system using BBC Micro's Basic. The development itself took place in the building of a former barn. Interestingly, one of the main developers of the 6502, Bill Mensch, was the first who was given an opportunity to make the ARM electronics. But, he immediately realized that the ARM was a competitor to the best developments of large companies and decided not to get involved, perhaps fearing that otherwise his company WDC would face the fate of MOS Technology. The processor was eventually made by VLSI. The debut of the ARM turned out rather unsuccessfully. In 1986 the second ARM processor for the BBC Micro was released with the name of the ARM Evaluation system, which contained 4 MB of memory in addition to the processor (this was very much for those years), which made this attachment a very expensive product (above 4000 pounds, it was about $6000). Indeed if you compare it with the computers of that time with comparable performance capabilities, this second processor turned out to be an order of magnitude or even almost two orders of magnitude cheaper. There were very few programs for the new system. This was a bit strange because it was quite possible to port Unix for this system, there were a lot of Unix variants available in that time which didn't require MMU, there were such Unix variants for the 68000, PDP-11, 80186 and even 8088. Linux was ported for the Acorn Archimedes only in the 90's. Perhaps the delay in the appearance of a real Unix for the ARM was caused by Acorn's reluctance to transfer the ARM technology to other companies.



The first ARM based system



The Acorn's somewhat unsuccessful marketing policy led to a very difficult financial situation in 1985. Acorn in addition to the ARM also tried to conduct expensive development of computers for business which failed, in particular due to the shortcomings of the 32016 processor chosen for them. The Acorn Communicator computer was also not very successful. The development of a relatively successful but not quite IBM PC compatible computer Master 512, was very costly. In addition a lot of financial resources were spent in an unsuccessful attempt to enter the US market, which the Italian company Olivetti, with its rather successful Intel 8086 and 80286-based computers was allowed to enter into, as part of a hypothetical big game of absorbing Acorn itself. By the way after the absorption of Acorn the role of Olivetti in the US market quickly faded away.

As part of Olivetti Acorn developed an improved ARM2 chip with built-in multiplication instructions, on the basis of which the Archimedes personal computers were made. They were stunning then for their speed. The first models of those computers became available in 1987. However Olivetti's management was focused on working with the IBM PC compatible computers and did not want to use its resources to sell Acorn products. It is also surprising that the Archimedes did not replace the BBC Micro in English schools, perhaps this happened due to a failed deal with the USSR for the Memotech MTX computers. Memotech received a million pounds from the British government, and after the failure of the deal declared itself bankrupt. After that, the government stopped the practice of supporting its computer manufacturers, including Acorn.

The ARM provides for the use of 16 32-bit registers. There are actually more of them if we take into account the registers for system needs. One of the registers the R15 is (like the PDP-11 architecture) a program counter. Almost all operations are performed in 1 clock cycle, more cycles are needed in particular for jumps, multiplications and memory accesses. Unlike popular processors of those years ARM was distinguished by the absence of such a typical structure as a stack. The stack is implemented if necessary through one of the registers. When calling subprograms the stack is not used; instead the return address is stored in the register allocated for it. Such a scheme obviously does not work for nested calls for which the stack has to be organized. A unique feature of the ARM is the combination of the program counter (which is 26-bit and therefore it allows you to address up to 64 MB of memory) with a status register. For flags in this register eight bits are allocated, two more bits in this register are obtained due to a fact that the lower two bits of the address are not used, since the codes must be aligned along the 4-byte word boundary. The processor can refer to bytes and 4-byte words, it cannot directly access 16-bit data. The ARM's instructions for working with data are 3-address.

A characteristic feature of RISC architecture is the use of register-memory commands only for loading and storing data. The ARM has a built-in fast bit shifter (barrel shifter) that allows you to shift the value of one of the registers in an instruction by any number of times without any clock cycle. For example multiplying the value of register R0 by 65 and placing the result in register R1 can be written with one single-cycle addition command ADD R1, R0, R0 shl 6, and multiplying by 63 – with one instruction RSB R1, R0, R0 shl 6. In the instruction system there is a reverse subtraction, which allows in particular to have a unary minus as a special case of this instruction and speed up the division procedure. The ARM has another unique feature: all its instructions are conditional. There are 16 cases (flag combinations) that are attached to each instruction, an instruction is executed only if the current set of flags corresponds to the set in this instruction. In processors of other architectures such an execution takes place, as a rule only for conditional jumps. This feature of the ARM allows to avoid slow jump operations in many cases. The latter is also facilitated by a fact that when performing arithmetic operations you can refuse to set status flags. With the ARM like with the 6809 processor you can use both fast and regular interrupts. In addition in the interrupt modes the higher-numbered registers are replaced with the system ones, which makes interrupt handlers more compact and fast.

The ARM instruction system contains significantly fewer basic instructions than the x86 processor instruction system, but the ARM instructions themselves are very flexible and powerful. Several very convenient and powerful ARM instructions have no analogues for the 80386, for example, the RSB (reverse subtraction), the BIC (the AND with inversion, such a command exists for the PDP-11), the 4-address MLA (multiplication with accumulation), the LDM and STM (loading or unloading multiple registers from memory, they are both similar to the MOVEM command for the 68k processors). Almost all of the ARM instructions are 3-address, and almost all of the 80386 instructions have no more than 2 operands. The ARM command system is more orthogonal that means that all registers are interchangeable, some exceptions are registers R14 and R15. Most of the ARM's commands may require 3-4 of the 80386's commands to emulate them, and most of the 80386's commands can be emulated by only 2-3 ARM commands. Interestingly the IBM PC XT emulator on the hardware of the Acorn Archimedes with an 8 MHz processor runs even faster than a real PC XT computer. In the Commodore Amiga with the 68000 @7 MHz, the emulator can only work at a speed no greater than 10-15% of the real PC XT. It is also fascinating that the first computers NeXT with the 25 MHz 68030 showed the same performance of integer calculations as the 8 MHz ARM. Apple was going to make the Apple ]['s successor in the Möbius project, but when it turned out that the prototype of this computer in the emulation mode overtook not only the Apple ][ but also the Macintosh based on the 68k processors, the project was closed!

Among the shortcomings of the ARM we can highlight the problem of loading an immediate constant into a register. You can load only 8 bits at a time, although the constant can be inverted and shifted. Therefore loading a full 32-bit constant can take up to 4 instructions. You can of course load a constant from memory with one instruction, but here the problem arises of specifying an address of this value, since the offset can only be 12-bit. Another shortcoming of the ARM is its relatively low code density, which makes the programs somewhat large and, most importantly reduces the efficiency of the processor cache. However this is probably the result of the low quality of the compilers for this platform. Multiplication instructions allow you to get only the lower 32 bit of the product. For a long time a significant drawback of the ARM was the lack of built-in support for memory management (MMU), Apple for example demanded this support in the early 90's. Coprocessors for working with real numbers for the ARM architecture also began to be used with a significant delay. The ARM did not have such advanced features for debugging as the x86 had. There is still some oddity in the standard assembler language for the ARM: it is standard to write operations for the barrel shifter separated by commas. Thus instead of the simple form R1 shl 7 (shift the contents of the register R1 by 7 bits to the left) you need to write R1, shl 7.

Since 1989 the ARM3 has become available with a built-in cache. In 1990 the ARM development team separated from Acorn and created ARM Holdings with the help of Apple and VLSI. One of the reasons for the separation was the excessive cost of ARM development in the opinion of Acorn-Olivetti management. It is an irony that subsequently Acorn ceased its independent existence and ARM Holdings became a large company. However the separation of Acorn and ARM Holdings was also initiated by Apple’s desire to have the ARM processors in its Newton computers and not be dependent on another computer manufacturer. By the way, in 1999 VLSI lost its independence, becoming part of Philips.

The ARM showed performance on integer data exceeding the 80486 at the same frequency by approximately 10-20%! Intel was able to achieve the advantage by using clock multiplication technology. Later Intel could firmly fix this advantage with the Pentium. The StrongARM (developed by DEC) was able to briefly regain the ARM's leadership in 1996, after which the technology was purchased by Intel, which has since been a large manufacturer of ARM-architecture processors. Thus, there are two centers of development of this architecture.

Further development of the ARM architecture is also very interesting, but this is another story. Although it can be mentioned that thanks to a share in ARM Holdings Apple was able to avoid bankruptcy in the 90's.

A lot of thanks to jms2 and BigEd who helped to improve the style and content. Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 8 (DEC VAX)

Processor for DEC VAX



The VAX-11 systems were quite popular in the 80's, especially in higher education. They could emulate the popular PDP-11. Now it is difficult to understand some of the concepts described in books from those years, without knowing the features of the architecture of those systems.

The VAX-11 was more expensive than the PDP-11. However it was more oriented towards universal programming than the PDP-11. Additionally the VAX-11 was significantly cheaper than the IBM/370 systems.

The V-11 processor that was produced by the mid-80s for the VAX architecture, before that time processor assemblies were the only option.

The VAX-11 architecture is 32-bit, it uses 16 registers, among which, like the PDP-11, there is a program counter. It assumes the use of two stacks, one of which is used to store frames of subroutines. In addition one of the registers is assigned to work with the arguments of called functions. Thus, 3 of 16 registers are allocated for stacks.

The instruction system of the VAX-11 cannot fail to amaze with its vastness and the presence of very rare and often unique commands. For example it has commands for working with bit fields, for working with several types of queues, for calculating the CRC, for multiplying decimal strings, etc. Many instructions have both three-address variants (like the ARM) and two-address variants (like the x86), but there are also four-address instructions, for example, the extended division – EDIV. Of course, there is support for working with floating point numbers.

However the VAX-11 is a very slow system for its class and price. Even the super-simple 6502 at 4 MHz could outrun the slowest family member VAX-11/730. The fastest VAX-11 systems – huge cabinets and “whole furniture sets” – was at the same level of speed as the first PC AT's. When the 80286 appeared it became clear that the days of the VAX-11 were numbered and even the slowdown of the development of systems based on the 80286 could not change anything fundamentally. The straightforward people from Acorn having made the ARM in 1985 without hiding anything, said that the ARM is much cheaper and much faster. The VAX-11 however remained relevant until the early 90's, while still having some advantages over the PC, in particular faster systems for working with disks.

There were known compatibility issues among the cheapest VAX-11s. In particular, it was problematic to port Unix to the first VAX-11/730's due to the peculiarities of the implementation of privileged instructions on them.

The VAX is probably the last mass computer system in which the convenience of working in assembly language was considered more important than its performance. In a sense this approach has moved to modern popular scripting languages.



The VAX-11/785 is also a computer (1984) – the fastest among the VAX-11 series, with its processor speed comparable to the IBM PC AT or ARM Evaluation System



Surprisingly there is very little literature available on the VAX systems in open access, as if there is some strange law of oblivion. Several episodes close to politics and correlated with the history of the USSR have been associated with the history of this architecture. It is possible that the actual rejection of the development of the PDP-11 architecture was caused by its low cost and the success of its cloning in the Soviet Union. The cloning of the VAX cost a higher order of magnitude in resources and led to a dead end. Interest in the VAX was created using for example, hoaxes like the famous Kremvax on April 1 1984, in which the then USSR leader Konstantin Chernenko offered to drink vodka on the occasion of connecting to the Usenet network. Another joke was that some VAX-11 chips were impressed with a message in broken Russian about how good the VAX was.

Some first models of the VAX were cloned in the USSR by the end of the 80's, but such clones were produced in very little numbers and they almost did not find a use.

The latest VAX-11 models were made in the mid-80s. Of course, you need to keep in mind that with the VAX-11 the history of VAX computers in DEC did not end there, they were replaced by the VAX 8000 models. In parallel, the development of the MicroVax, VAXstation and VAXserver lines was underway. The VAX 8000 was replaced by the cheaper and somewhat faster VAX 6000. Subsequently in the early 90's MicroVax was replaced by the VAX 4000 models. The VAX processors from the early 90's showed performance at the level of the 80486, but had slightly higher clock speeds. I can assume that the 80486DX4 at 100 MHz and the first Pentium processors began to overtake the best VAX 7000 models in performance. After that, DEC had to abandon support for the VAX instruction system and switch to emulating it in the DEC Alpha systems. There were also VAX 9000 supercomputers and multiprocessor variants, such as the VAX 7000, but these were very expensive systems. We can also mention the high-reliability VAXft systems, in which the processor functions were duplicated, which allowed overcoming any failure of one of the processors. After the V11, there were the CVAX, Rigel, NVAX, and NVAX+ processors.

Several VAX systems are available for use over the network and this distinguishes them favorably from the IBM/370 systems with which they competed.

Edited by Ralph Kernbach and Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 7 (NS 32016)

The first 32-bit CPU – National Semiconductor 32016



This is the first true 32-bit processor proposed for use in computers back in 1982. This processor was originally planned as a VAX-11 on a chip, but due to the impossibility to negotiate with DEC, National Semiconductor had to make the processor which was similar to the VAX-11 architecture in only some details.

The use of paged virtual memory begins with this processor – it is the dominant technology today. Though the virtual memory support is not built into the processor, it is available through a separate coprocessor. A separate coprocessor is also required for working with real numbers.

The instruction system of the NS32016 is huge and similar to the VAX-11 instruction system, in particular, with the presence of a separate stack for sub-program frames. The address bus is 24-bit that allows to use up to 16 MB of memory. The distinguishing feature of the 32016 is slightly unusual set of status flags. In addition to the standard flags of carry (which can be used as a flag for a conditional jump), overflow, sign, equality (or zero), there is also the L flag, which means 'less', this is a carry flag for comparisons only. The situation around the carry flag is similar to that of the Motorola 68k processors. The overflow flag is for some reason called F. There are flags of step-by-step mode, privileged mode, and a unique flag of the current stack selection. When executing arithmetic instructions, the flags of the sign, zero, and less are not set, they are set only by comparison commands.

You can use eight 32-bit general purpose registers. In addition there is also a program counter, two stack pointers, a stack pointer of the subroutine frames, a program database pointer (this is something unique), a module base pointer (also something very rare), a pointer to the interrupt vector table, a configuration register, and a processor status register. The performance of the NS32016 was comparable to the 68000, it was maybe only a bit faster.

It is known that this very complex processor had serious hardware errors that were being fixed for years.

The 32016 as far as I know was used only with the BBC Micro personal computers as a second processor. It was possible to order a processor with frequencies 6, 8 and 10 MHz. This second processor was a very expensive and prestigious device for 1984. The software for it was very limited in number and was only made by the efforts of Acorn. It includes the Panos operating system which is similar to Unix and the permanent Acorn satellite BASIC. The BBC Micro did not use an MMU chip – there were no programs for its use, although it could be plugged in. An arithmetic coprocessor was not even supposed to be connected.

The most ambitious plans associated with this processor may have been with the founder of Commodore, Jack Tramiel, who was called the king of low cost computers. After buying Atari, he announced that he was going to make a low-cost personal computer with the capabilities of the VAX superminicomputers. It is believed that he was referring to the use of the 32016 processor or its full 32-bit variant 32032.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 6 (TI TMS9900)

Texas Instruments TMS9900



This is the first 16-bit processor available for use in personal computers. This processor was produced since 1976. It used a much rarer big-endian order of bytes. This order was used only in processors of the Motorola's 6800 and 68000 series and in the architecture of the IBM mainframes. All other processors in this review use little-endian byte order.

The TMS9900 has only three internal 16-bit registers: the program counter, the status register and the base register for external registers. This processor uses 32-byte memory space (the workspace), which is pointed by the base register, as 16 double-byte general purpose registers. Such a way of use of memory is somewhat like the zero page memory in the 6502 architecture. So the external registers are rather a form of addressing than real registers. By changing the value of the base register, the TMS9900 can change its set of external registers very quickly. This is similar to the Z80 which has two register contexts. Processor status flags are distinguished in originality, along with typical flags of carry, zero (equality), overflow, parity, there also are two more unique flags of logical and arithmetic greater than. The latter flag compensates for the absence of the sign flag, but the logical greater than is a logical consequence of the carry and zero flags and therefore it is theoretically redundant. The next table explains the redundant role of the logical > flag.


TMS9900x86
opflagsopflags
JLL>=0 and Z=0JBC=1
JLEL>=0JBEC=1 or Z=1
JHL>=1JAC=0 and Z=0
JHEL>=1 or Z=1JAEC=0


However, the TMS9900 does not set the carry flag in comparison operations, and therefore the logical > flag may play a role more similar to the carry flag in the 68k architecture. There is no ready to use stack, but you can create a stack using one of the registers. When calling subroutines, the stack is not used, instead the return address is stored in a register allocated to it – this is how they work with subroutine calls on the ARM or IBM/370. You can also call subroutines with context switching, when the call saves not only the return address, but also the current set of external registers and the other two internal registers. Such an extended call is more like calling a software interrupt. There is also an instruction for explicitly calling a software interrupt. This instruction has an atypical feature, it has a parameter for an interrupt handler. The TMS9900 has a built-in interrupt controller designed to work with masked hardware interrupts of up to 16. In addition, there is support for unmasked interrupts. The TMS9900 has a built-in serial interface that allows you to work with 4096 bit ports in a separate address space. There are 5 special instructions for working with this interface. Unusually, the TMS9900 has 5 external instructions that can be executed by external circuits. The TMS9900 architecture implies the presence of instructions defined by user's hardware.




The first 16-bit home computer – it has even color sprites!



The system of instructions looks very impressive, there are even multiplication and division. The unique X instruction (a similar instruction exists only for the IBM mainframes) allows you to execute one instruction on any memory address and move on to the next one. In other words, this instruction executes another instruction as a subroutine. The execution of instructions is rather slow, the fastest instructions require 8 cycles and arithmetic instructions 14 cycles. However, multiplication (16*16=32) for 52 cycles and division (32/16=16,16) for only 124 cycles were probably the fastest among processors of the 70's. Interestingly, multiplication does not change flags at all, and division only sets the overflow flag. The latter is very convenient – on the x86, such a division overflow immediately causes an exception-crash. Increment and decrement can operate in 1 or 2 steps and set all arithmetic flags, but they can only be applied to words – adding/subtracting is required to increment/decrement a byte. Constants cannot be loaded into bytes either, only into words. In general, all operations with constants are available only for words. The command system is almost completely orthogonal. Although there are not enough conditional jumps, for instance on overflow, parity and signed jumps on <= and >=. There are other gaps in orthogonality, such as the absence of some operations with the immediate value. Operands are usually taken in left-to-right order, but some instructions use reverse order, which is somewhat confusing. It is also unusual that when operating with a byte from a register, the most significant byte of the register is used.

The TMS9900's addressing methods are quite diverse. You can, for example, even use indirect addressing via a register or via a register with an offset. Interestingly, register 0 may have a special role when used in addressing, which again resembles the IBM/370 architecture. The TMS9900 has auto-increment addressing, but not auto-decrement addressing. The latter can cause some asymmetry in the codes for implementing, for example, the stack. It's pretty natural for the TMS9900 to generate relocatable code. However, the instructions for relocatable jumps are only short, with offsets from -256 to 254 bytes. Although, for comparison, on the 8086 and even 80286, the offsets for such jumps are even smaller, only from -128 to 127.

The lack of addition and subtraction operations with the carry flag is unpleasant. The absence of rotations through the carry flag is also unpleasant. There is no rotation to the left, it should be replaced by a rotation to the right. All this makes long integer arithmetic slower. In addition to a fairly typical operation for changing a sign, there is also an operation of discarding a sign – getting the modulus of a number. Also, in addition to the fairly typical zeroing operation, there is also an operation for filling a given value with ones. All these operations (rotation, shifts, sign changes, ...) can only be applied to words.

On the TMS9900, in addition to the rather typical ability to check all specified bits for equality to zero (in the x86, the TEST instruction is used for this), there is also an opportunity to check all specified bits for equality to one. Instead of a bitwise multiplication (AND), as in the DEC PDP-11 architecture, a bitwise multiplication operation with pre-inversion is used.

The TMS9900 assembly instruction mnemonics are often unique. Although the non-cyclic shift commands have the same names as the corresponding instructions on the Z80. My favorite is the AI (Add Immediate) mnemonic, which also corresponds to the words Artificial Intelligence. It is interesting that there is no subtraction with a constant, it must always be replaced by AI, which, although always possible, requires some intelligence. It is also interesting that by default the registers are named only by their number – this is also possible due to the influence of the IBM/370 architecture. It may be also noticed that in the TMS9900 assembler, the > sign is used to prefix hex-numbers – I am not aware of other systems that use this sign in the same way.

The TMS9900 requires three supply voltages of -5, 5 and 12 volts and four phases of the clock signal – these are the worse specifications among the processors known to me. In 1979 this processor was demonstrated to IBM specialists, who then were looking for a processor for the IBM PC prototype. The obvious drawbacks of the TMS9900 (addressability of only 64 KB of memory, bulkiness of connection, lack of the necessary controllers, relative slowness) made an appropriate impression and the Intel 8088 was chosen for the future leader among PC's. To deal with the lack of controllers Texas Instruments also produced the TMS9900 variant with an 8-bit bus, the TMS9980, which worked 33% slower.

The TMS9900 used in the TI-99/4A computers which were fairly popular in the USA. They were "crushed" in the price war by the computer Commodore VIC-20 by 1983. Curiously as a result of this war Texas Instruments was forced to cut prices on its computer to the incredible price for 1983 of $49 (in 1979 the price was $1150!) and to sell them with a big loss to themselves. As an example we can mention the relatively unpopular Commodore +4 computer, which ceased to be produced in 1986, the price of which fell to $49 in 1989 only. The TI-99/4A was stopped being produced in 1984, when because of the ultra-low prices it began to gain popularity. Interestingly, back in 1982, the TI-99/4A sales accounted for 34% of the US computer market for computers with an average retail price of $500. This was ahead of the Commodore VIC-20 (33%), Atari 400 (20%) and Tandy Coco (13%). This computer might only be conditionally called 100% 16-bit: only 256 bytes (very little) of its RAM and its system ROM are addressable through a 16-bit bus. The rest of the memory and I/O-devices work over a slow 8-bit bus. Moreover, most of its ROM works through a very unusual and slow serial interface. On the other hand, the TI-99/4A is more 16-bit than the IBM PC XT. Maybe it is possible to more correctly consider the BK0010 as the first 100% 16-bit home computer. It is an interesting coincidence that the TI-99/4A uses a processor at a frequency of 3 MHz – exactly the same as the BK0010 uses.

The TI-99/4 series is a rare example of computers where a processor and a computer manufacturer was the same. By the way, it was Texas Instruments that once developed the basis of all personal computers – the integrated circuit.

In its popular calculators series that began with the TI-81, Texas Instruments chose to use the Z80 instead of their somewhat more advanced processors. Although once it was Texas Instruments that developed both the first processor for calculators and the first hand-held calculator.

In the TI-99/4A a quite successful TMS9918 chip was used as a video controller, which became the basis for the very popular worldwide MSX standard, as well as for some other computers and game consoles. In the Japanese company Yamaha this video chip was significantly improved and was subsequently used in particular to upgrade the TI-99/4A themselves! It is strange that Texas Instruments failed to persuade manufacturers to use their processor in products that used the TMS9918. The only exception was the Panasonic Tomy Tutor computer, which used the TMS9995 processor compatible with the TMS9900. By the way, Texas Instruments made prototypes of the TI-99/2 and TI-99/8 computers based on this processor, which did not go into the series for some not entirely clear reasons. The TMS9995 was also used in the late (1987) Geneve 9640 computer, which is compatible with the TI-99/4A and therefore became the most well-known system based on this processor.

The TMS9995 deserves to have a few words said about it. This is a very unusual processor. The external data bus is 8-bit. But there is an internal memory of 256 bytes located at two fixed addresses, which works through the internal 16-bit bus. The TMS9995 already uses only one supply voltage and one clock signal, which made systems based on it simpler and cheaper. There is a built-in timer. Compared to the TMS9900, it has only 4 new instructions that only optimize some operations rather than introduce something fundamentally new. The control system for external masked interrupts became more primitive, only 3 levels remained from 16 levels. However, support for internal interrupts-exceptions appeared: by incorrect instruction, by timer and by overflow – the latter was implemented with an error that may not have been fixed. Only 6 of the 17 vectors for interrupts in the TMS9900 are left in the TMS9995. It is interesting that the TMS9995 actually divides its clock frequency by 4 – all instruction timings are based on a frequency 4 times lower than the input clock. Instructions on the TMS9995 became much faster to execute. However, if we take an external clock frequency as the base, then even with the internal memory, the TMS9995 is slower than the TMS9900 at the same frequency.

The instruction set of both the TMS9900 and TMS9995 is a subset of the instruction set used on the TI-990 series minicomputers. Interestingly, in the first TI-990 manual, the instructions had no mnemonics, only opcodes! In conclusion, we can conclude that the first home 16-bit computers are a side branch of the development of mini-computers.

Edited by Richard BN

2x2=4, mathematics

Emotional stories about processors for first computers: part 5 (Motorola 6800 family)

Motorola 6800 and close relatives


Motorola's processors have always been distinguished by the presence of several very attractive features, while at the same time there are both the presence of some absurd abstraction and poor practicality of architectural solutions. The main attractive feature of all processors under consideration is the second complete and very fast register-accumulator.

The 6800 was the first microprocessor to require a single 5-volt power supply – it was a very useful feature. In addition, the 6800 also introduced support for unmasked interrupts for the first time. However the 6800 because of the oneness of the cumbersome 16-bit index register for an 8-bit architecture turned out to be a product inconvenient for programming and use. It was released in 1974, not much later than the 8080, but it did not become the basis for any known computer system. Interestingly, the 6502 developers, Chuck Peddle and Bill Mensch, called the 6800 not right and too big. However, it and its variants were widely used as microcontrollers. Perhaps here it is worth noting that Intel has been producing processors since 1971, which put Motorola in the position of a catch-up party, for which the 6800 was the very first processor. If you compared the 6800 not with the 8080, but with its predecessor the 8008, then the 6800 would be much preferable. Motorola almost caught up with Intel with 68000/20/30/40. I can also note that in the 70s, Motorola was a significantly larger company than Intel.

Numerous variants of the 6800 were also produced: 6801, 6802, 6803, 6805, ... Most of them are microcontrollers with built-in memory and I/O ports. The 6803 is a simplified 6801, it was used in the Tandy TRS-80 MC-10 computer and in its French clone Matra Alice. These computers were very late (1983) for their class and were comparable to the Commodore VIC-20 (1980) or Sinclair ZX81 (1981). The command system of the 6801/6803 has been significantly improved by 16-bit commands, multiplication, and several others. An unusual unconditional branch instruction (BRN – branch never) has appeared, which actually never takes its branch! Some instructions became a little faster. It is worth noting that the 680x architecture was heavily influenced by the PDP-11 architecture. Some concepts of the PDP-11 were copied rather automatically, such as the useless CLV and SEV instructions.

The 680x range fully supports signed integers, the z80 and 6502 support it worse, and the 8080 has almost no such support at all. However, in 8-bit software such support was needed very rarely.

The 6809 was released in 1978, when the 16-bit era began with the 8086, and has a highly developed command system, including multiplying two accumulators to obtain a 16-bit result in 11 clock cycles (for comparison, the 8086 requires 70 clock cycles for such an operation). Two accumulators can in several cases be grouped into one 16-bit, which gives fast 16-bit instructions. The 6809 has two index registers and a record number of addressing methods among 8-bit processors – 12. Some of the addressing methods are unique for 8-bit chips, such as indexing with auto-increment or decrement, addressing relative to the command counter (this allows us to write relocatable instructions – there is no such possibility for the 6502, Z80 or even x86) and indexing with an offset. Of particular note is indirect addressing, which could be used with almost all basic addressing methods. You can, for example, write instruction LDD [,X++] that corresponds to C-code short D, **X; D = **X++;, or LDA [B,U] that corresponds to C-code char A, B, **U; A = **(B + U);. The 6809 has an effective address load command, which is sorely lacking on the PDP-11 and which is often very useful, for example LEAX ,--Y; corresponds to C-code X = --Y;. The instructions for working with the stack are very powerful, with one command you can load/unload any set of registers – similar commands appeared later on 68k and ARM, but they are not available for x86. Relocatable branches may use both 8-bit and 16-bit offsets – the latter is not even available for 80286! The 6809 has an interesting opportunity to use two types of interrupts: you can use fast interrupts with automatic partial register saving and interrupts with all registers saving. 6809 has three inputs for interrupt signals FIRQ (fast maskable), IRQ (maskable), NMI (non-maskable). It is worth noting in general a very high-quality support for working with interrupts, both with hardware and software. Also, it's sometimes convenient to use fast instructions for reading and setting all flags at once.

However, memory operations require a clock cycle greater than the 6502 or 6800. Index registers have remained bulky 16-bit "dinosaurs" in the 8-bit world. Some operations simply shock with their slowness, for example, sending one byte from one accumulator to another takes 6 clock cycles, and the exchange of their contents – 8 clock cycles (compare with the 8080, where 16-bit exchange passes for 4 clock cycles)! Relocatable addressing is slower than non-relocatable, and indirect addressing is very rarely needed. For some reason, two stack pointers are offered at once, perhaps it was the influence of the dead-end architecture VAX-11 – in an 8-bit architecture with 64 KB of memory looks rather awkward. There are no instructions for comparing registers. And even the existence of an instruction with an interesting name SEX of all problems the 6809 cannot eliminate. In general, the 6809 is still somewhat faster than the 6502 at the same frequency, but it requires the same memory speed. I managed to make a division procedure for the 6809 with 32-bit divisible and 16-bit divider (32/16 = 32,16) for approximately 480 cycles on average, for the 6502 I could not achieve less than 530 clock cycles. The second accumulator is a big advantage, but other 6502 features, in particular, the inverted carry flag, reduce this advantage only to the aforementioned rounded 10%. But multiplication by a 16-bit constant turned out to be slower than a table multiplication for the 6502 with a table of 768 bytes. The 6809 allows you to write quite compact and fast code using the direct page addressing mode, but this mode makes the code a bit tangled. The essence of this addressing is to set the high byte of the data address in a special register and specify only the low byte of the address in the commands. The same system with only a fixed high byte value is used in the 6502, where it is called zero page addressing. The direct page addressing is an exact analogue of the use of the DS segment register in the x86 only not for 64K segments, but for segmenties sized only of 256 bytes. Another artificiality of the 6800 architecture is the use of the order of bytes from major to minor (Big Endian), which slows down 16-bit addition and subtraction operations. The 6809 is not compatible with the 6800 instruction codes: you can only translate sources from the 6800 to the 6809 – this is similar to the case with the 8080 and 8086. The 6809 became the last 8-bit processor from Motorola. In further developments, it was decided to use the 68008 instead.

We can assume that Motorola spent a lot of resources to promote the 6809. This has had a lasting effect at mention of this processor. About the 6809 there are many favorable reviews, notable in some fuzziness, generalizations, and inconsistency. The 6809 was positioned as an 8-bit super-processor for micromainframes. Several similar to Unix operating systems were made for it: OS-9 and UniFlex. It was chosen as the main processor for Apple Macintosh and, as follows from the films about Steve Jobs, only his emotional intervention determined the transition to the more promising 68000. Indeed, the 6809 is a good processor, but in general, only slightly better than its competitors that appeared much earlier: the 6502 (three years earlier) and the z80 (two). One can only guess what would have happened if Motorola had spent at least half of their efforts on the development and promotion of the 6809 on the development of the 6502 instead.

The 6809 has been used in several fairly well-known computer systems. The most famous among them is the American computer Tandy Color or Tandy Coco, as well as their British, or more precisely, Welsh clone Dragon-32/64. It is interesting that on these computers the frequency of the 6809 was clearly artificially lowered and was only 0.89 MHz. The computer markets of the 1980's were notable for a significant non-transparency and Tandy Coco was distributed mainly only in the US. Dragons, once only popular in Britain, gained also some popularity in Spain. In France, the 6809 for some reason became the basis for mass computers of the 80s, the Thomson series, which remained virtually unknown anywhere else. The 6809 was also used as a second processor in at least two systems: in the series Commodore SuperPET 9000 and in an extremely rare TUBE-interface device for BBC Micro computers. This processor was used in other systems less well known to me, in particular, Japanese ones. It has also gained some popularity in the world of gaming consoles. It is worth mentioning one of these consoles, Vectrex, which uses a unique technology – a vector display.



Tandy CoCo 3



All the 680x have interesting undocumented instructions with an fascinating name Halt and Catch Fire (HCF), which are used for testing at the electronics level, for example, with an oscilloscope. Its use causes the processor to hang, from which it is possible to exit only by its reset. These processors also have other undocumented instructions. In the 6800 there are, for example, instructions that are opposite to register immediate loading commands, i.e. instructions for storing a register value to the immediate constant!

Like the 8080, 8085 or z80, it is very difficult to call the 6809 a pure 8-bit processor. It is even more difficult to call the 6309 processor 8-bit. The 6309 was produced by the Japanese company Toshiba as a processor fully compatible with the 6809. I was not able to find the exact year when its production began, but there is some evidence pointing to 1982. This processor could be switched to a new mode, which, while maintaining almost full compatibility with the 6809, provided many more capabilities. These capabilities were hidden in the official documentation but were published in 1988 on Usenet. Two additional accumulators were added, but the instructions with them were much slower than with the first two. The execution time of most instructions was greatly shortened. A number of commands were added, among which was a really fantastic division for the processors of this class – it was signed division of a 32-bit dividend and a 16-bit divisor (32/16 = 16,16) for 34 cycles, with the divisor being taken from memory. Furthermore, 16-bit multiplication with a 32-bit result for 28 clocks appeared. Also, very useful instructions were added for quick copying blocks of memory with a runtime of 6 + 3n, where n is the number of bytes to be copied: you could copy both with decreasing and with increasing addresses. The same instructions could also be used to quickly fill the memory with a specified byte. When they were executing, interrupts could occur. New bit operations, a zero-register, register comparisons, etc., appeared too. Interrupts were then invoked when executing an unknown instruction and when dividing by 0. In a sense, the 6309 was the pinnacle of technological achievements among 8-bit processors or more precisely processors with the addressable memory size of 64 KB.

The 6309 is electrically fully compatible with the 6809, making it a popular upgrade for the color Tandy or Dragons. There are also special OS versions that use the new features of the 6309.

Edited by Jim Tickner and Ralph Kernbach.

2x2=4, mathematics

Emotional stories about processors for first computers: part 4 (Zilog Z80)

Zilog Z80


This processor became along with the 6502 the main processor of the first personal computers. There are no dramatic events in the history of its appearance and use. There is only some intrigue in the failure of Zilog to make the next generation of processors. The Z80 was first produced in 1976 and its variants are still in production. Once even Bill Gates himself announced support for systems based on the z80.

A number of coincidences are interesting. As in the case of the 6502 the main developer of the Z80 Federico Faggin left a large company, Intel. After working on the z80 Federico almost did not work with the next generation the Z8000 processor. He left Zilog (founded by him) in the early 80's and never dealt with processors in the future. He then created several relatively successful startups, which were communication systems, touchpads and digital cameras. It can be mentioned that in addition to the z80 being with Zilog he had also developed a successful and still-produced Z8 microcontroller.

The Z80 is more convenient for inclusion in computer systems than the 8080. It requires only one power supply voltage and has built-in support for the regeneration of dynamic memory. In addition though it is fully compatible with the 8080 it has a lot of new commands, a second set of basic registers and several completely new registers. It is interesting that Zilog refused to use the 8080 assembler mnemonics, and began to use their own mnemonics more suitable for the extended command system of the z80. A similar story happened to the Intel x86 assembler in the GNU software world, for some reason they also use their own conventions for writing programs in assembler by default. The Z80 added support for the overflow flag, Intel officially added support for this flag only in the 8086. However this flag in the z80 was combined with the parity flag, so you cannot use both flags at the same time as in the 8086. In the z80 as in the 6502 there are only basic checks of the value of the flag, i.e. there are no checks of two or three flags at once, which is necessary for unsigned comparisons "greater" or "less or equal", as well as for all signed comparisons. In such cases it is necessary to do several checks, while with the 8086, 6800 or PDP-11 one is enough.

Among the new z80's instructions, block memory copy commands for 21 cycles per byte are especially impressive, as well as an interesting instruction searching for a byte in memory. Similar block instructions for input and output have also been added. However the EXX instruction is the most interesting it swaps the contents of 48 bytes of register memory, registers BC, DE, HL, with their counterparts in just 4 cycles! Even the 32-bit ARM will need at least 6 cycles for the similar operation. The remaining additional instructions are not so impressive, although they can sometimes be useful. Additionally added were the following commands:


  • 16-bit subtraction with borrow and 16-bit addition with carry for 15 clocks;
  • unary minus for the accumulator for 8 clocks;
  • possibility to read from memory and write to it, using registers BC, DE, SP, IX, IY – not just HL;
  • shifts, rotates and input-output for all 8-bit registers;
  • instructions to check, set and reset a bit by its number;
  • relocatable jumps with offsets (JR): an unconditional and conditional jumps for zero and carry flags;
  • a loop instruction;
  • very unusual instructions for decimal rotations RLD and RRD, similar to which were only for IBM mainframes;
  • instructions for input and output using an index register.

In total, 458 new instructions were added to the 244 instructions of the 8080, and if you count those that were later recognized as almost official, you will also get about an additional fifty new instructions.

Most new commands are rather slow, but using them right can still make the code somewhat faster and significantly more compact. This particularly applies to the use of new 16-bit registers IX and IY, which can be used for new addressing modes. Interestingly, the index registers IX and IY appeared in the Z80 in order to attract the 6800 users to use the Z80 instead! Operations with these index registers must always use byte offset, which in theory is not always necessary, but always slow down these already slow instructions.

Many of the 8080's commands in the z80 became faster by one clock and this is a very noticeable acceleration. But the basic command for 16-bit arithmetic, the ADD instruction became slower by one clock, also commands LD SP,HL and EX (SP),HL, the increment and decrement with memory and 16-bit registers, as well as the input and output operations became slower. Therefore, codes of the 8080 are executed on the z80 maybe only slightly faster. However, if you implement the same algorithm for the 8080 and Z80, then the Z80 can be much faster, for example, the calculation of the π number using a spigot algorithm for the Z80 turned out to be almost 40 percent faster than for the 8080.

The Z80 got support for a unique flag that is not found in any other system, this is the subtraction flag. It is only used by decimal correction command DAA, which can be used on the Z80 even after subtraction.

Another unique feature of the Z80 is its freely accessible memory regeneration register, it can, for example be used to generate random numbers.

The system of working with interrupts became more varied than that available on the 8080. With the z80 you can use both non-maskable interrupts and three modes to work with masked ones:


  1. the same as for the 8080 – it requires an interrupt controller compatible with the 8080;
  2. at a fixed address – this is the easiest, it does not require external circuit support;
  3. the processor sets the highest byte of the interrupt handler, and the controller determines the lowest byte – this method requires hardware support of additional chips from Zilog.

However, interrupts in mode 2 can be used without controller support, simply reserving 257 bytes in memory for the interrupt address. This is done on computers where hardware does not support mode 2, for example, on the Amstrad CPC/PCW, MSX, ZX Spectrum, ... This, of course, is somewhat costly, but the need for this may arise in cases like the ZX Spectrum, when the mode 1 interrupt vector is in ROM. Interestingly, in some MSX computers you can use mode 0 interrupts without an external controller support.

Working with interrupts in the Z80 has other unique features. As we know, the Z80 like the 8080 does not save a processor status word during interrupts, but the Z80 nevertheless has as many as two unique commands for returning from interrupts. An instruction to return from a masked interrupt, RETI, works the same way as a normal return from a subroutine, but it should be noted as special by an external control circuitry and reset the interrupt setting. In other architectures (the 6502, x86,...), the interrupt handler code must explicitly specify controller reset commands, such as MOV AL,20H and OUT 20H,AL for the x86. However, work with the RETI instruction must be supported by external Zilog chips, and there are no such chips on well-known computers (in particular, the ZX Spectrum, MSX, Amstrad CPC/PCW, Tandy TRS-80 model 1/3/4). Another unique instruction is RETN to return from an unmasked interrupt, it also works like a normal return from a subroutine, but with a slight additional effect – it restores the masked interrupt flag that was saved at the start of an unmasked interrupt. This is information from official documentation, which cannot but raise questions. What, for example if a nested unmasked interrupt occurs? According to the latest data, the z80 works simpler and more logically, and there is therefore an error in the official documentation. The Z80 always restores the masked interrupt flag after exiting any interrupt with the RETN or RETI command, although this is not important for RETI. However, the processor doesn't save this flag when a non-maskable interrupt starts.

For reasons unknown to me, computers based on the Z80 usually did not use interrupt control circuits with full support for the capabilities of this processor, maybe because of the price. Although it turns out somehow unusual: there was a cheap processor and expensive support chips for it, as if Zilog aimed at producing only the Z80. There was the proprietary timer, the DMA, parallel and serial interfaces, and other similar support chips for the Z80, but all these chips were not used in popular computers. They were used only in some models of relatively rare and expensive office systems, for example, the Tandy TRS-80 model 2/12/16, Tatung Einstein or Robotron 1715.

The Z80 has quite a few undocumented instructions, some of these instructions disappeared during the transition to CMOS technology, but those that have survived have become virtually standard and have been documented by some firms. Especially useful are instructions that allow you to work with individual bytes of the clumsy 16-bit registers IX and IY. In addition to undocumented instructions, the Z80 also has other undocumented properties, for example, two extra flags in the status register.

Of course the z80 even more so than the 8080 has the right to be called slightly 16-bit. The hypothetical bit index of the z80 is clearly slightly higher than for the 8080, but it is paradoxical that the ALU of the z80 is actually 4-bit! At the electronic level the z80 and 8080 are completely different chips.

The Z80's 16-bit operations were not fully complete, although this may have happened due to the need to maintain compatibility with the 8080. In particular, it is very inconvenient that the 16-bit increment and decrement do not change flags at all.

Of course, the Z80 can give a lot of other reasons for criticism. In addition to the empty instructions of type LD B,B inherited from the 8080, the Z80 introduced even more meaningless double instructions, i.e. the larger and slower clones of the available instructions which are the long LD HL,(nn) and LD (nn),HL, as well as RLC A, RRC A, RL A, RR A. Also if we count the semi-official instructions, then such ugly doubles will be found even more. There is a rather strange error in the official documentation for input and output commands: it states that register C is used as an index, but actually the register BC is used. Perhaps this error was made intentionally, since the execution of block input or output instructions register B is used as a counter. Of course, if you use only 256 bytes for the ports, this error disappears by itself, but some systems for example Amstrad CPC/PCW, use all 64 KB of address space to access input and output ports and you won't be able to use such block instructions there. It is also interesting that even when using an 8-bit address in the IN and OUT operations the Z80 still generates a 16-bit address using the accumulator to set the high byte. This was used for example in the ZX81. In my opinion, too many instructions were added for input and output, most of them are rather redundant.

In the official documentation for the Z80, in addition to those noted, there are other inaccuracies.

Much has been written about the comparison of the performance of the z80 and 6502, as these processors were very widely used in the first mass computers. In this topic there are several difficult points and without understanding them it is very problematic to maintain objectivity. Due to the presence of a rather large number of registers, the z80 is naturally used at a frequency higher than memory. Therefore the z80 at 4 MHz can use the same memory as the 6502 or 6809 at 1.3 MHz. According to many experienced programmers who wrote code for both processors, at the same frequency the 6502 is on average about 2.4 to 2.6 times faster than the z80. The author of this material agrees with this. I just need to add that writing fast code for the z80 is very difficult, you need to repeatedly optimize the use of the registers, and to work with memory as much as possible using the stack. If you really try then in my opinion, you can reduce the difference between the z80 and 6502 to about 2.1 times. If you do not try and ignore timings, then you can even easily get the difference more than 4 times. In some individual cases the z80 can show very good timings. On the task of filling memory using the PUSH instruction the Z80 can be even slightly faster than the 6502, but this is at the cost of disabling interrupts. On copying memory blocks the z80 is only 1.5 times slower. It is especially impressive that in the division of the 32-bit divisible by the 16-bit divider the z80 is slower by only 1.7 times. By the way such a notable division was implemented by a programmer from Russia. Thus, we get that the ZX Spectrum with the z80 at 3.5 MHz is about 1.5 times faster than the C64 with the 6502 at 1 MHz. It should also be noted that some ticks in most systems with the z80 or 6502 are taken from the processor to circuits for generating the video signal. For example because of this the popular Amstrad CPC/PCW computers have the effective processor frequency of 3.2 MHz, not the full 4. On the 6502 systems you can usually turn off the screen for maximum processor performance. If we take as a basis the frequency of memory not the processor, it turns out that the z80 25-40% is faster than the 6502. The last result can be illustrated by the fact that with memory with a frequency of 2 MHz the z80 can operate at a frequency of up to 6 MHz, and the 6502 only up to 2 MHz.

It would also be interesting to compare the performance of the Z80 and 8088. Of course in general the 8088 is a more powerful processor, but in many important cases it is slower than the Z80. Consider the following table which shows some typical cases where the Z80 is faster. The correspondence between the registers is the same as in the standard program that converts the 8080 code to the 8086 code.


Z808088gain
codetimecodetime
JP word10JMP word155
CALL word17CALL word192
RET10RETN2010
RST byte11INT byte7160
JP (HL)4JMP BX117
JP (IX)8JMP BP113
LD A,(HL)7MOV AL,[BX]103
LD (HL),A7MOV [BX],AL103
LD r,(HL)7MOV r,[BX]136
LD (HL),r7MOV [BX],r147
LD (HL),byte10MOV [BX],byte155
LD A,(BC)7MOV SI,CX
MOV AL,[SI]
125
LD (BC),A7MOV SI,CX
MOV [SI],AL
125
LD HL,(word)16MOV BX,[word]182
LD (word),HL16MOV [word],BX193
EX (SP),HL19MOV SI,SP
XCHG [SI],BX
245
EX (SP),IY23MOV SI,SP
XCHG [SI],DI
241
PUSH BC11PUSH CX154
POP DE10POP DX122
INC (HL)11INC [BX]209
DEC (HL)11DEC [BX]209
SET 0,(HL)15OR [BX],1227
RES 1,(HL)15AND [BX],0xFD227
RLC (HL)15ROL [BX],1205
RR (HL)15RCR [BX],1205


It can easily be noticed that the Z80 is faster on the 8080 instructions which set program counter or make a memory access. Especially noticeable is the advantage of the Z80 at executing far conditional jumps. With the 8088, the offset at conditional jumps is only one byte, so when you need a farther jump you have to write two commands. The Z80 does not have such a problem, there are always two bytes allocated for a jump, and therefore in such cases a conditional jump for the Z80 is much faster by 6 or 9 clock cycles. Almost all instructions using the HL register are performed in the Z80 a little faster, and this includes addition, subtraction, comparison, BIT and other logical operations. Although the Z80 and 8088 returns from an interrupt are architecturally very different, they carry identical functions. The RETI instruction of the Z80 is 30 clock cycles faster than IRET, and this is not counting the commands that the 8088 needs to execute to reset the interrupt controller. In the Z80, subroutine calls and returns can be conditional, which makes similar codes for the Z80 more compact and faster than for the 8088. Also of course emulating the EXX and EX commands for the additional set of registers, block i/o commands, or the RLD/RRD commands will take quite a long time on the 8088. In addition, the 8088 works with an instruction queue that requires 4 clock cycles for each byte of the instruction and this often adds clock cycles when executing a command, the Z80 does not have such a retard. However, the main advantage of the Z80 over the 8086/8088 is its faster memory access. On the Z80 this access takes 3 cycles, and on the 80886/88 it takes 4. But despite all this, the code for the 8088 is usually slightly faster than the code for the Z80 at the same frequencies. After all, the 8088 has more powerful instructions, and its registers are more versatile than the Z80 registers.

The Z80 was used in a very large number of computer systems. In the USA the Tandy TRS-80 was very popular – interestingly, that the HALT instruction causes the reset on this computer. In Europe it was the ZX Spectrum, and later the Amstrad CPC and PCW. Interestingly the Amstrad PCW computers maintained their importance until the mid-90's and massively and actively were used for their intended purpose until the late 90's. Japan and other countries produced quite successful around the world the MSX computers. The rather popular C128 could also use the z80, but in this case the users were left in a rather embarrassing situation. This late 1985 release, the 8-bit computer with the z80, which officially clocked at 2 MHz, really only worked at 1.6 MHz. It was slower even than the first systems of the mid-70's based on the 8080! The range of computers for using the operating system CP/M has at least three dozen fairly well-known systems. The Z80 was also used in the game consoles, Master System, Game Gear, Mega Drive, and others. The total number of the Z80-based systems produced is probably less than 100 million, which is less than the number of the 6502-based systems, but that's not counting calculators. In schools and universities in the United States, from the 90s to the present, almost all students are required to have a TI-83 calculator or one compatible with it! Therefore, taking into account calculators, it is possible that in the 21st century, the number of systems based on the Z80 surpassed the number for the 6502.




Such a PC looked decent even in the mid-90's, but its z80 was slower than that in the ZX Spectrum


The fastest computer system known to me based on the Z80 is the BBC Micro which has 6 MHz second processor the TUBE's Z80B, which was produced from 1984. The processor in this system runs at full speed, as it is possible to say, "without brakes". Similar devices were produced for Apple ][ since 1979. Some such Z80-cards later used the Z80H at 8 MHz and even higher. Interestingly Microsoft in 1980 received the greatest revenue from the sale of such devices. We can also mention the Amstrad PcW16 produced since 1994, which uses the CMOS Z80 at a frequency of 16 MHz.

In Japan for the MSX TurboR systems (1990), the R800 processor was made compatible with the Z80. In the R800 they added hardware 16-bit multiplication with a 32-bit result. Although when multiplying a 16-bit constant, table multiplication with the table of 768 bytes is one clock faster. There are opinions that the R800 is just a simplified Z800, running at four times the frequency of the bus which is about 7.16 MHz. So the R800 internal clock is about 28.64 MHz!

Zilog did work on improving the Z80 very inconsistently and extremely slowly. The first Z80 worked at frequencies up to 2.5 MHz, the Z80A which soon appeared had limiting frequency of 4 MHz. The latter processors became the basis for most popular computers using the Z80. The Z80B appeared by 1980 but was used relatively rarely, for example, in the mentioned second processor card for the BBC Micro or in the late (1989) computer Sam Coupé. The Z80H appeared by the mid-80's and could operate at frequencies up to 8 MHz, it was not used in well-known computers. Interestingly Zilog products had special traps on all chips for those who tried to make copies of them. For example the base Z80 had 9 traps and they, according to reviews of those who tried copying, slowed down the process for almost a year.

A deeper upgrade of the Z80 was hampered by the desire of Zilog to create processors that were competitive with 16-bit Intel processors. In 1978 a little later than the 8086 the Z8000 was released, it was not compatible with the Z80. This processor was unable to withstand competition from Intel and especially Motorola, the 68000 surpassed the Z8000 in almost all parameters, although the Z8000 was used in about a dozen different low-cost systems, usually for working with Unix variants. Interestingly IBM did not even consider the Z8000 as a possible processor for the IBM PC, since Zilog was funded by Exxon which was going to compete with IBM. Perhaps due to the lack of success of the Z8000 Zilog became an Exxon subsidiary by 1980. There was also an attempt to create a competitive 32-bit processor. In 1986 the Z80000 appeared compatible with the Z8000, which has never been used anywhere. Some circumstances, in particular very strange complaints from Zilog's team on excessive financing, allow us to think that maybe Zilog (as part of Exxon) for some obscure reason rather sabotaged their work. Here we can also recall the strange revelations of the leading engineer of Zilog Masatoshi Shima, in which he claimed that Zilog artificially lowered the frequency of the Z8000, so that it was not faster than the 8086.

One can only wonder why Zilog abandoned its approach (which showed super-successful results with the Z80) namely, to make processors software-compatible with Intel processors, but better than them and at the same time completely different at the hardware level. Subsequently, this approach was successfully used by many firms, in particular, AMD, Cyrix, VIA.

Creating a new processor based on the Z80 was postponed until 1985 when the Z800 was produced. However then the main efforts of Zilog were directed at the Z80000 and the Z800 was released in very few numbers. In 1986 after the failure of the Z80000 the Z280 was released, an insignificantly improved version of the Z800 (maybe it was just a rebranding). The Z800/Z280 in particular could work on the internal frequency several times higher than the bus frequency. This new idea brought a big success to the Intel 486DX2 and 486DX4 processors later. But perhaps because of poor performance the Z280 despite many technological innovations could use only relatively low clock frequencies, this processor also has not been used anywhere. It is considered that the Z280 roughly matched the capabilities of the Intel 80286, but was significantly at least 50% slower when using the same clock speed as used with the 80286. Perhaps if the Z280 had appeared 5 years earlier it could have been very successful.

The greatest success was achieved thanks to cooperation with the Japanese company Hitachi, which in 1985 released its super-Z80 (the HD64180) similar in capabilities with the Intel 80186. The HD64180 allowed the use of 512 KB of memory, added a dozen new instructions, but at the same time some almost standard undocumented Z80 instructions were not supported. This processor was used in some computer systems. Zilog received a license for the HD64180 and began to produce them with the marking Z64180. Zilog managed to slightly improve this processor in particular to add support for working with 1 MB of memory and released it by the end of 1986. This new processor was called the Z180 and became the basis for a family of processors and controllers with clock frequencies up to 33 MHz. It was used in some rare models of the MSX2 computers, but more as a controller. It is a curious coincidence that the Z280 and Z180 appeared in one year as was the case of their approximate counterparts the 80286 and 80186 four years before. In 1994 the 32-bit Z380 was made on the basis of the Z180, which retained compatibility with the Z80 and roughly corresponds to the capabilities of the Intel 80386 or Motorola 68020. In fact Zilog showed a lag behind competitors by almost 10 years. For the 21st century again on the basis of the Z180 the successful eZ80 controller-processors have been manufactured with timings almost like the 6502. They are used in various equipment in particular in network cards, DVD-drives, calculators, etc. A processor compatible with the 6502 and comparable to the eZ80 has not appeared. Perhaps simply because Zilog has always had better finances than WDC.

Edited by Richard BN