From: firstname.lastname@example.org (Jim Gettys)
Subject: Alpha Architecture Technical Summary
Keywords: Alpha microprocessor, Digital, 64 bit
Sender: email@example.com (USENET News System)
Date: Tue, 25 Feb 1992 17:17:38 GMT
Dick Sites is in Tokyo for the Alpha rollout today; he asked me to
post this architectural summary of Alpha for your interest.
Alpha Architecture Handbooks with complete information are available
in some quantity today; a much larger printing of the handbooks
should be available from your local sales people sometime next month, from
what I understand.
The current Alpha processor chip is called a 21064. I believe samples
are available for customer's evaluation now, according to the sales
information I've seen.
I don't want to be a salesman on the net here, but to avoid the
flood of mail I'd otherwise recieve, I include a phone number
for more information.
To learn more about pricing and availability of the 21064
microprocessor in its 150 MHz or faster clock rate versions, contact
your local Digital sales representative. Or, in the United States, call
1-800-DEC-2717; 1-800-DEC-2515 (TTY).
Some spec's on the chip follow the Architecture summary below.
- Jim Gettys
ALPHA ARCHITECTURE TECHNICAL SUMMARY
Dick Sites, Rich Witek
[NOTE: "Alpha" is an internal code name. An official name will be announced
WHAT IS ALPHA?
Alpha is a 64-bit RISC architecture, designed with particular emphasis on
speed, multiple instruction issue, multiple processors, software migration
from VAX VMS and MIPS ULTRIX, and long lifetime. The architects rejected
any feature that did not appear to be usable for at least 25 years.
The first chip implementation runs at up to 200 MHz. The speed of Alpha
implementations is expected to scale up from this by at least a factor of
1000 over the next 25 years.
Alpha is a load/store RISC architecture with all operations done between
registers. Alpha has 32 integer registers and 32 floating registers, each
64 bits. Integer register R31 and floating register F31 are always zero.
Longword (32-bit) and quadword (64-bit) integers are supported. Four
floating datatypes are supported: VAX F-float, VAX G-float, IEEE single
(32-bit), and IEEE double (64-bit). Memory is accessed via 64-bit virtual
little-endian byte addresses.
Alpha instructions are all 32 bits, in four different instruction formats
specifying 0, 1, 2, or 3 register fields. All formats have a 6-bit opcode.
| OP | number | PALcall
| OP | RA | disp | Branch
| OP | RA | RB | disp | Memory
| OP | RA | RB | func. | RC | Operate
PALcalls specify one of a few dozen complex operations to be performed.
Conditional branches test register RA and specify a signed 21-bit
PC-relative longword target displacement. Subroutine calls put the return
address in RA.
Loads and stores move longwords or quadwords between RA and memory, using
RB plus a signed 16-bit displacement as the memory address.
Operates use source registers RA and RB, writing result register RC. There
is an extended opcode in the 11-bit function field. Integer operates can use
the RB field and part of the function field to specify an 8-bit
The Privileged Architecture Library call instructions specify one of a few
dozen complex functions to be performed. These functions deal with
interrupts and exceptions, task switching, virtual memory, and other
complex operations that must be done atomically. PALcall instructions
vector to a privileged library of software subroutines (using the same Alpha
instruction set) that implement an operating-system-specific set of these
Conditional branch instructions can test a register for positive/negative
or for zero/nonzero. They can also test integer registers for even/odd.
Unconditional branch instructions can write a return address into a
register. There is also a calculated jump instruction the branches to an
arbitrary 64-bit address in a register.
Load and store instructions can move either 32- or 64-bit aligned
quantities. The VAX floating-point load/store instructions swap words to
give a consistent register format for floats. Memory addresses are flat
64-bit virtual addresses, with no segmentation. A 32-bit integer datum is
placed in a register in a canonical form that makes 33 copies of the high
bit of the datum. A 32-bit floating datum is placed in a register in a
canonical form that extends the exponent by 3 bits and extends the fraction
with 29 low-order zeros. 32-bit operates preserve these canonical forms.
There are no 8- or 16-bit load/store instructions, but there are facilities
for doing byte manipulation in registers.
Alpha has no 32/64 mode bit or other such device. Compilers, as directed by
user declarations, can generate any mixture of 32- and 64-bit operations.
Integer Operate Instructions
The integer operate instructions manipulate full 64-bit values, and include
the usual assortment of arithmetic, compare, logical, and shift
instructions. There are just three 32-bit integer operates: add, subtract,
and multiply. These differ from their 64-bit counterparts ONLY in overflow
detection and in producing 32-bit canonical results.
There is no integer divide instruction.
In addition to the operations found in conventional RISC architectures,
there are scaled add/subtract for quick subscript calculation, 128-bit
multiply for division by a constant and multiprecision arithmetic,
conditional moves for avoiding branches, and an extensive set of
in-register byte manipulation instructions for avoiding single-byte writes.
Rather then keeping a global state bit for integer overflow trap enable,
the enable is encoded in the function field of each instruction. Thus, both
ADDQ/V and ADDQ opcodes exist for specifying 64-bit add with and without
overflow checking. This makes pipelined implementations easier.
Floating-point Operate Instructions
The floating operate instructions include four complete sets of VAX and
IEEE arithmetic, plus conversions between float and integer.
There is no floating square root instruction.
In addition to the operations found in conventional RISC architectures,
there are conditional moves for avoiding branches, and merge sign/exponent
instructions for simple field manipulation.
Rather then keeping global state bits for arithmetic trap enables and
rounding mode, these enable and mode bits are encoded in the function field
of each instruction.
SIGNIFICANT DIFFERENCES BETWEEN ALPHA AND CONVENTIONAL RISC PROCESSORS
First, Alpha is a true 64-bit architecture, with a minimal number of 32-bit
instructions. It is not a 32-bit architecture that was later expanded to 64
Second, Alpha was designed to allow very high-speed implementations. The
instructions are very simple (no load-four-registers-unaligned-and-check-
for-bytes-of-zero). There are no special registers that would prevent
pipelining multiple instances of the same operations (no MQ register and no
condition codes). The instructions interact with each other ONLY by one
instruction writing a register or memory, and another one reading from the
same place. This makes it particularly easy to build implementations that
issue multiple instructions every CPU cycle. (The first implementation
in fact issues two instructions every cycle.) There are no
implementation-specific pipeline timing hazards, no load-delay slots, and
no branch-delay slots. These features would make it difficult to maintain
binary compatibility across multiple implementations and difficult to
maintain full speed on multiple-issue implementations.
Alpha is unconventional in the approach to byte manipulation. Single-byte
stores found in conventional RISC architectures force cache and memory
implementations to include byte shift-and-mask logic, and sequencer logic
to perform read-modify-write on memory words. This approach is awkward to
implement quickly, and tends to slow down cache access to normal 32- or
64-bit aligned quantities. It also makes it awkward to build a high-speed
error-correcting write-back cache, which is often needed to keep a very
fast RISC implementation busy. It also can make it difficult to pipeline
multiple byte operations.
Instead, the byte shifting and masking is done in Alpha with normal 64-bit
register-to-register instructions, crafted to keep the sequences short.
Alpha is also unconventional in the approach to arithmetic traps. In
contrast to conventional RISC architectures, Alpha arithmetic traps
(overflow, underflow, etc.) are imprecise -- they can be delivered an
arbitrary number of instructions after the instruction that triggered the
trap, and traps from many different instructions can be reported at once.
This makes implementations that use pipelining and multiple issue
substantially easier to build.
If precise arithmetic exceptions are desired, trap barrier instructions can
be explicitly inserted in the program to force traps to be delivered at
Alpha is also unconventional in the approach to multiprocessor shared
memory. As viewed from a second processor (including an I/O device), a
sequence of reads and writes issued by one processor may be arbitrarily
reordered by an implementation. This allows implementations to use
multi-bank caches, bypassed write buffers, write merging, pipelined writes
with retry on error, etc. If strict ordering between two accesses must be
maintained, memory barrier instructions can be explicitly inserted in the
The basic multiprocessor interlocking primitive is a RISC-style
load_locked, modify, store_conditional sequence. If the sequence runs
without interrupt, exception, or an interfering write from another
processor, then the conditional store succeeds. Otherwise, the store fails
and the program eventually must branch back and retry the sequence. This
style of interlocking scales well with very fast caches, and makes Alpha an
especially attractive architecture for building multiple-processor systems.
Alpha includes a number of HINTS for implementations, all aimed at allowing
higher speed. Calculated jumps have a target hint that can allow much
faster subroutine calls and returns. There are prefetching hints for the
memory system that can allow much higher cache hit rates. There are also
granularity hints for the virtual-address mapping that can allow much more
effective use of translation lookaside buffers for big contiguous
Alpha includes a very flexible privileged library of software for operating-
system-specific operations, invoked with PALcalls. This library allows Alpha
to run full VMS using one version of this software library that mirrors many
of the VAX operating-system features, and to run OSF/1 using a different
version that mirrors many of the MIPS operating-system features, and
similarly for NT. Other versions could be tailored for real-time, teaching,
etc. The PALcalls allow Alpha to run VMS with hardly more hardware than
a a conventional RISC machine has (the PAL mode bit itself, plus 4 extra
protection bits in each TB entry). This library makes Alpha an especially
attractive architecture for multiple operating systems.
Finally, Alpha is not strongly biased toward only one or two programming
languages. It is an attractive architecture for compiling at least a dozen
Alpha is designed to be a leadership 64-bit architecture.
Specifications (150MHz version).
Process Technology .75 micron CMOS
Cycle Time 150 MHz (6.6 ns)
Die Size 13.9mm x 16.8mm
Transistor Count 1.68 million
Package 431 pin PGA
Number of Signal Pins 291
Power Dissipation 23 W at 6.6 ns cycle
Power Supply 3.3 volts
Clocking Input 300 MHz differential
On-chip D-cache 8 Kbyte, physical, direct-mapped,
write-through, 32-byte line, 32-byte fill
On-chip I-cache 8 Kbyte, physical, direct-mapped,
32-byte line, 32-byte fill, 64 ASNs
On-chip DTB 32-entry; fully-associative; 8-Kbyte,
64-Kbyte, 256-Kbyte, 4-Mbyte page sizes
On-chip ITB 8-entry, fully associative, 8-Kbyte page
plus 4-entry, fully-associative, 4-Mbyte page
Floating Point Unit On-chip FPU supports both IEEE and VAX
Bus Separate data and address bus.
128-bit/64-bit data bus
Serial ROM Interface Allows the chip to directly
access serial ROM
Virtual Address Size 64 bits checked; 43 bits
Physical Address Size 34 bits implemented
Page Size 8 Kbytes
Issue Rate 2 instructions per cycle to A-box,
E-box, or F-box
Integer Pipeline 7-stage pipeline
Floating Pipeline 10-stage pipeline