"The third structural feature of BESM-6 is a method to utilise
super-operating memory of a little size, non-addressable in software, which is
aimed to reduce automatically request load imposed on the primary operating
memory device. This super-operating memory is driven in such a way that operands
used the most frequently as well as small internal command cycles appear in
quick registers and become ready for immediate use in an arithmetical device or
machine controlling system. In many cases, the quick registers allow to
eliminate up to 60% of the total memory access requests, thus reducing delivery
times for figures and commands due from the primary memory..."
"The fourth structural feature of BESM-6, which is important very much for
development of operating systems and machine functionality in multitasking mode,
is an approach implemented in hardware to translate mathematical, also known as
virtual, addresses into physical ones. There is a clear separation for
physical and mathematical memory space with page-based organisation in
BESM-6..."
L. N. Korolyov, "Structures of Mainframes and Their Mathematical
Supplement", 1974.
BESM-6 (Big Electronical Calculating Machine - 6 in Russian) was the last
Soviet mainframe of the BESM family. It was developed in Institute of Precision
Mechanics and Computer Technology (assigned to the Academy of Sciences of the
USSR) under supervision of S. A. Lebedev and
V. A. Melnikov. The design was ready by 1966, and its peak calculating
power reached a record of 1 MIPS (Million Instructions Per Second).
|
BESM-6 deployed at Institute of Precision Mechanics and Computer Technology
|
Foreword
People who pay some interest to computer hardware often get curious to know
how exactly cache memory works and what it consists of. Nevertheless, it isn't
an easy task to find in-depth answers on this matter, especially without having
to run through numerous books and articles. When a talk on cache memory starts,
most people have got used to discuss size and clock speed while leaving other
characteristics somewhere beyond their attention. However, this approach is
absurd as much as judging different processors exclusively by their clock speed.
The same applies to comparisons of different memory modules while taking into
account their size only — in bytes of course, not in centimetres or
inches, though from the point of correctness it isn't a big difference at all.
In general, this article should eliminate such a disregard paid to cache memory.
At the same time, the reader is supposed to have at least intermediate knowledge
in mathematics and computer architecture to understand the matters covered.
Briefing
Cache memory is a temporary storage place for information requested most
often. It consists of relatively small areas of fast local memory. As a
result, the information stored may be delivered much quicker than it would be
done from slower external devices such as operating memory or disk subsystem.
Hence, cache memory helps to reduce undesired stalls down to complete
elimination and to increase performance thereby. It is built of static memory
cells almost always because this approach allows for the maximal performance.
Usually, it takes 6 field-effect transistors to complete a single cache memory
cell (in other words, a storage place for one bit of information), though other
implementations do exist as well. For example, those involving 8 or 12
transistors per cell. In the past, approximately until 0.5µ technological
processes went into production, 4-transistor cells were also popular. However,
they required an additional layer of polysilicon and featured less performance.
Anyway, no matter how many transistors it takes, logical value of a single
static memory cell ("0" or "1") depends strictly on status of the transistors'
channels which get either open or closed accordingly to voltage supplied to the
transistors' gates. In general, a modern processor accommodates built-in cache
memory usually, though may also access external cache memory through high-speed
bus interfaces.
Operating memory gets installed in much larger size than all cache memory
available within a particular machine because it's designed to provide all
software tasks (processes and threads) active with storage space. However, it is
unattractive economically to build operating memory chips of static memory
cells, so there are dynamic memory cells preferred for that purpose almost
always. A single cell of this type consists of only 1 transistor and 1
capacitor, so there is no wonder that dynamic memory allows for lower
manufacturing costs. Logical value of such a single cell is indicated by voltage
of its capacitor's charge. Nevertheless, charge/discharge time of dynamic
memory's capacitor is higher than switch time of static memory's field-effect
transistor, so this factor alone places dynamic memory behind of statical memory
in means of performance. It should also be considered that capacitors leak
current, so they have to be recharged on a regular basis, i. e. information
must be read from memory and written back immediately. Finally, every read
operation discharges inevitably those capacitors involved. So, dynamic memory if
compared to static memory is cheaper, but also slower and more complicated in
maintenance. In a matter of fact, the choice of dynamic cells for operating
memory and of static cells for cache memory delivers the best price/performance
ratio, i. e. what consumers and manufacturers expect.
Cache memory doesn't stand for a universal substitute to high bandwidth
of system and memory buses. There is a simple reason: different software tasks
are characterised by different levels of cacheability. Only those tasks which
follow the principles of temporal and spatial locality are able to utilise
caches efficiently. To elaborate, the principle of temporal locality supposes
that information processed in the near past may be requested again in the near
future in high probability. The second principle follows a hypothesis saying
that information neighborous to utilised already may be found necessary soon.
On the other hand, tasks operating with streaming data (for instance, video and
audio related information) fail miserably with caching because they tend to fill
caches with information not to be used anytime soon, though special non-temporal
read and write instructions may help to minimise cache pollution in this case.
The same applies to tasks operating with large data sets which cannot fit in
cache because of their size.
It should also be mentioned that cache memory hasn't been a standard
component of computer processors all the time. For many years processors ran at
the same clock speed of their system buses and operating memory. They lacked
pipelining as well, thus featured very low ratios of instructions executed per
cycle on average. Hence, there was no apparent reason of getting cache memory
in. However, even those old days cache memory chips were installed sometimes
onto mainboards to compensate high access latencies of operating memory,
especially until page mode came into common practice. Only in the beginning of
1990's, when processors outran their system buses and operating memory in means
of clock speed plus basic pipelining principles came into fashion, cache memory
became a regular feature of processor cores. Apart of that, performance increase
could be achieved by laying out wider system and memory buses. Even nowadays
there are high-end systems with 256-bit or 512-bit data channels of memory buses
while regular computers accommodate 64-bit or 128-bit ones. There are numerous
reasons explaining why wide data channels of system and memory buses aren't
popular and have never been. First of all, wide data exchange paths require
advanced integrated circuits featuring large silicon die sizes and high pin
counts to service them. Of course, it increases manufacturing costs very much.
Secondly, it takes complicated multilayer designs of printed circuit boards to
accommodate so many data traces. Every trace runs from a controller's pin to a
memory module's contact, and it's desired very much for all traces to be of the
same length/impedance, low noise sensivity asf. It's quite obvious that more
traces you have to lay out, more troubles you are about to face. Again, it has
something bad to do with manufacturing costs. On the other hand, it's a
significantly easier task to provide cache memory built into processor core with
a dedicated on-chip data bus of 256 or even 512 bits. To draw a conclusion,
there is no commercially attractive alternative to cache memory so far.
|