Alasir Enterprises
Main Page >  Articles >  Functional Principles of Cache Memory  

Main Page
About Us
Functional Principles of Cache Memory

Paul V. Bolotoff
Release date: 20th of April 2007
Last modify date: 20th of April 2007

in Russian

"The third structural feature of BESM-6 is a method to utilise super-operating memory of a little size, non-addressable in software, which is aimed to reduce automatically request load imposed on the primary operating memory device. This super-operating memory is driven in such a way that operands used the most frequently as well as small internal command cycles appear in quick registers and become ready for immediate use in an arithmetical device or machine controlling system. In many cases, the quick registers allow to eliminate up to 60% of the total memory access requests, thus reducing delivery times for figures and commands due from the primary memory..."
"The fourth structural feature of BESM-6, which is important very much for development of operating systems and machine functionality in multitasking mode, is an approach implemented in hardware to translate mathematical, also known as virtual, addresses into physical ones. There is a clear separation for physical and mathematical memory space with page-based organisation in BESM-6..."

L. N. Korolyov, "Structures of Mainframes and Their Mathematical Supplement", 1974.

BESM-6 (Big Electronical Calculating Machine - 6 in Russian) was the last Soviet mainframe of the BESM family. It was developed in Institute of Precision Mechanics and Computer Technology (assigned to the Academy of Sciences of the USSR) under supervision of S. A. Lebedev and V. A. Melnikov. The design was ready by 1966, and its peak calculating power reached a record of 1 MIPS (Million Instructions Per Second).
BESM-6 deployed at Institute of Precision Mechanics and Computer Technology


People who pay some interest to computer hardware often get curious to know how exactly cache memory works and what it consists of. Nevertheless, it isn't an easy task to find in-depth answers on this matter, especially without having to run through numerous books and articles. When a talk on cache memory starts, most people have got used to discuss size and clock speed while leaving other characteristics somewhere beyond their attention. However, this approach is absurd as much as judging different processors exclusively by their clock speed. The same applies to comparisons of different memory modules while taking into account their size only — in bytes of course, not in centimetres or inches, though from the point of correctness it isn't a big difference at all. In general, this article should eliminate such a disregard paid to cache memory. At the same time, the reader is supposed to have at least intermediate knowledge in mathematics and computer architecture to understand the matters covered.


Cache memory is a temporary storage place for information requested most often. It consists of relatively small areas of fast local memory. As a result, the information stored may be delivered much quicker than it would be done from slower external devices such as operating memory or disk subsystem. Hence, cache memory helps to reduce undesired stalls down to complete elimination and to increase performance thereby. It is built of static memory cells almost always because this approach allows for the maximal performance. Usually, it takes 6 field-effect transistors to complete a single cache memory cell (in other words, a storage place for one bit of information), though other implementations do exist as well. For example, those involving 8 or 12 transistors per cell. In the past, approximately until 0.5µ technological processes went into production, 4-transistor cells were also popular. However, they required an additional layer of polysilicon and featured less performance. Anyway, no matter how many transistors it takes, logical value of a single static memory cell ("0" or "1") depends strictly on status of the transistors' channels which get either open or closed accordingly to voltage supplied to the transistors' gates. In general, a modern processor accommodates built-in cache memory usually, though may also access external cache memory through high-speed bus interfaces.
Operating memory gets installed in much larger size than all cache memory available within a particular machine because it's designed to provide all software tasks (processes and threads) active with storage space. However, it is unattractive economically to build operating memory chips of static memory cells, so there are dynamic memory cells preferred for that purpose almost always. A single cell of this type consists of only 1 transistor and 1 capacitor, so there is no wonder that dynamic memory allows for lower manufacturing costs. Logical value of such a single cell is indicated by voltage of its capacitor's charge. Nevertheless, charge/discharge time of dynamic memory's capacitor is higher than switch time of static memory's field-effect transistor, so this factor alone places dynamic memory behind of statical memory in means of performance. It should also be considered that capacitors leak current, so they have to be recharged on a regular basis, i. e. information must be read from memory and written back immediately. Finally, every read operation discharges inevitably those capacitors involved. So, dynamic memory if compared to static memory is cheaper, but also slower and more complicated in maintenance. In a matter of fact, the choice of dynamic cells for operating memory and of static cells for cache memory delivers the best price/performance ratio, i. e. what consumers and manufacturers expect.
Cache memory doesn't stand for a universal substitute to high bandwidth of system and memory buses. There is a simple reason: different software tasks are characterised by different levels of cacheability. Only those tasks which follow the principles of temporal and spatial locality are able to utilise caches efficiently. To elaborate, the principle of temporal locality supposes that information processed in the near past may be requested again in the near future in high probability. The second principle follows a hypothesis saying that information neighborous to utilised already may be found necessary soon. On the other hand, tasks operating with streaming data (for instance, video and audio related information) fail miserably with caching because they tend to fill caches with information not to be used anytime soon, though special non-temporal read and write instructions may help to minimise cache pollution in this case. The same applies to tasks operating with large data sets which cannot fit in cache because of their size.
It should also be mentioned that cache memory hasn't been a standard component of computer processors all the time. For many years processors ran at the same clock speed of their system buses and operating memory. They lacked pipelining as well, thus featured very low ratios of instructions executed per cycle on average. Hence, there was no apparent reason of getting cache memory in. However, even those old days cache memory chips were installed sometimes onto mainboards to compensate high access latencies of operating memory, especially until page mode came into common practice. Only in the beginning of 1990's, when processors outran their system buses and operating memory in means of clock speed plus basic pipelining principles came into fashion, cache memory became a regular feature of processor cores. Apart of that, performance increase could be achieved by laying out wider system and memory buses. Even nowadays there are high-end systems with 256-bit or 512-bit data channels of memory buses while regular computers accommodate 64-bit or 128-bit ones. There are numerous reasons explaining why wide data channels of system and memory buses aren't popular and have never been. First of all, wide data exchange paths require advanced integrated circuits featuring large silicon die sizes and high pin counts to service them. Of course, it increases manufacturing costs very much. Secondly, it takes complicated multilayer designs of printed circuit boards to accommodate so many data traces. Every trace runs from a controller's pin to a memory module's contact, and it's desired very much for all traces to be of the same length/impedance, low noise sensivity asf. It's quite obvious that more traces you have to lay out, more troubles you are about to face. Again, it has something bad to do with manufacturing costs. On the other hand, it's a significantly easier task to provide cache memory built into processor core with a dedicated on-chip data bus of 256 or even 512 bits. To draw a conclusion, there is no commercially attractive alternative to cache memory so far.
  Next page >>

Copyright (c) Paul V. Bolotoff, 2007. All rights reserved.
A full or partial reprint without a permission received from the author is prohibited.
Designed and maintained by Alasir Enterprises, 1999-2007
rhett from, walter from