As it has been mentioned in the previous chapter, cache memory of any kind
is subdivided for so-called lines. Well, it's unreasonable very much to provide
every single byte in cache with an address field indicating its location in
operating memory because it would lead to terrible consequences in means of
manufacturing costs. If to consider a 32-bit linear physical address space which
allows for up to 4Gb of operating memory, every single cache byte must be
supplied with 4 bytes for addressing needs this way! Additionally, such a cache
would be poor very much in means of performance. Thus, it's much wiser to
address certain sets of adjacent bytes which would form, yes, cache lines. In
real life, these lines are sized at 32 or 64 bytes each usually, however could
be found as large as 1024 bytes each. Of course, any line size must be a
positive integer power of two. Nevertheless, even if size requirement has been
met, not any set of adjacent bytes may be cached because of an additional
limitation known as address alignment. In other words, a set of adjacent bytes
may be placed into a cache line if and only if its starting address is aligned
at the line size boundary. To throw an example in, a 32-byte cache line may only
be filled with information from operating memory located at hexadecimal
(decimal) addresses of 00-1F (00-31), 20-3F (32-63), 40-5F (64-95) and so on.
Apart of other benefits, this simple rule allows to reduce number of address
bits required per line. To be absolutely correct, by 5 in the example above,
i. e. log
2(32) = 5. Finally, a cache line must either
be loaded fully with some valid information or be empty completely which means
the same to be loaded with invalid information. So, it cannot carry any partial
content. There is only one exception to this rule: if two cache lines share one
address field, they may be considered as sublines of one large line, and they're
independent to some extent from each other.
Every address field consists of two primary parts: a dynamical (tag) which
contains the higher address bits, and a statical (index) which contains the
lower address bits
(nota bene: do not associate with dynamic and static
memory cells, that's different). The first one may be modified during
run-time while the second one is fixed. To explain what these parts are for, it
needs to introduce logical memory segmentation. Imagine that there is some
physical memory space which is organised in form of M memory segments sized
equally. Suppose that those memory segments and cache segments are exactly of
the same size. Every memory segment contains N memory lines, of course, sized
equally. Once again, suppose that those memory lines and cache lines are exactly
of the same size. Therefore, to obtain an address of a memory line, it needs to
determine the number of its memory segment first and the number of the memory
line inside of that segment second, then to merge both numbers. Substitute the
segment number with the tag and the line number with the index, and you should
have realised the idea in general.
Therefore, cache line's tag size depends on 3 factors:
- size of cache memory;
- associativity of cache memory;
- cacheable range of operating memory.
It could be calculated using the following formula:
Stag — size of cache tag, in bits;
Smemory — cacheable range of operating memory, in bytes;
Scache — size of cache memory, in bytes;
A — associativity of cache memory, in ways.
Consider some abstract machine with 1Gb maximum of operating memory and 1Mb
of cache memory (doesn't matter at all of what level) with 2-way set associative
policy which requires exactly 11 bits per tag. In other words, there are
1Gb / 512Kb = 2048 memory segments to be addressed by
any particular tag, thus log
2(2048) = 11 bits. However, one
must say that it takes 11 bits per tag to maintain cacheability of any memory
location over strictly 1Gb range with such a cache design given. If there are
fewer bits per tag available, such a cache is still effective though with a
reduced operating range. For example, if there are only 8 bits per tag
implemented, they allow to operate with 2
8 = 256 memory
segments of 512Kb each, hence delivering only 128Mb of cacheable range. Anything
above this boundary will be never cached. Although undesirable, such a situation
was common in times of 486 and Pentium processors. Their system logic sets
supported large amounts of operating memory for those old days, from 256Mb to
1Gb for Pentium-class systems typically, but their B-caches were capable of
dealing with just 64Mb usually. Many of those systems were designed upon regular
8-bit SRAM chip used for tag array of B-cache and eight 8x32Kbit SRAM chips to
produce a direct-mapped data bank of 256Kb. Those chips were often installed in
sockets, thus could be removed for upgrade without much trouble. Later, they
disappeared in favour of less expensive derivatives with higher pin counts,
which were soldered directly on mainboards. Well, it's much cheaper to produce
and install two 32x32Kbit SRAM chips instead of eight 8x32Kbit ones plus eight
sockets. The largest B-cache to appear on a Pentium-class mainboard is of 2Mb.
The idea of such discrete caches was abandoned in the following generations of
mainstream x86 hardware with an exception of a few processor families which
featured cache chips soldered on processor modules — Intel Pentium II
(Klamath, Deschutes) and Pentium II Xeon (Drake), Intel
Pentium III (Katmai) and Pentium III Xeon (Tanner), AMD Athlon
(K7, K75).
Index size depends only on cache segment size and line size. Actually, it
must be big enough to enumerate all lines within any particular segment. For
instance, if there is 512Kb cache segment with 32-byte line size, index size is
log
2(512Kb / 32b) = 14 bits. In a matter of fact,
every cache line within a particular segment has a dedicated number assigned and
hardwired. It cannot be changed and there is no need to. If there are N cache
segments, there must always be N cache lines carrying a similar index. If to
keep segment size intact, line size increase imposes index size decrease as well
as reduction on total number of index fields because of reduction on total number
of lines involved. However, larger lines feature higher load/store latencies
while transferring from/to other caches or operating memory since bandwidth of
any bus interface is well finite. Additionally, larger lines decrease caching
efficiency because of higher level of pollution with unnecessary information.
After all, every cache line is provided usually with some redundant
information for proper error detection. Data fields are protected usually with
either parity checking on byte level or one of error checking and correcting
(ECC) protocols on field level, most of which are based upon the Hamming
algorithm. Tags may be protected with single-bit parity checking, though it
isn't nearly as mandatory as in case of data fields. Whatever a particular
implementation of error detection is, any fault protecting logic adds some
complexity to an original design and gets transaction latencies affected
inevitably. That is, a control sum of data field has to be recalculated and
compared to the stored one on every read. The worst case is partial write into a
valid line because a control sum has to be calculated twice: before and after
the line change.