Associativity is a characteristic of cache memory related directly to its
logical segmentation: there are as many segments as many ways defined. In
other words, N-way set associative cache memory means that information stored
at some address in operating memory could be placed (cached) in N locations
(lines) of this cache memory. The basic principle of logical segmentation says
that there is only one line within any particular segment to be capable of
caching information located at some memory address. Thereby, once the address is
known, there is only one line in a whole segment to be investigated upon. When a
cache controller receives a read or write request, it queries appropriate lines
of all cache segments on information availability (cache hit, i. e. valid
data) or absense (cache miss, i. e. invalid data or valid but related to
another memory address). Either all segments return misses or all but one which
returns a hit indeed. If there is a hit, then it just has to be serviced
accordingly, no need to perform any further investigation. If there are all
misses on a read request, the controller reissues the original request to a
lower cache controller and so forth. If all of them fail to complete the
operation, then it must be routed to a memory controller. On a write request,
the cache controller has to choose a segment to post a record to. In order to do
so, it checks condition and other identifiers of those lines to choose from and
arrives at a decision accordingly to one of the following popular algorithms:
-
LRU (Least Recently Used) — a line containing information with the
oldest last hit shall be chosen for a replacement;
-
LRR (Least Recently Replaced), also known as FIFO (First In, First Out)
— a line containing information with the oldest record shall be chosen
for a replacement;
-
LFU (Least Frequently Used) — a line containing information which has been
requested least often shall be chosen for a replacement;
-
Random — a casual line to be chosen for a replacement.
In a matter of fact, the more advanced algorithms compared to Random are
more effective, but require more complex logic designs to service them. LFU
supposes every cache line to be supported by a special counter which should be
increased on every hit. LRU and LRR imply some kind of a time stamp counter
instead of a hit counter, though LRU also imposes a reset on every hit. Random
has no need in any line counters and could be well implemented with a single
number generator or something like that.
There are two particular representatives of cache associativity policy which
are opposite in fact. If information located at some address in operating memory
can be placed in any cache line, such cache memory is called fully associative.
On the other hand, if this information can be placed in only one cache line,
such cache memory is called direct mapped. To elaborate these statements a
little, there are as many cache segments as cache lines available in case of
full associativity, and there is only one cache segment present in case of
direct mapping. Hence, direct mapped cache memory may be referenced as 1-way set
associative, and number of ways possessed by fully associative one equals to
number of cache lines available. Direct mapped cache is faster apparently than
fully associative one because it makes an easy task for controller to decide on
a line for writing/reading once a memory address is known. However, this
approach is the least efficient because controller has no choice while writing
because there is only one line available to choose from. So, significant
performance degradations occur if there are two or more memory addresses
challenging each other for a single cache line. In contrary, fully associative
cache is the slowest yet the most efficient one. Nevertheless, other variants of
cache associativity policy do exist between those two described above because
there is plenty of room for trade-offs in means of performance and efficiency
indeed. For instance, 2-, 4- and 8-way set associative caches tend to be the
most popular.
The following drawing illustrates how caches with different
implementations of associativity policy do their job. Consider some abstract
model with 8 cache lines and physical memory space equivalent in size to 64
cache lines. It needs to store the 10th so-called memory line in this cache
(nota bene: all lines are numbered from 0 and up the way, so this line
carries number 9). In case of direct-mapped cache this memory line may be
written in the only one place: 9 mod 8 = 1 (remainder on
division), i. e. cache line number 1 of the only 8-line cache segment.
2-way set associative organisation offers two possible places:
9 mod 4 = 1, i. e. cache line number 1 of any 4-line
cache segment. Fully associative approach delivers the maximum of options:
9 mod 1 = 0, i. e. cache line number 0 of any 1-line
cache segment, basically, any cache line available. Actions taken on a read
request are related fairly. In case of direct-mapped cache the only line that
needs to be probed is number 1. 2-way set associative mode implies lines number
1 and 5 to be queried, and fully associative mode requires all lines to be
investigated upon.
|
Although rarely and of a little size, fully associative cache memory may
be found in real-life processors. For instance, those of Cyrix 6x86 family
contain 256 bytes of such cache called instruction line buffer. This
instruction-only "zero level" cache is backed up by dual-ported U-cache of
either 16Kb (6x86 and 6x86L) or 64Kb (6x86MX and 6x86MII). The primary purpose
of this tiny little cache is to reduce load imposed on U-cache and to minimise
execution stalls possible when U-cache is overloaded with data. Fully
associative approach is popular while designing TLBs (shall be told in depth
about them later), branch target caches, input/output buffers and so on. In a
matter of fact, I-cache and D-cache feature low associativities (2 or 4 ways
usually) because if increased they would affect access latencies and lead to
serious performance decrease. To address this issue, associativities of S-cache
are set higher (8 or 16 ways usually) because access latencies of this cache
memory aren't of the primary importance. However, B-cache doesn't feature many
associativities as regular: 1-way for caches located on mainboards, 2- or 4-way
for those located on processor modules. It's just complicated to build
high-associativity caches out of standard discrete components.