|
Access and Write Policies
There are two popular ways to access cache memory by processor functional
units: Look-Through and Look-Aside. If a cache memory of the
Look-Through mode is accessed, the controller has to receive a response
from it prior to taking any further actions on this direction. In other words,
the controller is allowed to query a lower cache memory after it receives a
negative response from the upper cache memory. If a cache memory of the
Look-Aside mode is accessed, a copy of this request is dispatched to a
lower cache memory immediately. Both policies experience certain benefits and
drawbacks. Let's consider a regular situation when data requested is missing in
D-cache (say, of 2-cycle access latency) and available in S-cache (say, of
4-cycle access latency). Both caches are integrated and driven by C-box of some
processor. If D-cache were Look-Through, C-box would have no choice but
to query S-cache after D-cache, so 6 cycles to find out where the data are. If
D-cache were Look-Aside, it would take just 4 cycles to arrive at the
same conclusion. Additional cycles are required for the data delivery from
S-cache to D-cache and register file, but there is no way to avoid that indeed.
At the same time and in our case, the Look-Aside policy increases
seriosly load imposed onto tag servicing logic of S-cache, and that should be
taken into consideration when designing a device. In general, choice in favour
of one or another access policy should be made after estimation of cache
memory's potential hit rate which depends mostly on cache memory size and
associativity policy chosen. Look-Aside may be preferred in case of not
so high results, so Look-Through — in the opposite case.
There are two popular cache write policies called Write-Through and
Write-Back. When a cache write occurs, the first policy insists on two
identical store trasanctions: one to the current cache memory and one to the
lower cache memory or operating memory. This isn't the case with the second
policy which insists on a single transaction to the current cache memory. The
Write-Through policy is easier to be implemented because it requires only
a validity bit per line to function properly. However, this approach produces
more write traffic which may be undesired in many cases. The Write-Back
policy requires presence of a modify bit per line in addition to validity bit,
also some additional snooping logic to address memory coherence issues. In
theory, nothing prevents a Write-Back cache to operate in the
Write-Through mode, but not vice versa. Although there are some
interesting issues regarding the Write-Back policy. If there is a new
line to be cached, but its place is populated by a dirty line which has to be
evicted somehow, what to do? One way is to maintain this cache memory as a
subset of a lower cache which has to be larger and feature enough
associativities to qualify for this job. If to substitute the smaller cache with
D-cache and the larger cache with S-cache, it means that S-cache always holds a
complete copy of D-cache. So, any line in D-cache may be overwritten easily,
and S-cache receives a copy of a new line if hasn't got it already. There is one
serious disadvantage of this approach: effective size of S-cache is less than
actual by size of D-cache. In a matter of fact, this solution is a hybrid of the
Write-Back and Write-Through policies which introduces the term of
inclusive cache. By the way, all x86 processors by Intel starting with Pentium
Pro and up the way follow this model. Another way is to maintain a small
intercache buffer also known as copy-back/victim buffer. It's capable of holding
8 or 16 cache lines usually. When a dirty line is evicted from D-cache, it goes
into this buffer and waits for S-cache to become ready. D-cache and S-cache are
decoupled this way, what means a true Write-Back solution, and here is
the term of exclusive cache. All x86 processors by AMD starting with Athlon and
up the way follow this approach. On the other hand, nothing prevents a
particular design from featuring both inclusive and exclusive caches. For
instance, D-cache may be kept as a subset of S-cache, but S-cache may be
independent completely from T-cache. It should be also mentioned that inclusive
caches are easier to maintain in multiprocessor environments because it takes
every time only one coherence check rather than two.
The write allocation policy exists as well. It happens from time to time
that a write has to be performed on a non-cached memory line. It may be a single
byte, a word, a double-word or even a quad-word. In general, C-box generates a
single write cycle directly to operating memory in this case, thus bypassing
complete cache hierarchy. However, this information may be needed in the near
future. That is where write allocation occurs: C-box fetches a respective clean
line from operating memory, updates it and writes to some cache memory. There
are no clear advantages or disadvantages of this policy. It may deliver a minor
performance improvement in some cases, but may also decrease performance a
little in some other cases because of cache pollution.
Just to mention, there are local and remote caches. Local ones are either
integrated into processor's core or placed not so far from it, i. e. within
the same processor package. Remote caches are located usually somewhere on a
mainboard. Local cache is driven by C-box through dedicated buses generally.
Remote cache is controlled by system logic, and its access paths are multiplexed
with system bus usually. Local caches are much faster, but remote caches are
easier to be maintained in multiprocessor environments.
|