It would be quite logical to suppose that any cache line should be given
with some service information describing its status. For instance, how to find
out whether a particular cache line is valid (i. e. contains some useful
information) or invalid (i. e. full of junk)? Of course, a special policy
must be developed to address such a common situation. Say, to consider a line
invalid if all tag and data bits are set high, i. e. to all ones. However,
this approach would turn life of any cache controller into a neverending
nightmare. When a read or write request arrives, the controller activates all
lines carrying an appropriate index and starts to check their tag and data bits
on per line basis (a consequent logical multiplication, AND). If the result is
equal to one, such a cache line should be considered as invalid. To throw an
example in, if a cache memory of 256Kb is 16-way set associative with 64-byte
line size and operational over 4Gb memory range, it takes 288 tag bits and 8192
data bits to be read and compared against validity on every transaction at the
worst! The situation gets even worse if to involve data integrity algorithms.
Of course, this approach is of no practical use. Even if to simplify it by
paying no attention to data bits and checking tag bits only, thus leaving the
last memory segment uncacheable, it's viable for limited use only. For that
matter, B-caches of 486 and Pentium times were designed this way sometimes, but
bear in mind that they were direct-mapped and their tag size was small
relatively. So, this approach is rather an exception than a rule. However, it
becomes a completely different matter if to add an extra bit per line to
describe its validity. Let's suppose that a line is considered valid if this bit
is set high. Life gets simpler: cache controller doesn't need to check anything
else to decide whether a particular line is junky or not.
While an excellent idea itself, validity bit may lead to serious reliability
issues if gets inverted accidentally. In real life, it's highly unlikely though
still possible for a junk line to become valid this way. There are several ways
to prevent this, but the simplest one is to establish another accompanying bit
to carry an inverted value of the actual validity bit. If both bits happen to be
of the same value, a respective cache line must be considered invalid. It isn't
really possible to change values of two neighborous bits to their opposites
because any kind of electromagnetic interference known on this planet is
unipolar.
There is modify bit also known as dirty bit. Its purpose is to indicate
whether contents of a particular cache line are different to what is stored in
operating memory. If there is no such a bit per line implemented, then all lines
must be kept unmodified/clean, what means that any line change must be forwarded
to memory immediately. Such a cache is called memory coherent. Well, it isn't
attractive very much because of significant memory bandwidth waste. On the other
hand, it allows for faster cache writes because any unmodified line may be
overwritten with another one easily, no need to take any additional actions.
Unlike that, no modified line may be overwritten without prior eviction to a
lower cache or memory because of apparent data loss. In other words, cache memory
controller should keep an eye on these lines somehow, and that's what modify bit
is exactly for. It should also be mentioned that D-caches and U-caches may or
may not feature support for modified lines while I-caches are memory coherent
always. Although it is possible to store dirty lines in cache memory with no
modify bits implemented, this approach isn't popular. It simply treats all cache
lines as dirty right from the start, no matter whether they actually are.
However, this trick has a side effect of very large write traffic to a lower
cache memory or operating memory. Every line loaded into such a cache must be
unloaded back inescapably at some moment in the future. Some system logic sets
of the past, such as VIA Apollo VP2/97 (also known as AMD-640) for Pentium-class
processors, supported this mode as an option for B-cache aiming larger
cacheability region of operating memory, but performance shown in the real life
was rather disappointing.
The third important cache line condition identifier is shared bit. Its
purpose is to indicate whether a particular line is contained in other caches
including those which belong to other processors. For instance, if one processor
updates its local copy, other processors should be aware of this event and treat
their copies accordingly. There are two methods for that purpose,
Write
Update (also known as
Write Broadcast) and
Write Invalidate.
The first one obligates the modifying processor to broadcast changes made to
other processors, so they can update their copies. The second one differs from
the previous one in means that no data transferred but a special message, and
the other processors have got to invalidate their copies in accordance with it.
In general,
Write Update produces significantly more system bus traffic
than
Write Invalidate, so this approach works well for systems with
star-based processor topology (also known as point to point topology) and
especially well for those ones with some kind of direct interprocessor
connections. On the other hand,
Write Invalidate does the best for
systems with bus-based topology.
Given the information above, it's the right time to start with logical
condition and coherence protocols which form basis of almost all physical
implementations. The most popular one is MESI (
Modified,
Exclusive,
Shared,
Invalid):
| Condition |
bit 0 |
bit 1 |
| Modified |
1 |
0 |
| Exclusive |
0 |
1 |
| Shared |
1 |
1 |
| Invalid |
0 |
0 |
This protocol requires physical availability of two condition bits per line
to describe all four possible states. Obviously,
Invalid means that a
particular line contains no useful information, so it may be considered as
empty.
Exclusive implies that contents of this line are identical
absolutely to memory, so it may be called a clean line, also there are no copies
of it in other caches, i. e. this one is exclusive literally.
Modified states that contents of this line are different to memory, so it
may be called a dirty line, also there are still no copies of it in other caches.
Shared defines a clean line which may have copies in other caches (they
were for sure at the moment of condition writing, but could be invalidated
later).
The MESI protocol isn't perfect though simple fairly and well useful for
uniprocessor configurations with one or two levels of cache memory. Here is a
simple example to illustrate its behaviour. For a good start, a fresh line has
been loaded into D-cache from the operating memory, so it becomes
Exclusive. A while later, another line gets loaded into the same place,
thus forcing the original line for eviction to S-cache where it remains
Exclusive. Upon a cache hit, this line gets loaded back from S-cache to
D-cache, and both lines become
Shared. If a clone located in D-cache has
to get dirty, there are two reasonable solutions possible: a) the changes will
be issued to both S-cache and the operating memory, so both clones will stay
Shared; b) the changes will not be issued anywhere, a clone in S-cache
will be labelled as
Invalid to allow that one in D-cache for
Modified. Any other arcane approach is not to be reviewed here.
There are several relatives of the MESI protocol such as MSI
(
Modified,
Shared,
Invalid), MOSI (
Modified,
Owned,
Shared,
Invalid), MOESI (
Modified,
Owned,
Exclusive,
Shared,
Invalid) and MOWESI
(
Modified,
Owned,
Written,
Exclusive,
Shared,
Invalid). The last two are the most advanced, so about them in detail.
MOESI requires 3 physical condition bits per line to be implemented, and it's
primary purpose is to eliminate that drawback of MESI which doesn't allow to
keep dirty lines in more than one cache. A new state has been developed to
address this issue,
Owned, which has something to do with ownership
rights to a particular cache line, all clones of it, and a respective memory
line as well. These rights belong to system logic by default, so it must keep
track of all system agents
(nota bene: these aren't processors only, but also
all bus master devices) to receive valid information at any moment. The
first violation attempt of these rights was the condition of
Modified
which made system logic's life more complicated considerably. Every time a
system agent asks for any information from a cacheable memory space, the system
logic has to terminate the request and issue an inquiry to all processors
whether any of them has an appropriate line dirty in caches of its own. On a
positive response, the processor holding the line is obligated to write it back
to the operating memory and change the line status to
Exclusive. Upon
completion, the system logic restarts the original request and delivers the
information. If there were no condition of
Modified, all caches of all
processors would be memory coherent, so the system logic would have no need in
performing any further investigation and could just deliver what has been
requested directly from the operating memory. Let's get back to
Owned.
By granting a line with this condition, a respective processor takes all rights
and duties regarding it from the system logic. If any system agent requests this
line, the owning processor must respond and supply it with a valid copy. No
operating memory is updated on this transaction even if the system logic gets
involved. In a matter of fact, while the state of
Modified may be
elaborated as
Exclusive Modified, the state of
Owned is nothing
else but
Shared Modified. If any line becomes
Owned, all copies of
it must remain
Shared including those in caches of other processors.
However, the MOESI protocol isn't perfect as well. In a matter of fact, the
condition of
Owned gives all power to master processor, but places other
processors in read only position. So, if any of them needs to perform a write to
its
Shared line, it has to be done to operating memory as well, thus
terminating the master processor's state of
Owned. That's all right for
most multiprocessor environments featuring only one active writer and several
readers usually, but it's a different story for multithreaded environments where
several processors or processor cores may execute different threads of the same
process concurrently. An additional condition,
Written, has been
introduced in order to improve performance in this area, and it has resulted in
creation of the MOWESI protocol. When a write to a
Shared line occurs, it
changes the status to
Written, all other processors either invalidate or
update their obsolete copies, after that it needs to change the line status to
Modified or
Owned respectively. An operating memory update has
been avoided successfully either way. This protocol has been introduced in the
2-core IBM POWER5 processor.
There are numerous low-level condition & coherence protocols documented
which feature different combinations of condition identifiers, write methods,
bus signaling asf. To name a few the oldest: Illinois, Write Once, Berkeley, DEC
Firefly, Xerox Dragon. For instance, the protocol implemented in most DEC Alpha
systems is MOESI-based with
Write Invalidate support. It takes three
physical condition bits per line to operate: tagCtlV (
Valid), tagCtlS
(
Shared) and tagCtlD (
Dirty). An additional bit, tagCtlP, is
designed to handle even parity on them. The following table illustrates all
combinations of the condition bits and their actual meaning, where "x" character
means that a respective bit value is of no difference
(nota bene: there is
also a simpler protocol supported on uniprocessor systems only, which doesn't
utilise tagCtlS, but it isn't to be described here).
| Condition |
tagCtlV |
tagCtlS |
tagCtlD |
| Invalid |
0 |
x |
x |
| Clean |
1 |
0 |
0 |
| Dirty |
1 |
0 |
1 |
| Clean Shared |
1 |
1 |
0 |
| Dirty Shared |
1 |
1 |
1 |
The conditions of
Clean and
Dirty are fully equivalent to
Exclusive and
Modified respectively from the MOESI protocol as
well as
Invalid. However,
Clean Shared and
Dirty Shared
behave not exactly as
Shared and
Owned respectively. Operating
memory gets updated always on a write into such a line while other processors
invalidate their obsolete copies, and the line written gets
Clean. In
other words, any line labelled as
Clean or
Dirty may become
Clean Shared or
Dirty Shared respectively, but
Clean
Shared may never be turned into
Dirty Shared or vice versa.