main memory (and sometimes even sharing caches). In other words, the model of
shared-memory multicomputers may be implemented using physically separate
CPUs, multiple cores on a single CPU, or a combination of the above.
model, illustrated in Fig. 8-1(a), sounds simple, actually implementing it is not
really so simple and usually involves considerable message passing under the cov-
ers, as we will explain shortly.
However, this message passing is invisible to the
(a) A shared-memory multiprocessor.
(b) A message-passing
(c) A wide area distributed system.
Next comes the system of Fig. 8-1(b) in which the CPU-memory pairs are con-
nected by a high-speed interconnect. This kind of system is called a message-pas-
Each memory is local to a single CPU and can be accessed
only by that CPU.
The CPUs communicate by sending multiword messages over
the interconnect. With a good interconnect, a short message can be sent in 10–50
sec, but still far longer than the memory access time of Fig. 8-1(a). There is no
shared global memory in this design. Multicomputers (i.e., message-passing sys-
tems) are much easier to build than (shared-memory) multiprocessors, but they are
harder to program. Thus each genre has its fans.
The third model, which is illustrated in Fig. 8-1(c), connects complete com-
puter systems over a wide area network, such as the Internet, to form a distributed
system. Each of these has its own memory and the systems communicate by mes-
sage passing. The only real difference between Fig. 8-1(b) and Fig. 8-1(c) is that in
the latter, complete computers are used and message times are often 10–100 msec.
This long delay forces these
systems to be used in different ways
systems of Fig. 8-1(b). The three types of systems differ
in their delays by something like three orders of magnitude. That is the difference
between a day and three years.
This chapter has three major sections, corresponding to each of the three mod-
els of Fig. 8-1. In each model discussed in this chapter, we start out with a brief
MULTIPLE PROCESSOR SYSTEMS
introduction to the relevant hardware. Then we move on to the software, especially
the operating system issues for that type of system.
As we will see, in each case
different issues are present and different approaches are needed.
(or just multiprocessor henceforth) is a
computer system in which two or more CPUs share full access to a common RAM.
A program running on any of the CPUs sees a normal (usually paged) virtual ad-
dress space. The only unusual property this system has is that the CPU can write
some value into a memory word and then read the word back and get a different
value (because another CPU has changed it).
When organized correctly, this prop-
erty forms the basis of interprocessor communication: one CPU writes some data
into memory and another one reads the data out.
For the most part, multiprocessor operating systems are normal operating sys-
tems. They handle system calls, do memory management, provide a file system,
and manage I/O devices. Nevertheless, there are some areas in which they have
These include process synchronization, resource management,
and scheduling. Below we will first take a brief look at multiprocessor hardware
and then move on to these operating systems’ issues.
8.1.1 Multiprocessor Hardware
Although all multiprocessors have the property that every CPU can address all
of memory, some multiprocessors have the additional property that every memory
word can be read as fast as every other memory word. These machines are called
Uniform Memory Access
) multiprocessors. In contrast,
form Memory Access
) multiprocessors do not have this property. Why this dif-
ference exists will become clear later.
We will first examine UMA multiprocessors
and then move on to NUMA multiprocessors.
UMA Multiprocessors with Bus-Based Architectures
The simplest multiprocessors are based on a single bus, as illustrated in
Two or more CPUs and one or more memory modules all use the same
bus for communication. When a CPU wants to read a memory word, it first checks
to see if the bus is busy.
If the bus is idle, the CPU puts the address of the word it
wants on the bus, asserts a few control signals, and waits until the memory puts the
desired word on the bus.
If the bus is busy when a CPU wants to read or write memory, the CPU just
waits until the bus becomes idle. Herein lies the problem with this design.
two or three CPUs, contention for the bus will be manageable; with 32 or 64 it will
be unbearable. The system will be totally limited by the bandwidth of the bus, and
most of the CPUs will be idle most of the time.