|
|
|
Modern Operating Systems by Herbert Bos and Andrew S. Tanenb...
Modern_Operating_Systems_by_Herbert_Bos_and_Andrew_S._Tanenbaum_4th_Ed.pdf
Showing 568-569 out of 1137
Modern Operating Systems by Herbert Bos and Andrew...
Modern_Operating_Systems_by_Herbert_Bos_and_Andrew_S._Tanenbaum_4th_Ed.pdf-M ODERN O PERATING S YSTEMS
Modern Operating Systems by Herbert...
Modern_Operating_Systems_by_Herbert_Bos_and_Andrew_S._Tanenbaum_4th_Ed.pdf-M ODERN O PERATING S YSTEMS
Page 568
SEC. 8.1
MULTIPROCESSORS
537
Another way to reduce bus traffic is to use the well-known Ethernet binary
exponential backoff algorithm (Anderson, 1990).
Instead of continuously polling,
as in Fig. 2-25, a delay loop can be inserted between polls. Initially the delay is one
instruction. If the lock is still busy, the delay is doubled to two instructions, then
four instructions, and so on up to some maximum.
A low maximum gives a fast
response when the lock is released, but wastes more bus cycles on cache thrashing.
A high maximum reduces cache thrashing at the expense of not noticing that the
lock is free so quickly. Binary exponential backoff can be used with or without the
pure reads preceding the
TSL
instruction.
An even better idea is to give each CPU wishing to acquire the mutex its own
private lock variable to test, as illustrated in Fig. 8-11 (Mellor-Crummey and Scott,
1991). The variable should reside in an otherwise unused cache block to avoid
conflicts. The algorithm works by having a CPU that fails to acquire the lock allo-
cate a lock variable and attach itself to the end of a list of CPUs waiting for the
lock. When the current lock holder exits the critical region, it frees the private lock
that the first CPU on the list is testing (in its own cache).
This CPU then enters the
critical region. When it is done, it frees the lock its successor is using, and so on.
Although the protocol is somewhat complicated (to avoid having two CPUs attach
themselves to the end of the list simultaneously), it is efficient and starvation free.
For all the details, readers should consult the paper.
CPU 3
CPU 3 spins on this (private) lock
CPU 4 spins on this (private) lock
CPU 2 spins on this (private) lock
When CPU 1 is finished with the
real lock, it releases it and also
releases the private lock CPU 2
is spinning on
CPU 1
holds the
real lock
Shared memory
4
2
3
1
Figure 8-11.
Use of multiple locks to avoid cache thrashing.
Spinning vs. Switching
So far we have assumed that a CPU needing a locked mutex just waits for it,
by polling continuously, polling intermittently, or attaching itself to a list of wait-
ing CPUs.
Sometimes, there is no alternative for the requesting CPU to just wait-
ing. For example, suppose that some CPU is idle and needs to access the shared
Page 569
538
MULTIPLE PROCESSOR SYSTEMS
CHAP. 8
ready list to pick a process to run.
If the ready list is locked, the CPU cannot just
decide to suspend what it is doing and run another process, as doing that would re-
quire reading the ready list.
It
must
wait until it can acquire the ready list.
However, in other cases, there is a choice. For example, if some thread on a
CPU needs to access the file system buffer cache and it is currently locked, the
CPU can decide to switch to a different thread instead of waiting. The issue of
whether to spin or to do a thread switch has been a matter of much research, some
of which will be discussed below. Note that this issue does not occur on a uniproc-
essor because spinning does not make much sense when there is no other CPU to
release the lock.
If a thread tries to acquire a lock and fails, it is always blocked to
give the lock owner a chance to run and release the lock.
Assuming that spinning and doing a thread switch are both feasible options,
the trade-off is as follows. Spinning wastes CPU cycles directly.
Testing a lock re-
peatedly is not productive work. Switching, however, also wastes CPU cycles,
since the current thread’s state must be saved, the lock on the ready list must be ac-
quired, a thread must be selected, its state must be loaded, and it must be started.
Furthermore, the CPU cache will contain all the wrong blocks, so many expensive
cache misses will occur as the new thread starts running.
TLB faults are also like-
ly.
Eventually, a switch back to the original thread must take place, with more
cache misses following it. The cycles spent doing these two context switches plus
all the cache misses are wasted.
If it is known that mutexes are generally held for, say, 50
μ
sec and it takes 1
msec to switch from the current thread and 1 msec to switch back later, it is more
efficient just to spin on the mutex. On the other hand, if the average mutex is held
for 10 msec, it is worth the trouble of making the two context switches. The trouble
is that critical regions can vary considerably in their duration, so which approach is
better?
One design is to always spin.
A second design is to always switch.
But a third
design is to make a separate decision each time a locked mutex is encountered. At
the time the decision has to be made, it is not known whether it is better to spin or
switch, but for any given system, it is possible to make a trace of all activity and
analyze it later offline. Then it can be said in retrospect which decision was the
best one and how much time was wasted in the best case. This hindsight algorithm
then becomes a benchmark against which feasible algorithms can be measured.
This problem has been studied by researchers for decades (Ousterhout, 1982).
Most work uses a model in which a thread failing to acquire a mutex spins for
some period of time.
If this threshold is exceeded, it switches.
In some cases the
threshold is fixed, typically the known overhead for switching to another thread
and then switching back.
In other cases it is dynamic, depending on the observed
history of the mutex being waited on.
The best results are achieved when the system keeps track of the last few
observed spin times and assumes that this one will be similar to the previous ones.
For example, assuming a 1-msec context switch time again, a thread will spin for a
Ace your assessments! Get Better Grades
Browse thousands of Study Materials & Solutions from your Favorite Schools
Concordia University
Concordia_University
School:
Operating_Systems
Course:
Introducing Study Plan
Using AI Tools to Help you understand and remember your course concepts better and faster than any other resource.
Find the best videos to learn every concept in that course from Youtube and Tiktok without searching.
Save All Relavent Videos & Materials and access anytime and anywhere
Prepare Smart and Guarantee better grades
Students also viewed documents
lab 18.docx
lab_18.docx
Course
Course
3
Module5QuizSTA2023.d...
Module5QuizSTA2023.docx.docx
Course
Course
10
Week 7 Test Math302....
Week_7_Test_Math302.docx.docx
Course
Course
30
Chapter 1 Assigment ...
Chapter_1_Assigment_Questions.docx.docx
Course
Course
5
Week 4 tests.docx.do...
Week_4_tests.docx.docx
Course
Course
23
Week 6 tests.docx.do...
Week_6_tests.docx.docx
Course
Course
106