CASE STUDY 1: UNIX, LINUX, AND ANDROID
Per CPU runqueue in the
Linux O(1) scheduler
(b) Per CPU red-black tree in
the CFS scheduler
Illustration of Linux runqueue data structures for (a) the Linux
O(1) scheduler, and (b) the Completely Fair Scheduler.
slow it down enormously.
It is far better to let it run immediately after each re-
quest is completed, so that it can make the next one quickly. Similarly, if a process
was blocked waiting for keyboard input, it is clearly an interactive process, and as
such should be given a high priority as soon as it is ready in order to ensure that
interactive processes get good service.
In this light, CPU-bound processes basical-
ly get any service that is left over when all the I/O bound and interactive processes
Since Linux (or any other OS) does not know a priori whether a task is I/O- or
CPU-bound, it relies on continuously maintaining interactivity heuristics.
manner, Linux distinguishes between static and dynamic priority. The threads’ dy-
namic priority is continuously recalculated, so as to (1) reward interactive threads,
and (2) punish CPU-hogging threads. In the O(1) scheduler, the maximum priority
5, since lower-priority values correspond to higher priority received by
the scheduler. The maximum priority penalty is +5.
The scheduler maintains a
variable associated with each task. Whenever a task is awakened, this
variable is incremented. Whenever a task is preempted or when its quantum ex-
pires, this variable is decremented by the corresponding value. This value is used
PROCESSES IN LINUX
to dynamically map the task’s bonus to values from
5 to +5. The scheduler recal-
culates the new priority level as a thread is moved from the active to the expired
The O(1) scheduling algorithm refers to the scheduler made popular in the
early versions of the 2.6 kernel, and was first introduced in the unstable 2.5 kernel.
Prior algorithms exhibited poor performance in multiprocessor settings and did not
scale well with an increased number of tasks. Since the description presented in the
above paragraphs indicates that a scheduling decision can be made through access
to the appropriate active list, it can be done in constant O(1) time, independent of
the number of processes in the system.
However, in spite of the desirable property
of constant-time operation, the O(1) scheduler had significant shortcomings.
notably, the heuristics used to determine the interactivity of a task, and therefore its
priority level, were complex and imperfect, and resulted in poor performance for
To address this issue, Ingo Molnar, who also created the O(1) scheduler, pro-
posed a new scheduler called
Completely Fair Scheduler
based on ideas originally developed by Con Kolivas for an earlier scheduler, and
was first integrated into the 2.6.23 release of the kernel. It is still the default sched-
uler for the non-real-time tasks.
The main idea behind CFS is to use a
as the runqueue data struc-
ture. Tasks are ordered in the tree based on the amount of time they spend running
on the CPU, called
CFS accounts for the tasks’ running time with
As shown in Fig. 10-10(b), each internal node in the tree
corresponds to a task. The children to the left correspond to tasks which had less
time on the CPU, and therefore will be scheduled sooner, and the children to the
right on the node are those that have consumed more CPU time thus far.
leaves in the tree do not play any role in the scheduler.
The scheduling algorithm can be summarized as follows. CFS always sched-
ules the task which has had least amount of time on the CPU, typically the leftmost
node in the tree. Periodically, CFS increments the task’s
value based on
the time it has already run, and compares this to the current leftmost node in the
tree. If the running task still has smaller
, it will continue to run. Other-
wise, it will be inserted at the appropriate place in the red-black tree, and the CPU
will be given to task corresponding to the new leftmost node.
To account for differences in task priorities and ‘‘niceness,’’ CFS changes the
effective rate at which a task’s virtual time passes when it is running on the CPU.
For lower-priority tasks, time passes more quickly, their
value will in-
crease more rapidly, and, depending on other tasks in the system, they will lose the
CPU and be reinserted in the tree sooner than if they had a higher priority value. In
this manner, CFS avoids using separate runqueue structures for different priority
In summary, selecting a node to run can be done in constant time, whereas
inserting a task in the runqueue is done in O(log(
)) time, where
is the number