CASE STUDY 1: UNIX, LINUX, AND ANDROID
Pointer to i-node
Pointer to i-node
The relation between the file-descriptor table, the open-file-de-
scription-table, and the i-node table.
even this is not enough, the i-node has space for a
triple indirect block
ers point to many double indirect blocks. This addressing scheme can handle file
sizes of 2
1-KB blocks (16 GB).
For 8-KB block sizes, the addressing scheme
can support file sizes up to 64 TB.
The Linux Ext4 File System
In order to prevent all data loss after system crashes and power failures, the
ext2 file system would have to write out each data block to disk as soon as it was
created. The latency incurred during the required disk-head seek operation would
be so high that the performance would be intolerable. Therefore, writes are delay-
ed, and changes may not be committed to disk for up to 30 sec, which is a very
long time interval in the context of modern computer hardware.
To improve the robustness of the file system, Linux relies on
, a successor of the ext2 file system, is an example of a journaling
, a follow-on of ext3, is also a journaling file system, but unlike
THE LINUX FILE SYSTEM
ext3, it changes the block addressing scheme used by its predecessors, thereby sup-
porting both larger files and larger overall file-system sizes. We will describe some
of its features next.
The basic idea behind a journaling file system is to maintain a
describes all file-system operations in sequential order. By sequentially writing out
changes to the file-system data or metadata (i-nodes, superblock, etc.), the opera-
tions do not suffer from the overheads of disk-head movement during random disk
accesses. Eventually, the changes will be written out, committed, to the appropriate
disk location, and the corresponding journal entries can be discarded.
If a system
crash or power failure occurs before the changes are committed, during restart the
system will detect that the file system was not unmounted properly, traverse the
journal, and apply the file-system changes described in the journal log.
Ext4 is designed to be highly compatible with ext2 and ext3, although its core
data structures and disk layout are modified. Regardless, a file system which has
been unmounted as an ext2 system can be subsequently mounted as an ext4 system
and offer the journaling capability.
The journal is a file managed as a circular buffer. The journal may be stored on
the same or a separate device from the main file system. Since the journal opera-
tions are not "journaled" themselves, these are not handled by the same ext4 file
system. Instead, a separate
Journaling Block Device
) is used to perform the
journal read/write operations.
JBD supports three main data structures:
atomic operation handle
A log record describes a low-level file-system operation, typically
resulting in changes within a block. Since a system call such as
changes at multiple places—i-nodes, existing file blocks, new file blocks, list of
free blocks, etc.—related log records are grouped in atomic operations. Ext4 noti-
fies JBD of the start and end of system-call processing, so that JBD can ensure that
either all log records in an atomic operation are applied, or none of them. Finally,
primarily for efficiency reasons, JBD treats collections of atomic operations as
transactions. Log records are stored consecutively within a transaction.
allow portions of the journal file to be discarded only after all log records be-
longing to a transaction are safely committed to disk.
Since writing out a log entry for each disk change may be costly, ext4 may be
configured to keep a journal of all disk changes, or only of changes related to the
file-system metadata (the i-nodes, superblocks, etc.).
Journaling only metadata
gives less system overhead and results in better performance but does not make any
guarantees against corruption of file data. Several other journaling file systems
maintain logs of only metadata operations (e.g., SGI’s XFS). In addition, the
reliability of the journal can be further improved via checksumming.
Key modification in ext4 compared to its predecessors is the use of
Extents represent contiguous blocks of storage, for instance 128 MB of contiguous
4-KB blocks vs. individual storage blocks, as referenced in ext2. Unlike its prede-
cessors, ext4 does not require metadata operations for each block of storage.