- CPUs keep getting faster, disks are becoming much bigger and cheaper (but not much faster), and memories are growing exponentially in size.
- The one parameter that is not improving and bounds is disk seek time. A performance bottleneck is arising in many file systems.
- The idea that drove the LFS (the Log-structured File System) design is that as CPUs get faster and RAM memories get larger, disk caches are also increasing rapidly. Consequently, it is now possible to satisfy a very substantial fraction of all read requests directly from the file system cache, with no disk access needed.
- Most disk accesses will be writes. In most file systems, writes are done in very small chunks. Small writes are highly inefficient, since a 50-sec disk write is often preceded by a 10-msec seek and a 4-msec rotational delay. With these parameters, disk efficiency drops to a fraction of 1 percent.
- While the writes can be delayed, doing so exposes the file system to serious consistency problems if a crash occurs before the writes are done.
- From this reasoning, the LFS designers decided to re-implement the UNIX file system in such a way as to achieve the full bandwidth of the disk. The basic idea is to structure the entire disk as a log.
- The logging algorithms have been also applied successfully to the problem of consistency checking.
- The resulting implementations are known as log-based transaction-oriented (or journaling) file systems.
- Such file systems are actually in use (NTFS, ext3, ReiserFS).
- Recall that a system crash can cause inconsistencies among on-disk file system data structures, such as directory structures, free-block pointers, and free FCB pointers.
- A typical operation, such as file create, can involve many structural changes within the file system on the disk.
- Directory structures are modified,
- FCBs are allocated,
- Data blocks are allocated,
- The free counts for all of these blocks are decreased.
- These changes can be interrupted by a crash, and inconsistencies among the structures can result.
- For example, the free FCB count might indicate that an FCB had been allocated, but the directory structure might not point to the FCB.
- The consistency check may not be able to recover the structures, resulting in loss of files and even entire directories.
- The solution to this problem is to apply log-based recovery techniques to file-system metadata updates.
- Both NTFS and the Veritas (improved UFS) file system use this method, and it is an optional addition to UFS on Solaris 7 and beyond.
- Fundamentally, all metadata changes are written sequentially to a log. Each set of operations for performing a specific task is a transaction.
- Once the changes are written to this log, they are considered to be committed, and the system call can return to the user process, allowing it to continue execution.
- Meanwhile, these log entries are replayed across the actual file system structures.
- As the changes are made, a pointer is updated to indicate which actions have completed and which are still incomplete.
- When an entire committed transaction is completed, it is removed from the log file, which is actually a circular buffer.
- The log may be in a separate section of the file system or even on a separate disk.
- If the system crashes, the log file will contain zero or more transactions.
- Any transactions it contains were not completed to the file system, even though they were committed by the OS, so they must now be completed.
- The transactions can be executed from the pointer until the work is complete so that the file-system structures remain consistent.
- The only problem occurs when a transaction was aborted -that is, was not committed before the system crashed.
- Any changes from such a transaction that were applied to the file system must be undone, again preserving the consistency of the file system.
- This recovery is all that is needed after a crash, eliminating any problems with consistency checking.
Cem Ozdogan
2010-05-11