Log-Structured File System (LFS)
Log-Structured File Systems were introduced by Rosenblum and Ousterhout in the early 90’s to address the following issues.
- Growing system memories:
With growing disk sizes, the amount of data that can be cached also increases. Since reads are serviced by the cache, the file system performance begins to depend solely on its write performance.
- Sequential I/O performance trumps over random I/O performance:
Over the years, the bandwidth of accessing bits off the hard drive has increased because more bits can be accommodated over the same area. However, it is physically difficult for the small rotors to move the disk more quickly. Therefore, sequential access can improve disk performance significantly.
- Inefficiency of existing file systems:
Existing file systems perform a large number of writes for as much as creating a new file, including inode, bitmap and data block writes and subsequent updates. The short seeks and rotational delays incurred reduces bandwidth.
- File systems are not RAID-aware:
Further, file systems do not have any mechanism to counter the small-write problem in RAID-4 and RAID-5.
Even though processor speeds and main memory sizes have increased at an exponential rate, disk access costs have evolved much more slowly. This calls for a file system which focusses on write performance, makes use of the sequential bandwidth, and works efficiently on both disk writes as well as metadata updates. This is where the motivation is Log-Structured File System (LFS) is rooted.
While all reads are impossible to be carried out sequentially (since any file may be accessed at any point of time), we can exploit the efficiency of sequential writes. LFS keeps a small buffer of all writes in a memory segment. A log is simply a data structure which is written only at the head (one could think of the entire disk as a log). Once the log is full, it is written into an unused part of the disk in a sequential manner. New data and metadata (inodes, directories) are accumulated into the buffer cache and written all at once in large blocks (such as segments of 0.5M or 1M).
The following are the data structures used in the LFS implementation.
As in Unix, inodes contain physical block pointers to files.
- Inode Map:
This table indicates the location of each inode on the disk. The inode map is written in the segment itself.
- Segment Summary:
This maintains information about each block in the segment.
- Segment Usage Table:
This tells us the amount of data on a block.
Sequential Writing to Disk:
Consider the following figure, showing a data block D written onto the disk at location A0. Along with the data block is the inode, which points to the data block D. Usually, data blocks are 4 KBs while inodes are about 128 bytes in size.
Efficient Sequential Writing to Disk:
However, simply writing sequentially to a disk is not enough to achieve efficiency. In order to understand the problem, consider that we wrote a data block D to address A0 at time T. Now, when we get the next data block at time T+t to be written to the A0+1, the disk has already rotated by some units. If the rotation time of the disk is , we must wait before writing the second block so that the two addresses (A0, A0+1) are contiguous.
The solution to this problem is simple — instead of waiting after every two consecutive data block writes, we can simply group some of the consecutive writes and store them temporarily in a segment, and thereafter write them all together onto the disk. So, instead of waiting for the disk to reposition after every data block, we wait for it to reposition after every x data blocks, where x is the capacity of the segment. The figure below illustrates this concept.
and are 4 updates to the same file j, which are written onto the disk at once. This is one of the set of updates buffered into the LFS. is an update to the file k, which written to the disk in the next rotation.
The Process in a Nutshell:
The LFS read process is the same as that in the Unix file systems after finding the inode for the file (which is saved in the inode map). The write process can be summarized as follows:
- Every write causes new blocks to be added to the current segment buffer in memory.
- When the segment is full, it is written onto the disk.
LFS also obliterates the aforementioned small-write problem in RAID-4 and RAID-5, since entire segments are written instead of small data blocks.
One of the issues that spring up is that segments in the log tend to get fragmented as old blocks of files are replaced with new ones. Since LFS produces old copies of the data scattered in various segments on the disk, these need to be cleared periodically. For this, a cleaner process “cleans” old segments. This cleaner takes multiple non-full segments and compacts them, creating one full segment, thereby freeing up space.