Solaris UFS file system overview

Solaris UFS has its roots in the Berkeley FFS of the 1980s, although today's file system is the result of more than 20 years of enhancement, evolution, and stabilization. FFS was designed to overcome the problems inherent in the original UNIX® file system; principally poor performance due to a small fixed block size and a separation of metadata and data. These factors led to a mode of operation that required a great number of lengthy disk seeks.

For more detailed information on the components that make up a UFS file system, review article "Components of a UFS file system"

As a UNIX file system aged on disk, this problem got worse. There was only one superblock in the original UNIX file system, which meant that its corruption led to irretrievable damage, necessitating a full file system re-creation and data restoration. However, FFS was simple and lightweight, and the source code for it is still instructive today.

Work at Berkeley aimed to improve both reliability and throughput of the file system with the development of variable block sizes, a file system check program (fsck), and multiple superblocks.

The fundamental core of the Solaris OS is the original SunOS™ product. SunOS 4.x software was based on the 4.3 BSD UNIX distribution; SunOS 5.x software, which is the heart of the Solaris OS today, was first released in 1992, the result of a merge of BSD UNIX and AT&T UNIX System V by AT&T and Sun. UFS is a fundamental BSD technology that was introduced into System V UNIX as a result of this work.

Since then, work has continued on the development of UFS. At the same time, the development of the virtual file cache has replaced the traditional UNIX buffer cache. The principal difference between the two is the ability of the latter to act as a cache for files rather than disk blocks. This results in a performance gain, as the retrieval of cached data does not require the entire file system code path to be traversed. These developments are covered in detail in Solaris Internals: Core Kernel Architecture.

Logging in UFS

A major enhancement to UFS was implemented in 1994: File system metadata logging to provide better reliability and faster reboot times following a system crash or outage 9 . Logging enables the file system to be mounted and used immediately without lengthy file system checking. Originally, logging was provided as part of a volume management add-on product, Online DiskSuite (later Solstice DiskSuite) software. In 1998, logging functionality and volume management were incorporated into the base Solaris Operating System. Solstice DiskSuite logging used a separate disk partition to store the log. Now UFS logging embeds the log in the file system.

Since its inclusion in the Solaris OS, UFS logging has undergone continuous improvement. Performance often exceeds nonlogging performance. For example, under the metadata-intensive PostMark benchmark used for these tests, logging provides a 300-percent improvement on nonlogging transaction rates 10 . Users with any sizable file systems use logging regardless of performance considerations. Thus, it makes sense to enable it as the default option. It is enabled by default for file systems of less than one terabyte commencing with Solaris 9 9/04 release

UFS Logging Details

UFS logging does not log the contents of user files; it logs metadata - the structure of the file system. metadata is data that describes user file data, such as inodes, cylinder groups, block bitmaps, directories, and so on. After a crash, the file system remains intact, but the contents of individual files may be impacted. In common with most journaling file systems, UFS does not have the facility to log file data because of the negative effect this would have on performance.

The UFS log is a persistent, circular, append-only data store, occupying disk blocks within the file system. It is not visible to the user. The space used by the log shows up in df output but not in du output. Typically the log consumes one megabyte per gigabyte of file system space, up to a maximum of 64 megabytes.

UFS logging, in common with other logging file systems, borrows from database technology and adds the concept of transactions and two-phase commit to the file system. Changes to metadata - the transactions - are first journaled to the intent log. Then, and only then, they are committed to the file system. Upon reboot following a crash, the log is replayed and the file system rolls back (that is, ignores) incomplete transactions and applies complete transactions.

The advantage is that a file system that has previously suffered a crash can be made available for use far sooner than waiting for the traditional fsck to complete, because fsck must scan the entire file system to check its consistency. The time for fsck to complete is proportional to the size of the file system. In contrast, the time to replay the log is proportional to the size of the log. Generally this amounts to a few seconds.

Effect of UFS Logging on Performance

Logging was implemented in UFS to provide faster file system recovery times. A by-product of its implementation is faster processing of small files and metadata operations. This situation is somewhat counterintuitive; surely writing to a log and then writing to the file system should take longer than a single file system write?

Not necessarily. Performance can be positively impacted via the cancellation of some physical metadata operations. This occurs when these updates are issued very rapidly, such as when expanding a tar file or other archive. Another example is recursively deleting directories and their contents.

Without logging, the system is required to force the directory to disk after every file is processed (this is the definition of the phrase writing metadata synchronously); the effect is to write 512 or 2048 bytes every time 14 bytes is changed. When the file system is logging, the log record is pushed to disk when the log record fills, often when the 512-byte block is completed. This results in a 512/14 = 35 times reduction.

Essentially, logging not only provides faster file system recovery times, it also delivers significant performance gains in the specific areas of small file and metadata-intensive operations.

Space Management in UFS

UFS uses block allocation sizes of 4 KB and 8 KB, which provide significantly higher performance than the 512-byte blocks used in the System V file system. To overcome the potential disadvantage of wasting space in unused blocks, UFS uses the notion of file system fragments. Fragments allow a single block to be broken up into two, four, or eight fragments when necessary. The choice is made through the fragsize parameter to mkfs (see mkfs_ufs(1M)). If the block size is 4 KB, possible values are 512 bytes, 1 KB, 2 KB, and 4 KB. When the block size is 8 KB, legal values are 1 KB, 2 KB, 4 KB, and 8 KB. The default value is 1 KB.

References

  • Performance Benchmark Report: UNIX File System and VERITAS File System 3.5 on Solaris 9 Operating System 12/02 Release, Dominic Kay (2003), wwws.sun.com/software/whitepapers/solaris9/filesystem_benchmark.pdf
  • The Design and Implementation of the 4.3 Bsd Unix Operating System: Answer Book, Samuel J. Leffler and Marshall Kirk McKusick, Addison-Wesley (1991)
  • Solaris Internals: Core Kernel Architecture, Jim Mauro and Richard McDougall, Sun Microsystems Press, Prentice Hall (2000)
  • UNIX Filesystems: Evolution, Design, and Implementation, Steve D. Pate, Wiley (2003), doc.lagout.org
  • Sun QFS, Sun SAM-FS, and Sun SAM-QFS File System Administrator?s Guide, Sun Microsystems (2002), docs.sun.com/db/doc/816-2542-10?q=QFS
  • Design, Features, and Applicability of Solaris File Systems, Brian Wong (Sun Microsystems, 2004), www.sun.com/blueprints/0104/817-4971.pdf