0% found this document useful (0 votes)
71 views

File System Interface and Operations

The document discusses file systems and their interfaces. It covers file attributes, operations like creating and reading files, file structures, access methods like sequential and direct access, directory structures including single-level, two-level and tree-structured directories, and storage structures with partitions and volumes. It also discusses file types, protection, and linking files through hard and symbolic links.

Uploaded by

Sri Vardhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

File System Interface and Operations

The document discusses file systems and their interfaces. It covers file attributes, operations like creating and reading files, file structures, access methods like sequential and direct access, directory structures including single-level, two-level and tree-structured directories, and storage structures with partitions and volumes. It also discusses file types, protection, and linking files through hard and symbolic links.

Uploaded by

Sri Vardhan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT- V

File System Interface and Operations -Access methods, Directory Structure, Protection,
File System Structure, Allocation methods, kernel support for files, system calls for file I/O
operations open, create, read, write, close, lseek, stat, ioctl
Disk Management: Disk Scheduling Algorithms-FCFS, SSTF, SCAN, C-SCAN

FILE-SYSTEM INTERFACE

File Attributes

 Different OSes keep track of different file attributes, including:


o Name - Some systems give special significance to names, and particularly extensions (
.exe, .txt, etc. ), and some do not. Some extensions may be of significance to the OS ( .exe ),
and others only to certain applications ( .jpg )
o Identifier ( e.g. inode number )
o Type - Text, executable, other binary, etc.
o Location - on the hard drive.
o Size
o Protection
o Time & Date
o User ID

File Operations

 The file ADT supports many common operations:


o Creating a file
o Writing a file
o Reading a file
o Repositioning within a file
o Deleting a file
o Truncating a file.
 Most OSes require that files be opened before access and closed after all access is
complete. Normally the programmer must open and close files explicitly, but some rare
systems open the file automatically at first access. Information about currently open files is
stored in an open file table, containing for example:
o File pointer - records the current position in the file, for the next read or write access.
o File-open count - How many times has the current file been opened ( simultaneously
by different processes ) and not yet closed? When this counter reaches zero the file can be
removed from the table.
o Disk location of the file.
o Access rights
 Some systems provide support for file locking.
o A shared lock is for reading only.
o A exclusive lock is for writing as well as reading.
o An advisory lock is informational only, and not enforced. ( A "Keep Out" sign, which may be ignored. )
o A mandatory lock is enforced. ( A truly locked door. )
o UNIX used advisory locks, and Windows uses mandatory locks.

File Types

 Windows ( and some other systems ) use special file extensions to indicate the type of each file:

Common file types.

 Macintosh stores a creator attribute for each file, according to the program that first created it with the
create( ) system call.
 UNIX stores magic numbers at the beginning of certain files. ( Experiment with the "file" command,
especially in directories such as /bin and /dev )

File Structure

 Some files contain an internal structure, which may or may not be known to the OS.
 For the OS to support particular file formats increases the size and complexity of the OS.
 UNIX treats all files as sequences of bytes, with no further consideration of the internal structure. ( With
the exception of executable binary programs, which it must know how to load and find the first executable
statement, etc. )
 Macintosh files have two forks - a resource fork, and a data fork. The resource fork contains information
relating to the UI, such as icons and button images, and can be modified independently of the data fork,
which contains the code or data as appropriate.

Internal File Structure

 Disk files are accessed in units of physical blocks, typically 512 bytes or some power-of-two multiple
thereof. ( Larger physical disks use larger block sizes, to keep the range of block numbers within the range
of a 32-bit integer. )

124
 Internally files are organized in units of logical units, which may be as small as a single byte, or may
be a larger size corresponding to some data record or structure size.
 The number of logical units which fit into one physical block determines its packing, and has an
impact on the amount of internal fragmentation ( wasted space ) that occurs.
 As a general rule, half a physical block is wasted for each file, and the larger the block sizes the more
space is lost to internal fragmentation.

Access Methods

Sequential Access

 A sequential access file emulates magnetic tape operation, and generally supports a few operations:
o read next - read a record and advance the tape to the next position.
o write next - write a record and advance the tape to the next position.
o rewind
o skip n records - May or may not be supported. N may be limited to positive numbers, or may be
limited to +/- 1.

Sequential-access file.

Direct Access

 Jump to any record and read that record. Operations supported include:
o read n - read record number n. ( Note an argument is now required. )
o write n - write record number n. ( Note an argument is now required. )
o jump to record n - could be 0 or the end of file.
o Query current record - used to return back to this record later.
o Sequential access can be easily emulated using direct access. The inverse is complicated and inefficient.

Simulation of sequential access on a direct-access file.

Other Access Methods

 An indexed access scheme can be easily built on top of a direct access system. Very large files may require
a multi-tiered indexing scheme, i.e. indexes of indexes.

125
Example of index and relative files.

Directory Structure

Storage Structure

 A disk can be used in its entirety for a file system.


 Alternatively a physical disk can be broken up into multiple partitions, slices, or mini-disks, each of which
becomes a virtual disk and can have its own filesystem. ( or be used for raw storage, swap space, etc. )
 Or, multiple physical disks can be combined into one volume, i.e. a larger virtual disk, with its own
filesystem spanning the physical disks.

A typical file-system organization.

Directory Overview

 Directory operations to be supported include:


o Search for a file
o Create a file - add to the directory
o Delete a file - erase from the directory
o List a directory - possibly ordered in different ways.
o Rename a file - may change sorting order
o Traverse the file system.
126
Single-Level Directory

 Simple to implement, but each file must have a unique name.

Single-level directory.

Two-Level Directory

 Each user gets their own directory space.


 File names only need to be unique within a given user's directory.
 A master file directory is used to keep track of each users directory, and must be maintained when users are
added to or removed from the system.
 A separate directory is generally needed for system ( executable ) files.
 Systems may or may not allow users to access other directories besides their own
o If access to other directories is allowed, then provision must be made to specify the directory being
accessed.
o If access is denied, then special consideration must be made for users to run programs located in system
directories. A search path is the list of directories in which to search for executable programs, and can be
set uniquely for each user.

Two-level directory structure.

Tree-Structured Directories

 An obvious extension to the two-tiered directory structure, and the one with which we are all most
familiar.
 Each user / process has the concept of a current directory from which all ( relative ) searches take place.
 Files may be accessed using either absolute pathnames ( relative to the root of the tree ) or relative
pathnames ( relative to the current directory. )
 Directories are stored the same as any other file in the system, except there is a bit that identifies them as
directories, and they have some special structure that the OS understands.
 One question for consideration is whether or not to allow the removal of directories that are not empty -
Windows requires that directories be emptied first, and UNIX provides an option for deleting entire sub-
trees.

127
Tree-structured directory structure.

Acyclic-Graph Directories

 When the same files need to be accessed in more than one place in the directory structure ( e.g.
because they are being shared by more than one user / process ), it can be useful to provide an acyclic-
graph structure. ( Note the directed arcs from parent to child. )
 UNIX provides two types of links for implementing the acyclic-graph structure. ( See "man ln" for
more details. )
o A hard link ( usually just called a link ) involves multiple directory entries that both refer to the same
file. Hard links are only valid for ordinary files in the same filesystem.
o A symbolic link, that involves a special file, containing information about where to find the linked
file. Symbolic links may be used to link directories and/or files in other filesystems, as well as ordinary
files in the current filesystem.
 Windows only supports symbolic links, termed shortcuts.
 Hard links require a reference count, or link count for each file, keeping track of how many directory
entries are currently referring to this file. Whenever one of the references is removed the link count is
reduced, and when it reaches zero, the disk space can be reclaimed.
 For symbolic links there is some question as to what to do with the symbolic links when the original
file is moved or deleted:
o One option is to find all the symbolic links and adjust them also.
o Another is to leave the symbolic links dangling, and discover that they are no longer valid the next
time they are used.
o What if the original file is removed, and replaced with another file having the same name before the
symbolic link is next used?

128
Acyclic-graph directory structure.

General Graph Directory

 If cycles are allowed in the graphs, then several problems can arise:
o Search algorithms can go into infinite loops. One solution is to not follow links in search algorithms. (
Or not to follow symbolic links, and to only allow symbolic links to refer to directories. )
o Sub-trees can become disconnected from the rest of the tree and still not have their reference counts
reduced to zero. Periodic garbage collection is required to detect and resolve this problem. ( chkdsk in DOS
and fsck in UNIX search for these problems, among others, even though cycles are not supposed to be
allowed in either system. Disconnected disk blocks that are not marked as free are added back to the file
systems with made-up file names, and can usually be safely deleted. )

General graph directory.

File-System Mounting

 The basic idea behind mounting file systems is to combine multiple file systems into one large tree
structure.
 The mount command is given a filesystem to mount and a mount point ( directory ) on which to attach it.
 Once a file system is mounted onto a mount point, any further references to that directory actually refer to
the root of the mounted file system.
 Any files ( or sub-directories ) that had been stored in the mount point directory prior to mounting the new
filesystem are now hidden by the mounted filesystem, and are no longer available. For this reason some
systems only allow mounting onto empty directories.

129
 Filesystems can only be mounted by root, unless root has previously configured certain filesystems to be
mountable onto certain pre-determined mount points. ( E.g. root may allow users to mount floppy
filesystems to /mnt or something like it. ) Anyone can run the mount command to see what filesystems are
currently mounted.
 Filesystems may be mounted read-only, or have other restrictions imposed.

File system. (a) Existing system. (b) Unmounted volume.

Mount point.

 The traditional Windows OS runs an extended two-tier directory structure, where the first tier of the
structure separates volumes by drive letters, and a tree structure is implemented below that level.
 Macintosh runs a similar system, where each new volume that is found is automatically mounted and added
to the desktop when it is found.
 More recent Windows systems allow filesystems to be mounted to any directory in the filesystem, much
like UNIX.

File Sharing

Multiple Users

 On a multi-user system, more information needs to be stored for each file:


o The owner ( user ) who owns the file, and who can control its access.
o The group of other user IDs that may have some special access to the file.
o What access rights are afforded to the owner ( User ), the Group, and to the rest of the world ( the universe,
a.k.a. Others. )
o Some systems have more complicated access control, allowing or denying specific accesses to specifically
named users or groups.

130
Remote File Systems

 The advent of the Internet introduces issues for accessing files stored on remote computers
o The original method was ftp, allowing individual files to be transported across systems as needed. Ftp can
be either account and password controlled, or anonymous, not requiring any user name or password.
o Various forms of distributed file systems allow remote file systems to be mounted onto a local directory
structure, and accessed using normal file access commands. ( The actual files are still transported across the
network as needed, possibly using ftp as the underlying transport mechanism. )
o The WWW has made it easy once again to access files on remote systems without mounting their
filesystems, generally using ( anonymous ) ftp as the underlying file transport mechanism.

The Client-Server Model

 When one computer system remotely mounts a filesystem that is physically located on another system, the
system which physically owns the files acts as a server, and the system which mounts them is the client.
 User IDs and group IDs must be consistent across both systems for the system to work properly. ( I.e. this
is most applicable across multiple computers managed by the same organization, shared by a common
group of users. )
 The same computer can be both a client and a server. ( E.g. cross-linked file systems. )
 There are a number of security concerns involved in this model:
o Servers commonly restrict mount permission to certain trusted systems only. Spoofing ( a computer
pretending to be a different computer ) is a potential security risk.
o Servers may restrict remote access to read-only.
o Servers restrict which filesystems may be remotely mounted. Generally the information within those
subsystems is limited, relatively public, and protected by frequent backups.
 The NFS ( Network File System ) is a classic example of such a system.

Distributed Information Systems

 The Domain Name System, DNS, provides for a unique naming system across all of the Internet.
 Domain names are maintained by the Network Information System, NIS, which unfortunately has several
security issues. NIS+ is a more secure version, but has not yet gained the same widespread acceptance as
NIS.
 Microsoft's Common Internet File System, CIFS, establishes a network login for each user on a
networked system with shared file access. Older Windows systems used domains, and newer systems ( XP,
2000 ), use active directories. User names must match across the network for this system to be valid.
 A newer approach is the Lightweight Directory-Access Protocol, LDAP, which provides a secure single
sign-on for all users to access all resources on a network. This is a secure system which is gaining in
popularity, and which has the maintenance advantage of combining authorization information in one
central location.

Failure Modes

 When a local disk file is unavailable, the result is generally known immediately, and is generally non-
recoverable. The only reasonable response is for the response to fail.
 However when a remote file is unavailable, there are many possible reasons, and whether or not it is
unrecoverable is not readily apparent. Hence most remote access systems allow for blocking or delayed
response, in the hopes that the remote system ( or the network ) will come back up eventually.

131
Consistency Semantics

 Consistency Semantics deals with the consistency between the views of shared files on a networked
system. When one user changes the file, when do other users see the changes?

UNIX Semantics

 The UNIX file system uses the following semantics:


o Writes to an open file are immediately visible to any other user who has the file open.
o One implementation uses a shared location pointer, which is adjusted for all sharing users.
 The file is associated with a single exclusive physical resource, which may delay some accesses.

Session Semantics

 The Andrew File System, AFS uses the following semantics:


o Writes to an open file are not immediately visible to other users.
o When a file is closed, any changes made become available only to users who open the file at a later time.
 According to these semantics, a file can be associated with multiple ( possibly different ) views. Almost no
constraints are imposed on scheduling accesses. No user is delayed in reading or writing their personal
copy of the file.
 AFS file systems may be accessible by systems around the world. Access control is maintained through
( somewhat ) complicated access control lists, which may grant access to the entire world ( literally ) or to
specifically named users accessing the files from specifically named remote environments.

Immutable-Shared-Files Semantics

 Under this system, when a file is declared as shared by its creator, it becomes immutable and the name
cannot be re-used for any other resource. Hence it becomes read-only, and shared access is simple.

Protection

 Files must be kept safe for reliability ( against accidental damage ), and protection ( against deliberate
malicious access. ) The former is usually managed with backup copies. This section discusses the latter.
 One simple protection scheme is to remove all access to a file. However this makes the file unusable, so
some sort of controlled access must be arranged.

Types of Access

 The following low-level operations are often controlled:


o Read - View the contents of the file
o Write - Change the contents of the file.
o Execute - Load the file onto the CPU and follow the instructions contained therein.
o Append - Add to the end of an existing file.
o Delete - Remove a file from the system.
o List -View the name and other attributes of files on the system.
 Higher-level operations, such as copy, can generally be performed through combinations of the
above.

Access Control
132
 One approach is to have complicated Access Control Lists, ACL, which specify exactly what access
is allowed or denied for specific users or groups.
o The AFS uses this system for distributed access.
o Control is very finely adjustable, but may be complicated, particularly when the specific users
involved are unknown. ( AFS allows some wild cards, so for example all users on a certain
remote system may be trusted, or a given username may be trusted when accessing from any
remote system. )
 UNIX uses a set of 9 access control bits, in three groups of three. These correspond to R, W, and X
permissions for each of the Owner, Group, and Others. ( See "man chmod" for full details. ) The
RWX bits control the following privileges for ordinary files and directories:

bit Files Directories


Read ( view )
R Read directory contents. Required to get a listing of the directory.
file contents.
Write ( change )
Change directory contents. Required to create or delete files.
file contents.
Access detailed directory information. Required to get a long listing, or to access any
Execute file
specific file in the directory. Note that if a user has X but not R permissions on a
X contents as a
directory, they can still access specific files, but only if they already know the name of
program.
the file they are trying to access.

 In addition there are some special bits that can also be applied:
o The set user ID ( SUID ) bit and/or the set group ID ( SGID ) bits applied to executable files
temporarily change the identity of whoever runs the program to match that of the owner /
group of the executable program. This allows users running specific programs to have access
to files ( while running that program ) to which they would normally be unable to access.
Setting of these two bits is usually restricted to root, and must be done with caution, as it
introduces a potential security leak.
o The sticky bit on a directory modifies write permission, allowing users to only delete files for
which they are the owner. This allows everyone to create files in /tmp, for example, but to
only delete files which they have created, and not anyone else's.
o The SUID, SGID, and sticky bits are indicated with an S, S, and T in the positions for execute
permission for the user, group, and others, respectively. If the letter is lower case, ( s, s, t ),
then the corresponding execute permission is not also given. If it is upper case, ( S, S, T ),
then the corresponding execute permission IS given.
o The numeric form of chmod is needed to set these advanced bits.

133
Sample permissions in a UNIX system.

 Windows adjusts files access through a simple GUI:

Windows 7 access-control list management.

Other Protection Approaches and Issues

 Some systems can apply passwords, either to individual files, or to specific sub-directories, or to the
entire system. There is a trade-off between the number of passwords that must be maintained ( and
remembered by the users ) and the amount of information that is vulnerable to a lost or forgotten
password.
 Older systems which did not originally have multi-user file access permissions ( DOS and older
versions of Mac ) must now be retrofitted if they are to share files on a network.
 Access to a file requires access to all the files along its path as well. In a cyclic directory structure,
users may have different access to the same file accessed through different paths.
 Sometimes just the knowledge of the existence of a file of a certain name is a security ( or privacy )
concern. Hence the distinction between the R and X bits on UNIX directories.

FILE-SYSTEM STRUCTURE

 Hard disks have two important properties that make them suitable for secondary storage of files in
file systems: (1) Blocks of data can be rewritten in place, and (2) they are direct access, allowing any
block of data to be accessed with only ( relatively ) minor movements of the disk heads and rotational
latency.
 Disks are usually accessed in physical blocks, rather than a byte at a time. Block sizes may range
from 512 bytes to 4K or larger.
 File systems organize storage on disk drives, and can be viewed as a layered design:
o At the lowest layer are the physical devices, consisting of the magnetic media, motors & controls, and
the electronics connected to them and controlling them. Modern disk put more and more of the electronic

134
controls directly on the disk drive itself, leaving relatively little work for the disk controller card to
perform.
o I/O Control consists of device drivers, special software programs ( often written in assembly ) which
communicate with the devices by reading and writing special codes directly to and from memory
addresses corresponding to the controller card's registers. Each controller card ( device ) on a system has
a different set of addresses ( registers, a.k.a. ports ) that it listens to, and a unique set of command codes
and results codes that it understands.
o The basic file system level works directly with the device drivers in terms of retrieving and storing
raw blocks of data, without any consideration for what is in each block. Depending on the system, blocks
may be referred to with a single block number, ( e.g. block # 234234 ), or with head-sector-cylinder
combinations.
o The file organization module knows about files and their logical blocks, and how they map to
physical blocks on the disk. In addition to translating from logical to physical blocks, the file
organization module also maintains the list of free blocks, and allocates free blocks to files as needed.
o The logical file system deals with all of the meta data associated with a file ( UID, GID, mode, dates,
etc ), i.e. everything about the file except the data itself. This level manages the directory structure and
the mapping of file names to file control blocks, FCBs, which contain all of the meta data as well as
block number information for finding the data on the disk.
 The layered approach to file systems means that much of the code can be used uniformly for a wide
variety of different file systems, and only certain layers need to be filesystem specific. Common file
systems in use include the UNIX file system, UFS, the Berkeley Fast File System, FFS, Windows
systems FAT, FAT32, NTFS, CD-ROM systems ISO 9660, and for Linux the extended file systems ext2
and ext3 ( among 40 others supported.)

Figure 4.13- Layered file system.

FILE-SYSTEM IMPLEMENTATION
135
Overview

 File systems store several important data structures on the disk:


o A boot-control block, ( per volume ) a.k.a. the boot block in UNIX or the partition boot sector in
Windows contains information about how to boot the system off of this disk. This will generally be the
first sector of the volume if there is a bootable system loaded on that volume, or the block will be left
vacant otherwise.
o A volume control block, ( per volume ) a.k.a. the master file table in UNIX or the superblock in
Windows, which contains information such as the partition table, number of blocks on each filesystem,
and pointers to free blocks and free FCB blocks.
o A directory structure ( per file system ), containing file names and pointers to corresponding FCBs.
UNIX uses inode numbers, and NTFS uses a master file table.
o The File Control Block, FCB, ( per file ) containing details about ownership, size, permissions,
dates, etc. UNIX stores this information in inodes, and NTFS in the master file table as a relational
database structure.

A typical file-control block.

 There are also several key data structures stored in memory:


o An in-memory mount table.
o An in-memory directory cache of recently accessed directory information.
o A system-wide open file table, containing a copy of the FCB for every currently open file in the
system, as well as some other related information.
o A per-process open file table, containing a pointer to the system open file table as well as some other
information. ( For example the current file position pointer may be either here or in the system file table,
depending on the implementation and whether the file is being shared or not. )
 Figure below illustrates some of the interactions of file system components when files are created
and/or used:
o When a new file is created, a new FCB is allocated and filled out with important information
regarding the new file. The appropriate directory is modified with the new file name and FCB
information.
o When a file is accessed during a program, the open( ) system call reads in the FCB information from
disk, and stores it in the system-wide open file table. An entry is added to the per-process open file table
referencing the system-wide table, and an index into the per-process table is returned by the open( )
system call. UNIX refers to this index as a file descriptor, and Windows refers to it as a file handle.
136
o If another process already has a file open when a new request comes in for the same file, and it is
sharable, then a counter in the system-wide table is incremented and the per-process table is adjusted to
point to the existing entry in the system-wide table.
o When a file is closed, the per-process table entry is freed, and the counter in the system-wide table is
decremented. If that counter reaches zero, then the system wide table is also freed. Any data currently
stored in memory cache for this file is written out to disk if necessary.

- In-memory file-system structures. (a) File open. (b) File read.

Partitions and Mounting

 Physical disks are commonly divided into smaller units called partitions. They can also be combined
into larger units, but that is most commonly done for RAID installations and is left for later chapters.
 Partitions can either be used as raw devices ( with no structure imposed upon them ), or they can be
formatted to hold a filesystem ( i.e. populated with FCBs and initial directory structures as appropriate. )
Raw partitions are generally used for swap space, and may also be used for certain programs such as
databases that choose to manage their own disk storage system. Partitions containing filesystems can
generally only be accessed using the file system structure by ordinary users, but can often be accessed as
a raw device also by root.

137
 The boot block is accessed as part of a raw partition, by the boot program prior to any operating
system being loaded. Modern boot programs understand multiple OSes and filesystem formats, and can
give the user a choice of which of several available systems to boot.
 The root partition contains the OS kernel and at least the key portions of the OS needed to complete
the boot process. At boot time the root partition is mounted, and control is transferred from the boot
program to the kernel found there. ( Older systems required that the root partition lie completely within
the first 1024 cylinders of the disk, because that was as far as the boot program could reach. Once the
kernel had control, then it could access partitions beyond the 1024 cylinder boundary. )
 Continuing with the boot process, additional filesystems get mounted, adding their information into
the appropriate mount table structure. As a part of the mounting process the file systems may be checked
for errors or inconsistencies, either because they are flagged as not having been closed properly the last
time they were used, or just for general principals. Filesystems may be mounted either automatically or
manually. In UNIX a mount point is indicated by setting a flag in the in-memory copy of the inode, so all
future references to that inode get re-directed to the root directory of the mounted filesystem.

Virtual File Systems

 Virtual File Systems, VFS, provide a common interface to multiple different filesystem types. In
addition, it provides for a unique identifier ( vnode ) for files across the entire space, including across all
filesystems of different types. ( UNIX inodes are unique only across a single filesystem, and certainly do
not carry across networked file systems. )
 The VFS in Linux is based upon four key object types:
o The inode object, representing an individual file
o The file object, representing an open file.
o The superblock object, representing a filesystem.
o The dentry object, representing a directory entry.
 Linux VFS provides a set of common functionalities for each filesystem, using function pointers
accessed through a table. The same functionality is accessed through the same table position for all
filesystem types, though the actual functions pointed to by the pointers may be filesystem-specific. See
/usr/include/linux/fs.h for full details. Common operations provided include open( ), read( ), write( ), and
mmap( ).

Schematic view of a virtual file system.

138
DIRECTORY IMPLEMENTATION

 Directories need to be fast to search, insert, and delete, with a minimum of wasted disk space.

Linear List

 A linear list is the simplest and easiest directory structure to set up, but it does have some drawbacks.
 Finding a file ( or verifying one does not already exist upon creation ) requires a linear search.
 Deletions can be done by moving all entries, flagging an entry as deleted, or by moving the last entry
into the newly vacant position.
 Sorting the list makes searches faster, at the expense of more complex insertions and deletions.
 A linked list makes insertions and deletions into a sorted list easier, with overhead for the links.
 More complex data structures, such as B-trees, could also be considered.

Hash Table

 A hash table can also be used to speed up searches.


 Hash tables are generally implemented in addition to a linear or other structure

ALLOCATION METHODS

 There are three major methods of storing files on disks: contiguous, linked, and indexed.

Contiguous Allocation

 Contiguous Allocation requires that all blocks of a file be kept together contiguously.
 Performance is very fast, because reading successive blocks of the same file generally requires no
movement of the disk heads, or at most one small step to the next adjacent cylinder.
 Storage allocation involves the same issues discussed earlier for the allocation of contiguous blocks
of memory ( first fit, best fit, fragmentation problems, etc. ) The distinction is that the high time penalty
required for moving the disk heads from spot to spot may now justify the benefits of keeping files
contiguously when possible.
 ( Even file systems that do not by default store files contiguously can benefit from certain utilities that
compact the disk and make all files contiguous in the process. )
 Problems can arise when files grow, or if the exact size of a file is unknown at creation time:
o Over-estimation of the file's final size increases external fragmentation and wastes disk space.
o Under-estimation may require that a file be moved or a process aborted if the file grows beyond its
originally allocated space.
o If a file grows slowly over a long time period and the total final space must be allocated initially, then
a lot of space becomes unusable before the file fills the space.
 A variation is to allocate file space in large contiguous chunks, called extents. When a file outgrows
its original extent, then an additional one is allocated. ( For example an extent may be the size of a
complete track or even cylinder, aligned on an appropriate track or cylinder boundary. ) The high-
performance files system Veritas uses extents to optimize performance.

139
Contiguous allocation of disk space.

Linked Allocation

 Disk files can be stored as linked lists, with the expense of the storage space consumed by each link. (
E.g. a block may be 508 bytes instead of 512. )
 Linked allocation involves no external fragmentation, does not require pre-known file sizes, and
allows files to grow dynamically at any time.
 Unfortunately linked allocation is only efficient for sequential access files, as random access requires
starting at the beginning of the list for each new location access.
 Allocating clusters of blocks reduces the space wasted by pointers, at the cost of internal
fragmentation.
 Another big problem with linked allocation is reliability if a pointer is lost or damaged. Doubly linked
lists provide some protection, at the cost of additional overhead and wasted space.

Linked allocation of disk space.

The File Allocation Table, FAT, used by DOS is a variation of linked allocation, where all the links are
stored in a separate table at the beginning of the disk. The benefit of this approach is that the FAT table
can be cached in memory, greatly improving random access speeds.

140
File-allocation table.

Indexed Allocation

 Indexed Allocation combines all of the indexes for accessing each file into a common block ( for that
file ), as opposed to spreading them all over the disk or storing them in a FAT table.

Indexed allocation of disk space.

 Some disk space is wasted ( relative to linked lists or FAT tables ) because an entire index block must
be allocated for each file, regardless of how many data blocks the file contains. This leads to questions of
how big the index block should be, and how it should be implemented. There are several approaches:
o Linked Scheme - An index block is one disk block, which can be read and written in a single disk
operation. The first index block contains some header information, the first N block addresses, and if
necessary a pointer to additional linked index blocks.
o Multi-Level Index - The first index block contains a set of pointers to secondary index blocks, which
in turn contain pointers to the actual data blocks.
o Combined Scheme - This is the scheme used in UNIX inodes, in which the first 12 or so data block
pointers are stored directly in the inode, and then singly, doubly, and triply indirect pointers provide
access to more data blocks as needed. ( See below. ) The advantage of this scheme is that for small files (
which many are ), the data blocks are readily accessible ( up to 48K with 4K block sizes ); files up to
about 4144K ( using 4K blocks ) are accessible with only a single indirect block ( which can be cached ),
141
and huge files are still accessible using a relatively small number of disk accesses ( larger in theory than
can be addressed by a 32-bit address, which is why some systems have moved to 64-bit file pointers. )

The UNIX inode.

Performance

 The optimal allocation method is different for sequential access files than for random access files, and
is also different for small files than for large files.
 Some systems support more than one allocation method, which may require specifying how the file is
to be used ( sequential or random access ) at the time it is allocated. Such systems also provide conversion
utilities.
 Some systems have been known to use contiguous access for small files, and automatically switch to
an indexed scheme when file sizes surpass a certain threshold.
 And of course some systems adjust their allocation schemes ( e.g. block sizes ) to best match the
characteristics of the hardware for optimum performance.

FREE-SPACE MANAGEMENT

 Another important aspect of disk management is keeping track of and allocating free space.

Bit Vector

 One simple approach is to use a bit vector, in which each bit represents a disk block, set to 1 if free or
0 if allocated.
 Fast algorithms exist for quickly finding contiguous blocks of a given size
 The down side is that a 40GB disk requires over 5MB just to store the bitmap. ( For example. )

Linked List

 A linked list can also be used to keep track of all free blocks.
 Traversing the list and/or finding a contiguous block of a given size are not easy, but fortunately are
not frequently needed operations. Generally the system just adds and removes single blocks from the
beginning of the list.
 The FAT table keeps track of the free list as just one more linked list on the table.

142
Linked free-space list on disk.

Grouping

A variation on linked list free lists is to use links of blocks of indices of free blocks. If a block holds up to N
addresses, then the first block in the linked-list contains up to N-1 addresses of free blocks and a pointer to
the next block of free addresses.

Counting

 When there are multiple contiguous blocks of free space then the system can keep track of the starting
address of the group and the number of contiguous free blocks. As long as the average length of a
contiguous group of free blocks is greater than two this offers a savings in space needed for the free list. (
Similar to compression techniques used for graphics images when a group of pixels all the same color is
encountered. )

Space Maps

 Sun's ZFS file system was designed for HUGE numbers and sizes of files, directories, and even file
systems.
 The resulting data structures could be VERY inefficient if not implemented carefully. For example,
freeing up a 1 GB file on a 1 TB file system could involve updating thousands of blocks of free list bit maps
if the file was spread across the disk.
 ZFS uses a combination of techniques, starting with dividing the disk up into ( hundreds of )
metaslabs of a manageable size, each having their own space map.
 Free blocks are managed using the counting technique, but rather than write the information to a
table, it is recorded in a log-structured transaction record. Adjacent free blocks are also coalesced into a
larger single free block.
 An in-memory space map is constructed using a balanced tree data structure, constructed from the log
data.
 The combination of the in-memory tree and the on-disk log provide for very fast and efficient
management of these very large files and free blocks.

EFFICIENCY AND PERFORMANCE

Efficiency

143
 UNIX pre-allocates inodes, which occupies space even before any files are created.
 UNIX also distributes inodes across the disk, and tries to store data files near their inode, to reduce
the distance of disk seeks between the inodes and the data.
 Some systems use variable size clusters depending on the file size.
 The more data that is stored in a directory ( e.g. last access time ), the more often the directory blocks
have to be re-written.
 As technology advances, addressing schemes have had to grow as well.
o Sun's ZFS file system uses 128-bit pointers, which should theoretically never need to be expanded. (
The mass required to store 2^128 bytes with atomic storage would be at least 272 trillion kilograms! )
 Kernel table sizes used to be fixed, and could only be changed by rebuilding the kernels. Modern
tables are dynamically allocated, but that requires more complicated algorithms for accessing them.

Performance

 Disk controllers generally include on-board caching. When a seek is requested, the heads are moved
into place, and then an entire track is read, starting from whatever sector is currently under the heads (
reducing latency. ) The requested sector is returned and the unrequested portion of the track is cached in the
disk's electronics.
 Some OSes cache disk blocks they expect to need again in a buffer cache.
 A page cache connected to the virtual memory system is actually more efficient as memory addresses
do not need to be converted to disk block addresses and back again.
 Some systems ( Solaris, Linux, Windows 2000, NT, XP ) use page caching for both process pages
and file data in a unified virtual memory.
 Figures below show the advantages of the unified buffer cache found in some versions of UNIX and
Linux - Data does not need to be stored twice, and problems of inconsistent buffer information are avoided.

Figure - I/O without a unified buffer cache.

144
Figure I/O using a unified buffer cache.

 Page replacement strategies can be complicated with a unified cache, as one needs to decide whether
to replace process or file pages, and how many pages to guarantee to each category of pages. Solaris, for
example, has gone through many variations, resulting in priority paging giving process pages priority over
file I/O pages, and setting limits so that neither can knock the other completely out of memory.
 Another issue affecting performance is the question of whether to implement synchronous writes or
asynchronous writes. Synchronous writes occur in the order in which the disk subsystem receives them,
without caching; Asynchronous writes are cached, allowing the disk subsystem to schedule writes in a more
efficient order ( See Chapter 12. ) Metadata writes are often done synchronously. Some systems support
flags to the open call requiring that writes be synchronous, for example for the benefit of database systems
that require their writes be performed in a required order.
 The type of file access can also have an impact on optimal page replacement policies. For example,
LRU is not necessarily a good policy for sequential access files. For these types of files progression
normally goes in a forward direction only, and the most recently used page will not be needed again until
after the file has been rewound and re-read from the beginning, ( if it is ever needed at all. ) On the other
hand, we can expect to need the next page in the file fairly soon. For this reason sequential access files often
take advantage of two special policies:
o Free-behind frees up a page as soon as the next page in the file is requested, with the assumption that
we are now done with the old page and won't need it again for a long time.
o Read-ahead reads the requested page and several subsequent pages at the same time, with the
assumption that those pages will be needed in the near future. This is similar to the track caching that is
already performed by the disk controller, except it saves the future latency of transferring data from the disk
controller memory into motherboard main memory.
 The caching system and asynchronous writes speed up disk writes considerably, because the disk
subsystem can schedule physical writes to the disk to minimize head movement and disk seek times. ( See
Chapter 12. ) Reads, on the other hand, must be done more synchronously in spite of the caching system,
with the result that disk writes can counter-intuitively be much faster on average than disk reads.

MASS-STORAGE STRUCTURE

Overview of Mass-Storage Structure

Magnetic Disks

145
 Traditional magnetic disks have the following basic structure:
o One or more platters in the form of disks covered with magnetic media. Hard disk platters are made
of rigid metal, while "floppy" disks are made of more flexible plastic.
o Each platter has two working surfaces. Older hard disk drives would sometimes not use the very top
or bottom surface of a stack of platters, as these surfaces were more susceptible to potential damage.
o Each working surface is divided into a number of concentric rings called tracks. The collection of all
tracks that are the same distance from the edge of the platter, ( i.e. all tracks immediately above one another
in the following diagram ) is called a cylinder.
o Each track is further divided into sectors, traditionally containing 512 bytes of data each, although
some modern disks occasionally use larger sector sizes. ( Sectors also include a header and a trailer,
including checksum information among other things. Larger sector sizes reduce the fraction of the disk
consumed by headers and trailers, but increase internal fragmentation and the amount of disk that must be
marked bad in the case of errors. )
o The data on a hard drive is read by read-write heads. The standard configuration ( shown below )
uses one head per surface, each on a separate arm, and controlled by a common arm assembly which moves
all heads simultaneously from one cylinder to another. ( Other configurations, including independent read-
write heads, may speed up disk access, but involve serious technical difficulties. )
o The storage capacity of a traditional disk drive is equal to the number of heads ( i.e. the number of
working surfaces ), times the number of tracks per surface, times the number of sectors per track, times the
number of bytes per sector. A particular physical block of data is specified by providing the head-sector-
cylinder number at which it is located.

-Moving-head disk mechanism.

 In operation the disk rotates at high speed, such as 7200 rpm ( 120 revolutions per second. ) The rate
at which data can be transferred from the disk to the computer is composed of several steps:
o The positioning time, a.k.a. the seek time or random access time is the time required to move the
heads from one cylinder to another, and for the heads to settle down after the move. This is typically the
slowest step in the process and the predominant bottleneck to overall transfer rates.
o The rotational latency is the amount of time required for the desired sector to rotate around and come
under the read-write head.This can range anywhere from zero to one full revolution, and on the average will
equal one-half revolution. This is another physical step and is usually the second slowest step behind seek
time. ( For a disk rotating at 7200 rpm, the average rotational latency would be 1/2 revolution / 120
revolutions per second, or just over 4 milliseconds, a long time by computer standards.

146
o The transfer rate, which is the time required to move the data electronically from the disk to the
computer. ( Some authors may also use the term transfer rate to refer to the overall transfer rate, including
seek time and rotational latency as well as the electronic data transfer rate. )
 Disk heads "fly" over the surface on a very thin cushion of air. If they should accidentally contact the
disk, then a head crash occurs, which may or may not permanently damage the disk or even destroy it
completely. For this reason it is normal to park the disk heads when turning a computer off, which means to
move the heads off the disk or to an area of the disk where there is no data stored.
 Floppy disks are normally removable. Hard drives can also be removable, and some are even hot-
swappable, meaning they can be removed while the computer is running, and a new hard drive inserted in
their place.
 Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the common
interface formats include Enhanced Integrated Drive Electronics, EIDE; Advanced Technology Attachment,
ATA; Serial ATA, SATA, Universal Serial Bus, USB; Fiber Channel, FC, and Small Computer Systems
Interface, SCSI.
 The host controller is at the computer end of the I/O bus, and the disk controller is built into the disk
itself. The CPU issues commands to the host controller via I/O ports. Data is transferred between the
magnetic surface and onboard cache by the disk controller, and then the data is transferred from that cache
to the host controller and the motherboard memory at electronic speeds.

Solid-State Disks - New

 As technologies improve and economics change, old technologies are often used in different ways.
One example of this is the increasing used of solid state disks, or SSDs.
 SSDs use memory technology as a small fast hard disk. Specific implementations may use either
flash memory or DRAM chips protected by a battery to sustain the information through power cycles.
 Because SSDs have no moving parts they are much faster than traditional hard drives, and certain
problems such as the scheduling of disk accesses simply do not apply.
 However SSDs also have their weaknesses: They are more expensive than hard drives, generally not
as large, and may have shorter life spans.
 SSDs are especially useful as a high-speed cache of hard-disk information that must be accessed
quickly. One example is to store filesystem meta-data, e.g. directory and inode information, that must be
accessed quickly and often. Another variation is a boot disk containing the OS and some application
executables, but no vital user data. SSDs are also used in laptops to make them smaller, faster, and lighter.
 Because SSDs are so much faster than traditional hard disks, the throughput of the bus can become a
limiting factor, causing some SSDs to be connected directly to the system PCI bus for example.

Magnetic Tapes - Magnetic tapes were once used for common secondary storage before the days of

hard disk drives, but today are used primarily for backups.

 Accessing a particular spot on a magnetic tape can be slow, but once reading or writing commences,
access speeds are comparable to disk drives.
 Capacities of tape drives can range from 20 to 200 GB, and compression can double that capacity.

Disk Structure

 The traditional head-sector-cylinder, HSC numbers are mapped to linear block addresses by
numbering the first sector on the first head on the outermost track as sector 0. Numbering proceeds with the
rest of the sectors on that same track, and then the rest of the tracks on the same cylinder before proceeding
147
through the rest of the cylinders to the center of the disk. In modern practice these linear block addresses are
used in place of the HSC numbers for a variety of reasons:
1. The linear length of tracks near the outer edge of the disk is much longer than for those tracks located
near the center, and therefore it is possible to squeeze many more sectors onto outer tracks than onto inner
ones.
2. All disks have some bad sectors, and therefore disks maintain a few spare sectors that can be used in
place of the bad ones. The mapping of spare sectors to bad sectors in managed internally to the disk
controller.
3. Modern hard drives can have thousands of cylinders, and hundreds of sectors per track on their
outermost tracks. These numbers exceed the range of HSC numbers for many ( older ) operating systems,
and therefore disks can be configured for any convenient combination of HSC values that falls within the
total number of sectors physically on the drive.
 There is a limit to how closely packed individual bits can be placed on a physical media, but that limit
is growing increasingly more packed as technological advances are made.
 Modern disks pack many more sectors into outer cylinders than inner ones, using one of two
approaches:
o With Constant Linear Velocity, CLV, the density of bits is uniform from cylinder to cylinder.
Because there are more sectors in outer cylinders, the disk spins slower when reading those cylinders,
causing the rate of bits passing under the read-write head to remain constant. This is the approach used by
modern CDs and DVDs.
o With Constant Angular Velocity, CAV, the disk rotates at a constant angular speed, with the bit
density decreasing on outer cylinders. ( These disks would have a constant number of sectors per track on
all cylinders. )

DISK ATTACHMENT

Disk drives can be attached either directly to a particular host ( a local disk ) or to a network.

Host-Attached Storage

 Local disks are accessed through I/O Ports as described earlier.


 The most common interfaces are IDE or ATA, each of which allow up to two drives per host
controller.
 SATA is similar with simpler cabling.
 High end workstations or other systems in need of larger number of disks typically use SCSI disks:
o The SCSI standard supports up to 16 targets on each SCSI bus, one of which is generally the host
adapter and the other 15 of which can be disk or tape drives.
o A SCSI target is usually a single drive, but the standard also supports up to 8 units within each target.
These would generally be used for accessing individual disks within a RAID array. ( See below. )
o The SCSI standard also supports multiple host adapters in a single computer, i.e. multiple SCSI
busses.
o Modern advancements in SCSI include "fast" and "wide" versions, as well as SCSI-2.
o SCSI cables may be either 50 or 68 conductors. SCSI devices may be external as well as internal.
o See wikipedia for more information on the SCSI interface.
 FC is a high-speed serial architecture that can operate over optical fiber or four-conductor copper
wires, and has two variants:
o A large switched fabric having a 24-bit address space. This variant allows for multiple devices and
multiple hosts to interconnect, forming the basis for the storage-area networks, SANs, to be discussed in a
future section.
148
o The arbitrated loop, FC-AL, that can address up to 126 devices ( drives and controllers. )

Network-Attached Storage

 Network attached storage connects storage devices to computers using a remote procedure call, RPC,
interface, typically with something like NFS filesystem mounts. This is convenient for allowing several
computers in a group common access and naming conventions for shared storage.
 NAS can be implemented using SCSI cabling, or ISCSI uses Internet protocols and standard network
connections, allowing long-distance remote access to shared files.
 NAS allows computers to easily share data storage, but tends to be less efficient than standard host-
attached storage.

Figure 4.26- Network-attached storage.

Storage-Area Network

 A Storage-Area Network, SAN, connects computers and storage devices in a network, using storage
protocols instead of network protocols.
 One advantage of this is that storage access does not tie up regular networking bandwidth.
 SAN is very flexible and dynamic, allowing hosts and devices to attach and detach on the fly.
 SAN is also controllable, allowing restricted access to certain hosts and devices.

Figure - Storage-area network.

DISK SCHEDULING

 As mentioned earlier, disk transfer speeds are limited primarily by seek times and rotational latency.
When multiple requests are to be processed there is also some inherent delay in waiting for other requests to
be processed.
149
 Bandwidth is measured by the amount of data transferred divided by the total amount of time from
the first request being made to the last transfer being completed, ( for a series of disk requests. )
 Both bandwidth and access time can be improved by processing requests in a good order.
 Disk requests include the disk address, memory address, number of sectors to transfer, and whether
the request is for reading or writing.

FCFS Scheduling

 First-Come First-Serve is simple and intrinsically fair, but not very efficient. Consider in the
following sequence the wild swing from cylinder 122 to 14 and then back to 124:

Figure -FCFS disk scheduling.

SSTF Scheduling

 Shortest Seek Time First scheduling is more efficient, but may lead to starvation if a constant stream
of requests arrives for the same general area of the disk.
 SSTF reduces the total head movement to 236 cylinders, down from 640 required for the same set of
requests under FCFS. Note, however that the distance could be reduced still further to 208 by starting
with 37 and then 14 first before processing the rest of the requests.

Figure - SSTF disk scheduling.

SCAN Scheduling

150
 The SCAN algorithm, a.k.a. the elevator algorithm moves back and forth from one end of the disk to
the other, similarly to an elevator processing requests in a tall building.

Figure - SCAN disk scheduling.

 Under the SCAN algorithm, If a request arrives just ahead of the moving head then it will be
processed right away, but if it arrives just after the head has passed, then it will have to wait for the
head to pass going the other way on the return trip. This leads to a fairly wide variation in access
times which can be improved upon.
 Consider, for example, when the head reaches the high end of the disk: Requests with high cylinder
numbers just missed the passing head, which means they are all fairly recent requests, whereas
requests with low numbers may have been waiting for a much longer time. Making the return scan
from high to low then ends up accessing recent requests first and making older requests wait that
much longer.

C-SCAN Scheduling

 The Circular-SCAN algorithm improves upon SCAN by treating all requests in a circular queue
fashion - Once the head reaches the end of the disk, it returns to the other end without processing any
requests, and then starts again from the beginning of the disk:

Figure - C-SCAN disk scheduling.


151
LOOK Scheduling

 LOOK scheduling improves upon SCAN by looking ahead at the queue of pending requests, and not
moving the heads any farther towards the end of the disk than is necessary. The following diagram
illustrates the circular form of LOOK:

Figure 4.32- C-LOOK disk scheduling.

Selection of a Disk-Scheduling Algorithm

 With very low loads all algorithms are equal, since there will normally only be one request to process
at a time.
 For slightly larger loads, SSTF offers better performance than FCFS, but may lead to starvation when
loads become heavy enough.
 For busier systems, SCAN and LOOK algorithms eliminate starvation problems.
 The actual optimal algorithm may be something even more complex than those discussed here, but
the incremental improvements are generally not worth the additional overhead.
 Some improvement to overall filesystem access times can be made by intelligent placement of
directory and/or inode information. If those structures are placed in the middle of the disk instead of
at the beginning of the disk, then the maximum distance from those structures to data blocks is
reduced to only one-half of the disk size. If those structures can be further distributed and furthermore
have their data blocks stored as close as possible to the corresponding directory structures, then that
reduces still further the overall time to find the disk block numbers and then access the corresponding
data blocks.
 On modern disks the rotational latency can be almost as significant as the seek time, however it is not
within the OSes control to account for that, because modern disks do not reveal their internal sector
mapping schemes, ( particularly when bad blocks have been remapped to spare sectors. )
o Some disk manufacturers provide for disk scheduling algorithms directly on their disk
controllers, ( which do know the actual geometry of the disk as well as any remapping ), so
that if a series of requests are sent from the computer to the controller then those requests can
be processed in an optimal order.
o Unfortunately there are some considerations that the OS must take into account that are
beyond the abilities of the on-board disk-scheduling algorithms, such as priorities of some
requests over others, or the need to process certain requests in a particular order. For this
reason OSes may elect to spoon-feed requests to the disk controller one at a time in certain
situations.

152

You might also like