7269IV - 5th Semester - Computer Science and Engineering
7269IV - 5th Semester - Computer Science and Engineering
MODULE-IV
TEXT BOOK:
DISCLAIMER:
“THIS DOCUMENT DOES NOT CLAIM ANY ORIGINALITY AND CANNOT BE USED AS A
SUBSTITUTE FOR PRESCRIBED TEXTBOOKS. THE INFORMATION PRESENTED HERE IS
MERELY A COLLECTION FROM DIFFERENT REFERENCE BOOKS AND INTERNET
CONTENTS. THE OWNERSHIP OF THE INFORMATION LIES WITH THE RESPECTIVE
AUTHORS OR INSTITUTIONS.”
UNIT-IV
FILE SYSTEM
File concept:
Part of the OS dealing with the files is known as file system. The File system takes
care of the following issues
o File Structure
We have seen various data structures in which the file can be stored. The task
of the file system is to maintain an optimal file structure.
Whenever a file gets deleted from the hard disk, there is a free space created in
the disk. There can be many such spaces which need to be recovered in order
to reallocate them to other files.
The major concern about the file is deciding where to store the files on the hard
disk. There are various disks scheduling algorithm which will be covered later.
A File may or may not be stored within only one block. It can be stored in the
non contiguous blocks on the disk. We need to keep track of all the blocks on
which the part of the files reside.
byte
Record
Operation on a directory:
Search for a file: We need to be able to search a directory for a particular file.
Create a file: New files are created & added to the directory.
Delete a file: When a file is no longer needed, we may remove it from the
directory.
List a directory: We should be able to list the files of the directory.
Rename a file: The name of a file is changed when the contents of the file
changes.
Traverse the file system: It is useful to be able to access every directory &
every file within a directory.
Structure of a directory: The most common schemes for defining the
structure of the directory are:
1. Single level directory: It is the simplest directory structure. All files are present
Advantages
1. Implementation is very simple.
2. If the sizes of the files are very small then the searching becomes faster.
3. File creation, searching, deletion is very simple since we have only one
directory.
Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much
time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number
of files in the system because most of the Operating System limits the number
of characters used to construct the file name.
2. Two level directories: The solution to the name collision problem in single
level directory is to create a separate directory for each user. In a two level
directory structure, each user has its own user file directory. When a user logs
in, then master file directory is searched. It is indexed by user name & each
entry points to the UFD of that user.
Limitation: It solves name collision problem. But it isolates one user from
another. It is an advantage when users are completely independent. But it is a
disadvantage when the users need to access each other‘s files & co-operate
among themselves on a particular task.
level directory is a two level tree. So, the generalization is to extend the
directory structure to a tree of arbitrary height. It allows users to create their
own subdirectories & organize their files. Every file in the system has a unique
path name. It is the path from the root through all the sub-directories to a
specified file. Each user has a current directory. It contains most of the files that
are of current interest to the user. Path names can be of two types: An absolute
path name begins from the root directory & follows the path down to the
specified files. A relative path name defines the path from the current directory.
EX: If the current directory is root/spell/mail, then the relative path name is
prt/first & the absolute path name is root/ spell/ mail/ prt/ first. Here users can
access the files of other users also by specifying their path names.
Advantage: Searching is more efficient in this directory structure.
A tree structured directory system may consist of various levels therefore there is a
set of permissions assigned to each file and directory.
The permissions are R W X which are regarding reading, writing and the execution
of the files or directory. The permissions are assigned to three types of users: owner,
group and others.
There is a identification bit which differentiate between directory and file. For a
directory, it is d and for a file, it is dot (.)
Limitation: Now a file may have multiple absolute path names. So, distinct file
names may refer to the same file. Another problem occurs during deletion of a
shared file. When a file is removed by any one user. It may leave dangling
pointer to the non existing file. One serious problem in a cyclic graph structure
is ensuring that there are no cycles. To avoid these problems, some systems do
not allow shared directories or files. E.g. MS-DOS uses a tree structure rather
than a cyclic to avoid the problems associated with deletion. One approach for
deletion is to preserve the file until all references to it are deleted.
5. General graph directory: When links are added to an existing tree structured
directory, the tree structure is destroyed, resulting in a simple graph structure.
Linking is a technique that allows a file to appear in more than one directory.
The advantage is the simplicity of algorithm to transverse the graph &
determines when there are no more references to a file. But a similar problem
exists when we are trying to determine when a file can be deleted. Here also a
value zero in the reference count means that there are no more references to the
file or directory & the file can be deleted. But when cycle exists, the reference
count may be non-zero even when there are no references to the directory or file.
This occurs due to the possibility of self referencing (cycle) in the structure. So,
here we have to use garbage collection scheme to determine when the last
references to a file has been deleted & the space can be reallocated. It involves
two steps:
Transverse the entire file system & mark everything that can be accessed.
Everything that isn‘t marked is added to the list of free space.
But this process is extremely time consuming. It is only necessary due to
presence of cycles in the graph. So, a cyclic graph structure is easier to work
than this.
Read
Write
Execute
Append
Delete
List
Access lists and groups:
Various users may need different types of access to a file or directory. So,
we can associate an access lists with each file and directory to implement
identity dependent access. When a user access requests access to a
particular file, the OS checks the access list associated with that file. If that
user is granted the requested access, then the access is allowed. Otherwise, a
protection violation occurs & the user is denied access to the file. But the
main problem with access lists is their length. It is very tedious to construct
such a list. So, we use a condensed version of the access list by classifying
the users into 3 categories:
Owners: The user who created the file.
Group: A set of users who are sharing the files.
Others: All other users in the system.
Here only 3 fields are required to define protection. Each field is a
collection of bits each of which either allows or prevents the access. E.g.
The UNIX file system defines 3 fields of 3 bits each: r w x
r(read
access)
w(write
access)
x(execute
access)
Separate fields are kept for file owners, group & other users. So, a bit is
needed to record protection information for each file.
Allocation methods:
There are various methods which can be used to allocate disk space to the files.
Selection of an appropriate allocation method will significantly affect the
performance and efficiency of the system. Allocation method provides a way in
which the disk will be utilized and the files will be accessed.
We have mainly discussed 3 methods of allocating disk space which are widely
used.
1. Contiguous allocation:
If the blocks are allocated to the file in such a way that all the logical blocks of
the file get the contiguous physical block in the hard disk then such allocation
scheme is known as contiguous allocation.
In the image shown below, there are three files in the directory. The starting
block and the length of each file are mentioned in the table. We can check in the
table that the contiguous blocks are assigned to each file as per its need.
a. It requires each file to occupy a set of contiguous blocks on the disk.
b. Number of disk seeks required for accessing contiguously allocated file is
minimum.
c. The IBM VM/CMS OS uses contiguous allocation. Contiguous
allocation of a file is defined by the disk address and length (in terms
of block units).
d. If the file is ‘n‘ blocks long and starts at location ‘b‘, then it occupies blocks
b, b+1, b+2,----,b+ n-1.
e. The directory for each file indicates the address of the starting block
sequential access, the file system remembers the disk address of the
last block referenced and reads the next block when necessary.
g. For direct access to block i of a file that starts at block b we can immediately
access block b+i
Problems: One difficulty with contiguous allocation is finding space for a
new file. It also suffers from the problem of external fragmentation. As
files are deleted and allocated, the free disk space is broken into small
pieces. A major problem in contiguous allocation is how much space is
needed for a file. When a file is created, the total amount of space it will
need must be found and allocated. Even if the total amount of space
needed for a file is known in advances, pre-allocation is inefficient.
Because a file that grows very slowly must be allocated enough space for
its final size even though most of that space is left unused for a long period
time. Therefore, the file has a large amount of internal fragmentation.
2. Linked List Allocation
Linked List allocation solves all problems of contiguous allocation. In linked list
allocation, each file is considered as the linked list of disk blocks. However, the disks
blocks allocated to a particular file need not to be contiguous on the disk. Each disk
block allocated to a file contains a pointer which points to the next disk block
allocated to the same file.
a. Linked allocation solves all problems of contiguous allocation.
b. In linked allocation, each file is linked list of disk blocks, which are
management system and this new block is written to & linked to the
end of the file.
g. To read a file, we read blocks by following the pointers from block to block.
h. There is no external fragmentation with linked allocation & any
free block can be used to satisfy a request.
i. Also there is no need to declare the size of a file when that file is
created. A file can continue to grow as long as there are free blocks.
Disadvantages
1. Random Access is not provided.
2. Pointers require some space in the disk blocks.
3. Any of the pointers in the linked list must not be broken otherwise the file will
get corrupted.
4. Need to traverse each block.
3. Indexed Allocation:
Instead of maintaining a file allocation table of all the disk pointers, Indexed
allocation scheme stores all the disk pointers in one of the blocks called as indexed
block. Indexed block doesn't hold the file data, but it holds the pointers to all the
disk blocks allocated to that particular file. Directory entry will only contain the
index block address.
a. Indexed allocation solves the problem of linked allocation by bringing all
addresses. The ith entry in the index block points to the ith block of the file.
c. The directory contains the address of the index block. The read the ith block,
we use the pointer in the ith index block entry and read the desired block.
d. To write into the ith block, a free block is obtained from the free space
manager and its address is put in the ith index block entry.
e. Indexed allocation supports direct access without suffering external
fragmentation.
Operating System Concepts – 9th Edition 10.5 Silberschatz, Galvin and Gagne ©2013
As we know, a process needs two type of time, CPU time and IO time. For I/O, it
requests the Operating system to access the disk.
However, the operating system must be fare enough to satisfy each request and at
the same time, operating system must maintain the efficiency and speed of process
execution.
The technique that operating system uses to determine the request which is to be
satisfied next is called disk scheduling.
Seek time is the time taken in locating the disk arm to a specified track where the
read/write request will be satisfied.
Rotational Latency
It is the time taken by the desired sector to rotate itself to the position from where it
can access the R/W heads.
Transfer Time
It is the average of time spent by each request waiting for the IO operation.
The main purpose of disk scheduling algorithm is to select a disk request from the
queue of IO requests and decide the schedule when this request will be processed.
The list of various disks scheduling algorithm is given below. Each algorithm is
carrying some advantages and disadvantages. The limitation of each algorithm
leads to the evolution of a new algorithm.
It is the simplest Disk Scheduling algorithm. It services the IO requests in the order
in which they arrive. There is no starvation in this algorithm, every request is
serviced.
Disadvantages
o The scheme does not optimize the seek time.
o The request may come from different processes therefore there is the
possibility of inappropriate movement of the head.
Example
Consider the following disk request sequence for a disk with 100 tracks 45, 21, 67,
90, 4, 50, 89, 52, 61, 87, 25
Head pointer starting at 50 and moving in left direction. Find the number of head
movements in cylinders using FCFS scheduling.
Number of cylinders moved by the head
= (50-45)+(45-21)+(67-21)+(90-67)+(90-4)+(50-4)+(89-50)+(61-52)+(87-
61)+(87-25)
= 5 + 24 + 46 + 23 + 86 + 46 + 49 + 9 + 26 + 62
= 376
SSTF Scheduling Algorithm
Shortest seek time first (SSTF) algorithm selects the disk I/O
request which requires the least disk arm movement from its
current position regardless of the direction. It reduces the total
seek time as compared to FCFS.
Disadvantages
o It may cause starvation for some requests.
o Switching direction on the frequent basis slows the working
of algorithm.
o It is not the most optimal algorithm.
Example
Consider the following disk request sequence for a disk with 100
tracks
Solution:
Number of cylinders = (50-45)+(52-45)+(61-52)+(67-61)+(87-
67)+(89-87)+(90-89)+(90-25)+(25-21)+(21-4)
5 + 7 + 9 + 6 + 20 + 2 + 1 + 65 + 4 + 17 = 136
SCAN Algorithm
It is also called as Elevator Algorithm. In this algorithm, the disk arm moves into a
particular direction till the end, satisfying all the requests coming in its path,and
then it turns backand moves in the reverse direction satisfying requests coming in
its path.
It works in the way an elevator works, elevator moves in a direction completely till
the last floor of that direction and then turns back.
Example
Consider the following disk request sequence for a disk with 200 tracks
Head pointer starting at 54 and moving in left direction. Find the number of head
movements in cylinders using SCAN scheduling.
In C-SCAN algorithm, the arm of the disk moves in a particular direction servicing
requests until it reaches the last cylinder, then it jumps to the last cylinder of the
opposite direction without servicing any request then it turns back and start moving
in that direction servicing the remaining requests.
Example
Consider the following disk request sequence for a disk with 200 tracks
Head pointer starting at 54 and moving in left direction. Find the number of head
movements in cylinders using C-SCAN scheduling.
It is like SCAN scheduling Algorithm to some extant except the difference that, in
this scheduling algorithm, the arm of the disk stops moving inwards (or outwards)
when no more request in that direction exists. This algorithm tries to overcome the
overhead of SCAN algorithm which forces disk arm to move in one direction till
the end regardless of knowing if any request exists in the direction or not.
Example
Consider the following disk request sequence for a disk with 200 tracks
98, 137, 122, 183, 14, 133, 65, 78. Head pointer starting at 54 and moving in left
direction. Find the number of head movements in cylinders using LOOK
scheduling.
It is different from C SCAN algorithm in the sense that, C SCAN force the disk
arm to move till the last cylinder regardless of knowing whether any request is to
be serviced on that cylinder or not.
Example
Consider the following disk request sequence for a disk with 200
tracks
1. A Hard Disk has 63 sectors per track, 10 platters each with two recording
surfaces and 1000 cylinder. The address of a sector is given as a triple <C,
H, S> where C is cylinder number, H is the surface number and S is the
sector number. Thus the 0th sector address will be <0, 0, 0> and 1st sector as
<0, 0, 1> and so on. What is the total sector number corresponding to the
address<400,16,29>
Ans:
#sector=63
#platter =10
#surface=2
Then to skip to the 16th surface of the cylinder numbered 400, we need
to skip another= 16*63=1,008 sector.
2. An application loads 100 libraries at the startup loading each library requires
exactly one disk access. The seek time of the disk to random location is
given as 10ms.Rotational speed of the Disk is 6000rpm.If all 100 libraries
are loaded from random locations on the disk, how long does it take to load
all libraries?
(The time to transfer data from the disk block once the head has taken
positioned at the start up the block may be neglected)
Ans:
Seek time=10msec.
Problem-3
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The FCFS scheduling algorithm is used. The head
is initially at cylinder number 53. The cylinders are numbered from 0 to 199.
The total head movement (in number of cylinders) incurred while servicing
these requests is _______.
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The SSTF scheduling algorithm is used. The head
is initially at cylinder number 53 moving towards larger cylinder numbers on
its servicing pass. The cylinders are numbered from 0 to 199. The total
head movement (in number of cylinders) incurred while servicing these
requests is _______.
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The SCAN scheduling algorithm is used. The
head is initially at cylinder number 53 moving towards larger cylinder
numbers on its servicing pass. The cylinders are numbered from 0 to 199.
The total head movement (in number of cylinders) incurred while servicing
these requests is _______.
= 12 + 2 + 31 + 24 + 2 + 59 + 16 + 158 + 27
= 331
Problem-6
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The C-SCAN scheduling algorithm is used. The
head is initially at cylinder number 53 moving towards larger cylinder
numbers on its servicing pass. The cylinders are numbered from 0 to 199.
The total head movement (in number of cylinders) incurred while servicing
these requests is _______.
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The LOOK scheduling algorithm is used. The
head is initially at cylinder number 53 moving towards larger cylinder
numbers on its servicing pass. The cylinders are numbered from 0 to 199.
The total head movement (in number of cylinders) incurred while servicing
these requests is _______.
Consider a disk queue with requests for I/O to blocks on cylinders 98, 183,
41, 122, 14, 124, 65, 67. The C-LOOK scheduling algorithm is used. The
head is initially at cylinder number 53 moving towards larger cylinder
numbers on its servicing pass. The cylinders are numbered from 0 to 199.
The total head movement (in number of cylinders) incurred while servicing
these requests is _______.