Osy 6
Osy 6
File management
Course Outcome:Apply File Management Techniques
File:It is necessary to store information on disks and other secondary storage in units called
file.
A file can be defined as a data structure which stores the sequence of records. Files are stored
in a file system, which may exist on a disk or in the main memory. Files can be simple (plain
text) or complex (specially-formatted).
The collection of files is known as Directory. The collection of directories at the different
levels, is known as File System.
1.Name
Every file carries a name by which the file is recognized in the file system. One directory
cannot have two files with the same name.
2.Identifier
Along with the name, Each File has its own extension which identifies the type of the file.
For example, a text file has the extension .txt, A video file can have the extension .mp4.
3.Type
In a File System, the Files are classified in different types such as video files, audio files, text
files, executable files, etc.
4.Location
In the File System, there are several locations on which, the files can be stored. Each file
carries its location as its attribute.
5.Size
The Size of the File is one of its most important attribute. By size of the file, we mean the
number of bytes acquired by the file in the memory.
6.Protection
The Admin of the computer may want the different protections for the different files.
Therefore each file carries its own set of permissions to the different group of Users.
Every file carries a time stamp which contains the time and date on which the file is last
modified.
Operations on the File:
A file is a collection of logically related data that is recorded on the secondary storage in the
form of sequence of operations. The content of the files are defined by its creator who is
creating the file. The various operations which can be implemented on a file such as read,
write, open and close etc. are called file operations. These operations are performed by the
user by using the commands provided by the operating system. Some common operations are
as follows:
1.Create operation:
This operation is used to create a file in the file system. It is the most widely used operation
performed on the file system. To create a new file of a particular type the associated
application program calls the file system. This file system allocates space to the file. As the
file system knows the format of directory structure, so entry of this new file is made into the
appropriate directory.
2. Open operation:
This operation is the common operation performed on the file. Once the file is created, it
must be opened before performing the file processing operations. When the user wants to
open a file, it provides a file name to open the particular file in the file system. It tells the
operating system to invoke the open system call and passes the file name to the file system.
3. Write operation:
This operation is used to write the information into a file. A system call write is issued that
specifies the name of the file and the length of the data has to be written to the file. Whenever
the file length is increased by specified value and the file pointer is repositioned after the last
byte written.
4. Read operation:
This operation reads the contents from a file. A Read pointer is maintained by the OS,
pointing to the position up to which the data has been read.
The seek system call re-positions the file pointers from the current position to a specific place
in the file i.e. forward or backward depending upon the user's requirement. This operation is
generally performed with those file management systems that support direct access files.
6. Delete operation:
Deleting the file will not only delete all the data stored inside the file it is also used so that
disk space occupied by it is freed. In order to delete the specified file the directory is
searched. When the directory entry is located, all the associated file space and the directory
entry is released.
7. Truncate operation:
Truncating is simply deleting the file except deleting attributes. The file is not completely
deleted although the information stored inside the file gets replaced.
8. Close operation:
When the processing of the file is complete, it should be closed so that all the changes made
permanent and all the resources occupied should be released. On closing it deallocates all the
internal descriptors that were created when the file was opened.
9. Append operation:
File Types:
1) Regular files
2) Directories
3) Character special files
4) Block special files
1)Regular File:
In computing, a regular file is a type of file that contains user-created data in a specific
format. Regular files can be text files, binary files, or any other type containing data that is
not a directory or a special file. These files are typically identified by their file extensions or
by their content. For example, a text file may have a .txt extension, and a JPEG image file
might have a .jpg extension.
2. Directory
Directory store both special as well as ordinary files. For the users who are familiar with Mac
or Windows operating systems, Unix directories are equivalent to the folders. A directory file
includes an entry file for each file and a subdirectory which it houses. If there are 5 files in a
directory, then there will be 5 entries in the directory. Every entry comprises of 2
components.
These are related to input/output and used to model serial I/O devices such as
terminals,printers and networks.
4)Block Special files: If we use the block special file for the device input/output (I/O), the
data is moved to the higher fixed-size blocks. This kind of access is known as block device
access.
This is one character at a time for the terminal devices. However, for disk devices, raw access
means reading or writing across an entire section of data-blocks that are native to our disk.
File System provide efficient access to the disk by allowing data to be stored, located and
retrieved in a convenient way. A file System must be able to store the file, locate the file and
retrieve the file.
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.
The image shown below, elaborates how the file system is divided in different layers, and
also the functionality of each layer.
o When an application program asks for a file, the first request is directed to the logical
file system. The logical file system contains the Meta data of the file and directory
structure. If the application program doesn't have the required permissions of the file
then this layer will throw an error. Logical file systems also verify the path to the file.
o Generally, files are divided into various logical blocks. Files are to be stored in the
hard disk and to be retrieved from the hard disk. Hard disk is divided into various
tracks and sectors. Therefore, in order to store and retrieve the files, the logical blocks
need to be mapped to physical blocks. This mapping is done by File organization
module. It is also responsible for free space management.
o Once File organization module decided which physical block the application program
needs, it passes this information to basic file system. The basic file system is
responsible for issuing the commands to I/O control in order to fetch those blocks.
o I/O controls contain the codes by using which it can access hard disk. These codes are
known as device drivers. I/O controls are also responsible for handling interrupts.
1)Sequential Access
Most of the operating systems access the file sequentially. In other words, we can say that
most of the files need to be accessed sequentially by the operating system.
In sequential access, the OS read the file word by word. A pointer is maintained which
initially points to the base address of the file. If the user wants to read first word of the file
then the pointer provides that word to the user and increases its value by 1 word. This process
continues till the end of the file.
Modern word systems do provide the concept of direct access and indexed access but the
most used method is sequential access due to the fact that most of the files such as text files,
audio files, video files, etc need to be sequentially accessed.
2)Direct Access
The Direct Access is mostly required in the case of database systems. In most of the cases,
we need filtered information from the database. The sequential access can be very slow and
inefficient in such cases.
Suppose every block of the storage stores 4 records and we know that the record we needed is
stored in 10th block. In that case, the sequential access will not be implemented because it
will traverse all the blocks in order to access the needed record.
Direct access will give the required result despite of the fact that the operating system has to
perform some complex tasks such as determining the desired block number. However, that is
generally implemented in database applications.
The purpose of the swapping in operating system is to access the data present in the hard disk
and bring it to RAM so that the application programs can use it. The thing to remember is
that swapping is used only when data is not present in RAM.
Although the process of swapping affects the performance of the system, it helps to run larger
and more than one process. This is the reason why swapping is also referred to as memory
compaction.
The concept of swapping has divided into two more concepts: Swap-in and Swap-out.
o Swap-out is a method of removing a process from RAM and adding it to the hard
disk.
o Swap-in is a method of removing a program from a hard disk and putting it back into
the main memory or RAM.
Advantages of Swapping
1. It helps the CPU to manage multiple processes within a single main memory.
2. It helps to create and use virtual memory.
3. Swapping allows the CPU to perform multiple tasks simultaneously. Therefore,
processes do not have to wait very long before they are executed.
4. It improves the main memory utilization.
Disadvantages of Swapping
1. If the computer system loses power, the user may lose all information related to the
program in case of substantial swapping activity.
2. If the swapping algorithm is not good, the composite method can increase the number
of Page Fault and decrease the overall processing performance.
There are various methods which can be used to allocate disk space to the files. Selection of
an appropriate allocation method will significantly affect the performance and efficiency of
the system. Allocation method provides a way in which the disk will be utilized and the files
will be accessed.
1. Contiguous Allocation.
2. Linked Allocation
3. Indexed Allocation
1)Contiguous Allocation
If the blocks are allocated to the file in such a way that all the logical blocks of the file get the
contiguous physical block in the hard disk then such allocation scheme is known as
contiguous allocation.
In the image shown below, there are three files in the directory. The starting block and the
length of each file are mentioned in the table. We can check in the table that the contiguous
blocks are assigned to each file as per its need.
Advantages
1. It is simple to implement.
2. We will get Excellent read performance.
3. Supports Random Access into files.
Disadvantages
Linked List allocation solves all problems of contiguous allocation. In linked list allocation,
each file is considered as the linked list of disk blocks. However, the disks blocks allocated to
a particular file need not to be contiguous on the disk. Each disk block allocated to a file
contains a pointer which points to the next disk block allocated to the same file.
Advantages
Disadvantages
Instead of maintaining a file allocation table of all the disk pointers, Indexed allocation
scheme stores all the disk pointers in one of the blocks called as indexed block. Indexed
block doesn't hold the file data, but it holds the pointers to all the disk blocks allocated to that
particular file. Directory entry will only contain the index block address.
Advantages
Disadvantages
What is a directory?
Directory can be defined as the listing of the related files on the disk. The directory may store
some or the entire file attributes.
To get the benefit of different file systems on the different operating systems, A hard disk can
be divided into the number of partitions of different sizes. The partitions are also called
volumes or mini disks.
Each partition must have at least one directory in which, all the files of the partition can be
listed. A directory entry is maintained for each file in the directory which stores all the
information related to that file.
Types of Directory:
The simplest method is to have one big list of all the files on the disk. The entire system will
contain only one directory which is supposed to mention all the files present in the file
system. The directory contains one entry per each file present on the file system.
Advantages
Disadvantages
1. We cannot have two files with the same name.
2. The directory may be very big therefore searching for a file may take so much time.
3. Protection cannot be implemented for multiple users.
4. There are no ways to group same kind of files.
5. Choosing the unique name for every file is a bit complex and limits the number of
files in the system because most of the Operating System limits the number of
characters used to construct the file name.
In two level directory systems, we can create a separate directory for each user. There is one
master directory which contains separate directories dedicated to each user. For each user,
there is a different directory present at the second level, containing group of user's file. The
system doesn't let a user to enter in the other user's directory without permission.
Every Operating System maintains a variable as PWD which contains the present directory
name (present user name) so that the searching can be done appropriately.
Each user has its own directory and it cannot enter in the other user's directory. However, the
user has the permission to read the root's data but he cannot write or modify this. Only
administrator of the system has the complete access of root directory.
Searching is more efficient in this directory structure. The concept of current working
directory is used. A file can be accessed by two types of path, either relative or absolute.
Absolute path is the path of the file with respect to the root directory of the system while
relative path is the path with respect to the current working directory of the system. In tree
structured directory systems, the user is given the privilege to create the files as well as
directories.
RAID or redundant array of independent disks is a data storage virtualization technology that
combines multiple physical disk drive components into one or more logical units for data
redundancy, performance improvement, or both.
It is a way of storing the same data in different places on multiple hard disks or solid-state
drives to protect data in the case of a drive failure. A RAID system consists of two or more
drives working in parallel. These can be hard discs, but there is a trend to use SSD
technology (Solid State Drives).
RAID combines several independent and relatively small disks into single storage of a large
size. The disks included in the array are called array members. The disks can combine into
the array in different ways, which are known as RAID levels. Each of RAID levels has its
own characteristics of:
Levels of RAID
Many different ways of distributing data have been standardized into various RAID levels.
Each RAID level is offering a trade-off of data protection, system performance, and storage
space. The number of levels has been broken into three categories, standard, nested, and non-
standard RAID levels.
ADVERTISEMENT
Below are the following most popular and standard RAID levels.
RAID 0 is taking any number of disks and merging them into one large volume. It will
increase speeds as you're reading and writing from multiple disks at a time. But all data on all
disks is lost if any one disk fails. An individual file can then use the speed and capacity of all
the drives of the array. The downside to RAID 0, though, is that it is NOT redundant. The
loss of any individual disk will cause complete data loss. This RAID type is very much less
reliable than having a single disk.
There is rarely a situation where you should use RAID 0 in a server environment. You can
use it for cache or other purposes where speed is essential, and reliability or data loss does not
matter at all.
It duplicates data across two disks in the array, providing full redundancy. Both disks are
store exactly the same data, at the same time, and at all times. Data is not lost as long as one
disk survives. The total capacity of the array equals the capacity of the smallest disk in the
array. At any given instant, the contents of both disks in the array are identical.
RAID 5 requires the use of at least three drives. It combines these disks to protect data
against loss of any one disk; the array's storage capacity is reduced by one disk. It strips data
across multiple drives to increase performance. But, it also adds the aspect of redundancy by
distributing parity information across the disks.
5. RAID 6 (Striped disks with double parity)
RAID 6 is similar to RAID 5, but the parity data are written to two drives. The use of
additional parity enables the array to continue to function even if two disks fail
simultaneously. However, this extra protection comes at a cost. RAID 6 has a slower
write performance than RAID 5.