OS-Chapter 5 - File Management
OS-Chapter 5 - File Management
Chapter 5
File Systems
5.1. Fundamental Concepts:
Data: Facts and statistics collected together for reference or analysis.
The quantities or characters or symbols, on which operations are performed by a
computer is known as data. And which may be stored and transmitted in the form of electrical
signals and recorded on magnetic, optical or mechanical recording media.
File management system is a type of software that manages data files in a computer
system.it has limited capabilities and is designed to manage individual or group files, such as
special office documents and records.
The data maybe numbers, characters or binary information. Things are known or assumed
as facts, making the basis of reasoning or calculation.
Metadata: means “data about data”
Metadata is data that provides information about other data. Three distinct types of
metadata exist: descriptive metadata, structural metadata and administrative metadata.
Descriptive metadata describes a resource for purpose such as discover and identification.
It can include elements such as title, abstract, author and keywords.
Structural metadata is metadata about containers of data indicates how compound objects
are put together. For example how pages are ordered to from chapters. It describes the types,
versions, relationships and other characteristics of digital materials.
Administrative metadata provides information to help manage a resources, such as when
and how it was created, file type and other technical information, and who can access it.
Metadata traditionally used in the card catalogs of libraries, museum, digital audio files,
websites, traffic analysis etc.
File: A collection of data or information that has a name called the file name. Almost all
information stored in a computer must be in a file. (or) A file is an object on a computer that
stores data, information, settings or commands used with a computer program.
All computer applications need to store and retrieve information. While a process is
running, it can store a limited amount of information within its own address space. However, the
storage capacity is restricted to the size of the virtual address space.
A second problem with keeping information with in a process address space is that when
the process terminates, the information is lost. For many applications the information must be
retained for weeks, months or even forever.
A third problem is that it is frequently necessary for multiple processes to access the
information at the same time.
Thus we have three essential requirements for long-term information storage:
1. It must be possible to store a very large amount of information.
2. The information must survive the termination of the process using it.
3. Multiple processes must be able to access the information concurrently.
Magnetic disks have been used for years for this long-term storage and supporting two
operations.
Read block k
Write block k
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 1
Operating Systems
Files are logical units of information created by processes. A disk usually contains
thousands or even millions of them, each one independent of the others. Processes can read
existing file and creates a new ones if need be.
File Operations: File exists to store information and allow it to be retrieved later. Different
system provides different operations to allow storage and retrieval. Below is a discussion of the
most common system calls relating to files.
1. Create: The file is created with no data. The purpose of the call is to announce that the files
are coming and to set some of the attributes.
2. Delete: When the file is no longer needed, it has to be deleted to free up the disk space.
3. Open: Before using a file, a process must open it. The purpose of open call is to allow the
system to fetch the attributes for rapid access on later calls.
4. Close: When all the accesses are finished, the attributes and disk address are no longer needed,
so the file should be closed to free up internal table space.
5. Read: Data are read from file. Usually, the bytes come from the current position. The caller
must specify how much data are needed and must also provide a buffer to put them in.
6. Write: Data are written to the file again, usually, at the current position. If the current position
is the end of the file, the file size increases.
7. Append: This call is a restricted from of write. It can only add data to the end of the file.
Systems that provide a minimal set of system call do not generally have append, but many
systems provide multiple ways of doing the same things, and these systems sometimes have
append.
8. Seek: For random access files, a method is needed to specify from where to take the data. One
common approach is a system call, seek that repositions the file pointer to a specific place in the
file. After this call has completed, data can be read from, or written to that position.
9. Rename: It frequently happens that a user needs to change the name of an existing file. This
system call makes that possible. It is not always strictly necessary, because the file can usually be
copied to a new file with the new name, and the old file then deleted.
File Organization: File organization refers to the way data is stored in a file. File organization
is very important because it determines the methods of access, efficiency, flexibility and storage
devices to use.
There are four methods of organizing files on a storage media. This includes:
1. Sequential
2. Random or Direct
3. Serial
4. Indexed- Sequential
1. Sequential File Organization:
Records are stored and accessed in a particular order stored using a key field. Retrieval
requires searching sequentially through the entire file record by record to the end. Because the
record in a file stored in a particular order, better file searching methods like the binary search
technique can be used to reduce the time used for searching a file. For example, the file has
records with the key fields 20,30,40,50 and 60 the computer is searching for a record with key
field 50, it starts at 40 in its search, ignoring the first half of the set.
Advantages:
1. The sorting makes it easy to access records.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 2
Operating Systems
2. The binary chop technique can be used to reduce record search time by as much as half
the time taken.
Disadvantages:
1. The sorting does not remove the need to access other records as the search looks for
particular records.
2. Sequential records cannot support modern technologies that require fast access to stored
records.
2. Random or Direct File Organization:
Records are stored randomly but accessed directly. To access a file stored randomly, a
record key is used to determine where a record is stored on the storage media. Magnetic and
optical disks allow data to be stored and accessed randomly.
Advantages:
1. Quick retrieval of records.
2. The records can be of different sizes.
3. Serial File Organization:
Records in a file are stored and accessed one after the anther. The records are not stored
in any way on the storage medium this type of organization is mainly used on magnetic tapes.
Advantages:
1. It is simple.
2. It is cheap.
Disadvantages:
1. It is cumbersome to access because you have to access all proceeding records before
retrieving the one being searched.
2. Wastage of space on medium inform of inter record gap.
3. It cannot support modern high speed requirements for quick record access.
4. Indexed-Sequential File Organization:
Almost similar to sequential method but only that an index is used to enable the computer
to locate individual records on the storage media. For example, on a magnetic durm, records are
stored sequential on the tracks. However, each records is assigned an index that can be used to
access it directly.
Buffering: Operating system stores (its own copy of) data in memory while transferring to or
from devices is known as buffering.
The following are the uses of buffering:
To store multiple copies of files.
To access file very fast.
To maintain copy semantics.
To make searching easy.
Sequential vs Non-sequential Files:
Function Sequential Non-Sequential
Storage space allocation and Volumes Disk blocks
tracking
Aggregate reconstruction Not available Available
Available for use as copy Available Not available
storage pools or active data
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 3
Operating Systems
pools
File location Volume location is limited by File volumes use directories
the trigger prefix or by manual
specification
Migration Performed by volumes Performed by node
Storage pool backup Performed by volume Performed by node and file
Efficient searching
Grouping capability
Current directory (Working directory) eg. spell/mail/prog/obj.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 5
Operating Systems
as several smaller sections to improve efficiency, although it reduces usable space on the hard
disk because of additional overhead from multiple operating systems.
A disk manager partition manager allows system administrators to create, resize, delete
and manipulate partitions, while a partition table logs the location and size of the partition. Each
partition appears to the operating system as a distinct logical disk, and the operating system reads
the partition table before any other part of the disk.
Once a partition is created, it is formatted with a file system such as:
NTFS on windows drives
BSD partition
FAT32 and exFAT for removable drives
Solaris x86
HFS plus on mac computers
DOS partition
Ext4 on Linux etc.
Data and files are then written to the file system on the partition. When users boot the
operating system in a computer, a critical part of the process is to give control to the first sector
on the hard disk.
This includes the partition table that defines how many partitions will be formatted on the
hard disk, the size of each partition and the address where each disk partition begins. The sector
also contains a program that reads the boot sector for the operating system and gives it control so
that the rest of the operating system can be loaded into RAM.
Disk Partitioning
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 6
Operating Systems
Mounting: Mounting is a process by which the operating system makes files and directories on
a storage device (such as hard disk drive, CD-ROM, or network share) available for user to
access via the computer file system.
In general, the process of mounting comprises operating system acquiring access to the
storage medium, recognizing, reading, processing file system structure and metadata on it, before
registering them to the virtual file system component.
The exact location in VFS that the newly-mounted medium got registered is called mount
point, when the mounting process is completed, the user can access files and directories on the
medium from there.
Un mounting: An opposite process of mounting is called un mounting, in which the operating
system cuts off all user access to files and directories on the mount point, writes the remaining
queue of user data to the storage device, refreshes file system metadata, then relinquishes access
to the device, making the storage safe for removal.
Normally, when the computer is shutting down, every mounted storage will undergo an
un mounting process to ensure that all queued data got written, and to preserve integrity of the
system structure on the media.
Virtual File System: A virtual file system is programming that forms an interface between an
operating systems kernel and a more concrete file system. The VFS serves as an abstraction layer
that gives applications access to different types of file systems and local and network storage
device.
VFS on UNIX provides an object oriented way of implementing file systems. VFS allows
the same system call interface (the API) to be used for different types of file systems. The API is
to the VFS interface, rather than only specific type of file system.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 7
Operating Systems
Memory mapped files offer a unique memory management feature that allows
applications to access file on disk in the same way they access dynamic memory through
pointers. With this capability you can map a view of all or part of a file on disk to a specific
range of addresses within your processor address space.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 8
Operating Systems
The components required to identify a file varies across operating systems, as does the
syntax and format for a valid filename.
Example: c:\directory\mufile.txt
5.7. Searching:
Searching is just trying to find the information you need. Searching a file means finding a
file where the file is stored in computer memory. Searching can be done in two ways:
Linear search
Binary search
Linear Search: This is the simplest method of searching. In this method, the element to be
found is sequentially searched in the list. This method can be applied to a sorted or an unsorted
list.
Binary Search: Binary search method is very fast and efficient. This method requires that the
list of elements be in sorted order. In this method, to search an element we compare it with the
element present at the center of the list. If it matches then the search is successful.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 9
Operating Systems
5.8. Access:
File access mechanisms refers to the manner in which the records of a file may be
accessed. There are several ways to access files:
Sequential access
Direct/Random access
Indexed Sequential access
5.9. Backup Strategies:
In information technology, a backup, or the process of backup, refers to the copying into
an archived file of computer data so it may be used to restore the original after a data loss event.
Backup have two distinct purposes:
1. The primary purpose is to recover data after its loss, be it by data deletion or corruption.
2. The secondary purpose of backups is to recover data from an earlier time, according to a
user defined data retention policy, typically configured within a backup application for
how long copies of data are required.
Data backup is an essential part of data center operations, but it’s important to really
understand what makes a backup strategy successful. Most people say that its necessary to have a
second copy of data in case the original copy fails.
A good backup strategy is obviously going to create that second copy, but it is more
crucial that, when file recovery is needed, the data can actually be found quickly.
CD’s or DVD’s
Flash memory
Hard Disk Drive
Backup software
Cloud storage
Compression
Duplication
Cache etc.
Prepared by Ande, Lecturer, Dept of Computer Science, Gambella University, Gambella. Page 10