OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
OSY Notes Vol 2 (6th Chapter) - Ur Engineering Friend
File Management
Concept of File
1. File Name: This is the human-readable name of the file, which is used to
identify and reference it.
2. File Extension: The file extension is typically a part of the file name and
indicates the file type or format. For example, ".txt" is often used for plain text
files, ".jpg" for image files, and ".exe" for executable programs.
3. File Size: This attribute specifies the size of the file in bytes or another
appropriate unit of measurement, indicating how much storage space the file
occupies.
4. File Location: The file's path or directory location specifies where the file is
stored within the file system's hierarchy. It includes information about the
folder(s) in which the file is contained.
5. Date Created: This attribute indicates the date and time when the file was
initially created.
6. Date Modified: This attribute indicates the date and time when the file was
last modified or updated.
7. Date Accessed: Some operating systems track the date and time when the file
was last accessed, although this attribute is often disabled by default due to
performance considerations.
8. File Permissions: File permissions define who can access, read, write, or
execute the file. These permissions are usually categorized into read, write, and
execute permissions for the owner, group, and others.
9. File Type: This attribute provides information about the type or format of the
file, which can be used by the operating system and associated applications to
determine how to handle the file.
10. File Owner: Every file is associated with an owner, typically the user
account that created the file. The owner has special privileges and control over
the file's permissions.
File Operations
In an operating system, file operations are a set of actions that can be performed
on files to create, read, write, update, delete, and manage them. These
operations are essential for managing and manipulating data within the file
system. Here are some of the most common file operations:
1. File Creation: Creating a file involves specifying a file name and optionally
choosing a location within the file system. The operating system reserves space
for the file and assigns initial attributes such as creation date and permissions.
2. File Reading: Reading from a file allows you to retrieve data from an
existing file. Reading can be done sequentially or randomly, depending on the
file access method. The operating system provides system calls and APIs for
reading data from files.
5. File Updating: Updating a file means modifying specific parts of its content,
typically within the file's structure. For instance, updating a database file might
involve changing a record's data without affecting the rest of the file.
6. File Deletion: Deleting a file removes it from the file system, freeing up
storage space. Care should be taken when deleting files, as they may not always
be recoverable from the recycling bin or trash.
File System Structure
A record sequence file system stores data as records, where each record
consists of a fixed-size block or entry.
This structure is well-suited for applications where data is organized into
records, such as databases.
Records are typically identified by a record number or key and can be
read, written, or updated individually.
Record sequence file systems provide efficient access to specific data
within a file without the need to read or write the entire file.
This structure is commonly used in database management systems
(DBMS) and other data-centric applications.
File Contents:
Direct access, also known as random access, allows you to read or write data at
a specific location within the file without the need to traverse the entire file
sequentially.
Each data item within the file is associated with an address or index,
allowing you to directly access the desired item.
This method is suitable for tasks that require quick access to specific data
within a file, such as database systems or indexed data structures.
Example: Accessing a specific record in a database file by its record
number. You don't need to read through all the records to find the one
you're interested in.
The allocation methods define how the files are stored in the disk blocks.
There are three main disk space or file allocation methods.
Contiguous Allocation
Linked Allocation
Indexed Allocation
1. Contiguous Allocation
In this scheme, each file occupies a contiguous set of blocks on the disk. For
example, if a file requires n blocks and is given a block b as the starting
location, then the blocks assigned to the file will be: b, b+1, b+2,……b+n-1.
This means that given the starting block address and the length of the
file (in terms of blocks required), we can determine the blocks occupied
by the file.
The directory entry for a file with contiguous allocation contains
Address of starting block
Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6
blocks. Therefore, it occupies 19, 20, 21, 22, 23, 24 blocks.
Advantages:
Both the Sequential and Direct Accesses are supported by this. For direct
access, the address of the kth block of the file which starts at block b can
easily be obtained as (b+k).
This is extremely fast since the number of seeks are minimal because of
contiguous allocation of file blocks.
Disadvantages:
This method suffers from both internal and external fragmentation. This
makes it inefficient in terms of memory utilization.
Increasing file size is difficult because it depends on the availability of
contiguous memory at a particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not
be contiguous. The disk blocks can be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block.
Each block contains a pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly
distributed. The last block (25) contains -1 indicating a null pointer and
does not point to any other block.
Advantages:
This is very flexible in terms of file size. File size can be increased easily
since the system does not have to look for a contiguous chunk of memory.
This method does not suffer from external fragmentation. This makes it
relatively better in terms of memory utilization.
Disadvantages:
Because the file blocks are distributed randomly on the disk, a large number
of seeks are needed to access every block individually. This makes linked
allocation slower.
It does not support random or direct access. We can not directly access the
blocks of a file. A block k of a file can be accessed by traversing k blocks
sequentially (sequential access ) from the starting block of the file via block
pointers.
Pointers required in the linked allocation incur some extra overhead.
3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers
to all the blocks occupied by a file. Each file has its own index block. The ith
entry in the index block contains the disk address of the ith file block. The
directory entry contains the address of the index block as shown in the image:
Advantages:
This supports direct access to the blocks occupied by the file and therefore
provides fast access to the file blocks.
It overcomes the problem of external fragmentation.
Disadvantages:
Directory Structure
1) Single-level directory:
The single-level directory is the simplest directory structure. In it, all files
are contained in the same directory which makes it easy to support and
understand.
A single level directory has a significant limitation, however, when the
number of files increases or when the system has more than one user. Since all
the files are in the same directory, they must have a unique name. If two users
call their dataset test, then the unique name rule violated.
Advantages:
Disadvantages:
There may chance of name collision because two files can have the same
name.
Searching will become time taking if the directory is large.
This can not group the same type of files together.
2) Two-level directory:
Advantages:
The main advantage is there can be more than two files with same name,
and would be very helpful if there are multiple users.
A security would be there which would prevent user to access other user’s
files.
Searching of the files becomes very easy in this directory structure.
Disadvantages:
As there is advantage of security, there is also disadvantage that the user
cannot share the file with the other users.
Unlike the advantage users can create their own files, users don’t have the
ability to create subdirectories.
Scalability is not possible because one use can’t group the same types of
files together.
Disadvantages:
As the user isn’t allowed to access other user’s directory, this prevents the
file sharing among users.
As the user has the capability to make subdirectories, if the number of
subdirectories increase the searching may become complicated.
Users cannot modify the root directory data.
If files do not fit in one, they might have to be fit into other directories.
Raid Levels -:
RAID combines several independent and relatively small disks into single
storage of a large size. The disks included in the array are called array
members. The disks can combine into the array in different ways, which are
known as RAID levels.
RAID arrays appear to the operating system as a single logical drive. RAID
employs the techniques of disk mirroring or disk striping.
o Disk Mirroring will copy identical data onto more than one drive.
o Disk Striping partitions help spread data over multiple disk drives. Each
drive's storage space is divided into units ranging from 512 bytes up to
several megabytes. The stripes of all the disks are interleaved and
addressed in order.
o Disk mirroring and disk striping can also be combined in a RAID array.
Levels of RAID
Many different ways of distributing data have been standardized into various
RAID levels. Each RAID level is offering a trade-off of data protection, system
performance, and storage space. The number of levels has been broken into
three categories, standard, nested, and non-standard RAID levels.
RAID 0 is taking any number of disks and merging them into one large volume.
It will increase speeds as you're reading and writing from multiple disks at a
time. But all data on all disks is lost if any one disk fails. An individual file can
then use the speed and capacity of all the drives of the array. The downside to
RAID 0, though, is that it is NOT redundant. The loss of any individual disk
will cause complete data loss. This RAID type is very much less reliable than
having a single disk.
It duplicates data across two disks in the array, providing full redundancy. Both
disks are store exactly the same data, at the same time, and at all times. Data is
not lost as long as one disk survives. The total capacity of the array equals the
capacity of the smallest disk in the array. At any given instant, the contents of
both disks in the array are identical.
If either drive fails, you can then replace the broken drive with little to no
downtime. RAID 1 also gives you the additional benefit of increased read
performance, as data can read off any of the drives in the array. The downsides
are that you will have slightly higher write latency. Since the data needs to be
written to both drives in the array, you'll only have a single drive's available
capacity while needing two drives.
RAID 5 requires the use of at least three drives. It combines these disks to
protect data against loss of any one disk; the array's storage capacity is reduced
by one disk. It strips data across multiple drives to increase performance. But, it
also adds the aspect of redundancy by distributing parity information across the
disks.
RAID 6 is similar to RAID 5, but the parity data are written to two drives. The
use of additional parity enables the array to continue to function even if two
disks fail simultaneously. However, this extra protection comes at a cost. RAID
6 has a slower write performance than RAID 5.
The chances that two drives break down at the same moment are minimal.
However, if a drive in a RAID 5 system died and was replaced by a new drive,
it takes a lot of time to rebuild the swapped drive. If another drive dies during
that time, you still lose all of your data. With RAID 6, the RAID array will even
survive that second failure also.