0% found this document useful (0 votes)
9 views

File Concept

Operating system

Uploaded by

oshanmitkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

File Concept

Operating system

Uploaded by

oshanmitkari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

FILE CONCEPT

A file is a named collection of related information that is recorded on secondary storage. From a user's
perspective, a file is the smallest allotment of logical secondary storage; that is, data cannot be written to
secondary storage unless they are within a file. , a file is a sequence of bits, bytes, lines, or records, the
meaning of which is defined by the file's creator and user. The information in a file is defined by its
creator. Many different types of information may be stored in a file-source programs, object programs,
executable programs, numeric data, text, payroll records, graphic images, sound recordings, and so on. A
file has a certain defined STRUCTURE which depends on its type.

File Attributes

A file is named, for the convenience of its human users, and is referred to by its name. A name is usually
a string of characters, such as example.c.

A file's attributes vary from one operating system to another but typically consist of these:

Name. The symbolic file name is the only information kept in humanreadable form.

Identifier. This unique tag, usually a number, identifies the file within the file system; it is the non-
human-readable name for the file.

Type. This information is needed for systems that support different types of files.

Location. This information is a pointer to a device and to the location of the file on that device.

Size. The current size of the file (in bytes, words, or blocks) and possibly the maximum allowed size are
included in this attribute.

Protection. Access-control information determines who can do reading, writing, executing, and so on.

Time, date, and user identification. This information may be kept for creation, last modification, and last
use. These data can be useful for protection, security, and usage monitoring.

File Operations

A file is an abstract data type .To define a file properly, we need to consider the operations that can be
performed on files. The operating system can provide system calls to create, write, read, reposition,
delete, and truncate files. Let's examine what the operating system must do to perform each of these six
basic file operations.

Creating a file. Two steps are necessary to create a file. First, space in the file system must be found for
the file. Second, an entry for the new file must be made in the directory.

Writing a file. To write a file, we make a system call specifying both the name of the file and the
information to be written to the file. Given the name of the file, the system searches the directory to find
the file's location. The system must keep a write pointer to the location in the file where the next write is
to take place. The write pointer must be updated whenever a write occurs.

Reading a file. To read from a file, we use a system call that specifies the name of the file and where (in
memory) the next block of the file should be put. Again, the directory is searched for the associated entry,
and the system needs to keep a read pointer to the location in the file where the next read is to take place.

Repositioning within a file. The directory is searched for the appropriate entry, and the current-file-
position pointer is repositioned to a given value. Repositioning within a file need not involve any actual
I/0. This file operation is also kn.own as a file seek.

Deleting a file. To delete a file, we search the directory for the named file. Having found the associated
directory entry, we release all file space, so that it can be reused by other files, and erase the directory
entry.

Truncating a file. The user may want to erase the contents of a file but keep its attributes. Rather than
forcing the user to delete the file and then recreate it, this function allows all attributes to remain
unchanged -except for file length

File Types:

A common technique for implementing file types is to include the type as part of the file name. The name
is split into two parts-a name and an extension, usually separated by a period character (Figure 10.2). In
this way, the user and the operating system can tell from the name alone what the type of a file is. For
example, most operating systems allow users to specify a file name as a sequence of characters followed
by a period and terminated by an extension of additional characters. File name examples include
resume.doc, Server.java, and ReaderThread. c.

The system uses the extension to indicate the type of the file and the type of operations that can be done
on that file. Only a file with a .com, .exe, or .bat extension can be executed, for instance. The .com
and .exe files are two forms of binary executable files, whereas a .bat file is a batch file containing, in
ASCII format, commands to the operating system.
Access Method

Files store information. When it is used, this information must be accessed and read into computer
memory. The information in the file can be accessed in several ways.

Sequential Access

The simplest access method is Sequential Access Information in the file is processed in order, one record
after the other. This mode of access is by far the most common; for example, editors and compilers
usually access files in this fashion.
Reads and writes make up the bulk of the operations on a file. A read operation-read next-reads the next
portion of the file and automatically advances a file pointer, which tracks the I/O location. Similarly, the
write operation-write next-appends to the end of the file and advances to the end of the newly written
material

Direct Access

A file is made up of fixed length of logical records that allow programs to read and write records rapidly
in no particular order. The direct-access method is based on a disk model of a file, since disks allow
random access to any file block. For direct access, the file is viewed as a numbered sequence of blocks or
records. Thus, we may read block 14, then read block 53, and then write block 7. There are no restrictions
on the order of reading or writing for a direct-access file.

Direct-access files are of great use for immediate access to large amounts of information. Databases are
often of this type. When a query concerning a particular subject arrives, we compute which block
contains the answer and then read that block directly to provide the desired information.

For the direct-access method, the file operations must be modified to include the block number as a
parameter. Thus, we have read n, where n is the block number, rather than read next, and ·write n rather
than write next. An alternative approach is to retain read next and write next, as with sequential access,
and to add an operation position file to n, where n is the block number. Then, to effect a read n, we would
position to n and then read next.

Allocation Methods

The direct-access nature of disks allows us flexibility in the implementation of files. In almost every case,
many files are stored on the same disk. The main problem is how to allocate space to these files so that
disk space is utilized effectively and files can be accessed quickly. Three major methods of allocating
disk space are in wide use: contiguous, linked, and indexed. Each method has advantages and
disadvantages. Some systems (such as Data General's RDOS for its Nova line of computers) support all
three.
Contiguous Allocation

In this scheme, each file occupies a contiguous set of blocks on the disk. For example, if a file requires
n blocks and is given a block b as the starting location, then the blocks assigned to the file will be: b,
b+1, b+2,……b+n-1. This means that given the starting block address and the length of the file (in
terms of blocks required), we can determine the blocks occupied by the file.
The directory entry for a file with contiguous allocation contains
 Address of starting block
 Length of the allocated portion.
The file ‘mail’ in the following figure starts from the block 19 with length = 6 blocks. Therefore, it
occupies 19, 20, 21, 22, 23, 24 blocks.

Advantages:
 Both the Sequential and Direct Accesses are supported by this. For direct access, the address of the
kth block of the file which starts at block b can easily be obtained as (b+k).
 This is extremely fast since the number of seeks are minimal because of contiguous allocation of file
blocks.
0 seconds of 15 secondsVolume 0%

Disadvantages:
 This method suffers from both internal and external fragmentation. This makes it inefficient in terms
of memory utilization.
 Increasing file size is difficult because it depends on the availability of contiguous memory at a
particular instance.
2. Linked List Allocation
In this scheme, each file is a linked list of disk blocks which need not be contiguous. The disk blocks can
be scattered anywhere on the disk.
The directory entry contains a pointer to the starting and the ending file block. Each block contains a
pointer to the next block occupied by the file.
The file ‘jeep’ in following image shows how the blocks are randomly distributed. The last block (25)
contains -1 indicating a null pointer and does not point to any other block.

Advantages:
 This is very flexible in terms of file size. File size can be increased easily since the system does not
have to look for a contiguous chunk of memory.
 This method does not suffer from external fragmentation. This makes it relatively better in terms of
memory utilization.
Disadvantages:
 Because the file blocks are distributed randomly on the disk, a large number of seeks are needed to
access every block individually. This makes linked allocation slower.
 It does not support random or direct access. We can not directly access the blocks of a file. A block k
of a file can be accessed by traversing k blocks sequentially (sequential access ) from the starting
block of the file via block pointers.
 Pointers required in the linked allocation incur some extra overhead.

3. Indexed Allocation
In this scheme, a special block known as the Index block contains the pointers to all the blocks
occupied by a file. Each file has its own index block. The ith entry in the index block contains the disk
address of the ith file block. The directory entry contains the address of the index block as shown in the
image:

Advantages:
 This supports direct access to the blocks occupied by the file and therefore provides fast access to
the file blocks.
 It overcomes the problem of external fragmentation.
Disadvantages:
 The pointer overhead for indexed allocation is greater than linked allocation.
 For very small files, say files that expand only 2-3 blocks, the indexed allocation would keep one
entire block (index block) for the pointers which is inefficient in terms of memory utilization.
However, in linked allocation we lose the space of only 1 pointer per block.
For files that are very large, single index block may not be able to hold all the pointers.
Following mechanisms can be used to resolve this:
1. Linked scheme: This scheme links two or more index blocks together for holding the pointers.
Every index block would then contain a pointer or the address to the next index block.
2. Multilevel index: In this policy, a first level index block is used to point to the second level index
blocks which inturn points to the disk blocks occupied by the file. This can be extended to 3 or more
levels depending on the maximum file size.
3. Combined Scheme: In this scheme, a special block called the Inode (information Node) contains
all the information about the file such as the name, size, authority, etc and the remaining space of
Inode is used to store the Disk Block addresses which contain the actual file as shown in the image
below. The first few of these pointers in Inode point to the direct blocks i.e the pointers contain the
addresses of the disk blocks that contain data of the file. The next few pointers point to indirect
blocks. Indirect blocks may be single indirect, double indirect or triple indirect. Single Indirect
block is the disk block that does not contain the file data but the disk address of the blocks that
contain the file data. Similarly, double indirect blocks do not contain the file data but the disk
address of the blocks that contain
4.
5. the address of the blocks containing the file data.

Indexed File Allocation

Example

As shown in the diagram below block 19 is the index block which contains all the addresses of the file
named text1. In order, the first storage block is 9, followed by 16, 1, then 10, and 25. The negative
number -1 here denotes the empty index block list as the file text1 is still too small to fill more blocks.
Linked File Allocation.

Example

Here we have one file which is stored using Linked File Allocation.
In the above image on the right, we have a memory diagram where we can see memory blocks. On the
left side, we have a directory where we have the information like the address of the first memory block
and the last memory block.

In this allocation, the starting block given is 0 and the ending block is 15, therefore the OS searches the
empty blocks between 0 and 15 and stores the files in available blocks, but along with that it also stores
the pointer to the next block in the present block. Hence it requires some extra space to store that link.

Contiguous File Allocation.

Example

We have three different types of files that are stored in a contiguous manner on the hard disk.
In the above image on the left side, we have a memory diagram where we can see the blocks of memory.
At first, we have a text file named file1.txt which is allocated using contiguous memory allocation, it
starts with the memory block 0 and has a length of 4 so it takes the 4 contiguous blocks 0,1,2,3. Similarly,
we have an image file and video file named sun.jpg and mov.mp4 respectively, which you can see in the
directory that they are stored in the contiguous blocks. 5,6,7 and 9,10,11 respectively.

FREE SPACE MANAGEMENT TECHNIQUE

Since disk space is limited, we need to reuse the space from deleted files for new files, if possible. (Write-
once optical disks only allow one write to any given sector, and thus such reuse is not physically
possible.) To keep track of free disk space, the system maintains a free space list. The free-space list
records all free disk blocks-those not allocated to some file or directory. To create a file, we search the
free-space list for the required amount of space and allocate that space to the new file. This space is then
removed from the free-space list. When a file is deleted, its disk space is added to the free-space list.
Bit Vector

Frequently, the free-space list is implemented as a bit map or bit vector. Each block is represented by 1
bit. If the block is free, the bit is 1; if the block is allocated, the bit is 0. For example, consider a disk
where blocks 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 17, 18, 25, 26, and 27 are free and the rest of the blocks are
allocated. The free-space bit map would be

001111001111110001100000011100000 ...

The main advantage of this approach is its relative simplicity and its efficiency in finding the first free
block or n consecutive free blocks on the disk.

Linked List

Another approach to free-space management is to link together all the free disk blocks, keeping a pointer
to the first free block in a special location on the disk and caching it in memory. This first block contains
a pointer to the next free disk block, and so on. Recall our earlier example, in which blocks 2, 3, 4, 5, 8, 9,
10, 11, 12, 13, 17, 18, 25, 26, and 27 were free and the rest of the blocks were allocated. In this situation,
we would keep a pointer to block 2 as the first free block. Block 2 would contain a pointer to block 3,
which would point to block 4, which would point to block 5, which would point to block 8, and so on
(Figure 11.10). This scheme is not efficient; to traverse the list, we must read each block, which requires
substantial I/0 time. Fortunately, however, traversing the free list is not a frequent action. Usually, the
operating system simply needs a free block so that it can allocate that block to a file, so the first block in
the free list is used.
Single Level Directory in OS
We all know that directories are one of the most important things users need to store their data in.
Those directories are divided into multiple structures, i.e., categories, one of those categories is the
single-level Directory. Single-level directories offer the simplest approach to File Organization in any
OS. As the name suggests, a single-level directory stores all the necessary files in a single directory
without having multiple sub-directories inside it. It might seem the best way to store files because it is
the simplest form of file organization, but the main problem arises when the number of files we are
storing inside it increases heavily. Then it becomes a nightmare to find the file we are looking for as
everything is stored inside one folder so it takes a lot of time to find the file.

Understanding the Single-Level Structure


The single-level directory structure in file organization is the simplest way to store any number of files
in a single directory. It doesn’t require creating multiple sub-directories inside it, all the files are stored
in the same directory or folder. It follows a very straightforward approach, but the files that are being
stored inside the directory must have unique names, no two files can have the same name and reside
inside the same directory. But in the single-level directory, the user can store multiple types of files
inside a single directory, meaning that even if the extensions of the files are different from each other
or the same, they can reside inside the same directory, but only the name must be unique.

Advantages of Single-Level Directory


 Simplicity: The single-directory system is the simplest of all file organization structures. In only
one directory, all the files are stored.
 Ease-of-Access: Due to its simplicity, accessing the files becomes very easy, As everything is inside
the same folder, we don’t need to navigate some other sub-directories to find the desired file.
 Simple File Manipulation: Creation, Deletion, Renaming, and Finding a file is very easy as
everything resides in the same directory.

Disadvantages of Single-Level Directory


 Scalability: Single level directory becomes a nightmare when the number of files residing inside it
increases severely. As everything is inside a single directory, it becomes hard to search for a file,
everything looks cluttered, lack of sub-directories makes them unorganized.
 Naming Problem: No two files can have the same name inside of the directory, their extension can
be the same but names but be unique. If the extensions are different then only files of the same name
can be stored together.
 Security: Security is very less as anyone who has access to that directory can see all the files.
Two Level Directory

This kind of structure overcomes the problems of assigning unique names to the files. Thus there need not
be any confusion among users.
In this kind of structure each user has a user file directory (UFD). The UFDs have an alike structure
except each lists the files of a single user. As soon as a user starts or logs in the system's master file
directory (MFD) is searched. The MFD is indexed by the account number or the user name and each entry
points to the UFD for that user. Once a user searches for a particular file only his own UFD is searched
and so different users may have the same names as long as all the file names within each UFD are unique.
A two level directory is able to be thought of as an inverted tree of height 2. The root of the tree is the
MFD in addition to its descendents are the UFDs whose descendents are the files.
Disadvantages:
This structure detaches one user from the other. This is an benefit when the users are completely
independent of each other but is a drawback when the users want to cooperate on some task and to access
one another's files.

Advantages:
 The main advantage is there can be more than two files with same name, and would be very helpful
if there are multiple users.
 A security would be there which would prevent user to access other user’s files.
 Searching of the files becomes very easy in this directory structure.
Disadvantages:
 As there is advantage of security, there is also disadvantage that the user cannot share the file with
the other users.
 Unlike the advantage users can create their own files, users don’t have the ability to create
subdirectories.
 Scalability is not possible because one use can’t group the same types of files together.
Tree Structured Directory

In Tree structured directory system, any directory entry can either be a file or sub directory. Tree
structured directory system overcomes the drawbacks of two level directory system. The similar kind of
files can now be grouped in one directory.

Each user has its own directory and it cannot enter in the other user's directory. However, the user has the
permission to read the root's data but he cannot write or modify this. Only administrator of the system has
the complete access of root directory.

Searching is more efficient in this directory structure. The concept of current working directory is used. A
file can be accessed by two types of path, either relative or absolute.

Absolute path is the path of the file with respect to the root directory of the system while relative path is
the path with respect to the current working directory of the system. In tree structured directory systems,
the user is given the privilege to create the files as well as directories.

Advantages:
 This directory structure allows subdirectories inside a directory.
 The searching is easier.
 File sorting of important and unimportant becomes easier.
 This directory is more scalable than the other two directory structures explained.
Disadvantages:
 As the user isn’t allowed to access other user’s directory, this prevents the file sharing among users.
 As the user has the capability to make subdirectories, if the number of subdirectories increase the
searching may become complicated.
 Users cannot modify the root directory data.
 If files do not fit in one, they might have to be fit into other directories.
RAID (Redundant Arrays of Independent Disks)
RAID is a technique that makes use of a combination of multiple disks instead of using a single disk for
increased performance, data redundancy, or both.

Data Redundancy although taking up extra space, adds to disk reliability. This means, in case of disk
failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the
operation. On the other hand, if the data is spread across just multiple disks without the RAID
technique, the loss of a single disk can affect the entire data.

RAID is very transparent to the underlying system. This means, to the host system, it appears as a
single big disk presenting itself as a linear array of blocks. This allows older technologies to be
replaced by RAID without making too many changes to the existing code.

Different RAID Levels


1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)

RAID-0 (Stripping)
 Blocks are “stripped” across disks.
 In the figure, blocks “0,1,2,3” form a stripe.
 Instead of placing just one block into a disk at a time, we can work with two (or more) blocks placed
into a disk before moving on to the next one.

Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.

2. RAID-1 (Mirroring)
 More than one copy of each block is stored in a separate disk. Thus, every block has two (or more)
copies, lying on different disks.

 The above figure shows a RAID-1 system with mirroring level 2.


 RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability.
Advantages
1. It covers complete redundancy.
2. It can increase data security and speed.
Disadvantages
1. It is highly expensive.
2. Storage capacity is less.

3. RAID-2 (Bit-Level Stripping with Dedicated Parity)


 In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code Parity
Method to find the error in the data.
 It uses one designated drive to store parity.
 The structure of Raid-2 is very complex as we use two disks in this technique. One word is used to
store bits of each word and another word is used to store error code correction.
 It is not commonly used.
Advantages
1. In case of Error Correction, it uses hamming code.
2. It Uses one designated drive to store parity.
Disadvantages
1. It has a complex structure and high cost due to extra drive.
2. It requires an extra drive for error detection.

4. RAID-3 (Byte-Level Stripping with Dedicated Parity)


 It consists of byte-level striping with dedicated parity striping.
 At this level, we store parity information in a disc section and write to a dedicated parity drive.
 Whenever failure of the drive occurs, it helps in accessing the parity drive, through which we can
reconstruct the data.

 Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data loss occurs, we can
construct it with Disk 3.
Advantages
1. Data can be transferred in bulk.
2. Data can be accessed in parallel.
Disadvantages
1. It requires an additional drive for parity.
2. In the case of small-size files, it performs slowly.

5. RAID-4 (Block-Level Stripping with Dedicated Parity)


 Instead of duplicating data, this adopts a parity-based approach.
 In the figure, we can observe one column (disk) dedicated to parity.
 Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the parity bit is
XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is XOR(0,1,1,0) = 0. A simple approach is
that an even number of ones results in parity 0, and an odd number of ones results in parity 1.

 Assume that in the above figure, C3 is lost due to some disk failure. Then, we can recompute the
data bit stored in C3 by looking at the values of all the other columns and the parity bit. This allows
us to recover lost data.
Advantages
1. It helps in reconstructing the data if at most one data is lost.
Disadvantages
1. It can’t help in reconstructing when more than one data is lost.

6. RAID-5 (Block-Level Stripping with Distributed Parity)


 This is a slight modification of the RAID-4 system where the only difference is that the parity rotates
among the drives.

 In the figure, we can notice how the parity bit “rotates”.


 This was introduced to make the random write performance better.

Advantages
1. Data can be reconstructed using parity bits.
2. It makes the performance better.
Disadvantages
1. Its technology is complex and extra space is required.
2. If both discs get damaged, data will be lost forever.

7. RAID-6 (Block-Level Stripping with two Parity Bits)


 Raid-6 helps when there is more than one disk failure. A pair of independent parities are generated
and stored on multiple disks at this level. Ideally, you need four disk drives for this level.
 There are also hybrid RAIDs, which make use of more than one RAID level nested one after the
other, to fulfill specific requirements.

Advantages
1. Very high data Accessibility.
2. Fast read data transactions.
Disadvantages
1. Due to double parity, it has slow write data transactions.
2. Extra space is required.

Advantages of RAID
1. Increased data reliability: RAID provides redundancy, which means that if one disk fails, the data
can be recovered from the remaining disks in the array. This makes RAID a reliable storage solution
for critical data.
2. Improved performance: RAID can improve performance by spreading data across multiple disks.
This allows multiple read/write operations to co-occur, which can speed up data access.
3. Scalability: RAID can be scaled by adding more disks to the array. This means that storage capacity
can be increased without having to replace the entire storage system.
4. Cost-effective: Some RAID configurations, such as RAID 0, can be implemented with low-cost
hardware. This makes RAID a cost-effective solution for small businesses or home users.

Disadvantages of RAID
1. Cost: Some RAID configurations, such as RAID 5 or RAID 6, can be expensive to implement. This
is because they require additional hardware or software to provide redundancy.
2. Performance limitations: Some RAID configurations, such as RAID 1 or RAID 5, can have
performance limitations. For example, RAID 1 can only read data as fast as a single drive, while
RAID 5 can have slower write speeds due to the parity calculations required.
3. Complexity: RAID can be complex to set up and maintain. This is especially true for more advanced
configurations, such as RAID 5 or RAID 6.
4. Increased risk of data loss: While RAID provides redundancy, it is not a substitute for proper
backups. If multiple drives fail simultaneously, data loss can still occur.

You might also like