13 Filesystems Slides
13 Filesystems Slides
Paul Krzyzanowski
Rutgers University
Spring 2015
• More abstract
– A way to access information by name
• Devices
• System configuration, process info, random numbers
3
Terms
• Disk
– Non-volatile block-addressable storage.
• Block = sector
– Smallest chunk of I/O on a disk
– Common block sizes = 512 or 4096 (4K) bytes
E.g., WD Black Series 4TB drive has 7,814,037,168 512-byte sectors
• Partition
– Set of contiguous blocks on a disk. A disk has ≥ 1 partitions
• Volume
– Disk, disks, or partition that contains a file system
– A volume may span disks
4
More terms
• Track
– Blocks are stored on concentric tracks on a disk
• Cylinder
– The set of all blocks on one track
(obsolete now since we don’t know what’s where)
• Seek
– The movement of a disk head from track to track
5
File Terms
• File
– A unit of data managed by the file system
• Data: (Contents)
– The user data associated with a file
– Unstructured (byte stream) or structured (records)
• Name
– A textual name that identifies the file
File Terms
• Metadata
– Information about the file (creation time, permissions, length of file
data, location of file data, etc.)
• Attribute
– A form of metadata – a textual name and associated value (e.g.,
source URL, author of document, checksum)
• Directory (folder)
– A container for file names
– Directories within directories provide a hierarchical name system
File System Terms
• Superblock
– Area on the volume that contains key file system information
• Cluster
– Logical block size used in the file system that is equivalent to N
blocks
• Extent
– Group of contiguous clusters identified by a starting block number
and a block count
8
Design Choices
Namespace Multiple volumes File types
Support one type of What kind of attributes How is the data laid
file system should the file system out on the disk?
have?
or multiple types
(iso9660, NTFS,
ext3)?
9
Working with the Operating System
File System Operations
Formatting
• Formatting
– Low-level formatting
Identify sectors, CRC regions on the disk
Done at manufacturing time; user can reinitialize disk
– Partitioning
Divide a disk into one or more regions
Each can hold a separate file system
– High-level formatting
Initialize a file system for use
• Initializing a file system
– Initialize size of volume
– Determine where various data structures live:
• Free block bitmaps, inode lists, data blocks
• Initialize structures to show an empty file system
Mounting
• Make file system available for use
• mount system call
– Pass the file system type, block device & mount point
• Steps
– Access the raw disk (block device)
– Read superblock and file system metadata (free block bitmaps, root
directory, etc.)
– Check to see if the file system was properly unmounted (clean?)
• If not, validate the structure of the file system
– Prepare in-memory data structures to access the volume
• In-memory version of the superblock
• References to the root directory
• Free block bitmaps
– Mark the superblock as “dirty”
Unmounting
• Ensure there are no processes with open files in the file
system
• Remove file system from the OS name space
• Flush all in-memory file system state to disk
• Mark the superblock as “clean” (unmount took place)
File System Validation
• OS performs file system operations in memory first
– Block I/O goes to the buffer cache
16
Mounting: building up a name space
Volume /dev/sda1
bin include
17
Union mounts
Mounted file system merges the existing namespace
bin include
Considerations:
• Search path (what if two names are the same in the file systems)?
• Where to write?
18
Create a file
• Create an inode to hold info (metadata) about the file
– Initialize timestamps
– Set permissions/modes
– Set size = 0
• Steps
– Create a new inode (& initialize)
– Initialize contents to contain
• A directory entry to the parent (name = “..”)
• A directory entry to itself (name = “.”) – on POSIX systems
Links to files
• Symbolic link
– A file’s contents contain a link to another file or directory
ln -s current_file new_file
– If you delete current_file, then new_file will have a broken link
3. May need to read additional blocks to get the block map to find
the desired block numbers
4. Increment the current file offset by the amount that was read
Delete a file
• Remove the file from its directory entry
– This stops other programs from opening it – they won’t see it
• Either
– Link the destination name into the destination directory
– Link the source file name to the destination file name
• Operations:
– opendir: open a directory for reading
– readdir: iterate through the contents of the directory
– closedir: close a directory entry
Read & Write metadata
• Read inode information
– stat system call
• Write metadata: calls to change specific fields
– chown: change owner
– chgrp: change group
– chmod: change permissions
– utime: change access & modification times
• Extended attributes (name-value sets)
– listxattr: list extended attributes
– getxattr: get a value of given extended attribute
– setxattr: set an extended attribute
– removexattr: remove extended attribute
Operating System Interfaces for
File Systems
Virtual File System (VFS) Interface
• Abstract interface for a file system object
• Each real file system interface exports a common interface
inode cache
Virtual File System
Directory cache
30
Keeping track of file system types
Like drivers, file systems can be built into the kernel or compiled as
loadable modules (loaded at mount)
• Each file system registers itself with VFS
• Kernel maintains a list of file systems
struct file_system_type {
const char *name; name of file system type
int fs_flags; requires device, fs handles moves, kernel-only mount, …
struct super_block *(*get_sb)(struct file_system_type *,
int, char *, void *, struct vfsmount *); set up superblock
void (*kill_sb) (struct super_block *); call to clean up at unmount
struct module *owner; module that owns this
struct file_system_type *next; next file system type in list
struct list_head fs_supers; list of all superblocks of this type
struct lock_class_key s_lock_key; used for lock validation (optional)
struct lock_class_key s_umount_key; used for lock validation (optional)
};
31
Keeping track of mounted file systems
• Before mounting a file system, first check if we know the file system
type: look through the file_systems list
– If not found, the kernel daemon will load the file system module
/lib/modules/3.13.0-46-generic/kernel/fs/ntfs/ntfs.ko
/lib/modules/3.13.0-46-generic/kernel/fs/hfsplus/hfsplus.ko
/lib/modules/3.13.0-46-generic/kernel/fs/jffs2/jffs2.ko
/lib/modules/3.13.0-46-generic/kernel/fs/minix/minix.ko
…
32
VFS: Common set of objects
• Superblock: Describes the file system
– Block size, max file size, mount point
– One per mounted file system
33
VFS superblock
• Structure that represents info about the file system
• Includes
– File system name
– Size
– State
– Reference to the block device
– List of operations for managing inodes within the file system:
• alloc_inode, destroy_inode, read_inode, write_inode, sync_fs, …
34
VFS inode
• Uniquely identifies a file in a file system
• Access metadata (attributes) of the file (except name)
struct inode {
unsigned long i_ino;
umode_t i_mode;
uid_t i_uid;
gid_t i_gid;
kdev_t i_rdev;
loff_t i_size;
struct timespec i_atime;
struct timespec i_ctime;
struct timespec i_mtime; inode operations
struct super_block *i_sb;
struct inode_operations *i_op;
struct address_space *i_mapping;
struct list_head i_dentry;
...
}
35
VFS inode operations
Functions that operate on file & directory names and attributes
struct inode_operations {
int (*create) (struct inode *, struct dentry *, int);
struct dentry * (*lookup) (struct inode *, struct dentry *);
int (*link) (struct dentry *, struct inode *, struct dentry *);
int (*unlink) (struct inode *, struct dentry *);
int (*symlink) (struct inode *, struct dentry *, const char *);
int (*mkdir) (struct inode *, struct dentry *, int);
int (*rmdir) (struct inode *, struct dentry *);
int (*mknod) (struct inode *, struct dentry *, int, dev_t);
int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *);
int (*readlink) (struct dentry *, char *,int);
int (*follow_link) (struct dentry *, struct nameidata *);
void (*truncate) (struct inode *);
int (*permission) (struct inode *, int);
int (*setattr) (struct dentry *, struct iattr *);
int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
int (*setxattr) (struct dentry *, const char *, const void *, size_t, int);
ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*removexattr) (struct dentry *, const char *);
};
36
VFS File operations
Functions that operate on file & directory data
struct file_operations {
struct module *owner;
loff_t (*llseek) (struct file *, loff_t, int);
ssize_t (*read) (struct file *, char *, size_t, loff_t *);
ssize_t (*aio_read) (struct kiocb *, char *, size_t, loff_t);
ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
ssize_t (*aio_write) (struct kiocb *, const char *, size_t, loff_t);
int (*readdir) (struct file *, void *, filldir_t);
unsigned int (*poll) (struct file *, struct poll_table_struct *);
int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *);
int (*release) (struct inode *, struct file *);
int (*fsync) (struct file *, struct dentry *, int datasync);
int (*aio_fsync) (struct kiocb *, int datasync);
int (*fasync) (int, struct file *, int);
int (*lock) (struct file *, int, struct file_lock *);
ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *);
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long,
unsigned long, unsigned long);
}; 37
VFS File operations
Not all functions need to be implemented!
Example: The same file_operations are used for a character device driver
38
The End