Operating System: February 2020

Thursday, February 27, 2020

Disk Attachment

Disk Attachment

Disk drives can be attached either directly to a particular host (a local disk) or to a network.

a. Host-Attached Storage

Local disks are accessed through I/O Ports. The most common interfaces are IDE (Integrated Drive Electronics) or ATA (Advanced Technology Attachment), each of which allow up to two drives per host controller. SATA (Serial ATA) is similar with simpler cabling.

High end workstations or other systems in need of larger number of disks typically use SCSI disks:

· The SCSI standard supports up to 16 targets on each SCSI bus, one of which is generally the host adapter and the other 15 of which can be disk or tape drives.

· A SCSI target is usually a single drive, but the standard also supports up to 8 units within each target. These would generally be used for accessing individual disks within a RAID array.

FC is a high-speed serial architecture that can operate over optical fiber or four-conductor copper wires, and has two variants:

· A large switched fabric having a 24-bit address space. This variant allows for multiple devices and multiple hosts to interconnect, forming the basis for the storage-area networks, SANs, to be discussed in a future section.

· The arbitrated loop, FC-AL, (that can address up to 126 devices drives and controllers.)

b. Network-Attached Storage

Network attached storage connects storage devices to computers using a remote procedure call, RPC, interface, typically with something like NFS file system mounts. This is convenient for allowing several computers in a group common access and naming conventions for shared storage.

NAS can be implemented using SCSI cabling, or iSCSI uses Internet protocols and standard network connections, allowing long-distance remote access to shared files.

NAS allows computers to easily share data storage, but tends to be less efficient than standard host-attached storage.

c. Storage-Area Network

A Storage-Area Network, SAN, connects computers and storage devices in a network, using storage protocols instead of network protocols. One advantage of this is that storage access does not tie up regular networking bandwidth.

SAN is very flexible and dynamic, allowing hosts and devices to attach and detach on the fly. SAN is also controllable, allowing restricted access to certain hosts and devices.

Disk Structure

Disk Structure

The traditional head-sector-cylinder, HSC numbers are mapped to linear block addresses by numbering the first sector on the first head on the outermost track as sector 0. Numbering proceeds with the rest of the sectors on that same track, and then the rest of the tracks on the same cylinder before proceeding through the rest of the cylinders to the center of the disk.

In modern practice linear block addresses are used in place of the HSC numbers for a variety of reasons:

1. The linear length of tracks near the outer edge of the disk is much longer than for those tracks located near the center, and therefore it is possible to squeeze many more sectors onto outer tracks than onto inner ones.

2. All disks have some bad sectors, and therefore disks maintain a few spare sectors that can be used in place of the bad ones. The mapping of spare sectors to bad sectors in managed internally to the disk controller.

3. Modern hard drives can have thousands of cylinders, and hundreds of sectors per track on their outermost tracks. These numbers exceed the range of HSC numbers for many (older ) operating systems, and therefore disks can be configured for any convenient combination of HSC values that falls within the total number of sectors physically on the drive.

Modern disks pack many more sectors into outer cylinders than inner ones, using one of two approaches:

· With Constant Linear Velocity, CLV, the density of bits is uniform from cylinder to cylinder. Because there are more sectors in outer cylinders, the disk spins slower when reading those cylinders, causing the rate of bits passing under the read-write head to remain constant. This is the approach used by modern CDs and DVDs.

· With Constant Angular Velocity, CAV, the disk rotates at a constant angular speed, with the bit density decreasing on outer cylinders. ( These disks would have a constant number of sectors per track on all cylinders. )

Mass-Storage Structure - Introduction

Secondary and tertiary storage structures are the lowest level of the file system.

1. Magnetic Disks

Traditional magnetic disks have the following basic structure:

· One or more platters in the form of disks covered with magnetic media. Hard disk platters are made of rigid metal, while "floppy" disks are made of more flexible plastic.

· Each platter has two working surfaces. Older hard disk drives would sometimes not use the very top or bottom surface of a stack of platters, as these surfaces were more susceptible to potential damage.

· Each working surface is divided into a number of concentric rings called tracks. The collection of all tracks that are the same distance from the edge of the platter, ( i.e. all tracks immediately above one another in the following diagram ) is called a cylinder.

· Each track is further divided into sectors, traditionally containing 512 bytes of data each, although some modern disks occasionally use larger sector sizes. ( Sectors also include a header and a trailer, including checksum information among other things. Larger sector sizes reduce the fraction of the disk consumed by headers and trailers, but increase internal fragmentation and the amount of disk that must be marked bad in the case of errors. )

The data on a hard drive is read by read-write heads. The standard configuration uses one head per surface, each on a separate arm, and controlled by a common arm assembly which moves all heads simultaneously from one cylinder to another. ( Other configurations, including independent read-write heads, may speed up disk access, but involve serious technical difficulties. )

The storage capacity of a traditional disk drive is equal to the number of heads ( i.e. the number of working surfaces ), times the number of tracks per surface, times the number of sectors per track, times the number of bytes per sector. A particular physical block of data is specified by providing the head-sector-cylinder number at which it is located.

In operation the disk rotates at high speed, such as 7200 rpm (120 revolutions per second.) The rate at which data can be transferred from the disk to the computer is composed of several steps:

The positioning time, a.k.a. the seek time or random access time is the time required to move the heads from one cylinder to another, and for the heads to settle down after the move. This is typically the slowest step in the process and the predominant bottleneck to overall transfer rates.

The rotational latency is the amount of time required for the desired sector to rotate around and come under the read-write head. This can range anywhere from zero to one full revolution, and on the average will equal one-half revolution. This is another physical step and is usually the second slowest step behind seek time. (For a disk rotating at 7200 rpm, the average rotational latency would be 1/2 revolution / 120 revolutions per second, or just over 4 milliseconds, a long time by computer standards.

The transfer rate, which is the time required to move the data electronically from the disk to the computer. (Some authors may also use the term transfer rate to refer to the overall transfer rate, including seek time and rotational latency as well as the electronic data transfer rate.)

Floppy disks are normally removable. Hard drives can also be removable, and some are even hot swappable, meaning they can be removed while the computer is running, and a new hard drive inserted in their place.

Disk drives are connected to the computer via a cable known as the I/O Bus. Some of the common interface formats include Enhanced Integrated Drive Electronics, EIDE; Advanced Technology Attachment, ATA; Serial ATA, SATA, Universal Serial Bus, USB; Fiber Channel, FC, and Small Computer Systems Interface, SCSI.

The host controller is at the computer end of the I/O bus, and the disk controller is built into the disk itself. The CPU issues commands to the host controller via I/O ports. Data is transferred between the magnetic surface and onboard cache by the disk controller, and then the data is transferred from that cache to the host controller and the motherboard memory at electronic speeds.

2. Magnetic Tapes

Magnetic tapes were once used for common secondary storage before the days of hard disk drives, but today are used primarily for backups.

Accessing a particular spot on a magnetic tape can be slow, but once reading or writing commences, access speeds are comparable to disk drives.

Capacities of tape drives can range from 20 to 200 GB and compression can double that capacity.

Thrashing

Thrashing

If a process does not have the number of frames it needs to support pages in active use, it will quickly page fault. At this point it must replace some page that will be needed again immediately. Consequently it quickly page faults again, and again, and again replacing pages that it must bring back immediately.

This high paging activity is called Thrashing. A process is thrashing if it is spending more time paging than executing.

Cause of Thrashing

Early process scheduling schemes would control the level of multi programming allowed based on CPU utilization, adding in more processes when CPU utilization was low. The problem is that when memory filled up and processes started spending lots of time waiting for their pages to page in, then CPU utilization would lower, causing the schedule to add in even more processes and worsen the problem. Eventually the system would essentially grind to a halt.

To limit the effects of thrashing a local replacement algorithm can be used. With local replacement if one process starts thrashing it cannot steal frames from another process and cause it to thrash. However, the problem is not entirely solved. The effective access time will increase even for a process that is not thrashing.

To prevent thrashing, a process needs to be provided with as many frames as it needs. There are several techniques to do so.

The working-set strategy is a approach which defines the locality model of process execution.

The locality model notes that processes typically access memory references in a given locality, making lots of references to the same general area of memory before moving periodically to a new locality. If we could just keep as many frames as are involved in the current locality, then page faulting would occur primarily on switches from one locality to another.

Working-Set Model

The working set model is based on the concept of locality, and defines a working set window, of length (∆) delta. Whatever pages are included in the most recent delta page references are said to be in the processes working set window, and comprise its current working set.

The selection of delta is critical to the success of the working set model - If it is too small then it does not encompass all of the pages of the current locality, and if it is too large, then it encompasses pages that are no longer being frequently accessed.

The total demand, D, is the sum of the sizes of the working sets for all processes. If D exceeds the total number of available frames, then at least one process is thrashing, because there are not enough frames available to satisfy its minimum working set. If D is significantly less than the currently available frames, then additional processes can be launched.

The working-set strategy prevents thrashing while keeping the degree of multi-programming as high as possible.

The hard part of the working-set model is keeping track of what pages are in the current working set, since every reference adds one to the set and removes one older page. An approximation can be made using reference bits and a timer that goes off after a set interval of memory references

Allocation of Frames

Allocation of Frames

Two important tasks in virtual memory management are: a page-replacement strategy and a frame-allocation strategy.

1. Minimum Number of Frames

The absolute minimum number of frames that a process must be allocated is dependent on system architecture. One reason for allocating at least a minimum number of frames involves performance. As the number of frames allocated decreases, the page fault rate increases results slowing process execution.

Enough frames had to be allocated to hold all the different pages that any single instruction can reference.

The worst case involves indirect addressing, particularly where multiple levels of indirect addressing are allowed. Left unchecked, a pointer to a pointer to a pointer to a pointer to a . . . could theoretically touch every page in the virtual address space in a single machine instruction, requiring every virtual page be loaded into physical memory simultaneously. For this reason architectures place a limit ( say 16 ) on the number of levels of indirection allowed in an instruction, which is enforced with a counter initialized to the limit and decremented with every level of indirection in an instruction - If the counter reaches zero, then an "excessive indirection" trap occurs.

2. Allocation Algorithms

Equal Allocation - If there are m frames available and n processes to share them, each process gets

m / n frames, and the leftovers are kept in a free-frame buffer pool.

Proportional Allocation - Allocate the frames proportionally to the size of the process, relative to the total size of all processes.

So if the size of process i is S_i, and S is the sum of all S_i, then the allocation for process P_i is

a_i = m * S_i / S, where ‘m’ is total number of available frames.

Variations on proportional allocation could consider priority of process rather than just their size.

3. Global versus Local Allocation

Page replacement algorithms can be classified into two broad categories: global replacement and local replacement.

· With local replacement, the number of pages allocated to a process is fixed, and page replacement occurs only amongst the pages allocated to this process.

· With global replacement, any page may be a potential victim, whether it currently belongs to the process seeking a free frame or not.

· Local page replacement allows processes to better control their own page fault rates, and leads to more consistent performance of a given process over different system load levels.

· Global page replacement is overall more efficient, and is the more commonly used approach.

4. Non-Uniform Memory Access

The assumption is that all main memory is created equal or at least that is accessed equally. On many systems this is not the case like in multiple-processor systems, especially where each CPU is physically located on a separate circuit board which also holds some portion of the overall system memory. The performance differences are caused by how CPU’s and memory are interconnected in the system.

Systems in which memory access times vary significantly are collectively known as non-uniform memory access (NUMA) systems. Memory allocation algorithms have to be modified by considering NUMA into account; similar changes have to be made to scheduling system. The goal of these changes is to have memory frames allocated as close as possible to the CPU on which the processor running.

Counting Based Page replacement, Page-Buffering Algorithms and Applications and Page replacement

Counting Based Page replacement:

There are several algorithms based on counting the number of references that have been made to a given page, such as:

o . Least Frequently Used, LFU: Replace the page with the lowest reference count. A problem can occur if a page is used frequently initially and then not used any more, as the reference count remains high. A solution to this problem is to right-shift the counters periodically, yielding a time-decaying average reference count

o Most Frequently Used, MFU: Replace the page with the highest reference count. The logic behind this idea is that pages that have already been referenced a lot have been in the system a long time, and we are probably done with them, whereas pages referenced only a few times have only recently been loaded, and we still need them.

In general counting-based algorithms are not commonly used, as their implementation is expensive and they do not approximate OPT well.

Page-Buffering Algorithms:

There are number of page buffering algorithms often used in addition to a specific page replacement algorithm.

For example, maintain a certain minimum number of free frames at all times. When a page-fault occurs, go ahead and allocate one of the free frames from the free list first, to get the requesting process up and running again as quickly as possible, and then select a victim page to write to disk and free up a frame as a second step.

Keep a list of modified pages, and when the I/O system is otherwise idle, have it write these pages out to disk, and then clear the modify bits, thereby increasing the chance of finding a "clean" page for the next potential victim.

Keep a pool of free frames, but remember what page was in it before it was made free. Since the data in the page is not actually cleared out when the page is freed, it can be made an active page again without having to load in any new data from disk. This is useful when an algorithm mistakenly replaces a page that in fact is needed again soon.

Applications and Page replacement:

· Some applications ( most notably database programs ) understand their data accessing and caching needs better than the general-purpose OS, and should therefore be given reign to do their own memory management.

· Sometimes such programs are given a raw disk partition to work with, containing raw data blocks and no file system structure. It is then up to the application to use this disk partition as extended memory or for whatever other reasons it sees fit.

LRU-Approximation Page Replacement

LRU-Approximation Page Replacement

i. Additional reference bits algorithm:

The ordering information can be gained by recording the reference bits at regular intervals: Storing the most recent 8 reference bits for each page in an 8-bit byte in the page table entry, which is interpreted as an unsigned integer.

At periodic intervals ( clock interrupts ), the OS takes over, and right-shifts each of the reference bytes by one bit.

The high-order ( leftmost ) bit is then filled in with the current value of the reference bit, and the reference bits are cleared.

At any given time, the page with the smallest value for the reference byte is the LRU page.

Ex:

00000000 indicates page has not been used for eight time periods.

11000100 indicate that page had referenced recently.

If the eight bit are interpreted as unsigned integers, the page with lowest number is the page to be replaced.

ii. Second chance algorithm:

The basic algorithm of second chance replacement is a FIFO replacement algorithm. When a page has been selected, the reference bit is inspected. If the value is 0, the page is replaced but if the reference bit is set to 1, the page is given second chance and move on to select the next FIFO page.

When a page gets a second chance, its reference bit is cleared and its arrival time is reset to the current time. Thus a page that is given a second chance will not be replaced until all other pages have been replaced.

One way to implement the second chance algorithm is a circular queue. Second chance replacement degenerates to FIFO replacement if bits of all pages are set.

iii. Enhanced second-chance algorithm:

The enhanced second chance algorithm looks at the reference bit and the modify bit ( dirty bit ) as an ordered page, and classifies pages into one of four classes:

( 0, 0 ) - Neither recently used nor modified.
( 0, 1 ) - Not recently used, but modified.
( 1, 0 ) - Recently used, but clean.
( 1, 1 ) - Recently used and modified.

This algorithm searches the page table in a circular fashion ( in as many as four passes ), looking for the first page it can find in the lowest numbered category. I.e. it first makes a pass looking for a ( 0, 0 ), and then if it can't find one, it makes another pass looking for a ( 0, 1 ), etc.

The main difference between this algorithm and the previous one is the preference for replacing clean pages if possible.

Saturday, February 22, 2020

LRU Page replacement

Least Recently Used (LRU) page replacement algorithm

The idea behind LRU page replacement is to use recent past as an approximation of the near future. Replace the page that has not been used for the longest period of time.

LRU is considered a good replacement policy, and is often used. The problem is how exactly to implement it. There are two simple approaches commonly used:

Counters. Every memory access increments a counter, and the current value of this counter is stored in the page table entry for that page. Then finding the LRU page involves simple searching the table for the page with the smallest counter value. Note that overflowing of the counter must be considered.
Stack. Another approach is to use a stack, and whenever a page is accessed, pull that page from the middle of the stack and place it on the top. The LRU page will always be at the bottom of the stack. Because this requires removing objects from the middle of the stack, a doubly linked list is the recommended data structure.

Note that both implementations of LRU require hardware support, either for incrementing the counter or for managing the stack, as these operations must be performed for every memory access.

Neither LRU or OPT exhibit Belady's anomaly. Both belong to a class of page-replacement algorithms called stack algorithms, which can never exhibit Belady's anomaly.

A stack algorithm is one in which the pages kept in memory for a frame set of size N will always be a subset of the pages kept for a frame size of N + 1.

Optimal Page Replacement

Optimal Page Replacement

Result of Belady’s anmoly was the search for an optimal page replacement algorithm, which has the lowest page fault rate of all algorithms and never suffer from Belady’s anamoly.

This algorithm is simple: "Replace the page that will not be used for the longest time in the future."

Optimal replacement is better than FIFO algorithm. Unfortunately the optimal page replacement algorithm is difficult to implement because it requires future knowledge of the reference string. The optimal page replacement algorithm is used mainly for comparison studies.

First In First Out Page replacement

FIFO page replacement

First In First Out page replacement is the simplest page replacement. this algorithm associates with each page the time when that page was brought into memory. When a page must be replace, the oldest page is chosen.

Although FIFO is simple and easy, it is not always optimal, or even efficient. An interesting effect that can occur with FIFO is Belady's anomaly, in which increasing the number of frames available can actually increase the number of page faults that occur. Consider, for example, the following reference string ( 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 ). The number of page faults for four frames is 10 and the number of page faults for three frames is 9.

Thursday, February 20, 2020

Basic page replacement

Basic page replacement

If no frame is fee, find page that is not currently being used and free it. Frame is freed by writing its contents to swap space and changing the page table to indicate that the page is no longer in memory. The page fault service routine is modified to include page replacement.

Find the location of the desired page on the disk, either in swap space or in the file system.
Find a free frame:

If there is a free frame, use it.
If there is no free frame, use a page-replacement algorithm to select an existing frame to be replaced, known as the victim frame.
Write the victim frame to disk. Change all related page tables to indicate that this page is no longer in memory.

Read in the desired page and store it in the frame. Adjust all related page and frame tables to indicate the change.
Restart the process that was waiting for this page.

A modify bit or dirty bit is used to reduce the overhead of disk write. A dirty bit of each page indicate whether or not it has been changed since it was last loaded in from disk. If the dirty bit has not been set, then the page is unchanged, and does not need to be written out to disk. Otherwise the page write is required. It should come as no surprise that many page replacement strategies specifically look for pages that do not have their dirty bit set, and preferentially select clean pages as victim pages.

Page replacement is basic to demand paging. There are two major requirements to implement a successful demand paging system: developing a frame-allocation algorithm and a page-replacement algorithm. . The former decides how many frames are allocated to each process ( and to other needs ), and the latter deals with how to select a page for replacement when there are no free frames available.

The overall goal in selecting and tuning these algorithms is to generate the fewest number of overall page faults.

Algorithms are evaluated using a given string of memory accesses known as a reference string, and computing the number of page faults. To determine the number of page faults for a particular reference string and page replacement algorithm, number of page frames available is also needed. As the number of page frames available increases the number of page faults decreases.

Page Replacement

Page Replacement

While a user process is executing, a page fault occurs. the operating system determines where the desired page is residing on the disk. If there are no free frames on the free frame list, the operating system has several options.

1. It could terminate the user process; this option is not best choice.

2. Swap some process out of memory completely, freeing up its page frames.

3. Find some page in memory that isn't being used right now, and swap that page only out to disk, freeing up a frame that can be allocated to the process requesting it. This is known as page replacement, and is the most common solution.

There are many different algorithms for page replacement:

a. Basic page replacement

b. FIFO page replacement

c. Optimal page replacement

d. LRU page replacement

e. LRU approximation page replacement

f. Counting based page replacement

g. Page- buffering algorithms

Copy on Write

Copy-on-write

The fork() system call creates a child process that is a duplicate of its parent. Traditionally fork() creates a copy of the parent’s address space for the child, duplicating the pages belonging to the parent.

Instead of copying the parent’s address space a technique known as Copy-on-write is used. It works by allowing the parent and child process initially shares the same pages. These shared pages are marked as copy-on-write pages, meaning that if either process writes to a shared page, a copy of the shared page is created.

Obviously only pages that can be modified even need to be labeled as copy-on-write. Code segments can simply be shared. Pages used to satisfy copy-on-write duplication are typically allocated using zero-fill-on-demand, meaning that their previous contents are zeroed out before the copy proceeds.