Red Hat ENTERPRISE LINUX 4 - INTRODUCTION TO SYSTEM ADMINISTRATION Administration Manual

Red Hat Enterprise Linux 4
Introduction to System
Administration
Red Hat Enterprise Linux 4: Introduction to System Administration
Copyright © 2005 Red Hat, Inc.
Red Hat, Inc.
rhel-isa(EN)-4-Print-RHI (2004-08-25T17:11) Copyright © 2005 by Red Hat, Inc. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, V1.0 or later (the latest version is presentlyavailableat http://www.opencontent.org/openpub/). Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. Distribution of the work or derivative of the work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder. Red Hat and the Red Hat "Shadow Man" logo are registered trademarks of Red Hat, Inc. in the United States and other
countries. All other trademarks referenced herein are the property of their respective owners. The GPG fingerprint of the security@redhat.com key is: CA 20 86 86 2B D6 9D FC 65 F6 EC C4 21 91 80 CD DB 42 A6 0E
Table of Contents
Introduction.......................................................................................................................................... i
1. Architecture-specific Information .......................................................................................... i
2. Document Conventions .......................................................................................................... i
3. Activate Your Subscription .................................................................................................. iv
3.1. Provide a Red Hat Login....................................................................................... iv
3.2. Provide Your Subscription Number ....................................................................... v
3.3. Connect Your System............................................................................................. v
4. More to Come ....................................................................................................................... v
4.1. Send in Your Feedback .......................................................................................... v
1. The Philosophy of System Administration ................................................................................... 1
1.1. Automate Everything ......................................................................................................... 1
1.2. Document Everything ........................................................................................................ 2
1.3. Communicate as Much as Possible .................................................................................... 3
1.3.1. Tell Your Users What You Are Going to Do ...................................................... 3
1.3.2. Tell Your Users What You Are Doing ................................................................ 4
1.3.3. Tell Your Users What You Have Done ............................................................... 4
1.4. Know Your Resources........................................................................................................ 5
1.5. Know Your Users ............................................................................................................... 6
1.6. Know Your Business.......................................................................................................... 6
1.7. Security Cannot be an Afterthought .................................................................................. 6
1.7.1. The Risks of Social Engineering......................................................................... 7
1.8. Plan Ahead ......................................................................................................................... 7
1.9. Expect the Unexpected ...................................................................................................... 8
1.10. Red Hat Enterprise Linux-Specific Information .............................................................. 8
1.10.1. Automation ....................................................................................................... 8
1.10.2. Documentation and Communication ................................................................ 9
1.10.3. Security ........................................................................................................... 10
1.11. Additional Resources ..................................................................................................... 10
1.11.1. Installed Documentation ................................................................................. 10
1.11.2. Useful Websites .............................................................................................. 11
1.11.3. Related Books ................................................................................................. 11
2. Resource Monitoring .................................................................................................................... 13
2.1. Basic Concepts................................................................................................................. 13
2.2. System Performance Monitoring ..................................................................................... 13
2.3. Monitoring System Capacity ........................................................................................... 14
2.4. What to Monitor? ............................................................................................................. 14
2.4.1. Monitoring CPU Power .................................................................................... 15
2.4.2. Monitoring Bandwidth ...................................................................................... 16
2.4.3. Monitoring Memory.......................................................................................... 16
2.4.4. Monitoring Storage ........................................................................................... 17
2.5. Red Hat Enterprise Linux-Specific Information .............................................................. 18
2.5.1. free.................................................................................................................. 18
2.5.2. top .................................................................................................................... 19
2.5.3. vmstat.............................................................................................................. 20
2.5.4. The Sysstat Suite of Resource Monitoring Tools ............................................. 22
2.5.5. OProfile ............................................................................................................. 24
2.6. Additional Resources ....................................................................................................... 28
2.6.1. Installed Documentation ................................................................................... 28
2.6.2. Useful Websites ................................................................................................ 28
2.6.3. Related Books ................................................................................................... 29
3. Bandwidth and Processing Power ............................................................................................... 31
3.1. Bandwidth ........................................................................................................................ 31
3.1.1. Buses ................................................................................................................. 31
3.1.2. Datapaths........................................................................................................... 32
3.1.3. Potential Bandwidth-Related Problems ............................................................ 32
3.1.4. Potential Bandwidth-Related Solutions ............................................................ 33
3.1.5. In Summary. . . .................................................................................................. 33
3.2. Processing Power ............................................................................................................. 34
3.2.1. Facts About Processing Power.......................................................................... 34
3.2.2. Consumers of Processing Power....................................................................... 34
3.2.3. Improving a CPU Shortage ............................................................................... 35
3.3. Red Hat Enterprise Linux-Specific Information .............................................................. 38
3.3.1. Monitoring Bandwidth on Red Hat Enterprise Linux....................................... 38
3.3.2. Monitoring CPU Utilization on Red Hat Enterprise Linux .............................. 39
3.4. Additional Resources ....................................................................................................... 43
3.4.1. Installed Documentation ................................................................................... 43
3.4.2. Useful Websites ................................................................................................ 43
3.4.3. Related Books ................................................................................................... 44
4. Physical and Virtual Memory ...................................................................................................... 45
4.1. Storage Access Patterns ................................................................................................... 45
4.2. The Storage Spectrum ...................................................................................................... 45
4.2.1. CPU Registers ................................................................................................... 46
4.2.2. Cache Memory .................................................................................................. 46
4.2.3. Main Memory — RAM .................................................................................... 47
4.2.4. Hard Drives ....................................................................................................... 48
4.2.5. Off-Line Backup Storage .................................................................................. 49
4.3. Basic Virtual Memory Concepts ...................................................................................... 49
4.3.1. Virtual Memory in Simple Terms ..................................................................... 49
4.3.2. Backing Store — the Central Tenet of Virtual Memory ................................... 50
4.4. Virtual Memory: The Details ........................................................................................... 50
4.4.1. Page Faults ........................................................................................................ 51
4.4.2. The Working Set ............................................................................................... 52
4.4.3. Swapping........................................................................................................... 52
4.5. Virtual Memory Performance Implications ..................................................................... 52
4.5.1. Worst Case Performance Scenario .................................................................... 53
4.5.2. Best Case Performance Scenario ...................................................................... 53
4.6. Red Hat Enterprise Linux-Specific Information .............................................................. 53
4.7. Additional Resources ....................................................................................................... 56
4.7.1. Installed Documentation ................................................................................... 56
4.7.2. Useful Websites ................................................................................................ 56
4.7.3. Related Books ................................................................................................... 57
5. Managing Storage ......................................................................................................................... 59
5.1. An Overview of Storage Hardware.................................................................................. 59
5.1.1. Disk Platters ...................................................................................................... 59
5.1.2. Data reading/writing device .............................................................................. 59
5.1.3. Access Arms ..................................................................................................... 60
5.2. Storage Addressing Concepts .......................................................................................... 61
5.2.1. Geometry-Based Addressing ............................................................................ 61
5.2.2. Block-Based Addressing ................................................................................... 62
5.3. Mass Storage Device Interfaces ....................................................................................... 63
5.3.1. Historical Background ...................................................................................... 63
5.3.2. Present-Day Industry-Standard Interfaces ........................................................ 64
5.4. Hard Drive Performance Characteristics ......................................................................... 66
5.4.1. Mechanical/Electrical Limitations .................................................................... 67
5.4.2. I/O Loads and Performance .............................................................................. 68
5.5. Making the Storage Usable.............................................................................................. 69
5.5.1. Partitions/Slices................................................................................................. 70
5.5.2. File Systems ...................................................................................................... 71
5.5.3. Directory Structure............................................................................................ 73
5.5.4. Enabling Storage Access................................................................................... 74
5.6. Advanced Storage Technologies ...................................................................................... 74
5.6.1. Network-Accessible Storage ............................................................................. 74
5.6.2. RAID-Based Storage ........................................................................................ 75
5.6.3. Logical Volume Management ........................................................................... 80
5.7. Storage Management Day-to-Day ................................................................................... 81
5.7.1. Monitoring Free Space ...................................................................................... 82
5.7.2. Disk Quota Issues ............................................................................................. 84
5.7.3. File-Related Issues ............................................................................................ 85
5.7.4. Adding/Removing Storage................................................................................ 86
5.8. A Word About Backups. .. .............................................................................................. 91
5.9. Red Hat Enterprise Linux-Specific Information .............................................................. 92
5.9.1. Device Naming Conventions ............................................................................ 92
5.9.2. File System Basics ............................................................................................ 94
5.9.3. Mounting File Systems ..................................................................................... 96
5.9.4. Network-Accessible Storage Under Red Hat Enterprise Linux ....................... 99
5.9.5. Mounting File Systems Automatically with /etc/fstab............................. 100
5.9.6. Adding/Removing Storage.............................................................................. 100
5.9.7. Implementing Disk Quotas ............................................................................. 104
5.9.8. Creating RAID Arrays .................................................................................... 108
5.9.9. Day to Day Management of RAID Arrays ..................................................... 109
5.9.10. Logical Volume Management ....................................................................... 111
5.10. Additional Resources ................................................................................................... 111
5.10.1. Installed Documentation ............................................................................... 111
5.10.2. Useful Websites ............................................................................................ 111
5.10.3. Related Books ............................................................................................... 112
6. Managing User Accounts and Resource Access ....................................................................... 113
6.1. Managing User Accounts............................................................................................... 113
6.1.1. The Username ................................................................................................. 113
6.1.2. Passwords........................................................................................................ 116
6.1.3. Access Control Information ............................................................................ 120
6.1.4. Managing Accounts and Resource Access Day-to-Day ................................. 121
6.2. Managing User Resources ............................................................................................. 123
6.2.1. Who Can Access Shared Data ........................................................................ 123
6.2.2. Where Users Access Shared Data ................................................................... 124
6.2.3. What Barriers Are in Place To Prevent Abuse of Resources .......................... 125
6.3. Red Hat Enterprise Linux-Specific Information ............................................................ 125
6.3.1. User Accounts, Groups, and Permissions ....................................................... 125
6.3.2. Files Controlling User Accounts and Groups ................................................. 127
6.3.3. User Account and Group Applications ........................................................... 130
6.4. Additional Resources ..................................................................................................... 131
6.4.1. Installed Documentation ................................................................................. 132
6.4.2. Useful Websites .............................................................................................. 132
6.4.3. Related Books ................................................................................................. 132
7. Printers and Printing .................................................................................................................. 135
7.1. Types of Printers ............................................................................................................ 135
7.1.1. Printing Considerations ................................................................................... 135
7.2. Impact Printers ............................................................................................................... 136
7.2.1. Dot-Matrix Printers ......................................................................................... 136
7.2.2. Daisy-Wheel Printers ...................................................................................... 137
7.2.3. Line Printers.................................................................................................... 137
7.2.4. Impact Printer Consumables ........................................................................... 137
7.3. Inkjet Printers................................................................................................................. 137
7.3.1. Inkjet Consumables......................................................................................... 138
7.4. Laser Printers ................................................................................................................. 138
7.4.1. Color Laser Printers ........................................................................................ 138
7.4.2. Laser Printer Consumables ............................................................................. 139
7.5. Other Printer Types ........................................................................................................ 139
7.6. Printer Languages and Technologies ............................................................................. 140
7.7. Networked Versus Local Printers................................................................................... 140
7.8. Red Hat Enterprise Linux-Specific Information ............................................................ 141
7.9. Additional Resources ..................................................................................................... 142
7.9.1. Installed Documentation ................................................................................. 142
7.9.2. Useful Websites .............................................................................................. 142
7.9.3. Related Books ................................................................................................. 143
8. Planning for Disaster .................................................................................................................. 145
8.1. Types of Disasters .......................................................................................................... 145
8.1.1. Hardware Failures ........................................................................................... 145
8.1.2. Software Failures ............................................................................................ 150
8.1.3. Environmental Failures ................................................................................... 153
8.1.4. Human Errors.................................................................................................. 158
8.2. Backups.......................................................................................................................... 162
8.2.1. Different Data: Different Backup Needs......................................................... 163
8.2.2. Backup Software: Buy Versus Build............................................................... 164
8.2.3. Types of Backups ............................................................................................ 165
8.2.4. Backup Media ................................................................................................. 166
8.2.5. Storage of Backups ......................................................................................... 168
8.2.6. Restoration Issues ........................................................................................... 168
8.3. Disaster Recovery .......................................................................................................... 169
8.3.1. Creating, Testing, and Implementing a Disaster Recovery Plan ..................... 170
8.3.2. Backup Sites: Cold, Warm, and Hot ............................................................... 171
8.3.3. Hardware and Software Availability ............................................................... 171
8.3.4. Availability of Backups................................................................................... 172
8.3.5. Network Connectivity to the Backup Site....................................................... 172
8.3.6. Backup Site Staffing ....................................................................................... 172
8.3.7. Moving Back Toward Normalcy ..................................................................... 172
8.4. Red Hat Enterprise Linux-Specific Information ............................................................ 173
8.4.1. Software Support ............................................................................................ 173
8.4.2. Backup Technologies ...................................................................................... 173
8.5. Additional Resources ..................................................................................................... 176
8.5.1. Installed Documentation ................................................................................. 176
8.5.2. Useful Websites .............................................................................................. 177
8.5.3. Related Books ................................................................................................. 177
Index................................................................................................................................................. 179
Colophon.......................................................................................................................................... 187
Introduction
Welcome to the Red Hat Enterprise Linux Introduction to System Administration.
The Red Hat Enterprise Linux Introduction to System Administration contains introductory informa­tion for new Red Hat Enterprise Linux system administrators. It does not teach you how to perform a particular task under Red Hat Enterprise Linux; rather, it provides you with the background knowledge that more experienced system administrators have learned over time.
This guide assumes you have a limited amount of experience as a Linux user, but no Linux system administration experience. If you are completely new to Linux in general (and Red Hat Enterprise Linux in particular), you should start by purchasing an introductory book on Linux.
Each chapter in the Red Hat Enterprise Linux Introduction to System Administration has the following structure:
Generic overview material — This section discusses the topic of the chapter without going into
details about a specific operating system, technology, or methodology.
Red Hat Enterprise Linux-specific material — This section addresses aspects of the topic related to
Linux in general and Red Hat Enterprise Linux in particular.
Additional resources for further study — This section includes pointers to other Red Hat Enterprise
Linux manuals, helpful websites, and books containing information applicable to the topic.
By adopting a consistent structure, readers can more easily read the Red Hat Enterprise Linux Intro- duction to System Administration in whatever way they choose. For example, an experienced system administrator with little Red Hat Enterprise Linux experience could skim only the sections that specif­ically focus on Red Hat Enterprise Linux, while a new system adminstrator could start by reading only the generic overview sections, and using the Red Hat Enterprise Linux-specific sections as an intro­duction to more in-depth resources.
While on the subject of more in-depth resources, the Red Hat Enterprise Linux System Adminis- tration Guide is an excellent resource for performing specific tasks in a Red Hat Enterprise Linux environment. Administrators requiring more in-depth, factual information should refer to the Red Hat Enterprise Linux Reference Guide.
HTML, PDF, and RPM versions of the manuals are available on the Red Hat Enterprise Linux Docu­mentation CD and online at http://www.redhat.com/docs/.
Note
Although this manual reflects the most current information possible, read the Red Hat Enterprise Linux Release Notes for information that may not have been available prior to our documenta-
tion being finalized. They can be found on the Red Hat Enterprise Linux CD #1 and online at http://www.redhat.com/docs/.
1. Architecture-specific Information
Unless otherwise noted, all information contained in this manual apply only to the x86 processor and processors featuring the Intel® Extended Memory 64 Technology (Intel® EM64T) and AMD64 technologies. For architecture-specific information, refer to the Red Hat Enterprise Linux Installation Guide for your respective architecture.
ii Introduction
2. Document Conventions
When you read this manual, certain words are represented in different fonts, typefaces, sizes, and weights. This highlighting is systematic; different words are represented in the same style to indicate their inclusion in a specific category. The types of words that are represented this way include the following:
command
Linux commands (and other operating system commands, when used) are represented this way. This style should indicate to you that you can type the word or phrase on the command line and press [Enter] to invoke a command. Sometimes a command contains words that would be displayed in a different style on their own (such as file names). In these cases, they are considered to be part of the command, so the entire phrase is displayed as a command. For example:
Use the cat testfile command to view the contents of a file, named testfile, in the current working directory.
file name
File names, directory names, paths, and RPM package names are represented this way. This style should indicate that a particular file or directory exists by that name on your system. Examples:
The .bashrc file in your home directory contains bash shell definitions and aliases for your own use.
The /etc/fstab file contains information about different system devices and file systems.
Install the webalizer RPM if you want to use a Web server log file analysis program.
application
This style indicates that the program is an end-user application (as opposed to system software). For example:
Use Mozilla to browse the Web.
[key]
A key on the keyboard is shown in this style. For example:
To use [Tab] completion, type in a character and then press the [Tab] key. Your terminal displays the list of files in the directory that start with that letter.
[key]-[combination]
A combination of keystrokes is represented in this way. For example:
The [Ctrl]-[Alt]-[Backspace] key combination exits your graphical session and return you to the graphical login screen or the console.
text found on a GUI interface
A title, word, or phrase found on a GUI interface screen or window is shown in this style. Text shown in this style is being used to identify a particular GUI screen or an element on a GUI screen (such as text associated with a checkbox or field). Example:
Select the Require Password checkbox if you would like your screensaver to require a password before stopping.
top level of a menu on a GUI screen or window
A word in this style indicates that the word is the top level of a pulldown menu. If you click on the word on the GUI screen, the rest of the menu should appear. For example:
Introduction iii
Under File on a GNOME terminal, the New Tab option allows you to open multiple shell prompts in the same window.
If you need to type in a sequence of commands from a GUI menu, they are shown like the following example:
Go to Main Menu Button (on the Panel) => Programming => Emacs to start the Emacs text editor.
button on a GUI screen or window
This style indicates that the text can be found on a clickable button on a GUI screen. For example:
Click on the Back button to return to the webpage you last viewed.
computer output
Text in this style indicates text displayed to a shell prompt such as error messages and responses to commands. For example:
The ls command displays the contents of a directory. For example:
Desktop about.html logs paulwesterberg.png Mail backupfiles mail reports
The output returned in response to the command (in this case, the contents of the directory) is shown in this style.
prompt
A prompt, which is a computer’s way of signifying that it is ready for you to input something, is shown in this style. Examples:
$
#
[stephen@maturin stephen]$
leopard login:
user input
Text that the user has to type, either on the command line, or into a text box on a GUI screen, is displayed in this style. In the following example, text is displayed in this style:
To boot your system into the text based installation program, you must type in the text com­mand at the boot: prompt.
replaceable
Text used for examples, which is meant to be replaced with data provided by the user, is displayed in this style. In the following example, <version-number> is displayed in this style:
The directory for the kernel source is /usr/src/<version-number>/, where <version-number> is the version of the kernel installed on this system.
Additionally, we use several different strategies to draw your attention to certain pieces of information. In order of how critical the information is to your system, these items are marked as a note, tip, important, caution, or warning. For example:
Note
Remember that Linux is case sensitive. In other words, a rose is not a ROSE is not a rOsE.
iv Introduction
Tip
The directory /usr/share/doc/ contains additional documentation for packages installed on your system.
Important
If you modify the DHCP configuration file, the changes do not take effect until you restart the DHCP daemon.
Caution
Do not perform routine tasks as root — use a regular user account unless you need to use the root account for system administration tasks.
Warning
Be careful to remove only the necessary Red Hat Enterprise Linux partitions. Removing other parti­tions could result in data loss or a corrupted system environment.
3. Activate Your Subscription
Before you can access service and software maintenance information, and the support documenta­tion included in your subscription, you must activate your subscription by registering with Red Hat. Registration includes these simple steps:
Provide a Red Hat login
Provide a subscription number
Connect your system
The first time you boot your installation of Red Hat Enterprise Linux, you are prompted to register with Red Hat using the Setup Agent. If you follow the prompts during the Setup Agent, you can complete the registration steps and activate your subscription.
If you can not complete registration during the Setup Agent (which requires network access), you can alternatively complete the Red Hat registration process online at http://www.redhat.com/register/.
3.1. Provide a Red Hat Login
If you do not have an existing Red Hat login, you can create one when prompted during the Setup Agent or online at:
https://www.redhat.com/apps/activate/newlogin.html
Introduction v
A Red Hat login enables your access to:
Software updates, errata and maintenance via Red Hat Network
Red Hat technical support resources, documentation, and Knowledgebase
If you have forgotten your Red Hat login, you can search for your Red Hat login online at:
https://rhn.redhat.com/help/forgot_password.pxt
3.2. Provide Your Subscription Number
Your subscription number is located in the package that came with your order. If your package did not include a subscription number, your subscription was activated for you and you can skip this step.
You can provide your subscription number when prompted during the Setup Agent or by visiting http://www.redhat.com/register/.
3.3. Connect Your System
The Red Hat Network Registration Client helps you connect your system so that you can begin to get updates and perform systems management. There are three ways to connect:
1. During the Setup Agent — Check the Send hardware information and Send system package list options when prompted.
2. After the Setup Agent has been completed — From the Main Menu, go to System Tools, then select Red Hat Network.
3. After the Setup Agent has been completed — Enter the following command from the command line as the root user:
/usr/bin/up2date --register
4. More to Come
The Red Hat Enterprise Linux Introduction to System Administration is part of Red Hat’s growing commitment to provide useful and timely support to Red Hat Enterprise Linux users. As new re­leases of Red Hat Enterprise Linux are made available, we make every effort to include both new and improved documentation for you.
4.1. Send in Your Feedback
If you spot a typo in the Red Hat Enterprise Linux Introduction to System Administration, or if you have thought of a way to make this manual better, we would love to hear from you. Please submit a report in Bugzilla (http://bugzilla.redhat.com/bugzilla) against the component rhel-isa.
Be sure to mention the manual’s identifier:
rhel-isa(EN)-4-Print-RHI (2004-08-25T17:11)
If you mention this manual’s identifier, we will know exactly which version of the guide you have.
vi Introduction
If you have a suggestion for improving the documentation, try to be as specific as possible. If you have found an error, please include the section number and some of the surrounding text so we can find it easily.
Chapter 1.
The Philosophy of System Administration
Although the specifics of being a system administrator may change from platform to platform, there are underlying themes that do not. These themes make up the philosophy of system administration.
The themes are:
Automate everything
Document everything
Communicate as much as possible
Know your resources
Know your users
Know your business
Security cannot be an afterthought
Plan ahead
Expect the unexpected
The following sections explore each theme in more detail.
1.1. Automate Everything
Most system administrators are outnumbered — either by their users, their systems, or both. In many cases, automation is the only way to keep up. In general, anything done more than once should be examined as a possible candidate for automation.
Here are some commonly automated tasks:
Free disk space checking and reporting
Backups
System performance data collection
User account maintenance (creation, deletion, etc.)
Business-specific functions (pushing new data to a Web server, running monthly/quarterly/yearly
reports, etc.)
This list is by no means complete; the functions automated by system administrators are only limited by an administrator’s willingness to write the necessary scripts. In this case, being lazy (and making the computer do more of the mundane work) is actually a good thing.
Automation also gives users the extra benefit of greater predictability and consistency of service.
Tip
Keep in mind that if you have a task that should be automated, it is likely that you are not the first system administrator to have that need. Here is where the benefits of open source software really shine — you may be able to leverage someone else’s work to automate the manual procedure that is currently eating up your time. So always make sure you search the Web before writing anything more complex than a small Perl script.
2 Chapter 1. The Philosophy of System Administration
1.2. Document Everything
If given the choice between installing a brand-new server and writing a procedural document on performing system backups, the average system administrator would install the new server every time. While this is not at all unusual, you must document what you do. Many system administrators put off doing the necessary documentation for a variety of reasons:
"I will get around to it later."
Unfortunately, this is usually not true. Even if a system administrator is not kidding themselves, the nature of the job is such that everyday tasks are usually too chaotic to "do it later." Even worse, the longer it is put off, the more that is forgotten, leading to a much less detailed (and therefore, less useful) document.
"Why write it up? I will remember it."
Unless you are one of those rare individuals with a photographic memory, no, you will not remember it. Or worse, you will remember only half of it, not realizing that you are missing the whole story. This leads to wasted time either trying to relearn what you had forgotten or fixing what you had broken due to your incomplete understanding of the situation.
"If I keep it in my head, they will not fire me — I will have job security!"
While this may work for a while, invariably it leads to less — not more — job security. Think for a moment about what may happen during an emergency. You may not be available; your documentation may save the day by letting someone else resolve the problem in your absence. And never forget that emergencies tend to be times when upper management pays close attention. In such cases, it is better to have your documentation be part of the solution than it is for your absence to be part of the problem.
In addition, if you are part of a small but growing organization, eventually there will be a need for another system administrator. How can this person learn to back you up if everything is in your head? Worst yet, not documenting may make you so indispensable that you might not be able to advance your career. You could end up working for the very person that was hired to assist you.
Hopefully you are now sold on the benefits of system documentation. That brings us to the next question: What should you document? Here is a partial list:
Policies
Policies are written to formalize and clarify the relationship you have with your user community. They make it clear to your users how their requests for resources and/or assistance are handled. The nature, style, and method of disseminating policies to your a community varies from orga­nization to organization.
Procedures
Procedures are any step-by-step sequence of actions that must be taken to accomplish a certain task. Procedures to be documented can include backup procedures, user account management procedures, problem reporting procedures, and so on. Like automation, if a procedure is followed more than once, it is a good idea to document it.
Changes
A large part of a system administrator’s career revolves around making changes — configuring systems for maximum performance, tweaking scripts, modifying configuration files, and so on.
Chapter 1. The Philosophy of System Administration 3
All of these changes should be documented in some fashion. Otherwise, you could find yourself being completely confused about a change you made several months earlier.
Some organizations use more complex methods for keeping track of changes, but in many cases a simple revision history at the start of the file being changed is all that is necessary. At a minimum, each entry in the revision history should contain:
The name or initials of the person making the change
The date the change was made
The reason the change was made
This results in concise, yet useful entries:
ECB, 12-June-2002 — Updated entry for new Accounting printer (to support the replacement printer’s ability to print duplex)
1.3. Communicate as Much as Possible
When it comes to your users, you can never communicate too much. Be aware that small system changes you might think are practically unnoticeable could very well completely confuse the admin­istrative assistant in Human Resources.
The method by which you communicate with your users can vary according to your organization. Some organizations use email; others, an internal website. Still others may rely on Usenet news or IRC. A sheet of paper tacked to a bulletin board in the breakroom may even suffice at some places. In any case, use whatever method(s) that work well at your organization.
In general, it is best to follow this paraphrased approach used in writing newspaper stories:
1. Tell your users what you are going to do
2. Tell your users what you are doing
3. Tell your users what you have done
The following sections look at these steps in more depth.
1.3.1. Tell Your Users What You Are Going to Do
Make sure you give your users sufficient warning before you do anything. The actual amount of warning necessary varies according to the type of change (upgrading an operating system demands more lead time than changing the default color of the system login screen), as well as the nature of your user community (more technically adept users may be able to handle changes more readily than users with minimal technical skills.)
At a minimum, you should describe:
The nature of the change
When it will take place
Why it is happening
Approximately how long it should take
The impact (if any) that the users can expect due to the change
Contact information should they have any questions or concerns
Here is a hypothetical situation. The Finance department has been experiencing problems with their database server being very slow at times. You are going to bring the server down, upgrade the CPU
4 Chapter 1. The Philosophy of System Administration
module to a faster model, and reboot. Once this is done, you will move the database itself to faster, RAID-based storage. Here is one possible announcement for this situation:
System Downtime Scheduled for Friday Night
Starting this Friday at 6pm (midnight for our associates in Berlin), all financial applications will be unavail­able for a period of approximately four hours.
During this time, changes to both the hardware and software on the Finance database server will be per­formed. These changes should greatly reduce the time required to run the Accounts Payable and Accounts Receivable applications, and the weekly Balance Sheet report.
Other than the change in runtime, most people should notice no other change. However, those of you that have written your own SQL queries should be aware that the layout of some indices will change. This is documented on the company intranet website, on the Finance page.
Should you have any questions, comments, or concerns, please contact System Administration at extension
4321.
A few points are worth noting:
Effectively communicate the start and duration of any downtime that might be involved in the
change.
Make sure you give the time of the change in such a way that it is useful to all users, no matter
where they may be located.
Use terms that your users understand. The people impacted by this work do not care that the new
CPU module is a 2GHz unit with twice as much L2 cache, or that the database is being placed on a RAID 5 logical volume.
1.3.2. Tell Your Users What You Are Doing
This step is primarily a last-minute warning of the impending change; as such, it should be a brief repeat of the first message, though with the impending nature of the change made more apparent ("The system upgrade will take place TONIGHT."). This is also a good place to publicly answer any questions you may have received as a result of the first message.
Continuing our hypothetical example, here is one possible last-minute warning:
System Downtime Scheduled for Tonight
Reminder: The system downtime announced this past Monday will take place as scheduled tonight at 6pm (midnight for the Berlin office). You can find the original announcement on the company intranet website, on the System Administration page.
Several people have asked whether they should stop working early tonight to make sure their work is backed up prior to the downtime. This will not be necessary, as the work being done tonight will not impact any work done on your personal workstations.
Remember, those of you that have written your own SQL queries should be aware that the layout of some indices will change. This is documented on the company intranet website, on the Finance page.
Your users have been alerted; now you are ready to actually do the work.
Chapter 1. The Philosophy of System Administration 5
1.3.3. Tell Your Users What You Have Done
After you have finished making the changes, you must tell your users what you have done. Again, this should be a summary of the previous messages (invariably someone will not have read them.)
1
However, there is one important addition you must make. It is vital that you give your users the current status. Did the upgrade not go as smoothly as planned? Was the new storage server only able to serve the systems in Engineering, and not in Finance? These types of issues must be addressed here.
Of course, if the current status differs from what you communicated previously, you should make this point clear and describe what will be done (if anything) to arrive at the final solution.
In our hypothetical situation, the downtime had some problems. The new CPU module did not work; a call to the system’s manufacturer revealed that a special version of the module is required for in-the­field upgrades. On the plus side, the migration of the database to the RAID volume went well (even though it took a bit longer than planned due to the problems with the CPU module.
Here is one possible announcement:
System Downtime Complete
The system downtime scheduled for Friday night (refer to the System Administration page on the company intranet website) has been completed.Unfortunately, hardware issues prevented one of the tasks from being completed. Due to this, the remaining tasks took longer than the originally-scheduled four hours. Instead, all systems were back in production by midnight (6am Saturday for the Berlin office).
Because of the remaining hardware issues, performance of the AP, AR, and the Balance Sheet report will be slightly improved, but not to the extent originally planned. A second downtime will be announced and scheduled as soon as the issues that prevented completion of the task have been resolved.
Please note that the downtime did change some database indices; people that have written their own SQL queries should consult the Finance page on the company intranet website.
Please contact System Administration at extension 4321 with any questions.
With this kind of information, your users will have sufficient background knowledge to continue their work, and to understand how the changes impact them.
1.4. Know Your Resources
System administration is mostly a matter of balancing available resources against the people and programs that use those resources. Therefore, your career as a system administrator will be a short and stress-filled one unless you fully understand the resources you have at your disposal.
Some of the resources are ones that seem pretty obvious:
System resources, such as available processing power, memory, and disk space
Network bandwidth
Available money in the IT budget
But some may not be so obvious:
The services of operations personnel, other system administrators, or even an administrative assis-
tant
1. Be sure to send this message out as soon as the work is done, before you leave for home. Once you have left
the office, it is much too easy to forget, leaving your users in the dark as to whether they can use the system or
not.
6 Chapter 1. The Philosophy of System Administration
Time (often of critical importance when the time involves things such as the amount of time during
which system backups may take place)
Knowledge (whether it is stored in books, system documentation, or the brain of a person that has
worked at the company for the past twenty years)
It is important to note is that it is highly valuable to take a complete inventory of those resources available to you and to keep it current — a lack of "situational awareness" when it comes to available resources can often be worse than no awareness at all.
1.5. Know Your Users
Although some people bristle at the term "users" (perhaps due to some system administrators’ use of the term in a derogatory manner), it is used here with no such connotation implied. Users are those people that use the systems and resources for which you are responsible — no more, and no less. As such, they are central to your ability to successfully administer your systems; without understanding your users, how can you understand the system resources they require?
For example, consider a bank teller. A bank teller uses a strictly-defined set of applications and re­quires little in the way of system resources. A software engineer, on the other hand, may use many different applications and always welcomes more system resources (for faster build times). Two en­tirely different users with two entirely different needs.
Make sure you learn as much about your users as you can.
1.6. Know Your Business
Whether you work for a large, multinational corporation or a small community college, you must still understand the nature of the business environment in which you work. This can be boiled down to one question:
What is the purpose of the systems you administer?
The key point here is to understand your systems’ purpose in a more global sense:
Applications that must be run within certain time frames, such as at the end of a month, quarter, or
year
The times during which system maintenance may be done
New technologies that could be used to resolve long-standing business problems
By taking into account your organization’s business, you will find that your day-to-day decisions will be better for your users, and for you.
1.7. Security Cannot be an Afterthought
No matter what you might think about the environment in which your systems are running, you cannot take security for granted. Even standalone systems not connected to the Internet may be at risk (al­though obviously the risks will be different from a system that has connections to the outside world).
Therefore, it is extremely important to consider the security implications of everything you do. The following list illustrates the different kinds of issues you should consider:
The nature of possible threats to each of the systems under your care
The location, type, and value of the data on those systems
Chapter 1. The Philosophy of System Administration 7
The type and frequency of authorized access to the systems
While you are thinking about security, do not make the mistake of assuming that possible intruders will only attack your systems from outside of your company. Many times the perpetrator is someone within the company. So the next time you walk around the office, look at the people around you and ask yourself this question:
What would happen if that person were to attempt to subvert our security?
Note
This does not mean that you should treat your coworkers as if they are criminals. It just means that you should look at the type of work that each person performs and determine what types of security breaches a person in that position could perpetrate, if they were so inclined.
1.7.1. The Risks of Social Engineering
While most system administrators’ first reactions when they think about security is to concentrate on the technological aspects, it is important to maintain perspective. Quite often, security breaches do not have their origins in technology, but in human nature.
People interested in breaching security often use human nature to entirely bypass technological access controls. This is known as social engineering. Here is an example:
The second shift operator receives an outside phone call. The caller claims to be your organization’s CFO (the CFO’s name and background information was obtained from your organization’s website, on the "Management Team" page).
The caller claims to be calling from some place halfway around the world (maybe this part of the story is a complete fabrication, or perhaps your organization’s website has a recent press release that makes mention of the CFO attending a tradeshow).
The caller tells a tale of woe; his laptop was stolen at the airport, and he is with an important cus­tomer and needs access to the corporate intranet to check on the customer’s account status. Would the operator be so kind as to give him the necessary access information?
Do you know what would your operator do? Unless your operator has guidance (in the form of policies and procedures), you very likely do not know for sure.
Like traffic lights, the goal of policies and procedures is to provide unambiguous guidance as to what is and is not appropriate behavior. However, just as with traffic lights, policies and procedures only work if everyone follows them. And there is the crux of the problem — it is unlikely that everyone will adhere to your policies and procedures. In fact, depending on the nature of your organization, it is possible that you do not even have sufficient authority to define policies, much less enforce them. What then?
Unfortunately, there are no easy answers. User education can help; do everything you can to help make your user community aware of security and social engineering. Give lunchtime presentations about security. Post pointers to security-related news articles on your organization’s mailing lists. Make yourself available as a sounding board for users’ questions about things that do not seem quite right.
In short, get the message out to your users any way you can.
8 Chapter 1. The Philosophy of System Administration
1.8. Plan Ahead
System administrators that took all this advice to heart and did their best to follow it would be fan­tastic system administrators — for a day. Eventually, the environment will change, and one day our fantastic administrator would be caught flat-footed. The reason? Our fantastic administrator failed to plan ahead.
Certainly no one can predict the future with 100% accuracy. However, with a bit of awareness it is easy to read the signs of many changes:
An offhand mention of a new project gearing up during that boring weekly staff meeting is a sure
sign that you will likely need to support new users in the near future
Talk of an impending acquisition means that you may end up being responsible for new (and pos-
sibly incompatible) systems in one or more remote locations
Being able to read these signs (and to respond effectively to them) makes life easier for you and your users.
1.9. Expect the Unexpected
While the phrase "expect the unexpected" is trite, it reflects an underlying truth that all system admin­istrators must understand:
There will be times when you are caught off-guard.
After becoming comfortable with this uncomfortable fact of life, what can a concerned system admin­istrator do? The answer lies in flexibility; by performing your job in such a way as to give you (and your users) the most options possible. Take, for example, the issue of disk space. Given that never having sufficient disk space seems to be as much a physical law as the law of gravity, it is reasonable to assume that at some point you will be confronted with a desperate need for additional disk space right now.
What would a system administrator who expects the unexpected do in this case? Perhaps it is possible to keep a few disk drives sitting on the shelf as spares in case of hardware problems2. A spare of this type could be quickly deployed3on a temporary basis to address the short-term need for disk space, giving time to more permanently resolve the issue (by following the standard procedure for procuring additional disk drives, for example).
By trying to anticipate problems before they occur, you will be in a position to respond more quickly and effectively than if you let yourself be surprised.
1.10. Red Hat Enterprise Linux-Specific Information
This section describes information related to the philosophy of system administration that is specific to Red Hat Enterprise Linux.
2. And of course a system administrator that expects the unexpected would naturally use RAID (or related
technologies) to lessen the impact of a critical disk drive failing during production.
3. Again,system administrators thatthink ahead configuretheir systems to make it as easy as possible to quickly
add a new disk drive to the system.
Chapter 1. The Philosophy of System Administration 9
1.10.1. Automation
Automation of frequently-performed tasks under Red Hat Enterprise Linux requires knowledge of several different types of technologies. First are the commands that control the timing of command or script execution. The cron and at commands are most commonly used in these roles.
Incorporating an easy-to-understand yet powerfully flexible time specification system, cron can schedule the execution of commands or scripts for recurring intervals ranging in length from minutes to months. The crontab command is used to manipulate the files controlling the cron daemon that actually schedules each cron job for execution.
The at command (and the closely-related command batch) are more appropriate for scheduling the execution of one-time scripts or commands. These commands implement a rudimentary batch subsystem consisting of multiple queues with varying scheduling priorities. The priorities are known as niceness levels (due to the name of the command — nice). Both at and batch are perfect for tasks that must start at a given time but are not time-critical in terms of finishing.
Next are the various scripting languages. These are the "programming languages" that the average system administrator uses to automate manual operations. There are many scripting languages (and each system administrator tends to have a personal favorite), but the following are currently the most common:
The bash command shell
The perl scripting language
The python scripting language
Over and above the obvious differences between these languages, the biggest difference is in the way in which these languages interact with other utility programs on a Red Hat Enterprise Linux system. Scripts written with the bash shell tend to make more extensive use of the many small utility programs (for example, to perform character string manipulation), while perl scripts perform more of these types of operations using features built into the language itself. A script written using python can fully exploit the language’s object-oriented capabilities, making complex scripts more easily extensible.
This means that, in order to truly master shell scripting, you must be familiar with the many utility programs (such as grep and sed) that are part of Red Hat Enterprise Linux. Learning perl (and
python), on the other hand, tends to be a more "self-contained" process. However, many perl lan-
guage constructs are based on the syntax of various traditional UNIX utility programs, and as such are familiar to those Red Hat Enterprise Linux system administrators with shell scripting experience.
1.10.2. Documentation and Communication
In the areas of documentation and communication, there is little that is specific to Red Hat Enterprise Linux. Since documentation and communication can consist of anything from adding comments to a text-based configuration file to updating a webpage or sending an email, a system administrator using Red Hat Enterprise Linux must have access to text editors, HTML editors, and mail clients.
Here is a small sample of the many text editors available under Red Hat Enterprise Linux:
The gedit text editor
The Emacs text editor
The Vim text editor
The gedit text editor is a strictly graphical application (in other words, it requires an active X Window System environment), while vim and Emacs are primarily text-based in nature.
10 Chapter 1. The Philosophy of System Administration
The subject of the best text editor has sparked debate for nearly as long as computers have existed and will continue to do so. Therefore, the best approach is to try each editor for yourself, and use what works best for you.
For HTML editors, system administrators can use the Composer function of the Mozilla Web browser. Of course, some system administrators prefer to hand-code their HTML, making a regular text editor a perfectly acceptable tool as well.
As far as email is concerned, Red Hat Enterprise Linux includes the Evolution graphical email client, the Mozilla email client (which is also graphical), and mutt, which is text-based. As with text editors, the choice of an email client tends to be a personal one; therefore, the best approach is to try each client for yourself, and use what works best for you.
1.10.3. Security
As stated earlier in this chapter, security cannot be an afterthought, and security under Red Hat Enter­prise Linux is more than skin-deep. Authentication and access controls are deeply-integrated into the operating system and are based on designs gleaned from long experience in the UNIX community.
For authentication, Red Hat Enterprise Linux uses PAM — Pluggable Authentication Modules. PAM makes it possible to fine-tune user authentication via the configuration of shared libraries that all PAM-aware applications use, all without requiring any changes to the applications themselves.
Access control under Red Hat Enterprise Linux uses traditional UNIX-style permissions (read, write, execute) against user, group, and "everyone else" classifications. Like UNIX, Red Hat Enterprise Linux also makes use of setuid and setgid bits to temporarily confer expanded access rights to pro­cesses running a particular program, based on the ownership of the program file. Of course, this makes it critical that any program to be run with setuid or setgid privileges must be carefully audited to ensure that no exploitable vulnerabilities exist.
Red Hat Enterprise Linux also includes support for access control lists. An access control list (ACL) is a construct that allows extremely fine-grained control over what users or groups may access a file or directory. For example, a file’s permissions may restrict all access by anyone other than the file’s owner, yet the file’s ACL can be configured to allow only user bob to write and group finance to read the file.
Another aspect of security is being able to keep track of system activity. Red Hat Enterprise Linux makes extensive use of logging, both at a kernel and an application level. Logging is controlled by the system logging daemon syslogd, which can log system information locally (normally to files in the /var/log/ directory) or to a remote system (which acts as a dedicated log server for multiple computers.)
Intrusion detection sytems (IDS) are powerful tools for any Red Hat Enterprise Linux system ad­ministrator. An IDS makes it possible for system administrators to determine whether unauthorized changes were made to one or more systems. The overall design of the operating system itself includes IDS-like functionality.
Because Red Hat Enterprise Linux is installed using the RPM Package Manager (RPM), it is possible to use RPM to verify whether any changes have been made to the packages comprising the operating system. However, because RPM is primarily a package management tool, its abilities as an IDS are somewhat limited. Even so, it can be a good first step toward monitoring a Red Hat Enterprise Linux system for unauthorized modifications.
1.11. Additional Resources
This section includes various resources that can be used to learn more about the philosophy of system administration and the Red Hat Enterprise Linux-specific subject matter discussed in this chapter.
Chapter 1. The Philosophy of System Administration 11
1.11.1. Installed Documentation
The following resources are installed in the course of a typical Red Hat Enterprise Linux installation and can help you learn more about the subject matter discussed in this chapter.
crontab(1) and crontab(5) man pages — Learn how to schedule commands and scripts for
automatic execution at regular intervals.
at(1) man page — Learn how to schedule commands and scripts for execution at a later time.
bash(1) man page — Learn more about the default shell and shell script writing.
perl(1) man page — Review pointers to the many man pages that make up perl’s online docu-
mentation.
python(1) man page — Learn more about options, files, and environment variables controlling
the Python interpreter.
gedit(1) man page and Help menu entry — Learn how to edit text files with this graphical text
editor.
emacs(1) man page — Learn more about this highly-flexible text editor, including how to run its
online tutorial.
vim(1) man page — Learn how to use this powerful text editor.
Mozilla Help Contents menu entry — Learn how to edit HTML files, read mail, and browse the
Web.
evolution(1) man page and Help menu entry — Learn how to manage your email with this
graphical email client.
mutt(1) man page and files in /usr/share/doc/mutt-<version> — Learn how to manage
your email with this text-based email client.
pam(8) man page and files in /usr/share/doc/pam-<version> — Learn how authentication
takes place under Red Hat Enterprise Linux.
1.11.2. Useful Websites
http://www.kernel.org/pub/linux/libs/pam/ — The Linux-PAM project homepage.
http://www.usenix.org/ — The USENIX homepage. A professional organization dedicated to bring-
ing together computer professionals of all types and fostering improved communication and inno­vation.
http://www.sage.org/ — The System Administrators Guild homepage. A USENIX special technical
group that is a good resource for all system administrators responsible for Linux (or Linux-like) operating systems.
http://www.python.org/ — The Python Language Website. An excellent site for learning more about
Python.
http://www.perl.org/ — The Perl Mongers Website. A good place to start learning about Perl and
connecting with the Perl community.
http://www.rpm.org/ — The RPM Package Manager homepage. The most comprehensive website
for learning about RPM.
12 Chapter 1. The Philosophy of System Administration
1.11.3. Related Books
Most books on system administration do little to cover the philosophy behind the job. However, the following books do have sections that give a bit more depth to the issues that were discussed here:
The Red Hat Enterprise Linux Reference Guide; Red Hat, Inc. — Provides an overview of locations
of key system files, user and group settings, and PAM configuration.
The Red Hat Enterprise Linux Security Guide; Red Hat, Inc. — Contains a comprehensive discus-
sion of many security-related issues for Red Hat Enterprise Linux system administrators.
The Red Hat Enterprise Linux System Administration Guide; Red Hat, Inc. — Includes chapters on
managing users and groups, automating tasks, and managing log files.
Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall —
Provides a good section on the policies and politics side of system administration, including several "what-if" discussions concerning ethics.
Linux System Administration: A User’s Guide by Marcel Gagne; Addison Wesley Professional —
Contains a good chapter on automating various tasks.
Solaris System Management by John Philcox; New Riders Publishing — Although not specifically
written for Red Hat Enterprise Linux (or even Linux in general), and using the term "system man­ager" instead of "system administrator," this book provides a 70-page overview of the many roles that system administrators play in a typical organization.
Chapter 2.
Resource Monitoring
As stated earlier, a great deal of system administration revolves around resources and their efficient use. By balancing various resources against the people and programs that use those resources, you waste less money and make your users as happy as possible. However, this leaves two questions:
What are resources?
And:
How is it possible to know what resources are being used (and to what extent)?
The purpose of this chapter is to enable you to answer these questions by helping you to learn more about resources and how they can be monitored.
2.1. Basic Concepts
Before you can monitor resources, you first have to know what resources there are to monitor. All systems have the following resources available:
CPU power
Bandwidth
Memory
Storage
These resources are covered in more depth in the following chapters. However, for the time being all you need to keep in mind is that these resources have a direct impact on system performance, and therefore, on your users’ productivity and happiness.
At its simplest, resource monitoring is nothing more than obtaining information concerning the uti­lization of one or more system resources.
However, it is rarely this simple. First, one must take into account the resources to be monitored. Then it is necessary to examine each system to be monitored, paying particular attention to each system’s situation.
The systems you monitor fall into one of two categories:
The system is currently experiencing performance problems at least part of the time and you would
like to improve its performance.
The system is currently running well and you would like it to stay that way.
The first category means you should monitor resources from a system performance perspective, while the second category means you should monitor system resources from a capacity planning perspective.
Because each perspective has its own unique requirements, the following sections explore each cate­gory in more depth.
14 Chapter 2. Resource Monitoring
2.2. System Performance Monitoring
As stated above, system performance monitoring is normally done in response to a performance prob­lem. Either the system is running too slowly, or programs (and sometimes even the entire system) fail to run at all. In either case, performance monitoring is normally done as the first and last steps of a three-step process:
1. Monitoring to identify the nature and scope of the resource shortages that are causing the per­formance problems
2. The data produced from monitoring is analyzed and a course of action (normally performance tuning and/or the procurement of additional hardware) is taken to resolve the problem
3. Monitoring to ensure that the performance problem has been resolved
Because of this, performance monitoring tends to be relatively short-lived in duration and more de­tailed in scope.
Note
System performance monitoring is often an iterative process, with these steps being repeated several times to arrive at the best possible system performance. The primary reason for this is that system resources and their utilization tend to be highly interrelated, meaning that often the elimination of one resource bottleneck uncovers another one.
2.3. Monitoring System Capacity
Monitoring system capacity is done as part of an ongoing capacity planning program. Capacity plan­ning uses long-term resource monitoring to determine rates of change in the utilization of system resources. Once these rates of change are known, it becomes possible to conduct more accurate long­term planning regarding the procurement of additional resources.
Monitoring done for capacity planning purposes is different from performance monitoring in two ways:
The monitoring is done on a more-or-less continuous basis
The monitoring is usually not as detailed
The reason for these differences stems from the goals of a capacity planning program. Capacity plan­ning requires a "big picture" viewpoint; short-term or anomalous resource usage is of little concern. Instead, data is collected over a period of time, making it possible to categorize resource utilization in terms of changes in workload. In more narrowly-defined environments, (where only one application is run, for example) it is possible to model the application’s impact on system resources. This can be done with sufficient accuracy to make it possible to determine, for example, the impact of five more customer service representatives running the customer service application during the busiest time of the day.
2.4. What to Monitor?
As stated earlier, the resources present in every system are CPU power, bandwidth, memory, and storage. At first glance, it would seem that monitoring would need only consist of examining these four different things.
Chapter 2. Resource Monitoring 15
Unfortunately, it is not that simple. For example, consider a disk drive. What things might you want to know about its performance?
How much free space is available?
How many I/O operations on average does it perform each second?
How long on average does it take each I/O operation to be completed?
How many of those I/O operations are reads? How many are writes?
What is the average amount of data read/written with each I/O?
There are more ways of studying disk drive performance; these points have only scratched the surface. The main concept to keep in mind is that there are many different types of data for each resource.
The following sections explore the types of utilization information that would be helpful for each of the major resource types.
2.4.1. Monitoring CPU Power
In its most basic form, monitoring CPU power can be no more difficult than determining if CPU utilization ever reaches 100%. If CPU utilization stays below 100%, no matter what the system is doing, there is additional processing power available for more work.
However, it is a rare system that does not reach 100% CPU utilization at least some of the time. At that point it is important to examine more detailed CPU utilization data. By doing so, it becomes possible to start determining where the majority of your processing power is being consumed. Here are some of the more popular CPU utilization statistics:
User Versus System
The percentage of time spent performing user-level processing versus system-level processing can point out whether a system’s load is primarily due to running applications or due to op­erating system overhead. High user-level percentages tend to be good (assuming users are not experiencing unsatisfactory performance), while high system-level percentages tend to point to­ward problems that will require further investigation.
Context Switches
A context switch happens when the CPU stops running one process and starts running another. Because each context switch requires the operating system to take control of the CPU, excessive context switches and high levels of system-level CPU consumption tend to go together.
Interrupts
As the name implies, interrupts are situations where the processing being performed by the CPU is abruptly changed. Interrupts generally occur due to hardware activity (such as an I/O device completing an I/O operation) or due to software (such as software interrupts that control applica­tion processing). Because interrupts must be serviced at a system level, high interrupt rates lead to higher system-level CPU consumption.
Runnable Processes
A process may be in different states. For example, it may be:
Waiting for an I/O operation to complete
Waiting for the memory management subsystem to handle a page fault
In these cases, the process has no need for the CPU.
16 Chapter 2. Resource Monitoring
However, eventually the process state changes, and the process becomes runnable. As the name implies, a runnable process is one that is capable of getting work done as soon as it is scheduled to receive CPU time. However, if more than one process is runnable at any given time, all but one1of the runnable processes must wait for their turn at the CPU. By monitoring the number of runnable processes, it is possible to determine how CPU-bound your system is.
Other performance metrics that reflect an impact on CPU utilization tend to include different services the operating system provides to processes. They may include statistics on memory management, I/O processing, and so on. These statistics also reveal that, when system performance is monitored, there are no boundaries between the different statistics. In other words, CPU utilization statistics may end up pointing to a problem in the I/O subsystem, or memory utilization statistics may reveal an application design flaw.
Therefore, when monitoring system performance, it is not possible to examine any one statistic in complete isolation; only by examining the overall picture it it possible to extract meaningful informa­tion from any performance statistics you gather.
2.4.2. Monitoring Bandwidth
Monitoring bandwidth is more difficult than the other resources described here. The reason for this is due to the fact that performance statistics tend to be device-based, while most of the places where bandwidth is important tend to be the buses that connect devices. In those instances where more than one device shares a common bus, you might see reasonable statistics for each device, but the aggregate load those devices place on the bus would be much greater.
Another challenge to monitoring bandwidth is that there can be circumstances where statistics for the devices themselves may not be available. This is particularly true for system expansion buses and datapaths2. However, even though 100% accurate bandwidth-related statistics may not always be available, there is often enough information to make some level of analysis possible, particularly when related statistics are taken into account.
Some of the more common bandwidth-related statistics are:
Bytes received/sent
Network interface statistics provide an indication of the bandwidth utilization of one of the more visible buses — the network.
Interface counts and rates
These network-related statistics can give indications of excessive collisions, transmit and receive errors, and more. Through the use of these statistics (particularly if the statistics are available for more than one system on your network), it is possible to perform a modicum of network troubleshooting even before the more common network diagnostic tools are used.
Transfers per Second
Normally collected for block I/O devices, such as disk and high-performance tape drives, this statistic is a good way of determining whether a particular device’s bandwidth limit is being reached. Due to their electromechanical nature, disk and tape drives can only perform so many I/O operations every second; their performance degrades rapidly as this limit is reached.
1. Assuming a single-processor computer system.
2. More information on buses, datapaths, and bandwidth is available in
Chapter 3 Bandwidth and Processing Power.
Chapter 2. Resource Monitoring 17
2.4.3. Monitoring Memory
If there is one area where a wealth of performance statistics can be found, it is in the area of moni­toring memory utilization. Due to the inherent complexity of today’s demand-paged virtual memory operating systems, memory utilization statistics are many and varied. It is here that the majority of a system administrator’s work with resource management takes place.
The following statistics represent a cursory overview of commonly-found memory management statis­tics:
Page Ins/Page Outs
These statistics make it possible to gauge the flow of pages from system memory to attached mass storage devices (usually disk drives). High rates for both of these statistics can mean that the system is short of physical memory and is thrashing, or spending more system resources on moving pages into and out of memory than on actually running applications.
Active/Inactive Pages
These statistics show how heavily memory-resident pages are used. A lack of inactive pages can point toward a shortage of physical memory.
Free, Shared, Buffered, and Cached Pages
These statistics provide additional detail over the more simplistic active/inactive page statistics. By using these statistics, it is possible to determine the overall mix of memory utilization.
Swap Ins/Swap Outs
These statistics show the system’s overall swapping behavior. Excessive rates here can point to physical memory shortages.
Successfully monitoring memory utilization requires a good understanding of how demand-paged virtual memory operating systems work. While such a subject alone could take up an entire book, the basic concepts are discussed in Chapter 4 Physical and Virtual Memory. This chapter, along with time spent actually monitoring a system, gives you the the necessary building blocks to learn more about this subject.
2.4.4. Monitoring Storage
Monitoring storage normally takes place at two different levels:
Monitoring for sufficient disk space
Monitoring for storage-related performance problems
The reason for this is that it is possible to have dire problems in one area and no problems whatsoever in the other. For example, it is possible to cause a disk drive to run out of disk space without once causing any kind of performance-related problems. Likewise, it is possible to have a disk drive that has 99% free space, yet is being pushed past its limits in terms of performance.
However, it is more likely that the average system experiences varying degrees of resource shortages in both areas. Because of this, it is also likely that — to some extent — problems in one area impact the other. Most often this type of interaction takes the form of poorer and poorer I/O performance as a disk drive nears 0% free space although, in cases of extreme I/O loads, it might be possible to slow I/O throughput to such a level that applications no longer run properly.
In any case, the following statistics are useful for monitoring storage:
18 Chapter 2. Resource Monitoring
Free Space
Free space is probably the one resource all system administrators watch closely; it would be a rare administrator that never checks on free space (or has some automated way of doing so).
File System-Related Statistics
These statistics (such as number of files/directories, average file size, etc.) provide additional detail over a single free space percentage. As such, these statistics make it possible for system administrators to configure the system to give the best performance, as the I/O load imposed by a file system full of many small files is not the same as that imposed by a file system filled with a single massive file.
Transfers per Second
This statistic is a good way of determining whether a particular device’s bandwidth limitations are being reached.
Reads/Writes per Second
A slightly more detailed breakdown of transfers per second, these statistics allow the system ad­ministrator to more fully understand the nature of the I/O loads a storage device is experiencing. This can be critical, as some storage technologies have widely different performance character­istics for read versus write operations.
2.5. Red Hat Enterprise Linux-Specific Information
Red Hat Enterprise Linux comes with a variety of resource monitoring tools. While there are more than those listed here, these tools are representative in terms of functionality. The tools are:
free
top (and GNOME System Monitor, a more graphically oriented version of top)
vmstat
The Sysstat suite of resource monitoring tools
The OProfile system-wide profiler
Let us examine each one in more detail.
2.5.1. free
The free command displays system memory utilization. Here is an example of its output:
total used free shared buffers cached
Mem: 255508 240268 15240 0 7592 86188
-/+ buffers/cache: 146488 109020 Swap: 530136 26268 503868
The Mem: row displays physical memory utilization, while the Swap: row displays the utilization of the system swap space, and the -/+ buffers/cache: row displays the amount of physical memory currently devoted to system buffers.
Since free by default only displays memory utilization information once, it is only useful for very short-term monitoring, or quickly determining if a memory-related problem is currently in progress. Although free has the ability to repetitively display memory utilization figures via its -s option, the output scrolls, making it difficult to easily detect changes in memory utilization.
Chapter 2. Resource Monitoring 19
Tip
A better solution than using free -s would be to run free using the watch command. For example, to display memory utilization every two seconds (the default display interval for watch), use this command:
watch free
The watch command issues the free command every two seconds, updating by clearing the screen and writing the new output to the same screen location. This makes it much easier to determine how memory utilization changes over time, since watch creates a single updated view with no scrolling. You can control the delay between updates by using the -n option, and can cause any changes between updates to be highlighted by using the -d option, as in the following command:
watch -n 1 -d free
For more information, refer to the watch man page.
The watch command runs until interrupted with [Ctrl]-[C]. The watch command is something to keep in mind; it can come in handy in many situations.
2.5.2. top
While free displays only memory-related information, the top command does a little bit of every­thing. CPU utilization, process statistics, memory utilization — top monitors it all. In addition, unlike the free command, top’s default behavior is to run continuously; there is no need to use the watch command. Here is a sample display:
14:06:32 up 4 days, 21:20, 4 users, load average: 0.00, 0.00, 0.00 77 processes: 76 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle
total 19.6% 0.0% 0.0% 0.0% 0.0% 0.0% 180.2% cpu00 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 100.0% cpu01 19.6% 0.0% 0.0% 0.0% 0.0% 0.0% 80.3%
Mem: 1028548k av, 716604k used, 311944k free, 0k shrd, 131056k buff
324996k actv, 108692k in_d, 13988k in_c
Swap: 1020116k av, 5276k used, 1014840k free 382228k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 17578 root 15 0 13456 13M 9020 S 18.5 1.3 26:35 1 rhn-applet-gu 19154 root 20 0 1176 1176 892 R 0.9 0.1 0:00 1 top
1 root 15 0 168 160 108 S 0.0 0.0 0:09 0 init 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0 6 root 35 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1 9 root 15 0 0 0 0 SW 0.0 0.0 0:07 1 bdflush 7 root 15 0 0 0 0 SW 0.0 0.0 1:19 0 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 0:14 1 kscand 10 root 15 0 0 0 0 SW 0.0 0.0 0:03 1 kupdated 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
The display is divided into two sections. The top section contains information related to overall system status — uptime, load average, process counts, CPU status, and utilization statistics for both memory and swap space. The lower section displays process-level statistics. It is possible to change what is displayed while top is running. For example, top by default displays both idle and non-idle processes. To display only non-idle processes, press [i]; a second press returns to the default display mode.
20 Chapter 2. Resource Monitoring
Warning
Although top appears like a simple display-only program, this is not the case. That is because top uses single character commands to perform various operations. For example, if you are logged in as root, it is possible to change the priority and even kill any process on your system. Therefore, until you have reviewed top’s help screen (type [?] to display it), it is safest to only type [q] (which exits
top).
2.5.2.1. The GNOME System Monitor — A Graphical top
If you are more comfortable with graphical user interfaces, the GNOME System Monitor may be more to your liking. Like top, the GNOME System Monitor displays information related to overall system status, process counts, memory and swap utilization, and process-level statistics.
However, the GNOME System Monitor goes a step further by also including graphical representa­tions of CPU, memory, and swap utilization, along with a tabular disk space utilization listing. An example of the GNOME System Monitor’s Process Listing display appears in Figure 2-1.
Figure 2-1. The GNOME System Monitor Process Listing Display
Additional information can be displayed for a specific process by first clicking on the desired process and then clicking on the More Info button.
To display the CPU, memory, and disk usage statistics, click on the System Monitor tab.
2.5.3. vmstat
For a more concise understanding of system performance, try vmstat. With vmstat, it is possible to get an overview of process, memory, swap, I/O, system, and CPU activity in one line of numbers:
Chapter 2. Resource Monitoring 21
procs memory swap io system cpu
r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 5276 315000 130744 380184 1 1 2 24 14 50 1 1 47 0
The first line divides the fields in six categories, including process, memory, swap, I/O, system, and CPU related statistics. The second line further identifies the contents of each field, making it easy to quickly scan data for specific statistics.
The process-related fields are:
r — The number of runnable processes waiting for access to the CPU
b — The number of processes in an uninterruptible sleep state
The memory-related fields are:
swpd — The amount of virtual memory used
free — The amount of free memory
buff — The amount of memory used for buffers
cache — The amount of memory used as page cache
The swap-related fields are:
si — The amount of memory swapped in from disk
so — The amount of memory swapped out to disk
The I/O-related fields are:
bi — Blocks sent to a block device
bo — Blocks received from a block device
The system-related fields are:
in — The number of interrupts per second
cs — The number of context switches per second
The CPU-related fields are:
us — The percentage of the time the CPU ran user-level code
sy — The percentage of the time the CPU ran system-level code
id — The percentage of the time the CPU was idle
wa — I/O wait
When vmstat is run without any options, only one line is displayed. This line contains averages, calculated from the time the system was last booted.
However, most system administrators do not rely on the data in this line, as the time over which it was collected varies. Instead, most administrators take advantage of vmstat’s ability to repetitively display resource utilization data at set intervals. For example, the command vmstat 1 displays one new line of utilization data every second, while the command vmstat 1 10 displays one new line per second, but only for the next ten seconds.
In the hands of an experienced administrator, vmstat can be used to quickly determine resource utilization and performance issues. But to gain more insight into those issues, a different kind of tool is required — a tool capable of more in-depth data collection and analysis.
22 Chapter 2. Resource Monitoring
2.5.4. The Sysstat Suite of Resource Monitoring Tools
While the previous tools may be helpful for gaining more insight into system performance over very short time frames, they are of little use beyond providing a snapshot of system resource utilization. In addition, there are aspects of system performance that cannot be easily monitored using such simplistic tools.
Therefore, a more sophisticated tool is necessary. Sysstat is such a tool.
Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization informa­tion and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.
The following sections explore each of these tools in more detail.
2.5.4.1. The iostat command
The iostat command at its most basic provides an overview of CPU and disk I/O statistics:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/11/2003
avg-cpu: %user %nice %sys %idle
6.11 2.56 2.15 89.18
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev3-0 1.68 15.69 22.42 31175836 44543290
Below the first line (which contains the system’s kernel version and hostname, along with the current date), iostat displays an overview of the system’s average CPU utilization since the last reboot. The CPU utilization report includes the following percentages:
Percentage of time spent in user mode (running applications, etc.)
Percentage of time spent in user mode (for processes that have altered their scheduling priority
using nice(2))
Percentage of time spent in kernel mode
Percentage of time spent idle
Below the CPU utilization report is the device utilization report. This report contains one line for each active disk device on the system and includes the following information:
Chapter 2. Resource Monitoring 23
The device specification, displayed as dev<major-number>-sequence-number, where
<major-number> is the device’s major number
3
, and <sequence-number> is a sequence
number starting at zero.
The number of transfers (or I/O operations) per second.
The number of 512-byte blocks read per second.
The number of 512-byte blocks written per second.
The total number of 512-byte blocks read.
The total number of 512-byte block written.
This is just a sample of the information that can be obtained using iostat. For more information, refer to the iostat(1) man page.
2.5.4.2. The mpstat command
The mpstat command at first appears no different from the CPU utilization report produced by
iostat:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/11/2003
07:09:26 PM CPU %user %nice %system %idle intr/s 07:09:26 PM all 6.40 5.84 3.29 84.47 542.47
With the exception of an additional column showing the interrupts per second being handled by the CPU, there is no real difference. However, the situation changes if mpstat’s -P ALL option is used:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/11/2003
07:13:03 PM CPU %user %nice %system %idle intr/s 07:13:03 PM all 6.40 5.84 3.29 84.47 542.47 07:13:03 PM 0 6.36 5.80 3.29 84.54 542.47 07:13:03 PM 1 6.43 5.87 3.29 84.40 542.47
On multiprocessor systems, mpstat allows the utilization for each CPU to be displayed individually, making it possible to determine how effectively each CPU is being used.
2.5.4.3. The sadc command
As stated earlier, the sadc command collects system utilization data and writes it to a file for later analysis. By default, the data is written to files in the /var/log/sa/ directory. The files are named
sa<dd>, where <dd> is the current day’s two-digit date.
sadc is normally run by the sa1 script. This script is periodically invoked by cron via the file sysstat, which is located in /etc/cron.d/. The sa1 script invokes sadc for a single one-second
measuring interval. By default, cron runs sa1 every 10 minutes, adding the data collected during each interval to the current /var/log/sa/sa<dd> file.
3. Device major numbers can be found by using ls -l to display the desired device file in /dev/. The major
number appears after the device’s group specification.
24 Chapter 2. Resource Monitoring
2.5.4.4. The sar command
The sar command produces system utilization reports based on the data collected by sadc. As config­ured in Red Hat Enterprise Linux, sar is automatically run to process the files automatically collected by sadc. The report files are written to /var/log/sa/ and are named sar<dd>, where <dd> is the two-digit representations of the previous day’s two-digit date.
sar is normally run by the sa2 script. This script is periodically invoked by cron via the file sysstat, which is located in /etc/cron.d/. By default, cron runs sa2 once a day at 23:53, allow-
ing it to produce a report for the entire day’s data.
2.5.4.4.1. Reading sar Reports
The format of a sar report produced by the default Red Hat Enterprise Linux configuration consists of multiple sections, with each section containing a specific type of data, ordered by the time of day that the data was collected. Since sadc is configured to perform a one-second measurement interval every ten minutes, the default sar reports contain data in ten-minute increments, from 00:00 to 23:504.
Each section of the report starts with a heading describing the data contained in the section. The heading is repeated at regular intervals throughout the section, making it easier to interpret the data while paging through the report. Each section ends with a line containing the average of the data reported in that section.
Here is a sample section sar report, with the data from 00:30 through 23:40 removed to save space:
00:00:01 CPU %user %nice %system %idle 00:10:00 all 6.39 1.96 0.66 90.98 00:20:01 all 1.61 3.16 1.09 94.14 ... 23:50:01 all 44.07 0.02 0.77 55.14 Average: all 5.80 4.99 2.87 86.34
In this section, CPU utilization information is displayed. This is very similar to the data displayed by
iostat.
Other sections may have more than one line’s worth of data per time, as shown by this section gener­ated from CPU utilization data collected on a dual-processor system:
00:00:01 CPU %user %nice %system %idle 00:10:00 0 4.19 1.75 0.70 93.37 00:10:00 1 8.59 2.18 0.63 88.60 00:20:01 0 1.87 3.21 1.14 93.78 00:20:01 1 1.35 3.12 1.04 94.49 ... 23:50:01 0 42.84 0.03 0.80 56.33 23:50:01 1 45.29 0.01 0.74 53.95 Average: 0 6.00 5.01 2.74 86.25 Average: 1 5.61 4.97 2.99 86.43
There are a total of seventeen different sections present in reports generated by the default Red Hat Enterprise Linux sar configuration; some are explored in upcoming chapters. For more information about the data contained in each section, refer to the sar(1) man page.
4. Due to changing system loads, the actual time at which the data was collected may vary by a second or two.
Chapter 2. Resource Monitoring 25
2.5.5. OProfile
The OProfile system-wide profiler is a low-overhead monitoring tool. OProfile makes use of the pro­cessor’s performance monitoring hardware5to determine the nature of performance-related problems.
Performance monitoring hardware is part of the processor itself. It takes the form of a special counter, incremented each time a certain event (such as the processor not being idle or the requested data not being in cache) occurs. Some processors have more than one such counter and allow the selection of different event types for each counter.
The counters can be loaded with an initial value and produce an interrupt whenever the counter over­flows. By loading a counter with different initial values, it is possible to vary the rate at which in­terrupts are produced. In this way it is possible to control the sample rate and, therefore, the level of detail obtained from the data being collected.
At one extreme, setting the counter so that it generates an overflow interrupt with every event provides extremely detailed performance data (but with massive overhead). At the other extreme, setting the counter so that it generates as few interrupts as possible provides only the most general overview of system performance (with practically no overhead). The secret to effective monitoring is the selection of a sample rate sufficiently high to capture the required data, but not so high as to overload the system with performance monitoring overhead.
Warning
You can configure OProfile so that it produces sufficient overhead to render the system unusable. Therefore, you must exercise care when selecting counter values. For this reason, the opcontrol command supports the --list-events option, which displays the event types available for the currently-installed processor, along with suggested minimum counter values for each.
It is important to keep the tradeoff between sample rate and overhead in mind when using OProfile.
2.5.5.1. OProfile Components
Oprofile consists of the following components:
Data collection software
Data analysis software
Administrative interface software
The data collection software consists of the oprofile.o kernel module, and the oprofiled daemon.
The data analysis software includes the following programs:
op_time
Displays the number and relative percentages of samples taken for each executable file
oprofpp
Displays the number and relative percentage of samples taken by either function, individual instruction, or in gprof-style output
5. OProfile can also use a fallback mechanism (known as TIMER_INT) for those system architectures that lack
performance monitoring hardware.
26 Chapter 2. Resource Monitoring
op_to_source
Displays annotated source code and/or assembly listings
op_visualise
Graphically displays collected data
These programs make it possible to display the collected data in a variety of ways.
The administrative interface software controls all aspects of data collection, from specifying which events are to be monitored to starting and stopping the collection itself. This is done using the
opcontrol command.
2.5.5.2. A Sample OProfile Session
This section shows an OProfile monitoring and data analysis session from initial configuration to final data analysis. It is only an introductory overview; for more detailed information, consult the Red Hat Enterprise Linux System Administration Guide.
Use opcontrol to configure the type of data to be collected with the following command:
opcontrol \
--vmlinux=/boot/vmlinux-‘uname -r‘ \
--ctr0-event=CPU_CLK_UNHALTED \
--ctr0-count=6000
The options used here direct opcontrol to:
Direct OProfile to a copy of the currently running kernel (--vmlinux=/boot/vmlinux-‘uname
-r‘)
Specify that the processor’s counter 0 is to be used and that the event to be monitored is the time
when the CPU is executing instructions (--ctr0-event=CPU_CLK_UNHALTED)
Specify that OProfile is to collect samples every 6000th time the specified event occurs
(--ctr0-count=6000)
Next, check that the oprofile kernel module is loaded by using the lsmod command:
Module Size Used by Not tainted oprofile 75616 1 ...
Confirm that the OProfile file system (located in /dev/oprofile/) is mounted with the ls
/dev/oprofile/ command:
0 buffer buffer_watershed cpu_type enable stats 1 buffer_size cpu_buffer_size dump kernel_only
(The exact number of files varies according to processor type.)
At this point, the /root/.oprofile/daemonrc file contains the settings required by the data col­lection software:
CTR_EVENT[0]=CPU_CLK_UNHALTED CTR_COUNT[0]=6000 CTR_KERNEL[0]=1 CTR_USER[0]=1 CTR_UM[0]=0 CTR_EVENT_VAL[0]=121 CTR_EVENT[1]=
Chapter 2. Resource Monitoring 27
CTR_COUNT[1]= CTR_KERNEL[1]=1 CTR_USER[1]=1 CTR_UM[1]=0 CTR_EVENT_VAL[1]= one_enabled=1 SEPARATE_LIB_SAMPLES=0 SEPARATE_KERNEL_SAMPLES=0 VMLINUX=/boot/vmlinux-2.4.21-1.1931.2.349.2.2.entsmp
Next, use opcontrol to actually start data collection with the opcontrol --start command:
Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running.
Verify that the oprofiled daemon is running with the command ps x | grep -i oprofiled:
32019 ? S 0:00 /usr/bin/oprofiled --separate-lib-samples=0 ... 32021 pts/0 S 0:00 grep -i oprofiled
(The actual oprofiled command line displayed by ps is much longer; however, it has been truncated here for formatting purposes.)
The system is now being monitored, with the data collected for all executables present on the system. The data is stored in the /var/lib/oprofile/samples/ directory. The files in this directory follow a somewhat unusual naming convention. Here is an example:
}usr}bin}less#0
The naming convention uses the absolute path of each file containing executable code, with the slash (/) characters replaced by right curly brackets (}), and ending with a pound sign (#) followed by a number (in this case, 0.) Therefore, the file used in this example represents data collected while
/usr/bin/less was running.
Once data has been collected, use one of the analysis tools to display it. One nice feature of OProfile is that it is not necessary to stop data collection before performing a data analysis. However, you must wait for at least one set of samples to be written to disk, or use the opcontrol --dump command to force the samples to disk.
In the following example, op_time is used to display (in reverse order — from highest number of samples to lowest) the samples that have been collected:
3321080 48.8021 0.0000 /boot/vmlinux-2.4.21-1.1931.2.349.2.2.entsmp 761776 11.1940 0.0000 /usr/bin/oprofiled 368933 5.4213 0.0000 /lib/tls/libc-2.3.2.so 293570 4.3139 0.0000 /usr/lib/libgobject-2.0.so.0.200.2 205231 3.0158 0.0000 /usr/lib/libgdk-x11-2.0.so.0.200.2 167575 2.4625 0.0000 /usr/lib/libglib-2.0.so.0.200.2 123095 1.8088 0.0000 /lib/libcrypto.so.0.9.7a 105677 1.5529 0.0000 /usr/X11R6/bin/XFree86 ...
Using less is a good idea when producing a report interactively, as the reports can be hundreds of lines long. The example given here has been truncated for that reason.
The format for this particular report is that one line is produced for each executable file for which samples were taken. Each line follows this format:
<sample-count> <sample-percent> <unused-field> <executable-name>
28 Chapter 2. Resource Monitoring
Where:
<sample-count> represents the number of samples collected
<sample-percent> represents the percentage of all samples collected for this specific executable
<unused-field> is a field that is not used
<executable-name> represents the name of the file containing executable code for which sam-
ples were collected.
This report (produced on a mostly-idle system) shows that nearly half of all samples were taken while the CPU was running code within the kernel itself. Next in line was the OProfile data collection daemon, followed by a variety of libraries and the X Window System server, XFree86. It is worth noting that for the system running this sample session, the counter value of 6000 used represents the minimum value recommended by opcontrol --list-events. This means that — at least for this particular system — OProfile overhead at its highest consumes roughly 11% of the CPU.
2.6. Additional Resources
This section includes various resources that can be used to learn more about resource monitoring and the Red Hat Enterprise Linux-specific subject matter discussed in this chapter.
2.6.1. Installed Documentation
The following resources are installed in the course of a typical Red Hat Enterprise Linux installation.
free(1) man page — Learn how to display free and used memory statistics.
top(1) man page — Learn how to display CPU utilization and process-level statistics.
watch(1) man page — Learn how to periodically execute a user-specified program, displaying
fullscreen output.
GNOME System Monitor Help menu entry — Learn how to graphically display process, CPU,
memory, and disk space utilization statistics.
vmstat(8) man page — Learn how to display a concise overview of process, memory, swap, I/O,
system, and CPU utilization.
iostat(1) man page — Learn how to display CPU and I/O statistics.
mpstat(1) man page — Learn how to display individual CPU statistics on multiprocessor sys-
tems.
sadc(8) man page — Learn how to collects system utilization data.
sa1(8) man page — Learn about a script that runs sadc periodically.
sar(1) man page — Learn how to produce system resource utilization reports.
sa2(8) man page — Learn how to produce daily system resource utilization report files.
nice(1) man page — Learn how to change process scheduling priority.
oprofile(1) man page — Learn how to profile system performance.
op_visualise(1) man page — Learn how to graphically display OProfile data.
Chapter 2. Resource Monitoring 29
2.6.2. Useful Websites
http://people.redhat.com/alikins/system_tuning.html — System Tuning Info for Linux Servers. A
stream-of-consciousness approach to performance tuning and resource monitoring for servers.
http://www.linuxjournal.com/article.php?sid=2396 — Performance Monitoring Tools for Linux.
This Linux Journal page is geared more toward the administrator interested in writing a customized performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
http://oprofile.sourceforge.net/ — OProfile project website. Includes valuable OProfile resources,
including pointers to mailing lists and the #oprofile IRC channel.
2.6.3. Related Books
The following books discuss various issues related to resource monitoring and are good resources for Red Hat Enterprise Linux system administrators:
The Red Hat Enterprise Linux System Administration Guide; Red Hat, Inc. — Includes information
on many of the resource monitoring tools described here, including OProfile.
Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams
— Provides more in-depth overviews of the resource monitoring tools presented here and includes others that might be appropriate for more specific resource monitoring needs.
Red Hat Linux Security and Optimization by Mohammed J. Kabir; Red Hat Press — Approximately
the first 150 pages of this book discuss performance-related issues. This includes chapters dedicated to performance issues specific to network, Web, email, and file servers.
Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall
— Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
Linux System Administration: A User’s Guide by Marcel Gagne; Addison Wesley Professional —
Contains a small chapter on performance monitoring and tuning.
30 Chapter 2. Resource Monitoring
Chapter 3.
Bandwidth and Processing Power
Of the two resources discussed in this chapter, one (bandwidth) is often hard for the new system administrator to understand, while the other (processing power) is usually a much easier concept to grasp.
Additionally, it may seem that these two resources are not that closely related — why group them together?
The reason for addressing both resources together is that these resources are based on the hardware that tie directly into a computer’s ability to move and process data. As such, their relationship is often interrelated.
3.1. Bandwidth
At its most basic, bandwidth is the capacity for data transfer — in other words, how much data can be moved from one point to another in a given amount of time. Having point-to-point data communication implies two things:
A set of electrical conductors used to make low-level communication possible
A protocol to facilitate the efficient and reliable communication of data
There are two types of system components that meet these requirements:
Buses
Datapaths
The following sections explore each in more detail.
3.1.1. Buses
As stated above, buses enable point-to-point communication and use some sort of protocol to ensure that all communication takes place in a controlled manner. However, buses have other distinguishing features:
Standardized electrical characteristics (such as the number of conductors, voltage levels, signaling
speeds, etc.)
Standardized mechanical characteristics (such as the type of connector, card size, physical layout,
etc.)
Standardized protocol
The word "standardized" is important because buses are the primary way in which different system components are connected together.
In many cases, buses allow the interconnection of hardware made by multiple manufacturers; without standardization, this would not be possible. However, even in situations where a bus is proprietary to one manufacturer, standardization is important because it allows that manufacturer to more easily implement different components by using a common interface — the bus itself.
32 Chapter 3. Bandwidth and Processing Power
3.1.1.1. Examples of Buses
No matter where in a computer system you look, there are buses. Here are a few of the more common ones:
Mass storage buses (ATA and SCSI)
Networks
1
(Ethernet and Token Ring)
Memory buses (PC133 and Rambus®)
Expansion buses (PCI, ISA, USB)
3.1.2. Datapaths
Datapaths can be harder to identify but, like buses, they are everywhere. Also like buses, datapaths enable point-to-point communication. However, unlike buses, datapaths:
Use a simpler protocol (if any)
Have little (if any) mechanical standardization
The reason for these differences is that datapaths are normally internal to some system component and are not used to facilitate the ad-hoc interconnection of different components. As such, datapaths are highly optimized for a particular situation, where speed and low cost are preferred over slower and more expensive general-purpose flexibility.
3.1.2.1. Examples of Datapaths
Here are some typical datapaths:
CPU to on-chip cache datapath
Graphics processor to video memory datapath
3.1.3. Potential Bandwidth-Related Problems
There are two ways in which bandwidth-related problems may occur (for either buses or datapaths):
1. The bus or datapath may represent a shared resource. In this situation, high levels of contention
for the bus reduces the effective bandwidth available for all devices on the bus.
A SCSI bus with several highly-active disk drives would be a good example of this. The highly­active disk drives saturate the SCSI bus, leaving little bandwidth available for any other device on the same bus. The end result is that all I/O to any of the devices on this bus is slow, even if each device on the bus is not overly active.
2. The bus or datapath may be a dedicated resource with a fixed number of devices attached to it.
In this case, the electrical characteristics of the bus (and to some extent the nature of the protocol being used) limit the available bandwidth. This is usually more the case with datapaths than with buses. This is one reason why graphics adapters tend to perform more slowly when operating at higher resolutions and/or color depths — for every screen refresh, there is more data that must be passed along the datapath connecting video memory and the graphics processor.
1. Instead of an intra-system bus, networks can be thought of as an inter-system bus.
Chapter 3. Bandwidth and Processing Power 33
3.1.4. Potential Bandwidth-Related Solutions
Fortunately, bandwidth-related problems can be addressed. In fact, there are several approaches you can take:
Spread the load
Reduce the load
Increase the capacity
The following sections explore each approach in more detail.
3.1.4.1. Spread the Load
The first approach is to more evenly distribute the bus activity. In other words, if one bus is overloaded and another is idle, perhaps the situation would be improved by moving some of the load to the idle bus.
As a system administrator, this is the first approach you should consider, as often there are additional buses already present in your system. For example, most PCs include at least two ATA channels (which is just another name for a bus). If you have two ATA disk drives and two ATA channels, why should both drives be on the same channel?
Even if your system configuration does not include additional buses, spreading the load might still be a reasonable approach. The hardware expenditures to do so would be less expensive than replacing an existing bus with higher-capacity hardware.
3.1.4.2. Reduce the Load
At first glance, reducing the load and spreading the load appear to be different sides of the same coin. After all, when one spreads the load, it acts to reduce the load (at least on the overloaded bus), correct?
While this viewpoint is correct, it is not the same as reducing the load globally. The key here is to determine if there is some aspect of the system load that is causing this particular bus to be overloaded. For example, is a network heavily loaded due to activities that are unnecessary? Perhaps a small temporary file is the recipient of heavy read/write I/O. If that temporary file resides on a networked file server, a great deal of network traffic could be eliminated by working with the file locally.
3.1.4.3. Increase the Capacity
The obvious solution to insufficient bandwidth is to increase it somehow. However, this is usually an expensive proposition. Consider, for example, a SCSI controller and its overloaded bus. To increase its bandwidth, the SCSI controller (and likely all devices attached to it) would need to be replaced with faster hardware. If the SCSI controller is a separate card, this would be a relatively straightforward process, but if the SCSI controller is part of the system’s motherboard, it becomes much more difficult to justify the economics of such a change.
3.1.5. In Summary. . .
All system administrators should be aware of bandwidth, and how system configuration and usage impacts available bandwidth. Unfortunately, it is not always apparent what is a bandwidth-related problem and what is not. Sometimes, the problem is not the bus itself, but one of the components attached to the bus.
34 Chapter 3. Bandwidth and Processing Power
For example, consider a SCSI adapter that is connected to a PCI bus. If there are performance problems with SCSI disk I/O, it might be the result of a poorly-performing SCSI adapter, even though the SCSI and PCI buses themselves are nowhere near their bandwidth capabilities.
3.2. Processing Power
Often known as CPU power, CPU cycles, and various other names, processing power is the ability of a computer to manipulate data. Processing power varies with the architecture (and clock speed) of the CPU — usually CPUs with higher clock speeds and those supporting larger word sizes have more processing power than slower CPUs supporting smaller word sizes.
3.2.1. Facts About Processing Power
Here are the two main facts about processing power that you should keep in mind:
Processing power is fixed
Processing power cannot be stored
Processing power is fixed, in that the CPU can only go so fast. For example, if you need to add two numbers together (an operation that takes only one machine instruction on most architectures), a particular CPU can do it at one speed, and one speed only. With few exceptions, it is not even possible to slow the rate at which a CPU processes instructions, much less increase it.
Processing power is also fixed in another way: it is finite. That is, there are limits to the types of CPUs that can be plugged into any given computer. Some systems are capable of supporting a wide range of CPUs of differing speeds, while others may not be upgradeable at all2.
Processing power cannot be stored for later use. In other words, if a CPU can process 100 million instructions in one second, one second of idle time equals 100 million instructions worth of processing that have been wasted.
If we take these facts and examine them from a slightly different perspective, a CPU "produces" a stream of executed instructions at a fixed rate. And if the CPU "produces" executed instructions, that means that something else must "consume" them. The next section defines these consumers.
3.2.2. Consumers of Processing Power
There are two main consumers of processing power:
Applications
The operating system itself
3.2.2.1. Applications
The most obvious consumers of processing power are the applications and programs you want the computer to run for you. From a spreadsheet to a database, applications are the reason you have a computer.
2. This situation leads to what is humorously termed as a forklift upgrade, which means a complete replacement
of a computer.
Chapter 3. Bandwidth and Processing Power 35
A single-CPU system can only do one thing at any given time. Therefore, if your application is run­ning, everything else on the system is not. And the opposite is, of course, true — if something other than your application is running, then your application is doing nothing.
But how is it that many different applications can seemingly run at once under a modern operating system? The answer is that these are multitasking operating systems. In other words, they create the illusion that many different things are going on simultaneously when in fact that is not possible. The trick is to give each process a fraction of a second’s worth of time running on the CPU before giving the CPU to another process for the next fraction of a second. If these context switches happen frequently enough, the illusion of multiple applications running simultaneously is achieved.
Of course, applications do other things than manipulate data using the CPU. They may wait for user input as well as performing I/O to devices such as disk drives and graphics displays. When these events take place, the application no longer needs the CPU. At these times, the CPU can be used for other processes running other applications without slowing the waiting application at all.
In addition, the CPU can be used by another consumer of processing power: the operating system itself.
3.2.2.2. The Operating System
It is difficult to determine how much processing power is consumed by the operating system. The reason for this is that operating systems use a mixture of process-level and system-level code to perform their work. While, for example, it is easy to use a process monitor to determine what the process running a daemon or service is doing, it is not so easy to determine how much processing power is being consumed by system-level I/O-related processing (which is normally done within the context of the process requesting the I/O.)
In general, it is possible to divide this kind of operating system overhead into two types:
Operating system housekeeping
Process-related activities
Operating system housekeeping includes activities such as process scheduling and memory manage­ment, while process-related activities include any processes that support the operating system itself, such as processes handling system-wide event logging or I/O cache flushing.
3.2.3. Improving a CPU Shortage
When there is insufficient processing power available for the work needing to be done, you have two options:
Reducing the load
Increasing the capacity
3.2.3.1. Reducing the Load
Reducing the CPU load is something that can be done with no expenditure of money. The trick is to identify those aspects of the system load under your control that can be cut back. There are three areas to focus on:
Reducing operating system overhead
Reducing application overhead
36 Chapter 3. Bandwidth and Processing Power
Eliminating applications entirely
3.2.3.1.1. Reducing Operating System Overhead
To reduce operating system overhead, you must examine your current system load and determine what aspects of it result in inordinate amounts of overhead. These areas could include:
Reducing the need for frequent process scheduling
Reducing the amount of I/O performed
Do not expect miracles; in a reasonably-well configured system, it is unlikely to notice much of a performance increase by trying to reduce operating system overhead. This is due to the fact that a reasonably-well configured system, by definition, results in a minimal amount of overhead. However, if your system is running with too little RAM for instance, you may be able to reduce overhead by alleviating the RAM shortage.
3.2.3.1.2. Reducing Application Overhead
Reducing application overhead means making sure the application has everything it needs to run well. Some applications exhibit wildly different behaviors under different environments — an application may become highly compute-bound while processing certain types of data, but not others, for exam­ple.
The point to keep in mind here is that you must understand the applications running on your system if you are to enable them to run as efficiently as possible. Often this entails working with your users, and/or your organization’s developers, to help uncover ways in which the applications can be made to run more efficiently.
3.2.3.1.3. Eliminating Applications Entirely
Depending on your organization, this approach might not be available to you, as it often is not a system administrator’s responsibility to dictate which applications will and will not be run. However, if you can identify any applications that are known "CPU hogs", you might be able to influence the powers-that-be to retire them.
Doing this will likely involve more than just yourself. The affected users should certainly be a part of this process; in many cases they may have the knowledge and the political power to make the necessary changes to the application lineup.
Tip
Keep in mind that an application may not need to be eliminated from every system in your organiza­tion. You might be able to move a particularly CPU-hungry application from an overloaded system to another system that is nearly idle.
3.2.3.2. Increasing the Capacity
Of course, if it is not possible to reduce the demand for processing power, you must find ways of increasing the processing power that is available. To do so costs money, but it can be done.
Chapter 3. Bandwidth and Processing Power 37
3.2.3.2.1. Upgrading the CPU
The most straightforward approach is to determine if your system’s CPU can be upgraded. The first step is to determine if the current CPU can be removed. Some systems (primarily laptops) have CPUs that are soldered in place, making an upgrade impossible. The rest, however, have socketed CPUs, making upgrades possible — at least in theory.
Next, you must do some research to determine if a faster CPU exists for your system configuration. For example, if you currently have a 1GHz CPU, and a 2GHz unit of the same type exists, an upgrade might be possible.
Finally, you must determine the maximum clock speed supported by your system. To continue the example above, even if a 2GHz CPU of the proper type exists, a simple CPU swap is not an option if your system only supports processors running at 1GHz or below.
Should you find that you cannot install a faster CPU in your system, your options may be limited to changing motherboards or even the more expensive forklift upgrade mentioned earlier.
However, some system configurations make a slightly different approach possible. Instead of replacing the current CPU, why not just add another one?
3.2.3.2.2. Is Symmetric Multiprocessing Right for You?
Symmetric multiprocessing (also known as SMP) makes it possible for a computer system to have more than one CPU sharing all system resources. This means that, unlike a uniprocessor system, an SMP system may actually have more than one process running at the same time.
At first glance, this seems like any system administrator’s dream. First and foremost, SMP makes it possible to increase a system’s CPU power even if CPUs with faster clock speeds are not available — just by adding another CPU. However, this flexibility comes with some caveats.
The first caveat is that not all systems are capable of SMP operation. Your system must have a moth­erboard designed to support multiple processors. If it does not, a motherboard upgrade (at the least) would be required.
The second caveat is that SMP increases system overhead. This makes sense if you stop to think about it; with more CPUs to schedule work for, the operating system requires more CPU cycles for overhead. Another aspect to this is that with multiple CPUs, there can be more contention for system resources. Because of these factors, upgrading a dual-processor system to a quad-processor unit does not result in a 100% increase in available CPU power. In fact, depending on the actual hardware, the workload, and the processor architecture, it is possible to reach a point where the addition of another processor could actually reduce system performance.
Another point to keep in mind is that SMP does not help workloads consisting of one monolithic application with a single stream of execution. In other words, if a large compute-bound simulation program runs as one process and without threads, it will not run any faster on an SMP system than on a single-processor machine. In fact, it may even run somewhat slower, due to the increased overhead SMP brings. For these reasons, many system administrators feel that when it comes to CPU power, single stream processing power is the way to go. It provides the most CPU power with the fewest restrictions on its use.
While this discussion seems to indicate that SMP is never a good idea, there are circumstances in which it makes sense. For example, environments running multiple highly compute-bound applica­tions are good candidates for SMP. The reason for this is that applications that do nothing but compute for long periods of time keep contention between active processes (and therefore, the operating system overhead) to a minimum, while the processes themselves keep every CPU busy.
One other thing to keep in mind about SMP is that the performance of an SMP system tends to degrade more gracefully as the system load increases. This does make SMP systems popular in server and multi-user environments, as the ever-changing process mix can impact the system-wide load less on a multi-processor machine.
38 Chapter 3. Bandwidth and Processing Power
3.3. Red Hat Enterprise Linux-Specific Information
Monitoring bandwidth and CPU utilization under Red Hat Enterprise Linux entails using the tools discussed in Chapter 2 Resource Monitoring; therefore, if you have not yet read that chapter, you should do so before continuing.
3.3.1. Monitoring Bandwidth on Red Hat Enterprise Linux
As stated in Section 2.4.2 Monitoring Bandwidth, it is difficult to directly monitor bandwidth utiliza­tion. However, by examining device-level statistics, it is possible to roughly gauge whether insufficient bandwidth is an issue on your system.
By using vmstat, it is possible to determine if overall device activity is excessive by examining the
bi and bo fields; in addition, taking note of the si and so fields give you a bit more insight into how
much disk activity is due to swap-related I/O:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 0 248088 158636 480804 0 0 2 6 120 120 10 3 87
In this example, the bi field shows two blocks/second written to block devices (primarily disk drives), while the bo field shows six blocks/second read from block devices. We can determine that none of this activity was due to swapping, as the si and so fields both show a swap-related I/O rate of zero kilobytes/second.
By using iostat, it is possible to gain a bit more insight into disk-related activity:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003
avg-cpu: %user %nice %sys %idle
5.34 4.60 2.83 87.24
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn dev8-0 1.10 6.21 25.08 961342 3881610 dev8-1 0.00 0.00 0.00 16 0
This output shows us that the device with major number 8 (which is /dev/sda, the first SCSI disk) averaged slightly more than one I/O operation per second (the tsp field). Most of the I/O activity for this device were writes (the Blk_wrtn field), with slightly more than 25 blocks written each second (the Blk_wrtn/s field).
If more detail is required, use iostat’s -x option:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003
avg-cpu: %user %nice %sys %idle
5.37 4.54 2.81 87.27
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz /dev/sda 13.57 2.86 0.36 0.77 32.20 29.05 16.10 14.53 54.52 /dev/sda1 0.17 0.00 0.00 0.00 0.34 0.00 0.17 0.00 133.40 /dev/sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 11.56 /dev/sda3 0.31 2.11 0.29 0.62 4.74 21.80 2.37 10.90 29.42 /dev/sda4 0.09 0.75 0.04 0.15 1.06 7.24 0.53 3.62 43.01
Over and above the longer lines containing more fields, the first thing to keep in mind is that this
iostat output is now displaying statistics on a per-partition level. By using df to associate mount
Chapter 3. Bandwidth and Processing Power 39
points with device names, it is possible to use this report to determine if, for example, the partition containing /home/ is experiencing an excessive workload.
Actually, each line output from iostat -x is longer and contains more information than this; here is the remainder of each line (with the device column added for easier reading):
Device: avgqu-sz await svctm %util /dev/sda 0.24 20.86 3.80 0.43 /dev/sda1 0.00 141.18 122.73 0.03 /dev/sda2 0.00 6.00 6.00 0.00 /dev/sda3 0.12 12.84 2.68 0.24 /dev/sda4 0.11 57.47 8.94 0.17
In this example, it is interesting to note that /dev/sda2 is the system swap partition; it is obvious from the many fields reading 0.00 for this partition that swapping is not a problem on this system.
Another interesting point to note is /dev/sda1. The statistics for this partition are unusual; the over­all activity seems low, but why are the average I/O request size (the avgrq-sz field), average wait time (the await field), and the average service time (the svctm field) so much larger than the other partitions? The answer is that this partition contains the /boot/ directory, which is where the kernel and initial ramdisk are stored. When the system boots, the read I/Os (notice that only the rsec/s and
rkB/s fields are non-zero; no writing is done here on a regular basis) used during the boot process are
for large numbers of blocks, resulting in the relatively long wait and service times iostat displays.
It is possible to use sar for a longer-term overview of I/O statistics; for example, sar -b displays a general I/O report:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003
12:00:00 AM tps rtps wtps bread/s bwrtn/s 12:10:00 AM 0.51 0.01 0.50 0.25 14.32 12:20:01 AM 0.48 0.00 0.48 0.00 13.32 ... 06:00:02 PM 1.24 0.00 1.24 0.01 36.23 Average: 1.11 0.31 0.80 68.14 34.79
Here, like iostat’s initial display, the statistics are grouped for all block devices.
Another I/O-related report is produced using sar -d:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (raptor.example.com) 07/21/2003
12:00:00 AM DEV tps sect/s 12:10:00 AM dev8-0 0.51 14.57 12:10:00 AM dev8-1 0.00 0.00 12:20:01 AM dev8-0 0.48 13.32 12:20:01 AM dev8-1 0.00 0.00 ... 06:00:02 PM dev8-0 1.24 36.25 06:00:02 PM dev8-1 0.00 0.00 Average: dev8-0 1.11 102.93 Average: dev8-1 0.00 0.00
This report provides per-device information, but with little detail.
While there are no explicit statistics showing bandwidth utilization for a given bus or datapath, we can at least determine what the devices are doing and use their activity to indirectly determine the bus loading.
40 Chapter 3. Bandwidth and Processing Power
3.3.2. Monitoring CPU Utilization on Red Hat Enterprise Linux
Unlike bandwidth, monitoring CPU utilization is much more straightforward. From a single percent­age of CPU utilization in GNOME System Monitor, to the more in-depth statistics reported by sar, it is possible to accurately determine how much CPU power is being consumed and by what.
Moving beyond GNOME System Monitor, top is the first resource monitoring tool discussed in Chapter 2 Resource Monitoring to provide a more in-depth representation of CPU utilization. Here is a top report from a dual-processor workstation:
9:44pm up 2 days, 2 min, 1 user, load average: 0.14, 0.12, 0.09 90 processes: 82 sleeping, 1 running, 7 zombie, 0 stopped CPU0 states: 0.4% user, 1.1% system, 0.0% nice, 97.4% idle CPU1 states: 0.5% user, 1.3% system, 0.0% nice, 97.1% idle Mem: 1288720K av, 1056260K used, 232460K free, 0K shrd, 145644K buff Swap: 522104K av, 0K used, 522104K free 469764K cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 30997 ed 16 0 1100 1100 840 R 1.7 0.0 0:00 top
1120 root 5 -10 249M 174M 71508 S < 0.9 13.8 254:59 X 1260 ed 15 0 54408 53M 6864 S 0.7 4.2 12:09 gnome-terminal
888 root 15 0 2428 2428 1796 S 0.1 0.1 0:06 sendmail
1264 ed 15 0 16336 15M 9480 S 0.1 1.2 1:58 rhn-applet-gui
1 root 15 0 476 476 424 S 0.0 0.0 0:05 init 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0 3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1 4 root 15 0 0 0 0 SW 0.0 0.0 0:01 keventd 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU0 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU1 7 root 15 0 0 0 0 SW 0.0 0.0 0:05 kswapd 8 root 15 0 0 0 0 SW 0.0 0.0 0:00 bdflush 9 root 15 0 0 0 0 SW 0.0 0.0 0:01 kupdated
10 root 25 0 0 0 0 SW 0.0 0.0 0:00 mdrecoveryd
The first CPU-related information is present on the very first line: the load average. The load average is a number corresponding to the average number of runnable processes on the system. The load average is often listed as three sets of numbers (as top does), which represent the load average for the past 1, 5, and 15 minutes, indicating that the system in this example was not very busy.
The next line, although not strictly related to CPU utilization, has an indirect relationship, in that it shows the number of runnable processes (here, only one -- remember this number, as it means something special in this example). The number of runnable processes is a good indicator of how CPU-bound a system might be.
Next are two lines displaying the current utilization for each of the two CPUs in the system. The utilization statistics show whether the CPU cycles were expended for user-level or system-level pro­cessing; also included is a statistic showing how much CPU time was expended by processes with altered scheduling priorities. Finally, there is an idle time statistic.
Moving down into the process-related section of the display, we find that the process using the most CPU power is top itself; in other words, the one runnable process on this otherwise-idle system was
top taking a "picture" of itself.
Tip
It is important to remember that the very act of running a system monitor affects the resource utiliza­tion statistics you receive. All software-based monitors do this to some extent.
Chapter 3. Bandwidth and Processing Power 41
To gain more detailed knowledge regarding CPU utilization, we must change tools. If we examine output from vmstat, we obtain a slightly different understanding of our example system:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 0 233276 146636 469808 0 0 7 7 14 27 10 3 87 0 0 0 0 233276 146636 469808 0 0 0 0 523 138 3 0 96 0 0 0 0 233276 146636 469808 0 0 0 0 557 385 2 1 97 0 0 0 0 233276 146636 469808 0 0 0 0 544 343 2 0 97 0 0 0 0 233276 146636 469808 0 0 0 0 517 89 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 32 518 102 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 91 2 1 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 72 2 0 98 0 0 0 0 233276 146636 469808 0 0 0 0 516 88 2 0 97 0 0 0 0 233276 146636 469808 0 0 0 0 516 81 2 0 97
Here we have used the command vmstat 1 10 to sample the system every second for ten times. At first, the CPU-related statistics (the us, sy, and id fields) seem similar to what top displayed, and maybe even appear a bit less detailed. However, unlike top, we can also gain a bit of insight into how the CPU is being used.
If we examine at the system fields, we notice that the CPU is handling about 500 interrupts per second on average and is switching between processes anywhere from 80 to nearly 400 times a second. If you think this seems like a lot of activity, think again, because the user-level processing (the us field) is only averaging 2%, while system-level processing (the sy field) is usually under 1%. Again, this is an idle system.
Reviewing the tools Sysstat offers, we find that iostat and mpstat provide little additional informa­tion over what we have already experienced with top and vmstat. However, sar produces a number of reports that can come in handy when monitoring CPU utilization.
The first report is obtained by the command sar -q, which displays the run queue length, total number of processes, and the load averages for the past one and five minutes. Here is a sample:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003
12:00:01 AM runq-sz plist-sz ldavg-1 ldavg-5 12:10:00 AM 3 122 0.07 0.28 12:20:01 AM 5 123 0.00 0.03 ... 09:50:00 AM 5 124 0.67 0.65 Average: 4 123 0.26 0.26
In this example, the system is always busy (given that more than one process is runnable at any given time), but is not overly loaded (due to the fact that this particular system has more than one processor).
The next CPU-related sar report is produced by the command sar -u:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003
12:00:01 AM CPU %user %nice %system %idle 12:10:00 AM all 3.69 20.10 1.06 75.15 12:20:01 AM all 1.73 0.22 0.80 97.25 ... 10:00:00 AM all 35.17 0.83 1.06 62.93 Average: all 7.47 4.85 3.87 83.81
42 Chapter 3. Bandwidth and Processing Power
The statistics contained in this report are no different from those produced by many of the other tools. The biggest benefit here is that sar makes the data available on an ongoing basis and is therefore more useful for obtaining long-term averages, or for the production of CPU utilization graphs.
On multiprocessor systems, the sar -U command can produce statistics for an individual processor or for all processors. Here is an example of output from sar -U ALL:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003
12:00:01 AM CPU %user %nice %system %idle 12:10:00 AM 0 3.46 21.47 1.09 73.98 12:10:00 AM 1 3.91 18.73 1.03 76.33 12:20:01 AM 0 1.63 0.25 0.78 97.34 12:20:01 AM 1 1.82 0.20 0.81 97.17 ... 10:00:00 AM 0 39.12 0.75 1.04 59.09 10:00:00 AM 1 31.22 0.92 1.09 66.77 Average: 0 7.61 4.91 3.86 83.61 Average: 1 7.33 4.78 3.88 84.02
The sar -w command reports on the number of context switches per second, making it possible to gain additional insight in where CPU cycles are being spent:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003
12:00:01 AM cswch/s 12:10:00 AM 537.97 12:20:01 AM 339.43 ... 10:10:00 AM 319.42 Average: 1158.25
It is also possible to produce two different sar reports on interrupt activity. The first, (produced using the sar -I SUM command) displays a single "interrupts per second" statistic:
Linux 2.4.21-1.1931.2.349.2.2.entsmp (falcon.example.com) 07/21/2003
12:00:01 AM INTR intr/s 12:10:00 AM sum 539.15 12:20:01 AM sum 539.49 ... 10:40:01 AM sum 539.10 Average: sum 541.00
By using the command sar -I PROC, it is possible to break down interrupt activity by processor (on multiprocessor systems) and by interrupt level (from 0 to 15):
Linux 2.4.21-1.1931.2.349.2.2.entsmp (pigdog.example.com) 07/21/2003
12:00:00 AM CPU i000/s i001/s i002/s i008/s i009/s i011/s i012/s 12:10:01 AM 0 512.01 0.00 0.00 0.00 3.44 0.00 0.00
12:10:01 AM CPU i000/s i001/s i002/s i008/s i009/s i011/s i012/s 12:20:01 AM 0 512.00 0.00 0.00 0.00 3.73 0.00 0.00 ... 10:30:01 AM CPU i000/s i001/s i002/s i003/s i008/s i009/s i010/s 10:40:02 AM 0 512.00 1.67 0.00 0.00 0.00 15.08 0.00 Average: 0 512.00 0.42 0.00 N/A 0.00 6.03 N/A
Chapter 3. Bandwidth and Processing Power 43
This report (which has been truncated horizontally to fit on the page) includes one column for each interrupt level (for example, the i002/s field illustrating the rate for interrupt level 2). If this were a multiprocessor system, there would be one line per sample period for each CPU.
Another important point to note about this report is that sar adds or removes specific interrupt fields if no data is collected for that field. The example report above provides an example of this, the end of the report includes interrupt levels (3 and 10) that were not present at the start of the sampling period.
Note
There are two other interrupt-related sar reports — sar -I ALL and sar -I XALL. However, the default configuration for the sadc data collection utility does not collect the information necessary for these reports. This can be changed by editing the file /etc/cron.d/sysstat, and changing this line:
*/10 * * * * root /usr/lib/sa/sa1 1 1
to this:
*/10 * * * * root /usr/lib/sa/sa1 -I 1 1
Keep in mind this change does cause additional information to be collected by sadc, and results in larger data file sizes. Therefore, make sure your system configuration can support the additional space consumption.
3.4. Additional Resources
This section includes various resources that can be used to learn more about the Red Hat Enterprise Linux-specific subject matter discussed in this chapter.
3.4.1. Installed Documentation
The following resources are installed in the course of a typical Red Hat Enterprise Linux installation and can help you learn more about the subject matter discussed in this chapter.
vmstat(8) man page — Learn how to display a concise overview of process, memory, swap, I/O,
system, and CPU utilization.
iostat(1) man page — Learn how to display CPU and I/O statistics.
sar(1) man page — Learn how to produce system resource utilization reports.
sadc(8) man page — Learn how to collect system utilization data.
sa1(8) man page — Learn about a script that runs sadc periodically.
top(1) man page — Learn how to display CPU utilization and process-level statistics.
44 Chapter 3. Bandwidth and Processing Power
3.4.2. Useful Websites
http://people.redhat.com/alikins/system_tuning.html — System Tuning Info for Linux Servers. A
stream-of-consciousness approach to performance tuning and resource monitoring for servers.
http://www.linuxjournal.com/article.php?sid=2396 — Performance Monitoring Tools for Linux.
This Linux Journal page is geared more toward the administrator interested in writing a customized performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
3.4.3. Related Books
The following books discuss various issues related to resource monitoring, and are good resources for Red Hat Enterprise Linux system administrators:
The Red Hat Enterprise Linux System Administration Guide; Red Hat, Inc. — Includes a chapter
on many of the resource monitoring tools described here.
Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams
— Provides more in-depth overviews of the resource monitoring tools presented here, and includes others that might be appropriate for more specific resource monitoring needs.
Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall
— Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
Linux System Administration: A User’s Guide by Marcel Gagne; Addison Wesley Professional —
Contains a small chapter on performance monitoring and tuning.
Chapter 4.
Physical and Virtual Memory
All present-day, general-purpose computers are of the type known as stored program computers. As the name implies, stored program computers load instructions (the building blocks of programs) into some type of internal storage, where they subsequently execute those instructions.
Stored program computers also use the same storage for data. This is in contrast to computers that use their hardware configuration to control their operation (such as older plugboard-based computers).
The place where programs were stored on the first stored program computers went by a variety of names and used a variety of different technologies, from spots on a cathode ray tube, to pressure pulses in columns of mercury. Fortunately, present-day computers use technologies with greater stor­age capacity and much smaller size than ever before.
4.1. Storage Access Patterns
One thing to keep in mind throughout this chapter is that computers tend to access storage in certain ways. In fact, most storage access tends to exhibit one (or both) of the following attributes:
Access tends to be sequential
Access tends to be localized
Sequential access means that, if address N is accessed by the CPU, it is highly likely that address N +1 will be accessed next. This makes sense, as most programs consist of large sections of instructions that execute — in order — one after the other.
Localized access means that, if address X is accessed, it is likely that other addresses surrounding X will also be accessed in the future.
These attributes are crucial, because it allows smaller, faster storage to effectively buffer larger, slower storage. This is the basis for implementing virtual memory. But before we can discuss virtual memory, we must examine the various storage technologies currently in use.
4.2. The Storage Spectrum
Present-day computers actually use a variety of storage technologies. Each technology is geared to­ward a specific function, with speeds and capacities to match.
These technologies are:
CPU registers
Cache memory
RAM
Hard drives
Off-line backup storage (tape, optical disk, etc.)
In terms of capabilities and cost, these technologies form a spectrum. For example, CPU registers are:
Very fast (access times of a few nanoseconds)
Low capacity (usually less than 200 bytes)
46 Chapter 4. Physical and Virtual Memory
Very limited expansion capabilities (a change in CPU architecture would be required)
Expensive (more than one dollar/byte)
However, at the other end of the spectrum, off-line backup storage is:
Very slow (access times may be measured in days, if the backup media must be shipped long
distances)
Very high capacity (10s - 100s of gigabytes)
Essentially unlimited expansion capabilities (limited only by the floorspace needed to house the
backup media)
Very inexpensive (fractional cents/byte)
By using different technologies with different capabilities, it is possible to fine-tune system design for maximum performance at the lowest possible cost. The following sections explore each technology in the storage spectrum.
4.2.1. CPU Registers
Every present-day CPU design includes registers for a variety of purposes, from storing the address of the currently-executed instruction to more general-purpose data storage and manipulation. CPU registers run at the same speed as the rest of the CPU; otherwise, they would be a serious bottleneck to overall system performance. The reason for this is that nearly all operations performed by the CPU involve the registers in one way or another.
The number of CPU registers (and their uses) are strictly dependent on the architectural design of the CPU itself. There is no way to change the number of CPU registers, short of migrating to a CPU with a different architecture. For these reasons, the number of CPU registers can be considered a constant, as they are changeable only with great pain and expense.
4.2.2. Cache Memory
The purpose of cache memory is to act as a buffer between the very limited, very high-speed CPU registers and the relatively slower and much larger main system memory — usually referred to as RAM1. Cache memory has an operating speed similar to the CPU itself so, when the CPU accesses data in cache, the CPU is not kept waiting for the data.
Cache memory is configured such that, whenever data is to be read from RAM, the system hardware first checks to determine if the desired data is in cache. If the data is in cache, it is quickly retrieved, and used by the CPU. However, if the data is not in cache, the data is read from RAM and, while being transferred to the CPU, is also placed in cache (in case it is needed again later). From the perspective of the CPU, all this is done transparently, so that the only difference between accessing data in cache and accessing data in RAM is the amount of time it takes for the data to be returned.
In terms of storage capacity, cache is much smaller than RAM. Therefore, not every byte in RAM can have its own unique location in cache. As such, it is necessary to split cache up into sections that can be used to cache different areas of RAM, and to have a mechanism that allows each area of cache to cache different areas of RAM at different times. Even with the difference in size between cache and RAM, given the sequential and localized nature of storage access, a small amount of cache can effectively speed access to a large amount of RAM.
1. While "RAM" is an acronym for "Random Access Memory," and a term that could easily apply to any storage
technology allowing the non-sequential access of stored data, when system administrators talk about RAM they
invariably mean main system memory.
Chapter 4. Physical and Virtual Memory 47
When writing data from the CPU, things get a bit more complicated. There are two different ap­proaches that can be used. In both cases, the data is first written to cache. However, since the purpose of cache is to function as a very fast copy of the contents of selected portions of RAM, any time a piece of data changes its value, that new value must be written to both cache memory and RAM. Otherwise, the data in cache and the data in RAM would no longer match.
The two approaches differ in how this is done. One approach, known as write-through caching, imme­diately writes the modified data to RAM. Write-back caching, however, delays the writing of modified data back to RAM. The reason for doing this is to reduce the number of times a frequently-modified piece of data must be written back to RAM.
Write-through cache is a bit simpler to implement; for this reason it is most common. Write-back cache is a bit trickier to implement; in addition to storing the actual data, it is necessary to maintain some sort of mechanism capable of flagging the cached data as clean (the data in cache is the same as the data in RAM), or dirty (the data in cache has been modified, meaning that the data in RAM is no longer current). It is also necessary to implement a way of periodically flushing dirty cache entries back to RAM.
4.2.2.1. Cache Levels
Cache subsystems in present-day computer designs may be multi-level; that is, there might be more than one set of cache between the CPU and main memory. The cache levels are often numbered, with lower numbers being closer to the CPU. Many systems have two cache levels:
L1 cache is often located directly on the CPU chip itself and runs at the same speed as the CPU
L2 cache is often part of the CPU module, runs at CPU speeds (or nearly so), and is usually a bit
larger and slower than L1 cache
Some systems (normally high-performance servers) also have L3 cache, which is usually part of the system motherboard. As might be expected, L3 cache would be larger (and most likely slower) than L2 cache.
In either case, the goal of all cache subsystems — whether single- or multi-level — is to reduce the average access time to the RAM.
4.2.3. Main Memory — RAM
RAM makes up the bulk of electronic storage on present-day computers. It is used as storage for both data and programs while those data and programs are in use. The speed of RAM in most systems today lies between the speed of cache memory and that of hard drives, and is much closer to the former than the latter.
The basic operation of RAM is actually quite straightforward. At the lowest level, there are the RAM chips — integrated circuits that do the actual "remembering." These chips have four types of connec­tions to the outside world:
Power connections (to operate the circuitry within the chip)
Data connections (to enable the transfer of data into or out of the chip)
Read/Write connections (to control whether data is to be stored into or retrieved from the chip)
Address connections (to determine where in the chip the data should be read/written)
Here are the steps required to store data in RAM:
1. The data to be stored is presented to the data connections.
48 Chapter 4. Physical and Virtual Memory
2. The address at which the data is to be stored is presented to the address connections.
3. The read/write connection is set to write mode.
Retrieving data is just as straightforward:
1. The address of the desired data is presented to the address connections.
2. The read/write connection is set to read mode.
3. The desired data is read from the data connections.
While these steps seem simple, they take place at very high speeds, with the time spent on each step measured in nanoseconds.
Nearly all RAM chips created today are sold as modules. Each module consists of a number of in­dividual RAM chips attached to a small circuit board. The mechanical and electrical layout of the module adheres to various industry standards, making it possible to purchase memory from a variety of vendors.
Note
The main benefit to a system using industry-standard RAM modules is that it tends to keep the cost of RAM low, due to the ability to purchase the modules from more than just the system manufacturer.
Although most computers use industry-standard RAM modules, there are exceptions. Most notable are laptops (and even here some standardization is starting to take hold) and high-end servers. However, even in these instances, it is likely that third-party RAM modules are available, assuming the system is relatively popular and is not a completely new design.
4.2.4. Hard Drives
All the technologies discussed so far are volatile in nature. In other words, data contained in volatile storage is lost when the power is turned off.
Hard drives, on the other hand, are non-volatile — the data they contain remains there, even after the power is removed. Because of this, hard drives occupy a special place in the storage spectrum. Their non-volatile nature makes them ideal for storing programs and data for longer-term use. Another unique aspect to hard drives is that, unlike RAM and cache memory, it is not possible to execute programs directly when they are stored on hard drives; instead, they must first be read into RAM.
Also different from cache and RAM is the speed of data storage and retrieval; hard drives are at least an order of magnitude slower than the all-electronic technologies used for cache and RAM. The difference in speed is due mainly to their electromechanical nature. There are four distinct phases taking place during each data transfer to or from a hard drive. The following list illustrates these phases, along with the time it would take a typical high-performance drive, on average, to complete each:
Access arm movement (5.5 milliseconds)
Disk rotation (.1 milliseconds)
Heads reading/writing data (.00014 milliseconds)
Data transfer to/from the drive’s electronics (.003 Milliseconds)
Of these, only the last phase is not dependent on any mechanical operation.
Chapter 4. Physical and Virtual Memory 49
Note
Although there is much more to learn about hard drives, disk storage technologies are discussed in more depth in Chapter 5 Managing Storage. For the time being, it is only necessary to keep in mind the huge speed difference between RAM and disk-based technologies and that their storage capacity usually exceeds that of RAM by a factor of at least 10, and often by 100 or more.
4.2.5. Off-Line Backup Storage
Off-line backup storage takes a step beyond hard drive storage in terms of capacity (higher) and speed (slower). Here, capacities are effectively limited only by your ability to procure and store the removable media.
The actual technologies used in these devices varies widely. Here are the more popular types:
Magnetic tape
Optical disk
Of course, having removable media means that access times become even longer, particularly when the desired data is on media not currently loaded in the storage device. This situation is alleviated somewhat by the use of robotic devices capable of automatically loading and unloading media, but the media storage capacities of such devices are still finite. Even in the best of cases, access times are measured in seconds, which is much longer than the relatively slow multi-millisecond access times typical for a high-performance hard drive.
Now that we have briefly studied the various storage technologies in use today, let us explore basic virtual memory concepts.
4.3. Basic Virtual Memory Concepts
While the technology behind the construction of the various modern-day storage technologies is truly impressive, the average system administrator does not need to be aware of the details. In fact, there is really only one fact that system administrators should always keep in mind:
There is never enough RAM.
While this truism might at first seem humorous, many operating system designers have spent a great deal of time trying to reduce the impact of this very real shortage. They have done so by implementing virtual memory — a way of combining RAM with slower storage to give a system the appearance of having more RAM than is actually installed.
4.3.1. Virtual Memory in Simple Terms
Let us start with a hypothetical application. The machine code making up this application is 10000 bytes in size. It also requires another 5000 bytes for data storage and I/O buffers. This means that, to run this application, there must be 15000 bytes of RAM available; even one byte less, and the application would not be able to run.
This 15000 byte requirement is known as the application’s address space. It is the number of unique addresses needed to hold both the application and its data. In the first computers, the amount of available RAM had to be greater than the address space of the largest application to be run; otherwise, the application would fail with an "out of memory" error.
50 Chapter 4. Physical and Virtual Memory
A later approach known as overlaying attempted to alleviate the problem by allowing programmers to dictate which parts of their application needed to be memory-resident at any given time. In this way, code only required once for initialization purposes could be written over (overlayed) with code that would be used later. While overlays did ease memory shortages, it was a very complex and error­prone process. Overlays also failed to address the issue of system-wide memory shortages at runtime. In other words, an overlayed program may require less memory to run than a program that is not overlayed, but if the system still does not have sufficient memory for the overlayed program, the end result is the same — an out of memory error.
With virtual memory, the concept of an application’s address space takes on a different meaning. Rather than concentrating on how much memory an application needs to run, a virtual memory op­erating system continually attempts to find the answer to the question, "how little memory does an application need to run?"
While it at first appears that our hypothetical application requires the full 15000 bytes to run, think back to our discussion in Section 4.1 Storage Access Patterns — memory access tends to be sequential and localized. Because of this, the amount of memory required to execute the application at any given time is less than 15000 bytes — usually a lot less. Consider the types of memory accesses required to execute a single machine instruction:
The instruction is read from memory.
The data required by the instruction is read from memory.
After the instruction completes, the results of the instruction are written back to memory.
The actual number of bytes necessary for each memory access varies according to the CPU’s archi­tecture, the actual instruction, and the data type. However, even if one instruction required 100 bytes of memory for each type of memory access, the 300 bytes required is still much less than the appli­cation’s entire 15000-byte address space. If a way could be found to keep track of an application’s memory requirements as the application runs, it would be possible to keep the application running while using less memory than its address space would otherwise dictate.
But that leaves one question:
If only part of the application is in memory at any given time, where is the rest of it?
4.3.2. Backing Store — the Central Tenet of Virtual Memory
The short answer to this question is that the rest of the application remains on disk. In other words, disk acts as the backing store for RAM; a slower, larger storage medium acting as a "backup" for a much faster, smaller storage medium. This might at first seem to be a very large performance problem in the making — after all, disk drives are so much slower than RAM.
While this is true, it is possible to take advantage of the sequential and localized access behavior of applications and eliminate most of the performance implications of using disk drives as backing store for RAM. This is done by structuring the virtual memory subsystem so that it attempts to ensure that those parts of the application currently needed — or likely to be needed in the near future — are kept in RAM only for as long as they are actually needed.
In many respects this is similar to the relationship between cache and RAM: making the a small amount of fast storage combined with a large amount of slow storage act just like a large amount of fast storage.
With this in mind, let us explore the process in more detail.
Chapter 4. Physical and Virtual Memory 51
4.4. Virtual Memory: The Details
First, we must introduce a new concept: virtual address space. Virtual address space is the maximum amount of address space available to an application. The virtual address space varies according to the system’s architecture and operating system. Virtual address space depends on the architecture because it is the architecture that defines how many bits are available for addressing purposes. Virtual address space also depends on the operating system because the manner in which the operating system was implemented may introduce additional limits over and above those imposed by the architecture.
The word "virtual" in virtual address space means this is the total number of uniquely-addressable memory locations available to an application, but not the amount of physical memory either installed in the system, or dedicated to the application at any given time.
In the case of our example application, its virtual address space is 15000 bytes.
To implement virtual memory, it is necessary for the computer system to have special memory man­agement hardware. This hardware is often known as an MMU (Memory Management Unit). Without an MMU, when the CPU accesses RAM, the actual RAM locations never change — memory address 123 is always the same physical location within RAM.
However, with an MMU, memory addresses go through a translation step prior to each memory access. This means that memory address 123 might be directed to physical address 82043 at one time, and physical address 20468 another time. As it turns out, the overhead of individually tracking the virtual to physical translations for billions of bytes of memory would be too great. Instead, the MMU divides RAM into pages — contiguous sections of memory of a set size that are handled by the MMU as single entities.
Keeping track of these pages and their address translations might sound like an unnecessary and confusing additional step. However, it is crucial to implementing virtual memory. For that reason, consider the following point.
Taking our hypothetical application with the 15000 byte virtual address space, assume that the applica­tion’s first instruction accesses data stored at address 12374. However, also assume that our computer only has 12288 bytes of physical RAM. What happens when the CPU attempts to access address 12374?
What happens is known as a page fault.
4.4.1. Page Faults
A page fault is the sequence of events occurring when a program attempts to access data (or code) that is in its address space, but is not currently located in the system’s RAM. The operating system must handle page faults by somehow making the accessed data memory resident, allowing the program to continue operation as if the page fault had never occurred.
In the case of our hypothetical application, the CPU first presents the desired address (12374) to the MMU. However, the MMU has no translation for this address. So, it interrupts the CPU and causes software, known as a page fault handler, to be executed. The page fault handler then determines what must be done to resolve this page fault. It can:
Find where the desired page resides on disk and read it in (this is normally the case if the page fault
is for a page of code)
Determine that the desired page is already in RAM (but not allocated to the current process) and
reconfigure the MMU to point to it
Point to a special page containing only zeros, and allocate a new page for the process only if the
process ever attempts to write to the special page (this is called a copy on write page, and is often used for pages containing zero-initialized data)
Get the desired page from somewhere else (which is discussed in more detail later)
52 Chapter 4. Physical and Virtual Memory
While the first three actions are relatively straightforward, the last one is not. For that, we need to cover some additional topics.
4.4.2. The Working Set
The group of physical memory pages currently dedicated to a specific process is known as the working set for that process. The number of pages in the working set can grow and shrink, depending on the
overall availability of pages on a system-wide basis.
The working set grows as a process page faults. The working set shrinks as fewer and fewer free pages exist. To keep from running out of memory completely, pages must be removed from process’s work­ing sets and turned into free pages, available for later use. The operating system shrinks processes’ working sets by:
Writing modified pages to a dedicated area on a mass storage device (usually known as swapping
or paging space)
Marking unmodified pages as being free (there is no need to write these pages out to disk as they
have not changed)
To determine appropriate working sets for all processes, the operating system must track usage in­formation for all pages. In this way, the operating system determines which pages are actively being used (and must remain memory resident) and which pages are not (and therefore, can be removed from memory.) In most cases, some sort of least-recently used algorithm determines which pages are eligible for removal from process working sets.
4.4.3. Swapping
While swapping (writing modified pages out to the system swap space) is a normal part of a sys­tem’s operation, it is possible to experience too much swapping. The reason to be wary of excessive swapping is that the following situation can easily occur, over and over again:
Pages from a process are swapped
The process becomes runnable and attempts to access a swapped page
The page is faulted back into memory (most likely forcing some other processes’ pages to be
swapped out)
A short time later, the page is swapped out again
If this sequence of events is widespread, it is known as thrashing and is indicative of insufficient RAM for the present workload. Thrashing is extremely detrimental to system performance, as the CPU and I/O loads that can be generated in such a situation quickly outweigh the load imposed by a system’s real work. In extreme cases, the system may actually do no useful work, spending all its resources moving pages to and from memory.
4.5. Virtual Memory Performance Implications
While virtual memory makes it possible for computers to more easily handle larger and more complex applications, as with any powerful tool, it comes at a price. The price in this case is one of performance — a virtual memory operating system has a lot more to do than an operating system incapable of supporting virtual memory. This means that performance is never as good with virtual memory as it is when the same application is 100% memory-resident.
Chapter 4. Physical and Virtual Memory 53
However, this is no reason to throw up one’s hands and give up. The benefits of virtual memory are too great to do that. And, with a bit of effort, good performance is possible. The thing that must be done is to examine those system resources impacted by heavy use of the virtual memory subsystem.
4.5.1. Worst Case Performance Scenario
For a moment, take what you have read in this chapter and consider what system resources are used by extremely heavy page fault and swapping activity:
RAM — It stands to reason that available RAM is low (otherwise there would be no need to page
fault or swap).
Disk — While disk space might not be impacted, I/O bandwidth (due to heavy paging and swap-
ping) would be.
CPU — The CPU is expending cycles doing the processing required to support memory manage-
ment and setting up the necessary I/O operations for paging and swapping.
The interrelated nature of these loads makes it easy to understand how resource shortages can lead to severe performance problems.
All it takes is a system with too little RAM, heavy page fault activity, and a system running near its limit in terms of CPU or disk I/O. At this point, the system is thrashing, with poor performance the inevitable result.
4.5.2. Best Case Performance Scenario
At best, the overhead from virtual memory support presents a minimal additional load to a well­configured system:
RAM — Sufficient RAM for all working sets with enough left over to handle any page faults
2
Disk — Because of the limited page fault activity, disk I/O bandwidth would be minimally impacted
CPU — The majority of CPU cycles are dedicated to actually running applications, instead of
running the operating system’s memory management code
From this, the overall point to keep in mind is that the performance impact of virtual memory is minimal when it is used as little as possible. This means the primary determinant of good virtual memory subsystem performance is having enough RAM.
Next in line (but much lower in relative importance) are sufficient disk I/O and CPU capacity. How­ever, keep in mind that these resources only help the system performance degrade more gracefully from heavy faulting and swapping; they do little to help the virtual memory subsystem performance (although they obviously can play a major role in overall system performance).
4.6. Red Hat Enterprise Linux-Specific Information
Due to the inherent complexity of being a demand-paged virtual memory operating system, monitor­ing memory-related resources under Red Hat Enterprise Linux can be confusing. Therefore, it is best to start with the more straightforward tools, and work from there.
2. A reasonably active system always experiences some level of page fault activity, due to page faults incurred
as newly-launched applications are brought into memory.
54 Chapter 4. Physical and Virtual Memory
Using free, it is possible to get a concise (if somewhat simplistic) overview of memory and swap utilization. Here is an example:
total used free shared buffers cached
Mem: 1288720 361448 927272 0 27844 187632
-/+ buffers/cache: 145972 1142748 Swap: 522104 0 522104
We note that this system has 1.2GB of RAM, of which only about 350MB is actually in use. As expected for a system with this much free RAM, none of the 500MB swap partition is in use.
Contrast that example with this one:
total used free shared buffers cached
Mem: 255088 246604 8484 0 6492 111320
-/+ buffers/cache: 128792 126296 Swap: 530136 111308 418828
This system has about 256MB of RAM, the majority of which is in use, leaving only about 8MB free. Over 100MB of the 512MB swap partition is in use. Although this system is certainly more limited in terms of memory than the first system, to determine if this memory limitation is causing performance problems we must dig a bit deeper.
Although more cryptic than free, vmstat has the benefit of displaying more than memory utilization statistics. Here is the output from vmstat 1 10:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 2 0 0 111304 9728 7036 107204 0 0 6 10 120 24 10 2 89 2 0 0 111304 9728 7036 107204 0 0 0 0 526 1653 96 4 0 1 0 0 111304 9616 7036 107204 0 0 0 0 552 2219 94 5 1 1 0 0 111304 9616 7036 107204 0 0 0 0 624 699 98 2 0 2 0 0 111304 9616 7052 107204 0 0 0 48 603 1466 95 5 0 3 0 0 111304 9620 7052 107204 0 0 0 0 768 932 90 4 6 3 0 0 111304 9440 7076 107360 92 0 244 0 820 1230 85 9 6 2 0 0 111304 9276 7076 107368 0 0 0 0 832 1060 87 6 7 3 0 0 111304 9624 7092 107372 0 0 16 0 813 1655 93 5 2 2 0 2 111304 9624 7108 107372 0 0 0 972 1189 1165 68 9 23
During this 10-second sample, the amount of free memory (the free field) varies somewhat, and there is a bit of swap-related I/O (the si and so fields), but overall this system is running well. It is doubtful, however, how much additional workload it could handle, given the current memory utilization.
When researching memory-related issues, it is often necessary to determine how the Red Hat Enter­prise Linux virtual memory subsystem is making use of system memory. By using sar, it is possible to examine this aspect of system performance in much more detail.
By reviewing the sar -r report, we can examine memory and swap utilization more closely:
Linux 2.4.20-1.1931.2.231.2.10.ent (pigdog.example.com) 07/22/2003
12:00:01 AM kbmemfree kbmemused %memused kbmemshrd kbbuffers kbcached 12:10:00 AM 240468 1048252 81.34 0 133724 485772 12:20:00 AM 240508 1048212 81.34 0 134172 485600 ... 08:40:00 PM 934132 354588 27.51 0 26080 185364 Average: 324346 964374 74.83 0 96072 467559
Chapter 4. Physical and Virtual Memory 55
The kbmemfree and kbmemused fields show the typical free and used memory statistics, with the percentage of memory used displayed in the %memused field. The kbbuffers and kbcached fields show how many kilobytes of memory are allocated to buffers and the system-wide data cache.
The kbmemshrd field is always zero for systems (such as Red Hat Enterprise Linux) using the 2.4 Linux kernel.
The lines for this report have been truncated to fit on the page. Here is the remainder of each line, with the timestamp added to the left to make reading easier:
12:00:01 AM kbswpfree kbswpused %swpused 12:10:00 AM 522104 0 0.00 12:20:00 AM 522104 0 0.00 ... 08:40:00 PM 522104 0 0.00 Average: 522104 0 0.00
For swap utilization, the kbswpfree and kbswpused fields show the amount of free and used swap space, in kilobytes, with the %swpused field showing the swap space used as a percentage.
To learn more about the swapping activity taking place, use the sar -W report. Here is an example:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003
12:00:01 AM pswpin/s pswpout/s 12:10:01 AM 0.15 2.56 12:20:00 AM 0.00 0.00 ... 03:30:01 PM 0.42 2.56 Average: 0.11 0.37
Here we notice that, on average, there were three times fewer pages being brought in from swap (pswpin/s) as there were going out to swap (pswpout/s).
To better understand how pages are being used, refer to the sar -B report:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003
12:00:01 AM pgpgin/s pgpgout/s activepg inadtypg inaclnpg inatarpg 12:10:00 AM 0.03 8.61 195393 20654 30352 49279 12:20:00 AM 0.01 7.51 195385 20655 30336 49275 ... 08:40:00 PM 0.00 7.79 71236 1371 6760 15873 Average: 201.54 201.54 169367 18999 35146 44702
Here we can determine how many blocks per second are paged in from disk (pgpgin/s) and paged out to disk (pgpgout/s). These statistics serve as a barometer of overall virtual memory activity.
However, more knowledge can be gained by examining the other fields in this report. The Red Hat Enterprise Linux kernel marks all pages as either active or inactive. As the names imply, active pages are currently in use in some manner (as process or buffer pages, for example), while inactive pages are not. This example report shows that the list of active pages (the activepg field) averages approx­imately 660MB3.
The remainder of the fields in this report concentrate on the inactive list — pages that, for one reason or another, have not recently been used. The inadtypg field shows how many inactive pages are dirty
3. The page size under Red Hat Enterprise Linux on the x86 system used in this example is 4096 bytes. Systems
based on other architectures may have different page sizes.
56 Chapter 4. Physical and Virtual Memory
(modified) and may need to be written to disk. The inaclnpg field, on the other hand, shows how many inactive pages are clean (unmodified) and do not need to be written to disk.
The inatarpg field represents the desired size of the inactive list. This value is calculated by the Linux kernel and is sized such that the inactive list remains large enough to act as a pool for page replacement purposes.
For additional insight into page status (specifically, how often pages change status), use the sar -R report. Here is a sample report:
Linux 2.4.20-1.1931.2.231.2.10.entsmp (raptor.example.com) 07/22/2003
12:00:01 AM frmpg/s shmpg/s bufpg/s campg/s 12:10:00 AM -0.10 0.00 0.12 -0.07 12:20:00 AM 0.02 0.00 0.19 -0.07 ... 08:50:01 PM -3.19 0.00 0.46 0.81 Average: 0.01 0.00 -0.00 -0.00
The statistics in this particular sar report are unique, in that they may be positive, negative, or zero. When positive, the value indicates the rate at which pages of this type are increasing. When negative, the value indicates the rate at which pages of this type are decreasing. A value of zero indicates that pages of this type are neither increasing or decreasing.
In this example, the last sample shows slightly over three pages per second being allocated from the list of free pages (the frmpg/s field) and nearly 1 page per second added to the page cache (the
campg/s field). The list of pages used as buffers (the bufpg/s field) gained approximately one page
every two seconds, while the shared memory page list (the shmpg/s field) neither gained nor lost any pages.
4.7. Additional Resources
This section includes various resources that can be used to learn more about resource monitoring and the Red Hat Enterprise Linux-specific subject matter discussed in this chapter.
4.7.1. Installed Documentation
The following resources are installed in the course of a typical Red Hat Enterprise Linux installation and can help you learn more about the subject matter discussed in this chapter.
free(1) man page — Learn how to display free and used memory statistics.
vmstat(8) man page — Learn how to display a concise overview of process, memory, swap, I/O,
system, and CPU utilization.
sar(1) man page — Learn how to produce system resource utilization reports.
sa2(8) man page — Learn how to produce daily system resource utilization report files.
4.7.2. Useful Websites
http://people.redhat.com/alikins/system_tuning.html — System Tuning Info for Linux Servers. A
stream-of-consciousness approach to performance tuning and resource monitoring for servers.
http://www.linuxjournal.com/article.php?sid=2396 — Performance Monitoring Tools for Linux.
This Linux Journal page is geared more toward the administrator interested in writing a customized
Chapter 4. Physical and Virtual Memory 57
performance graphing solution. Written several years ago, some of the details may no longer apply, but the overall concept and execution are sound.
4.7.3. Related Books
The following books discuss various issues related to resource monitoring, and are good resources for Red Hat Enterprise Linux system administrators:
The Red Hat Enterprise Linux System Administration Guide; Red Hat, Inc. — Includes a chapter
on many of the resource monitoring tools described here.
Linux Performance Tuning and Capacity Planning by Jason R. Fink and Matthew D. Sherer; Sams
— Provides more in-depth overviews of the resource monitoring tools presented here and includes others that might be appropriate for more specific resource monitoring needs.
Red Hat Linux Security and Optimization by Mohammed J. Kabir; Red Hat Press — Approximately
the first 150 pages of this book discuss performance-related issues. This includes chapters dedicated to performance issues specific to network, Web, email, and file servers.
Linux Administration Handbook by Evi Nemeth, Garth Snyder, and Trent R. Hein; Prentice Hall
— Provides a short chapter similar in scope to this book, but includes an interesting section on diagnosing a system that has suddenly slowed down.
Linux System Administration: A User’s Guide by Marcel Gagne; Addison Wesley Professional —
Contains a small chapter on performance monitoring and tuning.
Essential System Administration (3rd Edition) by Aeleen Frisch; O’Reilly & Associates — The
chapter on managing system resources contains good overall information, with some Linux specifics included.
System Performance Tuning (2nd Edition) by Gian-Paolo D. Musumeci and Mike Loukides;
O’Reilly & Associates — Although heavily oriented toward more traditional UNIX implementations, there are many Linux-specific references throughout the book.
58 Chapter 4. Physical and Virtual Memory
Chapter 5.
Managing Storage
If there is one thing that takes up the majority of a system administrator’s day, it would have to be storage management. It seems that disks are always running out of free space, becoming overloaded with too much I/O activity, or failing unexpectedly. Therefore, it is vital to have a solid working knowledge of disk storage in order to be a successful system administrator.
5.1. An Overview of Storage Hardware
Before managing storage, it is first necessary to understand the hardware on which data is stored. Unless you have at least some knowledge about mass storage device operation, you may find yourself in a situation where you have a storage-related problem, but you lack the background knowledge necessary to interpret what you are seeing. By gaining some insight into how the underlying hardware operates, you should be able to more easily determine whether your computer’s storage subsystem is operating properly.
The vast majority of all mass-storage devices use some sort of rotating media and supports the random access of data on that media. This means that the following components are present in some form within nearly every mass storage device:
Disk platters
Data reading/writing device
Access arms
The following sections explore each of these components in more detail.
5.1.1. Disk Platters
The rotating media used by nearly all mass storage devices are in the form of one or more flat, circularly-shaped platters. The platter may be composed of any number of different materials, such aluminum, glass, and polycarbonate.
The surface of each platter is treated in such a way as to enable data storage. The exact nature of the treatment depends on the data storage technology to be used. The most common data storage technology is based on the property of magnetism; in these cases the platters are covered with a compound that exhibits good magnetic characteristics.
Another common data storage technology is based on optical principles; in these cases, the platters are covered with materials whose optical properties can be modified, thereby allowing data to be stored optically1.
No matter what data storage technology is in use, the disk platters are spun, causing their entire surface to sweep past another component — the data reading/writing device.
1. Some optical devices — notably CD-ROM drives — use somewhat different approaches to data storage;
these differences are pointed out at the appropriate points within the chapter.
60 Chapter 5. Managing Storage
5.1.2. Data reading/writing device
The data reading/writing device is the component that takes the bits and bytes on which a computer system operates and turns them into the magnetic or optical variations necessary to interact with the materials coating the surface of the disk platters.
Sometimes the conditions under which these devices must operate are challenging. For instance, in magnetically-based mass storage the read/write devices (known as heads) must be very close to the surface of the platter. However, if the head and the surface of the disk platter were to touch, the resulting friction would do severe damage to both the head and the platter. Therefore, the surfaces of both the head and the platter are carefully polished, and the head uses air pressure developed by the spinning platters to float over the platter’s surface, "flying" at an altitude less than the thickness of a human hair. This is why magnetic disk drives are sensitive to shock, sudden temperature changes, and any airborne contamination.
The challenges faced by optical heads are somewhat different than for magnetic heads — here, the head assembly must remain at a relatively constant distance from the surface of the platter. Otherwise, the lenses used to focus on the platter does not produce a sufficiently sharp image.
In either case, the heads use a very small amount of the platter’s surface area for data storage. As the platter spins below the heads, this surface area takes the form of a very thin circular line.
If this was how mass storage devices worked, it would mean that over 99% of the platter’s surface area would be wasted. Additional heads could be mounted over the platter, but to fully utilize the platter’s surface area more than a thousand heads would be necessary. What is required is some method of moving the head over the surface of the platter.
5.1.3. Access Arms
By using a head attached to an arm that is capable of sweeping over the platter’s entire surface, it is possible to fully utilize the platter for data storage. However, the access arm must be capable of two things:
Moving very quickly
Moving very precisely
The access arm must move as quickly as possible, because the time spent moving the head from one position to another is wasted time. That is because no data can be read or written until the access arm stops moving2.
The access arm must be able to move with great precision because, as stated earlier, the surface area used by the heads is very small. Therefore, to efficiently use the platter’s storage capacity, it is necessary to move the heads only enough to ensure that any data written in the new position does not overwrite data written at a previous position. This has the affect of conceptually dividing the platter’s surface into a thousand or more concentric "rings" or tracks. Movement of the access arm from one track to another is often referred to as seeking, and the time it takes the access arms to move from one track to another is known as the seek time.
Where there are multiple platters (or one platter with both surfaces used for data storage), the arms for each surface are stacked, allowing the same track on each surface to be accessed simultaneously. If the tracks for each surface could be visualized with the access stationary over a given track, they would appear to be stacked one on top of another, making up a cylindrical shape; therefore, the set of tracks accessible at a certain position of the access arms are known as a cylinder.
2. In some optical devices (such as CD-ROM drives), the access arm is continually moving, causing the head
assembly to describe a spiral path over the surface of the platter. This is a fundamental difference in how the
storage medium is used and reflects the CD-ROM’s origins as a medium for music storage, where continuous
data retrieval is a more common operation than searching for a specific data point.
Chapter 5. Managing Storage 61
5.2. Storage Addressing Concepts
The configuration of disk platters, heads, and access arms makes it possible to position the head over any part of any surface of any platter in the mass storage device. However, this is not sufficient; to use this storage capacity, we must have some method of giving addresses to uniform-sized parts of the available storage.
There is one final aspect to this process that is required. Consider all the tracks in the many cylinders present in a typical mass storage device. Because the tracks have varying diameters, their circumfer­ence also varies. Therefore, if storage was addressed only to the track level, each track would have different amounts of data — track #0 (being near the center of the platter) might hold 10,827 bytes, while track #1,258 (near the outside edge of the platter) might hold 15,382 bytes.
The solution is to divide each track into multiple sectors or blocks of consistently-sized (often 512 bytes) segments of storage. The result is that each track contains a set number3of sectors.
A side effect of this is that every track contains unused space — the space between the sectors. Despite the constant number of sectors in each track, the amount of unused space varies — relatively little unused space in the inner tracks, and a great deal more unused space in the outer tracks. In either case, this unused space is wasted, as data cannot be stored on it.
However, the advantage offsetting this wasted space is that effectively addressing the storage on a mass storage device is now possible. In fact, there are two methods of addressing — geometry-based addressing and block-based addressing.
5.2.1. Geometry-Based Addressing
The term geometry-based addressing refers to the fact that mass storage devices actually store data at a specific physical spot on the storage medium. In the case of the devices being described here, this refers to three specific items that define a specific point on the device’s disk platters:
Cylinder
Head
Sector
The following sections describe how a hypothetical address can describe a specific physical location on the storage medium.
5.2.1.1. Cylinder
As stated earlier, the cylinder denotes a specific position of the access arm (and therefore, the read/write heads). By specifying a particular cylinder, we are eliminating all other cylinders, reducing our search to only one track for each surface in the mass storage device.
Cylinder Head Sector
1014 X X
Table 5-1. Storage Addressing
In Table 5-1, the first part of a geometry-based address has been filled in. Two more components to
3. While early mass storage devices used the same number of sectors for every track, later devices divided
the range of cylinders into different "zones," with each zone having a different number of sectors per track. The
reason for this is to take advantage of the additional space between sectors in the outer cylinders, where there is
more unused space between sectors.
62 Chapter 5. Managing Storage
this address — the head and sector — remain undefined.
5.2.1.2. Head
Although in the strictest sense we are selecting a particular disk platter, because each surface has a read/write head dedicated to it, it is easier to think in terms of interacting with a specific head. In fact, the device’s underlying electronics actually select one head and — deselecting the rest — only interact with the selected head for the duration of the I/O operation. All other tracks that make up the current cylinder have now been eliminated.
Cylinder Head Sector
1014 2 X
Table 5-2. Storage Addressing
In Table 5-2, the first two parts of a geometry-based address have been filled in. One final component to this address — the sector — remains undefined.
5.2.1.3. Sector
By specifying a particular sector, we have completed the addressing, and have uniquely identified the desired block of data.
Cylinder Head Sector
1014 2 12
Table 5-3. Storage Addressing
In Table 5-3, the complete geometry-based address has been filled in. This address identifies the loca­tion of one specific block out of all the other blocks on this device.
5.2.1.4. Problems with Geometry-Based Addressing
While geometry-based addressing is straightforward, there is an area of ambiguity that can cause problems. The ambiguity is in numbering the cylinders, heads, and sectors.
It is true that each geometry-based address uniquely identifies one specific data block, but that only applies if the numbering scheme for the cylinders, heads, and sectors is not changed. If the numbering scheme changes (such as when the hardware/software interacting with the storage device changes), then the mapping between geometry-based addresses and their corresponding data blocks can change, making it impossible to access the desired data.
Because of this potential for ambiguity, a different approach to addressing was developed. The next section describes it in more detail.
5.2.2. Block-Based Addressing
Block-based addressing is much more straightforward than geometry-based addressing. With block­based addressing, every data block is given a unique number. This number is passed from the computer
Chapter 5. Managing Storage 63
to the mass storage device, which then internally performs the conversion to the geometry-based address required by the device’s control circuitry.
Because the conversion to a geometry-based address is always done by the device itself, it is always consistent, eliminating the problem inherent with giving the device geometry-based addressing.
5.3. Mass Storage Device Interfaces
Every device used in a computer system must have some means of attaching to that computer system. This attachment point is known as an interface. Mass storage devices are no different — they have interfaces too. It is important to know about interfaces for two main reasons:
There are many different (mostly incompatible) interfaces
Different interfaces have different performance and price characteristics
Unfortunately, there is no single universal device interface and not even a single mass storage device interface. Therefore, system administrators must be aware of the interface(s) supported by their orga­nization’s systems. Otherwise, there is a real risk of purchasing the wrong hardware when a system upgrade is planned.
Different interfaces have different performance capabilities, making some interfaces more suitable for certain environments than others. For example, interfaces capable of supporting high-speed devices are more suitable for server environments, while slower interfaces would be sufficient for light desktop usage. Such differences in performance also lead to differences in price, meaning that — as always — you get what you pay for. High-performance computing does not come cheaply.
5.3.1. Historical Background
Over the years there have been many different interfaces created for mass storage devices. Some have fallen by the wayside, and some are still in use today. However, the following list is provided to give an idea of the scope of interface development over the past thirty years and to provide perspective on the interfaces in use today.
FD-400
An interface originally designed for the original 8-inch floppy disk drives in the mid-70s. Used a 44-conductor cable with an circuit board edge connector that supplied both power and data.
SA-400
Another floppy disk drive interface (this time originally developed in the late-70s for the then­new 5.25 inch floppy drive). Used a 34-conductor cable with a standard socket connector. A slightly modified version of this interface is still used today for 5.25 inch floppy and 3.5 inch diskette drives.
IPI
Standing for Intelligent Peripheral Interface, this interface was used on the 8 and 14-inch disk drives deployed on minicomputers of the 1970s.
SMD
A successor to IPI, SMD (stands for Storage Module Device) was used on 8 and 14-inch mini­computer hard drives in the 70s and 80s.
64 Chapter 5. Managing Storage
ST506/412
A hard drive interface dating from the early 80s. Used in many personal computers of the day, this interface used two cables — one 34-conductor and one 20-conductor.
ESDI
Standing for Enhanced Small Device Interface, this interface was considered a successor to ST506/412 with faster transfer rates and larger supported drive sizes. Dating from the mid-80s, ESDI used the same two-cable connection scheme of its predecessor.
There were also proprietary interfaces from the larger computer vendors of the day (IBM and DEC, primarily). The intent behind the creation of these interfaces was to attempt to protect the extremely lucrative peripherals business for their computers. However, due to their proprietary nature, the de­vices compatible with these interfaces were more expensive than equivalent non-proprietary devices. Because of this, these interfaces failed to achieve any long-term popularity.
While proprietary interfaces have largely disappeared, and the interfaces described at the start of this section no longer have much (if any) market share, it is important to know about these no-longer-used interfaces, as they prove one point — nothing in the computer industry remains constant for long. Therefore, always be on the lookout for new interface technologies; one day you might find that one of them may prove to be a better match for your needs than the more traditional offerings you currently use.
5.3.2. Present-Day Industry-Standard Interfaces
Unlike the proprietary interfaces mentioned in the previous section, some interfaces were more widely adopted, and turned into industry standards. Two interfaces in particular have made this transition and are at the heart of today’s storage industry:
IDE/ATA
SCSI
5.3.2.1. IDE/ATA
IDE stands for Integrated Drive Electronics. This interface originated in the late 80s, and uses a 40-pin connector.
Note
Actually, the proper name for this interface is the "AT Attachment" interface (or ATA), but use of the term "IDE" (which actually refers to an ATA-compatible mass storage device) is still used to some extent. However, the remainder of this section uses the interface’s proper name — ATA.
ATA implements a bus topology, with each bus supporting two mass storage devices. These two devices are known as the master and the slave. These terms are misleading, as it implies some sort of relationship between the devices; that is not the case. The selection of which device is the master and which is the slave is normally selected through the use of jumper blocks on each device.
Chapter 5. Managing Storage 65
Note
A more recent innovation is the introduction of cable select capabilities to ATA. This innovation re­quires the use of a special cable, an ATA controller, and mass storage devices that support cable select (normally through a "cable select" jumper setting). When properly configured, cable select eliminates the need to change jumpers when moving devices; instead, the device’s position on the ATA cable denotes whether it is master or slave.
A variation of this interface illustrates the unique ways in which technologies can be mixed and also introduces our next industry-standard interface. ATAPI is a variation of the ATA interface and stands for AT Attachment Packet Interface. Used primarily by CD-ROM drives, ATAPI adheres to the electrical and mechanical aspects of the ATA interface but uses the communication protocol from the next interface discussed — SCSI.
5.3.2.2. SCSI
Formally known as the Small Computer System Interface, SCSI as it is known today originated in the early 80s and was declared a standard in 1986. Like ATA, SCSI makes use of a bus topology. However, there the similarities end.
Using a bus topology means that every device on the bus must be uniquely identified somehow. While ATA supports only two different devices for each bus and gives each one a specific name, SCSI does this by assigning each device on a SCSI bus a unique numeric address or SCSI ID. Each device on a SCSI bus must be configured (usually by jumpers or switches4) to respond to its SCSI ID.
Before continuing any further in this discussion, it is important to note that the SCSI standard does not represent a single interface, but a family of interfaces. There are several areas in which SCSI varies:
Bus width
Bus speed
Electrical characteristics
The original SCSI standard described a bus topology in which eight lines in the bus were used for data transfer. This meant that the first SCSI devices could transfer data one byte at a time. In later years, the standard was expanded to permit implementations where sixteen lines could be used, doubling the amount of data that devices could transfer. The original "8-bit" SCSI implementations were then referred to as narrow SCSI, while the newer 16-bit implementations were known as wide SCSI.
Originally, the bus speed for SCSI was set to 5MHz, permitting a 5MB/second transfer rate on the original 8-bit SCSI bus. However, subsequent revisions to the standard doubled that speed to 10MHz, resulting in 10MB/second for narrow SCSI and 20MB/second for wide SCSI. As with the bus width, the changes in bus speed received new names, with the 10MHz bus speed being termed fast. Sub­sequent enhancements pushed bus speeds to ultra (20MHz), fast-40 (40MHz), and fast-805. Further increases in transfer rates lead to several different versions of the ultra160 bus speed.
By combining these terms, various SCSI configurations can be concisely named. For example, "ultra­wide SCSI" refers to a 16-bit SCSI bus running at 20MHz.
The original SCSI standard used single-ended signaling; this is an electrical configuration where only one conductor is used to pass an electrical signal. Later implementations also permitted the use of
4. Some storage hardware (usually those that incorporate removable drive "carriers") is designed so that the act
of plugging a module into place automatically sets the SCSI ID to an appropriate value.
5. Fast-80 is not technically a change in bus speed; instead the 40MHz bus was retained, but data was clocked
at both the rising and falling of each clock pulse, effectively doubling the throughput.
66 Chapter 5. Managing Storage
differential signaling, where two conductors are used to pass a signal. Differential SCSI (which was later renamed to high voltage differential or HVD SCSI) had the benefit of reduced sensitivity to electrical noise and allowed longer cable lengths, but it never became popular in the mainstream computer market. A later implementation, known as low voltage differential (LVD), has finally broken through to the mainstream and is a requirement for the higher bus speeds.
The width of a SCSI bus not only dictates the amount of data that can be transferred with each clock cycle, but it also determines how many devices can be connected to a bus. Regular SCSI supports 8 uniquely-addressed devices, while wide SCSI supports 16. In either case, you must make sure that all devices are set to use a unique SCSI ID. Two devices sharing a single ID causes problems that could lead to data corruption.
One other thing to keep in mind is that every device on the bus uses an ID. This includes the SCSI controller. Quite often system administrators forget this and unwittingly set a device to use the same SCSI ID as the bus’s controller. This also means that, in practice, only 7 (or 15, for wide SCSI) devices may be present on a single bus, as each bus must reserve an ID for the controller.
Tip
Most SCSI implementations include some means of scanning the SCSI bus; this is often used to confirm that all the devices are properly configured. If a bus scan returns the same device for every single SCSI ID, that device has been incorrectly set to the same SCSI ID as the SCSI controller. To resolve the problem, reconfigure the device to use a different (and unique) SCSI ID.
Because of SCSI’s bus-oriented architecture, it is necessary to properly terminate both ends of the bus. Termination is accomplished by placing a load of the correct electrical impedance on each conductor comprising the SCSI bus. Termination is an electrical requirement; without it, the various signals present on the bus would be reflected off the ends of the bus, garbling all communication.
Many (but not all) SCSI devices come with internal terminators that can be enabled or disabled using jumpers or switches. External terminators are also available.
One last thing to keep in mind about SCSI — it is not just an interface standard for mass storage devices. Many other devices (such as scanners, printers, and communications devices) use SCSI. Although these are much less common than SCSI mass storage devices, they do exist. However, it is likely that, with the advent of USB and IEEE-1394 (often called Firewire), these interfaces will be used more for these types of devices in the future.
Tip
The USB and IEEE-1394 interfaces are also starting to make inroads in the mass storage arena; however, no native USB or IEEE-1394 mass-storage devices currently exist. Instead, the present­day offerings are based on ATA or SCSI devices with external conversion circuitry.
No matter what interface a mass storage device uses, the inner workings of the device has a bearing on its performance. The following section explores this important subject.
5.4. Hard Drive Performance Characteristics
Hard drive performance characteristics have already been introduced in Section 4.2.4 Hard Drives; this section discusses the matter in more depth. This is important for system administrators to under-
Chapter 5. Managing Storage 67
stand, because without at least basic knowledge of how hard drives operate, it is possible to unwittingly making changes to your system configuration that could negatively impact its performance.
The time it takes for a hard drive to respond to and complete an I/O request is dependent on two things:
The hard drive’s mechanical and electrical limitations
The I/O load imposed by the system
The following sections explore these aspects of hard drive performance in more depth.
5.4.1. Mechanical/Electrical Limitations
Because hard drives are electro-mechanical devices, they are subject to various limitations on their speed and performance. Every I/O request requires the various components of the drive to work to­gether to satisfy the request. Because each of these components have different performance character­istics, the overall performance of the hard drive is determined by the sum of the performance of the individual components.
However, the electronic components are at least an order of magnitude faster than the mechanical components. Therefore, it is the mechanical components that have the greatest impact on overall hard drive performance.
Tip
The most effective way to improve hard drive performance is to reduce the drive’s mechanical activity as much as possible.
The average access time of a typical hard drive is roughly 8.5 milliseconds. The following sections break this figure down in more detail, showing how each component impacts the hard drive’s overall performance.
5.4.1.1. Command Processing Time
All hard drives produced today have sophisticated embedded computer systems controlling their op­eration. These computer systems perform the following tasks:
Interacting with the outside world via hard drive’s interface
Controlling the operation of the rest of the hard drive’s components, recovering from any error
conditions that might arise
Processing the raw data read from and written to the actual storage media
Even though the microprocessors used in hard drives are relatively powerful, the tasks assigned to them take time to perform. On average, this time is in the range of .003 milliseconds.
5.4.1.2. Heads Reading/Writing Data
The hard drive’s read/write heads only work when the disk platters over which they "fly" are spinning. Because it is the movement of the media under the heads that allows the data to be read or written, the time that it takes for media containing the desired sector to pass completely underneath the head is the sole determinant of the head’s contribution to total access time. This averages .0086 milliseconds for a 10,000 RPM drive with 700 sectors per track.
68 Chapter 5. Managing Storage
5.4.1.3. Rotational Latency
Because a hard drive’s disk platters are continuously spinning, when the I/O request arrives it is highly unlikely that the platter will be at exactly the right point in its rotation necessary to access the desired sector. Therefore, even if the rest of the drive is ready to access that sector, it is necessary for everything to wait while the platter rotates, bringing the desired sector into position under the read/write head.
This is the reason why higher-performance hard drives typically rotate their disk platters at higher speeds. Today, speeds of 15,000 RPM are reserved for the highest-performing drives, while 5,400 RPM is considered adequate only for entry-level drives. This averages approximately 3 milliseconds for a 10,000 RPM drive.
5.4.1.4. Access Arm Movement
If there is one component in hard drives that can be considered its Achilles’ Heel, it is the access arm. The reason for this is that the access arm must move very quickly and accurately over relatively long distances. In addition, the access arm movement is not continuous — it must rapidly accelerate as it approaches the desired cylinder and then just as rapidly decelerate to avoid overshooting. Therefore, the access arm must be strong (to survive the violent forces caused by the need for quick movement) but also light (so that there is less mass to accelerate/decelerate).
Achieving these conflicting goals is difficult, a fact that is shown by how relatively much time the access arm movement takes when compared to the time taken by the other components. Therefore, the movement of the access arm is the primary determinant of a hard drive’s overall performance, averaging 5.5 milliseconds.
5.4.2. I/O Loads and Performance
The other thing that controls hard drive performance is the I/O load to which a hard drive is subjected. Some of the specific aspects of the I/O load are:
The amount of reads versus writes
The number of current readers/writers
The locality of reads/writes
These are discussed in more detail in the following sections.
5.4.2.1. Reads Versus Writes
For the average hard drive using magnetic media for data storage, the number of read I/O operations versus the number of write I/O operations is not of much concern, as reading and writing data take the same amount of time6. However, other mass storage technologies take different amounts of time to process reads and writes7.
6. Actually, this is not entirely true. All hard drivesinclude some amount of on-board cache memory that is used
to improve read performance. However, any I/O request to read data must eventually be satisfied by physically
reading the data from the storage medium. This means that, while cache may alleviate read I/O performance
problems, it can never totally eliminate the time required to physically read the data from the storage medium.
7. Some optical disk drives exhibit this behavior, due to the physical constraints of the technologies used to
implement optical data storage.
Chapter 5. Managing Storage 69
The impact of this is that devices that take longer to process write I/O operations (for example) are able to handle fewer write I/Os than read I/Os. Looked at another way, a write I/O consumes more of the device’s ability to process I/O requests than does a read I/O.
5.4.2.2. Multiple Readers/Writers
A hard drive that processes I/O requests from multiple sources experiences a different load than a hard drive that services I/O requests from only one source. The main reason for this is due to the fact that multiple I/O requesters have the potential to bring higher I/O loads to bear on a hard drive than a single I/O requester.
This is because the I/O requester must perform some amount of processing before an I/O can take place. After all, the requester must determine the nature of the I/O request before it can be performed. Because the processing necessary to make this determination takes time, there is an upper limit on the I/O load that any one requester can generate — only a faster CPU can raise it. This limitation becomes more pronounced if the requester requires human input before performing an I/O.
However, with multiple requesters, higher I/O loads may be sustained. As long as sufficient CPU power is available to support the processing necessary to generate the I/O requests, adding more I/O requesters increases the resulting I/O load.
However, there is another aspect to this that also has a bearing on the resulting I/O load. This is discussed in the following section.
5.4.2.3. Locality of Reads/Writes
Although not strictly constrained to a multi-requester environment, this aspect of hard drive perfor­mance does tend to show itself more in such an environment. The issue is whether the I/O requests being made of a hard drive are for data that is physically close to other data that is also being requested.
The reason why this is important becomes apparent if the electromechanical nature of the hard drive is kept in mind. The slowest component of any hard drive is the access arm. Therefore, if the data being accessed by the incoming I/O requests requires no movement of the access arm, the hard drive is able to service many more I/O requests than if the data being accessed was spread over the entire drive, requiring extensive access arm movement.
This can be illustrated by looking at hard drive performance specifications. These specifications often include adjacent cylinder seek times (where the access arm is moved a small amount — only to the next cylinder), and full-stroke seek times (where the access arm moves from the very first cylinder to the very last one). For example, here are the seek times for a high-performance hard drive:
Adjacent Cylinder Full-Stroke
0.6 8.2
Table 5-4. Adjacent Cylinder and Full-Stroke Seek Times (in Milliseconds)
5.5. Making the Storage Usable
Once a mass storage device is in place, there is little that it can be used for. True, data can be written to it and read back from it, but without any underlying structure data access is only possible by using sector addresses (either geometrical or logical).
70 Chapter 5. Managing Storage
What is needed are methods of making the raw storage a hard drive provides more easily usable. The following sections explore some commonly-used techniques for doing just that.
5.5.1. Partitions/Slices
The first thing that often strikes a system administrator is that the size of a hard drive may be much larger than necessary for the task at hand. As a result, many operating systems have the capability of dividing a hard drive’s space into various partitions or slices.
Because they are separate from each other, partitions can have different amounts of space utilized, and that space in no way impacts the space utilized by other partitions. For example, the partition holding the files comprising the operating system is not affected even if the partition holding the users’ files becomes full. The operating system still has free space for its own use.
Although it is somewhat simplistic, you can think of partitions as being similar to individual disk drives. In fact, some operating systems actually refer to partitions as "drives". However, this viewpoint is not entirely accurate; therefore, it is important that we look at partitions more closely.
5.5.1.1. Partition Attributes
Partitions are defined by the following attributes:
Partition geometry
Partition type
Partition type field
These attributes are explored in more detail in the following sections.
5.5.1.1.1. Geometry
A partition’s geometry refers to its physical placement on a disk drive. The geometry can be specified in terms of starting and ending cylinders, heads, and sectors, although most often partitions start and end on cylinder boundaries. A partition’s size is then defined as the amount of storage between the starting and ending cylinders.
5.5.1.1.2. Partition Type
The partition type refers to the partition’s relationship with the other partitions on the disk drive. There are three different partition types:
Primary partitions
Extended partitions
Logical partitions
The following sections describe each partition type.
5.5.1.1.2.1. Primary Partitions
Primary partitions are partitions that take up one of the four primary partition slots in the disk drive’s partition table.
Chapter 5. Managing Storage 71
5.5.1.1.2.2. Extended Partitions
Extended partitions were developed in response to the need for more than four partitions per disk drive. An extended partition can itself contain multiple partitions, greatly extending the number of partitions possible on a single drive. The introduction of extended partitions was driven by the ever-increasing capacities of new disk drives.
5.5.1.1.2.3. Logical Partitions
Logical partitions are those partitions contained within an extended partition; in terms of use they are no different than a non-extended primary partition.
5.5.1.1.3. Partition Type Field
Each partition has a type field that contains a code indicating the partition’s anticipated usage. The type field may or may not reflect the computer’s operating system. Instead, it may reflect how data is to be stored within the partition. The following section contains more information on this important point.
5.5.2. File Systems
Even with the proper mass storage device, properly configured, and appropriately partitioned, we would still be unable to store and retrieve information easily — we are missing a way of structuring and organizing that information. What we need is a file system.
The concept of a file system is so fundamental to the use of mass storage devices that the average com­puter user often does not even make the distinction between the two. However, system administrators cannot afford to ignore file systems and their impact on day-to-day work.
A file system is a method of representing data on a mass storage device. File systems usually include the following features:
File-based data storage
Hierarchical directory (sometimes known as "folder") structure
Tracking of file creation, access, and modification times
Some level of control over the type of access allowed for a specific file
Some concept of file ownership
Accounting of space utilized
Not all file systems posses every one of these features. For example, a file system constructed for a single-user operating system could easily use a more simplified method of access control and could conceivably do away with support for file ownership altogether.
One point to keep in mind is that the file system used can have a large impact on the nature of your daily workload. By ensuring that the file system you use in your organization closely matches your organization’s functional requirements, you can ensure that not only is the file system up to the task, but that it is more easily and efficiently maintainable.
With this in mind, the following sections explore these features in more detail.
72 Chapter 5. Managing Storage
5.5.2.1. File-Based Storage
While file systems that use the file metaphor for data storage are so nearly universal as to be considered a given, there are still some aspects that should be considered here.
First is to be aware of any restrictions on file names. For instance, what characters are permitted in a file name? What is the maximum file name length? These questions are important, as it dictates those file names that can be used and those that cannot. Older operating systems with more primitive file systems often allowed only alphanumeric characters (and only uppercase at that), and only traditional
8.3 file names (meaning an eight-character file name, followed by a three-character file extension).
5.5.2.2. Hierarchical Directory Structure
While the file systems used in some very old operating systems did not include the concept of direc­tories, all commonly-used file systems today include this feature. Directories are themselves usually implemented as files, meaning that no special utilities are required to maintain them.
Furthermore, because directories are themselves files, and directories contain files, directories can therefore contain other directories, making a multi-level directory hierarchy possible. This is a pow­erful concept with which all system administrators should be thoroughly familiar. Using multi-level directory hierarchies can make file management much easer for you and for your users.
5.5.2.3. Tracking of File Creation, Access, Modification Times
Most file systems keep track of the time at which a file was created; some also track modification and access times. Over and above the convenience of being able to determine when a given file was created, accessed, or modified, these dates are vital for the proper operation of incremental backups.
More information on how backups make use of these file system features can be found in Section 8.2 Backups.
5.5.2.4. Access Control
Access control is one area where file systems differ dramatically. Some file systems have no clear-cut access control model, while others are much more sophisticated. In general terms, most modern day file systems combine two components into a cohesive access control methodology:
User identification
Permitted action list
User identification means that the file system (and the underlying operating system) must first be capable of uniquely identifying individual users. This makes it possible to have full accountability with respect to any operations on the file system level. Another often-helpful feature is that of user groups — creating ad-hoc collections of users. Groups are most often used by organizations where users may be members of one or more projects. Another feature that some file systems support is the creation of generic identifiers that can be assigned to one or more users.
Next, the file system must be capable of maintaining lists of actions that are permitted (or not permit­ted) against each file. The most commonly-tracked actions are:
Reading the file
Writing the file
Executing the file
Chapter 5. Managing Storage 73
Various file systems may extend the list to include other actions such as deleting, or even the ability to make changes related to a file’s access control.
5.5.2.5. Accounting of Space Utilized
One constant in a system administrator’s life is that there is never enough free space, and even if there is, it will not remain free for long. Therefore, a system administrator should at least be able to easily determine the level of free space available for each file system. In addition, file systems with well­defined user identification capabilities often include the capability to display the amount of space a particular user has consumed.
This feature is vital in large multi-user environments, as it is an unfortunate fact of life that the 80/20 rule often applies to disk space — 20 percent of your users will be responsible for consuming 80 percent of your available disk space. By making it easy to determine which users are in that 20 percent, you can more effectively manage your storage-related assets.
Taking this a step further, some file systems include the ability to set per-user limits (often known as disk quotas) on the amount of disk space that can be consumed. The specifics vary from file system to file system, but in general each user can be assigned a specific amount of storage that a user can use. Beyond that, various file systems differ. Some file systems permit the user to exceed their limit for one time only, while others implement a "grace period" during which a second, higher limit is applied.
5.5.3. Directory Structure
Many system administrators give little thought to how the storage they make available to users today is actually going to be used tomorrow. However, a bit of thought spent on this matter before handing over the storage to users can save a great deal of unnecessary effort later on.
The main thing that system administrators can do is to use directories and subdirectories to structure the storage available in an understandable way. There are several benefits to this approach:
More easily understood
More flexibility in the future
By enforcing some level of structure on your storage, it can be more easily understood. For example, consider a large mult-user system. Instead of placing all user directories in one large directory, it might make sense to use subdirectories that mirror your organization’s structure. In this way, people that work in accounting have their directories under a directory named accounting, people that work in engineering would have their directories under engineering, and so on.
The benefits of such an approach are that it would be easier on a day-to-day basis to keep track of the storage needs (and usage) for each part of your organization. Obtaining a listing of the files used by everyone in human resources is straightforward. Backing up all the files used by the legal department is easy.
With the appropriate structure, flexibility is increased. To continue using the previous example, assume for a moment that the engineering department is due to take on several large new projects. Because of this, many new engineers are to be hired in the near future. However, there is currently not enough free storage available to support the expected additions to engineering.
However, since every person in engineering has their files stored under the engineering directory, it would be a straightforward process to:
Procure the additional storage necessary to support engineering
Back up everything under the engineering directory
74 Chapter 5. Managing Storage
Restore the backup onto the new storage
Rename the engineering directory on the original storage to something like
engineering-archive (before deleting it entirely after running smoothly with the new
configuration for a month)
Make the necessary changes so that all engineering personnel can access their files on the new
storage
Of course, such an approach does have its shortcomings. For example, if people frequently move between departments, you must have a way of being informed of such transfers, and you must modify the directory structure appropriately. Otherwise, the structure no longer reflects reality, which makes more work — not less — for you in the long run.
5.5.4. Enabling Storage Access
Once a mass storage device has been properly partitioned, and a file system written to it, the storage is available for general use.
For some operating systems, this is true; as soon as the operating system detects the new mass storage device, it can be formatted by the system administrator and may be accessed immediately with no additional effort.
Other operating systems require an additional step. This step — often referred to as mounting — directs the operating system as to how the storage may be accessed. Mounting storage normally is done via a special utility program or command, and requires that the mass storage device (and possibly the partition as well) be explicitly identified.
5.6. Advanced Storage Technologies
Although everything presented in this chapter so far has dealt only with single hard drives directly­attached to a system, there are other, more advanced options that you can explore. The following sections describe some of the more common approaches to expanding your mass storage options.
5.6.1. Network-Accessible Storage
Combining network and mass storage technologies can result in a great deal more flexibility for system administrators. There are two benefits that are possible with this type of configuration:
Consolidation of storage
Simplified administration
Storage can be consolidated by deploying high-performance servers with high-speed network con­nectivity and configured with large amounts of fast storage. Given an appropriate configuration, it is possible to provide storage access at speeds comparable to locally-attached storage. Furthermore, the shared nature of such a configuration often makes it possible to reduce costs, as the expenses associ­ated with providing centralized, shared storage can be less than providing the equivalent storage for each and every client. In addition, free space is consolidated, instead of being spread out (and not widely usable) across many clients.
Centralized storage servers can also make many administrative tasks easier. For instance, monitoring free space is much easier when the storage to be monitored exists on a centralized storage server. Backups can be vastly simplified using a centralized storage server. Network-aware backups for mul­tiple clients are possible, but require more work to configure and maintain.
Chapter 5. Managing Storage 75
There are a number of different networked storage technologies available; choosing one can be diffi­cult. Nearly every operating system on the market today includes some means of accessing network­accessible storage, but the different technologies are incompatible with each other. What is the best approach to determining which technology to deploy?
The approach that usually provides the best results is to let the built-in capabilities of the client decide the issue. There are a number of reasons for this:
Minimal client integration issues
Minimal work on each client system
Low per-client cost of entry
Keep in mind that any client-related issues are multiplied by the number of clients in your organi­zation. By using the clients’ built-in capabilities, you have no additional software to install on each client (incurring zero additional cost in software procurement). And you have the best chance for good support and integration with the client operating system.
There is a downside, however. This means that the server environment must be up to the task of providing good support for the network-accessible storage technologies required by the clients. In cases where the server and client operating systems are one and the same, there is normally no issue. Otherwise, it will be necessary to invest time and effort in making the server "speak" the clients’ language. However, often this trade-off is more than justified.
5.6.2. RAID-Based Storage
One skill that a system administrator should cultivate is the ability to look at complex system config­urations, and observe the different shortcomings inherent in each configuration. While this might, at first glance, seem to be a rather depressing viewpoint to take, it can be a great way to look beyond the shiny new boxes and visualize some future Saturday night with all production down due to a failure that could easily have been avoided with a bit of forethought.
With this in mind, let us use what we now know about disk-based storage and see if we can determine the ways that disk drives can cause problems. First, consider an outright hardware failure:
A disk drive with four partitions on it dies completely: what happens to the data on those partitions?
It is immediately unavailable (at least until the failing unit can be replaced, and the data restored from a recent backup).
A disk drive with a single partition on it is operating at the limits of its design due to massive I/O loads: what happens to applications that require access to the data on that partition?
The applications slow down because the disk drive cannot process reads and writes any faster.
You have a large data file that is slowly growing in size; soon it will be larger than the largest disk drive available for your system. What happens then?
The disk drive fills up, the data file stops growing, and its associated applications stop running.
Just one of these problems could cripple a data center, yet system administrators must face these kinds of issues every day. What can be done?
Fortunately, there is one technology that can address each one of these issues. The name for that technology is RAID.
76 Chapter 5. Managing Storage
5.6.2.1. Basic Concepts
RAID is an acronym standing for Redundant Array of Independent Disks
8
. As the name implies,
RAID is a way for multiple disk drives to act as if they were a single disk drive.
RAID techniques were first developed by researchers at the University of California, Berkeley in the mid-1980s. At the time, there was a large gap in price between the high-performance disk drives used on the large computer installations of the day, and the smaller, slower disk drives used by the still­young personal computer industry. RAID was viewed as a method of having several less expensive disk drives fill in for one higher-priced unit.
More importantly, RAID arrays can be constructed in different ways, resulting in different character­istics depending on the final configuration. Let us look at the different configurations (known as RAID levels) in more detail.
5.6.2.1.1. RAID Levels
The Berkeley researchers originally defined five different RAID levels and numbered them "1" through "5." In time, additional RAID levels were defined by other researchers and members of the storage industry. Not all RAID levels were equally useful; some were of interest only for research purposes, and others could not be economically implemented.
In the end, there were three RAID levels that ended up seeing widespread usage:
Level 0
Level 1
Level 5
The following sections discuss each of these levels in more detail.
5.6.2.1.1.1. RAID 0
The disk configuration known as RAID level 0 is a bit misleading, as this is the only RAID level that employs absolutely no redundancy. However, even though RAID 0 has no advantages from a reliability standpoint, it does have other benefits.
A RAID 0 array consists of two or more disk drives. The available storage capacity on each drive is divided into chunks, which represent some multiple of the drives’ native block size. Data written to the array is be written, chunk by chunk, to each drive in the array. The chunks can be thought of as forming stripes across each drive in the array; hence the other term for RAID 0: striping.
For example, with a two-drive array and a 4KB chunk size, writing 12KB of data to the array would result in the data being written in three 4KB chunks to the following drives:
The first 4KB would be written to the first drive, into the first chunk
The second 4KB would be written to the second drive, into the first chunk
The last 4KB would be written to the first drive, into the second chunk
Compared to a single disk drive, the advantages to RAID 0 include:
Larger total size — RAID 0 arrays can be constructed that are larger than a single disk drive, making
it easier to store larger data files
8. When early RAID research began, the acronym stood for Redundant Array of Inexpensive Disks, but over
time the "standalone" disks that RAID was intended to supplant became cheaper and cheaper, rendering the price
comparison meaningless.
Chapter 5. Managing Storage 77
Better read/write performance — The I/O load on a RAID 0 array is spread evenly among all the
drives in the array (Assuming all the I/O is not concentrated on a single chunk)
No wasted space — All available storage on all drives in the array are available for data storage
Compared to a single disk drive, RAID 0 has the following disadvantage:
Less reliability — Every drive in a RAID 0 array must be operative for the array to be available;
a single drive failure in an N -drive RAID 0 array results in the removal of 1/N th of all the data, rendering the array useless
Tip
If you have trouble keeping the different RAID levels straight, just remember that RAID 0 has zero percent redundancy.
5.6.2.1.1.2. RAID 1
RAID 1 uses two (although some implementations support more) identical disk drives. All data is written to both drives, making them mirror images of each other. That is why RAID 1 is often known as mirroring.
Whenever data is written to a RAID 1 array, two physical writes must take place: one to the first drive, and one to the second drive. Reading data, on the other hand, only needs to take place once and either drive in the array can be used.
Compared to a single disk drive, a RAID 1 array has the following advantages:
Improved redundancy — Even if one drive in the array were to fail, the data would still be accessible
Improved read performance — With both drives operational, reads can be evenly split between
them, reducing per-drive I/O loads
When compared to a single disk drive, a RAID 1 array has some disadvantages:
Maximum array size is limited to the largest single drive available.
Reduced write performance — Because both drives must be kept up-to-date, all write I/Os must be
performed by both drives, slowing the overall process of writing data to the array
Reduced cost efficiency — With one entire drive dedicated to redundancy, the cost of a RAID 1
array is at least double that of a single drive
Tip
If you have trouble keeping the different RAID levels straight, just remember that RAID 1 has one hundred percent redundancy.
78 Chapter 5. Managing Storage
5.6.2.1.1.3. RAID 5
RAID 5 attempts to combine the benefits of RAID 0 and RAID 1, while minimizing their respective disadvantages.
Like RAID 0, a RAID 5 array consists of multiple disk drives, each divided into chunks. This allows a RAID 5 array to be larger than any single drive. Like a RAID 1 array, a RAID 5 array uses some disk space in a redundant fashion, improving reliability.
However, the way RAID 5 works is unlike either RAID 0 or 1.
A RAID 5 array must consist of at least three identically-sized disk drives (although more drives may be used). Each drive is divided into chunks and data is written to the chunks in order. However, not every chunk is dedicated to data storage as it is in RAID 0. Instead, in an array with n disk drives in it, every nth chunk is dedicated to parity.
Chunks containing parity make it possible to recover data should one of the drives in the array fail. The parity in chunk x is calculated by mathematically combining the data from each chunk x stored on all the other drives in the array. If the data in a chunk is updated, the corresponding parity chunk must be recalculated and updated as well.
This also means that every time data is written to the array, at least two drives are written to: the drive holding the data, and the drive containing the parity chunk.
One key point to keep in mind is that the parity chunks are not concentrated on any one drive in the array. Instead, they are spread evenly across all the drives. Even though dedicating a specific drive to contain nothing but parity is possible (in fact, this configuration is known as RAID level 4), the constant updating of parity as data is written to the array would mean that the parity drive could become a performance bottleneck. By spreading the parity information evenly throughout the array, this impact is reduced.
However, it is important to keep in mind the impact of parity on the overall storage capacity of the array. Even though the parity information is spread evenly across all the drives in the array, the amount of available storage is reduced by the size of one drive.
Compared to a single drive, a RAID 5 array has the following advantages:
Improved redundancy — If one drive in the array fails, the parity information can be used to recon-
struct the missing data chunks, all while keeping the array available for use
9
Improved read performance — Due to the RAID 0-like way data is divided between drives in the
array, read I/O activity is spread evenly between all the drives
Reasonably good cost efficiency — For a RAID 5 array of n drives, only 1/nth of the total available
storage is dedicated to redundancy
Compared to a single drive, a RAID 5 array has the following disadvantage:
Reduced write performance — Because each write to the array results in at least two writes to the
physical drives (one write for the data and one for the parity), write performance is worse than a single drive
10
9. I/O performance is reduced while operating with one drive unavailable, due to the overhead involved in
reconstructing the missing data.
10. There is also an impact from the parity calculations required for each write. However, depending on the
specific RAID 5 implementation (specifically, where in the system the parity calculations are performed), this
impact can range from sizable to nearly nonexistent.
Chapter 5. Managing Storage 79
5.6.2.1.1.4. Nested RAID Levels
As should be obvious from the discussion of the various RAID levels, each level has specific strengths and weaknesses. It was not long after RAID-based storage began to be deployed that people began to wonder whether different RAID levels could somehow be combined, producing arrays with all of the strengths and none of the weaknesses of the original levels.
For example, what if the disk drives in a RAID 0 array were themselves actually RAID 1 arrays? This would give the advantages of RAID 0’s speed, with the reliability of RAID 1.
This is just the kind of thing that can be done. Here are the most commonly-nested RAID levels:
RAID 1+0
RAID 5+0
RAID 5+1
Because nested RAID is used in more specialized environments, we will not go into greater detail here. However, there are two points to keep in mind when thinking about nested RAID:
Order matters — The order in which RAID levels are nested can have a large impact on reliability.
In other words, RAID 1+0 and RAID 0+1 are not the same.
Costs can be high — If there is any disadvantage common to all nested RAID implementations, it
is one of cost; for example, the smallest possible RAID 5+1 array consists of six disk drives (and even more drives are required for larger arrays).
Now that we have explored the concepts behind RAID, let us see how RAID can be implemented.
5.6.2.1.2. RAID Implementations
It is obvious from the previous sections that RAID requires additional "intelligence" over and above the usual disk I/O processing for individual drives. At the very least, the following tasks must be performed:
Dividing incoming I/O requests to the individual disks in the array
For RAID 5, calculating parity and writing it to the appropriate drive in the array
Monitoring the individual disks in the array and taking the appropriate action should one fail
Controlling the rebuilding of an individual disk in the array, when that disk has been replaced or
repaired
Providing a means to allow administrators to maintain the array (removing and adding drives, initi-
ating and halting rebuilds, etc.)
There are two major methods that may be used to accomplish these tasks. The next two sections describe them in more detail.
5.6.2.1.2.1. Hardware RAID
A hardware RAID implementation usually takes the form of a specialized disk controller card. The card performs all RAID-related functions and directly controls the individual drives in the arrays attached to it. With the proper driver, the arrays managed by a hardware RAID card appear to the host operating system just as if they were regular disk drives.
Most RAID controller cards work with SCSI drives, although there are some ATA-based RAID con­trollers as well. In any case, the administrative interface is usually implemented in one of three ways:
80 Chapter 5. Managing Storage
Specialized utility programs that run as applications under the host operating system, presenting a
software interface to the controller card
An on-board interface using a serial port that is accessed using a terminal emulator
A BIOS-like interface that is only accessible during the system’s power-up testing
Some RAID controllers have more than one type of administrative interface available. For obvious reasons, a software interface provides the most flexibility, as it allows administrative functions while the operating system is running. However, if you are booting an operating system from a RAID con­troller, an interface that does not require a running operating system is a requirement.
Because there are so many different RAID controller cards on the market, it is impossible to go into further detail here. The best course of action is to read the manufacturer’s documentation for more information.
5.6.2.1.2.2. Software RAID
Software RAID is RAID implemented as kernel- or driver-level software for a particular operating system. As such, it provides more flexibility in terms of hardware support — as long as the hardware is supported by the operating system, RAID arrays can be configured and deployed. This can dramat­ically reduce the cost of deploying RAID by eliminating the need for expensive, specialized RAID hardware.
Often the excess CPU power available for software RAID parity calculations greatly exceeds the pro­cessing power present on a RAID controller card. Therefore, some software RAID implementations actually have the capability for higher performance than hardware RAID implementations.
However, software RAID does have limitations not present in hardware RAID. The most important one to consider is support for booting from a software RAID array. In most cases, only RAID 1 arrays can be used for booting, as the computer’s BIOS is not RAID-aware. Since a single drive from a RAID 1 array is indistinguishable from a non-RAID boot device, the BIOS can successfully start the boot process; the operating system can then change over to software RAID operation once it has gained control of the system.
5.6.3. Logical Volume Management
One other advanced storage technology is that of logical volume management (LVM). LVM makes it possible to treat physical mass storage devices as low-level building blocks on which different storage configurations are built. The exact capabilities vary according to the specific implementation, but can include physical storage grouping, logical volume resizing, and data migration.
5.6.3.1. Physical Storage Grouping
Although the name given to this capability may differ, physical storage grouping is the foundation for all LVM implementations. As the name implies, the physical mass storage devices can be grouped together in such a way as to create one or more logical mass storage devices. The logical mass storage devices (or logical volumes) can be larger in capacity than the capacity of any one of the underlying physical mass storage devices.
For example, given two 100GB drives, a 200GB logical volume can be created. However, a 150GB and a 50GB logical volume could also be created. Any combination of logical volumes equal to or less than the total capacity (200GB in this example) is possible. The choices are limited only by your organization’s needs.
Chapter 5. Managing Storage 81
This makes it possible for a system administrator to treat all storage as being part of a single pool, available for use in any amount. In addition, drives can be added to the pool at a later time, making it a straightforward process to stay ahead of your users’ demand for storage.
5.6.3.2. Logical Volume Resizing
The feature that most system administrators appreciate about LVM is its ability to easily direct storage where it is needed. In a non-LVM system configuration, running out of space means — at best — moving files from the full device to one with available space. Often it can mean actual reconfiguration of your system’s mass storage devices; a task that would have to take place after normal business hours.
However, LVM makes it possible to easily increase the size of a logical volume. Assume for a moment that our 200GB storage pool was used to create a 150GB logical volume, with the remaining 50GB held in reserve. If the 150GB logical volume became full, LVM makes it possible to increase its size (say, by 10GB) without any physical reconfiguration. Depending on the operating system environ­ment, it may be possible to do this dynamically or it might require a short amount of downtime to actually perform the resizing.
5.6.3.3. Data Migration
Most seasoned system administrators would be impressed by LVM capabilities so far, but they would also be asking themselves this question:
What happens if one of the drives making up a logical volume starts to fail?
The good news is that most LVM implementations include the ability to migrate data off of a particular physical drive. For this to work, there must be sufficient reserve capacity left to absorb the loss of the failing drive. Once the migration is complete, the failing drive can then be replaced and added back into the available storage pool.
5.6.3.4. With LVM, Why Use RAID?
Given that LVM has some features similar to RAID (the ability to dynamically replace failing drives, for instance), and some features providing capabilities that cannot be matched by most RAID im­plementations (such as the ability to dynamically add more storage to a central storage pool), many people wonder whether RAID is no longer important.
Nothing could be further from the truth. RAID and LVM are complementary technologies that can be used together (in a manner similar to nested RAID levels), making it possible to get the best of both worlds.
5.7. Storage Management Day-to-Day
System administrators must pay attention to storage in the course of their day-to-day routine. There are various issues that should be kept in mind:
Monitoring free space
Disk quota issues
File-related issues
Directory-related issues
82 Chapter 5. Managing Storage
Backup-related issues
Performance-related issues
Adding/removing storage
The following sections discuss each of these issues in more detail.
5.7.1. Monitoring Free Space
Making sure there is sufficient free space available should be at the top of every system administrator’s daily task list. The reason why regular, frequent free space checking is so important is because free space is so dynamic; there can be more than enough space one moment, and almost none the next.
In general, there are three reasons for insufficient free space:
Excessive usage by a user
Excessive usage by an application
Normal growth in usage
These reasons are explored in more detail in the following sections.
5.7.1.1. Excessive Usage by a User
Different people have different levels of neatness. Some people would be horrified to see a speck of dust on a table, while others would not think twice about having a collection of last year’s pizza boxes stacked by the sofa. It is the same with storage:
Some people are very frugal in their storage usage and never leave any unneeded files hanging
around.
Some people never seem to find the time to get rid of files that are no longer needed.
Many times where a user is responsible for using large amounts of storage, it is the second type of person that is found to be responsible.
5.7.1.1.1. Handling a User’s Excessive Usage
This is one area in which a system administrator needs to summon all the diplomacy and social skills they can muster. Quite often discussions over disk space become emotional, as people view enforcement of disk usage restrictions as making their job more difficult (or impossible), that the restrictions are unreasonably small, or that they just do not have the time to clean up their files.
The best system administrators take many factors into account in such a situation. Are the restrictions equitable and reasonable for the type of work being done by this person? Does the person seem to be using their disk space appropriately? Can you help the person reduce their disk usage in some way (by creating a backup CD-ROM of all emails over one year old, for example)? Your job during the conversation is to attempt to discover if this is, in fact, the case while making sure that someone that has no real need for that much storage cleans up their act.
In any case, the thing to do is to keep the conversation on a professional, factual level. Try to address the user’s issues in a polite manner ("I understand you are very busy, but everyone else in your depart­ment has the same responsibility to not waste storage, and their average utilization is less than half of yours.") while moving the conversation toward the matter at hand. Be sure to offer assistance if a lack of knowledge/experience seems to be the problem.
Approaching the situation in a sensitive but firm manner is often better than using your authority as system administrator to force a certain outcome. For example, you might find that sometimes a compromise between you and the user is necessary. This compromise can take one of three forms:
Chapter 5. Managing Storage 83
Provide temporary space
Make archival backups
Give up
You might find that the user can reduce their usage if they have some amount of temporary space that they can use without restriction. People that often take advantage of this situation find that it allows them to work without worrying about space until they get to a logical stopping point, at which time they can perform some housekeeping, and determine what files in temporary storage are really needed or not.
Warning
If you offer this situation to a user, do not fall into the trap of allowing this temporary space to be­come permanent space. Make it very clear that the space being offered is temporary, and that no guarantees can be made as to data retention; no backups of any data in temporary space are ever made.
In fact, many administrators often underscore this fact by automatically deleting any files in temporary storage that are older than a certain age (a week, for example).
Other times, the user may have many files that are so obviously old that it is unlikely continuous access to them is needed. Make sure you determine that this is, in fact, the case. Sometimes individual users are responsible for maintaining an archive of old data; in these instances, you should make a point of assisting them in that task by providing multiple backups that are treated no differently from your data center’s archival backups.
However, there are times when the data is of dubious value. In these instances you might find it best to offer to make a special backup for them. You then back up the old data, and give the user the backup media, explaining that they are responsible for its safekeeping, and if they ever need access to any of the data, to ask you (or your organization’s operations staff — whatever is appropriate for your organization) to restore it.
There are a few things to keep in mind so that this does not backfire on you. First and foremost is to not include files that are likely to need restoring; do not select files that are too new. Next, make sure that you are able to perform a restoration if one ever is requested. This means that the backup media should be of a type that you are reasonably sure will be used in your data center for the foreseeable future.
Tip
Your choice of backup media should also take into consideration those technologies that can enable the user to handle data restoration themselves. For example, even though backing up several giga­bytes onto CD-R media is more work than issuing a single command and spinning it off to a 20GB tape cartridge, consider that the user can then be able to access the data on CD-R whenever they want — without ever involving you.
5.7.1.2. Excessive Usage by an Application
Sometimes an application is responsible for excessive usage. The reasons for this can vary, but can include:
Enhancements in the application’s functionality require more storage
84 Chapter 5. Managing Storage
An increase in the number of users using the application
The application fails to clean up after itself, leaving no-longer-needed temporary files on disk
The application is broken, and the bug is causing it to use more storage than it should
Your task is to determine which of the reasons from this list apply to your situation. Being aware of the status of the applications used in your data center should help you eliminate several of these reasons, as should your awareness of your users’ processing habits. What remains to be done is often a bit of detective work into where the storage has gone. This should narrow down the field substantially.
At this point you must then take the appropriate steps, be it the addition of storage to support an increasingly-popular application, contacting the application’s developers to discuss its file handling characteristics, or writing scripts to clean up after the application.
5.7.1.3. Normal Growth in Usage
Most organizations experience some level of growth over the long term. Because of this, it is normal to expect storage utilization to increase at a similar pace. In nearly all circumstances, ongoing monitoring can reveal the average rate of storage utilization at your organization; this rate can then be used to determine the time at which additional storage should be procured before your free space actually runs out.
If you are in the position of unexpectedly running out of free space due to normal growth, you have not been doing your job.
However, sometimes large additional demands on your systems’ storage can come up unexpectedly. Your organization may have merged with another, necessitating rapid changes in the IT infrastructure (and therefore, storage). A new high-priority project may have literally sprung up overnight. Changes to an existing application may have resulted in greatly increased storage needs.
No matter what the reason, there are times when you will be taken by surprise. To plan for these instances, try to configure your storage architecture for maximum flexibility. Keeping spare storage on-hand (if possible) can alleviate the impact of such unplanned events.
5.7.2. Disk Quota Issues
Many times the first thing most people think of when they think about disk quotas is using it to force users to keep their directories clean. While there are sites where this may be the case, it also helps to look at the problem of disk space usage from another perspective. What about applications that, for one reason or another, consume too much disk space? It is not unheard of for applications to fail in ways that cause them to consume all available disk space. In these cases, disk quotas can help limit the damage caused by such errant applications, forcing it to stop before no free space is left on the disk.
The hardest part of implementing and managing disk quotas revolves around the limits themselves. What should they be? A simplistic approach would be to divide the disk space by the number of users and/or groups using it, and use the resulting number as the per-user quota. For example, if the system has a 100GB disk drive and 20 users, each user should be given a disk quota of no more than 5GB. That way, each user would be guaranteed 5GB (although the disk would be 100% full at that point).
For those operating systems that support it, temporary quotas could be set somewhat higher — say
7.5GB, with a permanent quota remaining at 5GB. This would have the benefit of allowing users to permanently consume no more than their percentage of the disk, but still permitting some flexibility when a user reaches (and exceeds) their limit. When using disk quotas in this manner, you are actually over-committing the available disk space. The temporary quota is 7.5GB. If all 20 users exceeded their permanent quota at the same time and attempted to approach their temporary quota, that 100GB
Chapter 5. Managing Storage 85
disk would actually have to be 150GB to allow everyone to reach their temporary quota at the same time.
However, in practice not everyone exceeds their permanent quota at the same time, making some amount of overcommitment a reasonable approach. Of course, the selection of permanent and tempo­rary quotas is up to the system administrator, as each site and user community is different.
5.7.3. File-Related Issues
System administrators often have to deal with file-related issues. The issues include:
File Access
File Sharing
5.7.3.1. File Access
Issues relating to file access typically revolve around one scenario — a user is not able to access a file they feel they should be able to access.
Often this is a case of user #1 wanting to give a copy of a file to user #2. In most organizations, the ability for one user to access another user’s files is strictly curtailed, leading to this problem.
There are three approaches that could conceivably be taken:
User #1 makes the necessary changes to allow user #2 to access the file wherever it currently exists.
A file exchange area is created for such purposes; user #1 places a copy of the file there, which can
then be copied by user #2.
User #1 uses email to give user #2 a copy of the file.
There is a problem with the first approach — depending on how access is granted, user #2 may have full access to all of user #1’s files. Worse, it might have been done in such a way as to permit all users in your organization access to user #1’s files. Still worse, this change may not be reversed after user #2 no longer requires access, leaving user #1’s files permanently accessible by others. Unfortunately, when users are in charge of this type of situation, security is rarely their highest priority.
The second approach eliminates the problem of making all of user #1’s files accessible to others. How­ever, once the file is in the file exchange area the file is readable (and depending on the permissions, even writable) by all other users. This approach also raises the possibility of the file exchange area becoming filled with files, as users often forget to clean up after themselves.
The third approach, while seemingly an awkward solution, may actually be the preferable one in most cases. With the advent of industry-standard email attachment protocols and more intelligent email programs, sending all kinds of files via email is a mostly foolproof operation, requiring no system administrator involvement. Of course, there is the chance that a user will attempt to email a 1GB database file to all 150 people in the finance department, so some amount of user education (and possibly limitations on email attachment size) would be prudent. Still, none of these approaches deal with the situation of two or more users needing ongoing access to a single file. In these cases, other methods are required.
5.7.3.2. File Sharing
When multiple users need to share a single copy of a file, allowing access by making changes to file permissions is not the best approach. It is far preferable to formalize the file’s shared status. There are several reasons for this:
86 Chapter 5. Managing Storage
Files shared out of a user’s directory are vulnerable to disappearing unexpectedly when the user
either leaves the organization or does nothing more unusual than rearranging their files.
Maintaining shared access for more than one or two additional users becomes difficult, leading to
the longer-term problem of unnecessary work required whenever the sharing users change respon­sibilities.
Therefore, the preferred approach is to:
Have the original user relinquish direct ownership of the file
Create a group that will own the file
Place the file in a shared directory that is owned by the group
Make all users needing access to the file part of the group
Of course, this approach would work equally well with multiple files as it would with single files, and can be used to implement shared storage for large, complex projects.
5.7.4. Adding/Removing Storage
Because the need for additional disk space is never-ending, a system administrator often needs to add disk space, while sometimes also removing older, smaller drives. This section provides an overview of the basic process of adding and removing storage.
Note
On many operating systems, mass storage devices are named according to their physical connection to the system. Therefore, adding or removing mass storage devices can result in unexpected changes to device names. When adding or removing storage, always make sure you review (and update, if necessary) all device name references used by your operating system.
5.7.4.1. Adding Storage
The process of adding storage to a computer system is relatively straightforward. Here are the basic steps:
1. Installing the hardware
2. Partitioning
3. Formatting the partition(s)
4. Updating system configuration
5. Modifying backup schedule
The following sections look at each step in more detail.
5.7.4.1.1. Installing the Hardware
Before anything else can be done, the new disk drive has to be in place and accessible. While there are many different hardware configurations possible, the following sections go through the two most common situations — adding an ATA or SCSI disk drive. Even with other configurations, the basic steps outlined here still apply.
Chapter 5. Managing Storage 87
Tip
No matter what storage hardware you use, you should always consider the load a new disk drive adds to your computer’s I/O subsystem. In general, you should try to spread the disk I/O load over all available channels/buses. From a performance standpoint, this is far better than putting all disk drives on one channel and leaving another one empty and idle.
5.7.4.1.1.1. Adding ATA Disk Drives
ATA disk drives are mostly used in desktop and lower-end server systems. Nearly all systems in these classes have built-in ATA controllers with multiple ATA channels — normally two or four.
Each channel can support two devices — one master, and one slave. The two devices are connected to the channel with a single cable. Therefore, the first step is to see which channels have available space for an additional disk drive. One of three situations is possible:
There is a channel with only one disk drive connected to it
There is a channel with no disk drive connected to it
There is no space available
The first situation is usually the easiest, as it is very likely that the cable already in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place only has two connectors (one for the channel and one for the already-installed disk drive), then it is necessary to replace the existing cable with a three-connector model.
Before installing the new disk drive, make sure that the two disk drives sharing the channel are appro­priately configured (one as master and one as slave).
The second situation is a bit more difficult, if only for the reason that a cable must be procured so that it can connect a disk drive to the channel. The new disk drive may be configured as master or slave (although traditionally the first disk drive on a channel is normally configured as master).
In the third situation, there is no space left for an additional disk drive. You must then make a decision. Do you:
Acquire an ATA controller card, and install it
Replace one of the installed disk drives with the newer, larger one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer’s bus slots, there must be an open slot for it, and it must be supported by your operating system. Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
Write the data to a backup device and restore it after installing the new disk drive
Use your network to copy the data to another system with sufficient free space, restoring the data
after installing the new disk drive
Use the space physically occupied by a third disk drive by:
1. Temporarily removing the third disk drive
2. Temporarily installing the new disk drive in its place
3. Copying the data to the new disk drive
4. Removing the old disk drive
5. Replacing it with the new disk drive
88 Chapter 5. Managing Storage
6. Reinstalling the temporarily-removed third disk drive
Temporarily install the original disk drive and the new disk drive in another computer, copy the data
to the new disk drive, and then install the new disk drive in the original computer
As you can see, sometimes a bit of effort must be expended to get the data (and the new hardware) where it needs to go.
5.7.4.1.1.2. Adding SCSI Disk Drives
SCSI disk drives normally are used in higher-end workstations and server systems. Unlike ATA-based systems, SCSI systems may or may not have built-in SCSI controllers; some do, while others use a separate SCSI controller card.
The capabilities of SCSI controllers (whether built-in or not) also vary widely. It may supply a narrow or wide SCSI bus. The bus speed may be normal, fast, ultra, utra2, or ultra160.
If these terms are unfamiliar to you (they were discussed briefly in Section 5.3.2.2 SCSI), you must determine the capabilities of your hardware configuration and select an appropriate new disk drive. The best resource for this information would be the documentation for your system and/or SCSI adapter.
You must then determine how many SCSI buses are available on your system, and which ones have available space for a new disk drive. The number of devices supported by a SCSI bus varies according to the bus width:
Narrow (8-bit) SCSI bus — 7 devices (plus controller)
Wide (16-bit) SCSI bus — 15 devices (plus controller)
The first step is to see which buses have available space for an additional disk drive. One of three situations is possible:
There is a bus with less than the maximum number of disk drives connected to it
There is a bus with no disk drives connected to it
There is no space available on any bus
The first situation is usually the easiest, as it is likely that the cable in place has an unused connector into which the new disk drive can be plugged. However, if the cable in place does not have an unused connector, it is necessary to replace the existing cable with one that has at least one more connector.
The second situation is a bit more difficult, if only for the reason that a cable must be procured so that it can connect a disk drive to the bus.
If there is no space left for an additional disk drive, you must make a decision. Do you:
Acquire and install a SCSI controller card
Replace one of the installed disk drives with the new, larger one
Adding a controller card entails checking hardware compatibility, physical capacity, and software compatibility. Basically, the card must be compatible with your computer’s bus slots, there must be an open slot for it, and it must be supported by your operating system.
Replacing an installed disk drive presents a unique problem: what to do with the data on the disk? There are a few possible approaches:
Write the data to a backup device, and restore it after installing the new disk drive
Loading...