Novell LINUX ENTERPRISE SERVER 11, SUSE LINUX ENTERPRISE SERVER 11 STORAGE Administration Manual

Page 1
Novell®
www.novell.com
Storage Administration Guide
SUSE® Linux Enterprise Server
novdocx (en) 7 January 2010
AUTHORIZED DOCUMENTATION
11
February 23, 2010
SLES 11: Storage Administration Guide
Page 2
Legal Notices
Novell, Inc., makes no representations or warranties with respect to the contents or use of this documentation, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, Novell, Inc., reserves the right to revise this publication and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes.
Further, Novell, Inc., makes no representations or warranties with respect to any software, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Further, Novell, Inc., reserves the right to make changes to any and all parts of Novell software, at any time, without any obligation to notify any person or entity of such changes.
Any products or technical information provided under this Agreement may be subject to U.S. export controls and the trade laws of other countries. You agree to comply with all export control regulations and to obtain any required licenses or classification to export, re-export or import deliverables. You agree not to export or re-export to entities on the current U.S. export exclusion lists or to any embargoed or terrorist countries as specified in the U.S. export laws. You agree to not use deliverables for prohibited nuclear, missile, or chemical biological weaponry end uses. See the
Novell International Trade Services Web page (http://www.novell.com/info/exports/) for more information on
exporting Novell software. Novell assumes no responsibility for your failure to obtain any necessary export approvals.
novdocx (en) 7 January 2010
Copyright © 2009–2010 Novell, Inc. All rights reserved. No part of this publication may be reproduced, photocopied, stored on a retrieval system, or transmitted without the express written consent of the publisher.
Novell, Inc., has intellectual property rights relating to technology embodied in the product that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed on the Novell Legal Patents Web page (http://www.novell.com/company/legal/patents/) and one or more additional patents or pending patent applications in the U.S. and in other countries.
Novell, Inc. 404 Wyman Street, Suite 500 Waltham, MA 02451 U.S.A. www.novell.com
Online Documentation: To access the latest online documentation for this and other Novell products, see
the Novell Documentation Web page (http://www.novell.com/documentation).
Page 3
Novell Trademarks
For Novell trademarks, see the Novell Trademark and Service Mark list (http://www.novell.com/company/legal/
trademarks/tmlist.html).
Third-Party Materials
All third-party trademarks and copyrights are the property of their respective owners.
novdocx (en) 7 January 2010
Page 4
novdocx (en) 7 January 2010
4 SLES 11: Storage Administration Guide
Page 5
Contents
About This Guide 11
1 Overview of File Systems in Linux 13
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Major File Systems in Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.1 Ext2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.2 Ext3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Oracle Cluster File System 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.4 ReiserFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.2.5 XFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Other Supported File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Large File Support in Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
novdocx (en) 7 January 2010
2What’s New 21
2.1 EVMS2 Is Deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Ext3 as the Default File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 JFS File System Is Deprecated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 OCFS2 File System Is in the High Availability Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 /dev/disk/by-name Is Deprecated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Device Name Persistence in the /dev/disk/by-id Directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 Filters for Multipathed Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.8 User-Friendly Names for Multipathed Devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.9 Advanced I/O Load-Balancing Options for Multipath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.10 Location Change for Multipath Tool Callouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.11 Change from mpath to multipath for mkinitrd -f Option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Planning a Storage Solution 25
3.1 Partitioning Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Multipath Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Software RAID Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 File System Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Backup and Antivirus Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.1 Open Source Backup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5.2 Commercial Backup and Antivirus Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 LVM Configuration 27
4.1 Understanding the Logical Volume Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Creating LVM Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Creating Volume Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 Configuring Physical Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.5 Configuring Logical Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Direct LVM Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7 Resizing an LVM Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Contents 5
Page 6
5 Resizing File Systems 35
5.1 Guidelines for Resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.1 File Systems that Support Resizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.2 Increasing the Size of a File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.1.3 Decreasing the Size of a File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Increasing an Ext2 or Ext3 File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 Increasing the Size of a Reiser File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Decreasing the Size of an Ext2 or Ext3 File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Decreasing the Size of a Reiser File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6 Using UUIDs to Mount Devices 39
6.1 Naming Devices with udev. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 Understanding UUIDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2.1 Using UUIDs to Assemble or Activate File System Devices . . . . . . . . . . . . . . . . . . . 40
6.2.2 Finding the UUID for a File System Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 Using UUIDs in the Boot Loader and /etc/fstab File (x86) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.4 Using UUIDs in the Boot Loader and /etc/fstab File (IA64) . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.5 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
novdocx (en) 7 January 2010
7 Managing Multipath I/O for Devices 43
7.1 Understanding Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1.1 What Is Multipathing? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1.2 Benefits of Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Planning for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.1 Guidelines for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.2 Using Multipathed Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.2.3 Using LVM2 on Multipath Devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.4 Using mdadm with Multipath Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.5 Using --noflush with Multipath Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2.6 Partitioning Multipath Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.7 Supported Architectures for Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2.8 Supported Storage Arrays for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.3 Multipath Management Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.1 Device Mapper Multipath Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.3.2 Multipath I/O Management Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.3.3 Using mdadm for Multipathed Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.3.4 The Linux multipath(8) Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.4 Configuring the System for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.4.1 Preparing SAN Devices for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.4.2 Partitioning Multipathed Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4.3 Configuring the Server for Multipathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.4.4 Adding multipathd to the Boot Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7.4.5 Creating and Configuring the /etc/multipath.conf File . . . . . . . . . . . . . . . . . . . . . . . . 56
7.5 Enabling and Starting Multipath I/O Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.6 Configuring Path Failover Policies and Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6.1 Configuring the Path Failover Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6.2 Configuring Failover Priorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.6.3 Using a Script to Set Path Priorities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.6.4 Configuring ALUA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.6.5 Reporting Target Path Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.7 Tuning the Failover for Specific Host Bus Adapters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.8 Configuring Multipath I/O for the Root Device . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.9 Configuring Multipath I/O for an Existing Software RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6 SLES 11: Storage Administration Guide
Page 7
7.10 Scanning for New Devices without Rebooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.11 Scanning for New Partitioned Devices without Rebooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.12 Viewing Multipath I/O Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.13 Managing I/O in Error Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.14 Resolving Stalled I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.15 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.16 What’s Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8 Software RAID Configuration 79
8.1 Understanding RAID Levels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.1.1 RAID 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.1.2 RAID 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.1.3 RAID 2 and RAID 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.1.4 RAID 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.1.5 RAID 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.1.6 Nested RAID Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.2 Soft RAID Configuration with YaST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8.3 Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.4 For More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
novdocx (en) 7 January 2010
9 Configuring Software RAID for the Root Partition 83
9.1 Prerequisites for the Software RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2 Enabling iSCSI Initiator Support at Install Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Enabling Multipath I/O Support at Install Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
9.4 Creating a Software RAID Device for the Root (/) Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
10 Managing Software RAIDs 6 and 10 with mdadm 89
10.1 Creating a RAID 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.1.1 Understanding RAID 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
10.1.2 Creating a RAID 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2 Creating Nested RAID 10 Devices with mdadm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2.1 Understanding Nested RAID Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
10.2.2 Creating Nested RAID 10 (1+0) with mdadm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
10.2.3 Creating Nested RAID 10 (0+1) with mdadm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
10.3 Creating a Complex RAID 10 with mdadm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.3.1 Understanding the mdadm RAID10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10.3.2 Creating a RAID 10 with mdadm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.4 Creating a Degraded RAID Array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11 Resizing Software RAID Arrays with mdadm 99
11.1 Understanding the Resizing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.1.1 Guidelines for Resizing a Software RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.1.2 Overview of Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.2 Increasing the Size of a Software RAID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.2.1 Increasing the Size of Component Partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.2.2 Increasing the Size of the RAID Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
11.2.3 Increasing the Size of the File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
11.3 Decreasing the Size of a Software RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
11.3.1 Decreasing the Size of the File System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
11.3.2 Decreasing the Size of Component Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
11.3.3 Decreasing the Size of the RAID Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Contents 7
Page 8
12 iSNS for Linux 109
12.1 How iSNS Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
12.2 Installing iSNS Server for Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.3 Configuring iSNS Discovery Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
12.3.1 Creating iSNS Discovery Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
12.3.2 Creating iSNS Discovery Domain Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
12.3.3 Adding iSCSI Nodes to a Discovery Domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
12.3.4 Adding Discovery Domains to a Discovery Domain Set . . . . . . . . . . . . . . . . . . . . . 115
12.4 Starting iSNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
12.5 Stopping iSNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
12.6 For More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
13 Mass Storage over IP Networks: iSCSI 117
13.1 Installing iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13.1.1 Installing iSCSI Target Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13.1.2 Installing the iSCSI Initiator Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13.2 Setting Up an iSCSI Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
13.2.1 Preparing the Storage Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
13.2.2 Creating iSCSI Targets with YaST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.2.3 Configuring an iSCSI Target Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
13.2.4 Configuring Online Targets with ietadm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
13.3 Configuring iSCSI Initiator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13.3.1 Using YaST for the iSCSI Initiator Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
13.3.2 Setting Up the iSCSI Initiator Manually . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.3.3 The iSCSI Client Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
13.3.4 For More Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
novdocx (en) 7 January 2010
14 Volume Snapshots 131
14.1 Understanding Volume Snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
14.2 Creating Linux Snapshots with LVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
14.3 Monitoring a Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
14.4 Deleting Linux Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
15 Troubleshooting Storage Issues 133
15.1 Is DM-MPIO Available for the Boot Partition? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
A Documentation Updates 135
A.1 February 23, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.1.1 Configuring Software RAID for the Root Partition . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.1.2 Managing Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
A.2 January 20, 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.2.1 Managing Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.3 December 1, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.3.1 Managing Multipath I/O for Devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
A.3.2 Resizing File Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.3.3 What’s New . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.4 October 20, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.4.1 LVM Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.4.2 Managing Multipath I/O for Devices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.4.3 What’s New . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.5 August 3, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8 SLES 11: Storage Administration Guide
Page 9
A.5.1 Managing Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.6 June 22, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.6.1 Managing Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
A.6.2 Managing Software RAIDs 6 and 10 with mdadm . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.6.3 Mass Storage over IP Networks: iSCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.7 May 21, 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
A.7.1 Managing Multipath I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
novdocx (en) 7 January 2010
Contents 9
Page 10
novdocx (en) 7 January 2010
10 SLES 11: Storage Administration Guide
Page 11
About This Guide
This guide provides information about how to manage storage devices on a SUSE® Linux Enterprise Server 11 server.
Audience
This guide is intended for system administrators.
Feedback
We want to hear your comments and suggestions about this manual and the other documentation included with this product. Please use the User Comments feature at the bottom of each page of the online documentation, or go to www.novell.com/documentation/feedback.html and enter your comments there.
Documentation Updates
novdocx (en) 7 January 2010
For the most recent version of the SUSE Linux Enterprise Server 11 Storage Administration Guide, visit the Novell
www.novell.com/documentation/sles11).
Additional Documentation
For information about partitioning and managing devices, see the SUSE Linux Enterprise Server 11
Installation and Administration Guide (http://www.novell.com/documentation/sles11).
Documentation Conventions
In Novell documentation, a greater-than symbol (>) is used to separate actions within a step and items in a cross-reference path.
A trademark symbol ( trademark.
®
Documentation Web site for SUSE Linux Enterprise Server 11 (http://
®
, TM, etc.) denotes a Novell trademark. An asterisk (**) denotes a third-party
About This Guide 11
Page 12
novdocx (en) 7 January 2010
12 SLES 11: Storage Administration Guide
Page 13
1
Overview of File Systems in Linux
SUSE® Linux Enterprise Server ships with a number of different file systems from which to choose, including Ext3, Ext2, ReiserFS, and XFS. Each file system has its own advantages and disadvantages. Professional high-performance setups might require a different choice of file system than a home user’s setup. To meet the requirements of high-performance clustering scenarios, SUSE Linux Enterprise Server includes OCFS2 (Oracle Cluster File System 2) in the High-Availability Storage Infrastructure (HASI) release.
Section 1.1, “Terminology,” on page 13
Section 1.2, “Major File Systems in Linux,” on page 13
Section 1.3, “Other Supported File Systems,” on page 18
Section 1.4, “Large File Support in Linux,” on page 19
Section 1.5, “Additional Information,” on page 19
novdocx (en) 7 January 2010
1
1.1 Terminology
metadata
A data structure that is internal to the file system. It assures that all of the on-disk data is properly organized and accessible. Essentially, it is “data about the data.” Almost every file system has its own structure of metadata, which is on reason that the file systems show different performance characteristics. It is extremely important to maintain metadata intact, because otherwise all data on the file system could become inaccessible.
inode
A data structure on a file system that contains various information about a file, including size, number of links, pointers to the disk blocks where the file contents are actually stored, and date and time of creation, modification, and access.
journal
In the context of a file system, a journal is an on-disk structure containing a type of log in which the file system stores what it is about to change in the file system’s metadata. Journaling greatly reduces the recovery time of a file system because it has no need for the lengthy search process that checks the entire file system at system startup. Instead, only the journal is replayed.
1.2 Major File Systems in Linux
SUSE Linux Enterprise Server offers a variety of file systems from which to choose. This section contains an overview of how these file systems work and which advantages they offer.
It is very important to remember that no file system best suits all kinds of applications. Each file system has its particular strengths and weaknesses, which must be taken into account. In addition, even the most sophisticated file system cannot replace a reasonable backup strategy.
Overview of File Systems in Linux
13
Page 14
The terms data integrity and data consistency, when used in this section, do not refer to the consistency of the user space data (the data your application writes to its files). Whether this data is consistent must be controlled by the application itself.
IMPORTANT: Unless stated otherwise in this section, all the steps required to set up or change partitions and file systems can be performed by using YaST.
Section 1.2.1, “Ext2,” on page 14
Section 1.2.2, “Ext3,” on page 15
Section 1.2.3, “Oracle Cluster File System 2,” on page 16
Section 1.2.4, “ReiserFS,” on page 16
Section 1.2.5, “XFS,” on page 17
1.2.1 Ext2
The origins of Ext2 go back to the early days of Linux history. Its predecessor, the Extended File System, was implemented in April 1992 and integrated in Linux 0.96c. The Extended File System underwent a number of modifications and, as Ext2, became the most popular Linux file system for years. With the creation of journaling file systems and their short recovery times, Ext2 became less important.
novdocx (en) 7 January 2010
A brief summary of Ext2’s strengths might help understand why it was—and in some areas still is— the favorite Linux file system of many Linux users.
“Solidity and Speed” on page 14
“Easy Upgradability” on page 14
Solidity and Speed
Being quite an “old-timer,” Ext2 underwent many improvements and was heavily tested. This might be the reason why people often refer to it as rock-solid. After a system outage when the file system could not be cleanly unmounted, e2fsck starts to analyze the file system data. Metadata is brought into a consistent state and pending files or data blocks are written to a designated directory (called
lost+found
). In contrast to journaling file systems, e2fsck analyzes the entire file system and not just the recently modified bits of metadata. This takes significantly longer than checking the log data of a journaling file system. Depending on file system size, this procedure can take half an hour or more. Therefore, it is not desirable to choose Ext2 for any server that needs high availability. However, because Ext2 does not maintain a journal and uses significantly less memory, it is sometimes faster than other file systems.
Easy Upgradability
Because Ext3 is based on the Ext2 code and shares its on-disk format as well as its metadata format, upgrades from Ext2 to Ext3 are very easy.
14 SLES 11: Storage Administration Guide
Page 15
1.2.2 Ext3
Ext3 was designed by Stephen Tweedie. Unlike all other next-generation file systems, Ext3 does not follow a completely new design principle. It is based on Ext2. These two file systems are very closely related to each other. An Ext3 file system can be easily built on top of an Ext2 file system. The most important difference between Ext2 and Ext3 is that Ext3 supports journaling. In summary, Ext3 has three major advantages to offer:
“Easy and Highly Reliable Upgrades from Ext2” on page 15
“Reliability and Performance” on page 15
“Converting an Ext2 File System into Ext3” on page 15
Easy and Highly Reliable Upgrades from Ext2
The code for Ext2 is the strong foundation on which Ext3 could become a highly-acclaimed next­generation file system. Its reliability and solidity are elegantly combined in Ext3 with the advantages of a journaling file system. Unlike transitions to other journaling file systems, such as ReiserFS or XFS, which can be quite tedious (making backups of the entire file system and recreating it from scratch), a transition to Ext3 is a matter of minutes. It is also very safe, because re-creating an entire file system from scratch might not work flawlessly. Considering the number of existing Ext2 systems that await an upgrade to a journaling file system, you can easily see why Ext3 might be of some importance to many system administrators. Downgrading from Ext3 to Ext2 is as easy as the upgrade. Just perform a clean unmount of the Ext3 file system and remount it as an Ext2 file system.
novdocx (en) 7 January 2010
Reliability and Performance
Some other journaling file systems follow the “metadata-only” journaling approach. This means your metadata is always kept in a consistent state, but this cannot be automatically guaranteed for the file system data itself. Ext3 is designed to take care of both metadata and data. The degree of “care” can be customized. Enabling Ext3 in the
data=journal
mode offers maximum security (data integrity), but can slow down the system because both metadata and data are journaled. A relatively new approach is to use the
data=ordered
mode, which ensures both data and metadata integrity, but uses journaling only for metadata. The file system driver collects all data blocks that correspond to one metadata update. These data blocks are written to disk before the metadata is updated. As a result, consistency is achieved for metadata and data without sacrificing performance. A third option to use is
data=writeback
, which allows data to be written into the main file system after its metadata has been committed to the journal. This option is often considered the best in performance. It can, however, allow old data to reappear in files after crash and recovery while internal file system integrity is maintained. Ext3 uses the
data=ordered
option as the default.
Converting an Ext2 File System into Ext3
To convert an Ext2 file system to Ext3:
1 Create an Ext3 journal by running
tune2fs -j
as the
root
user.
This creates an Ext3 journal with the default parameters.
To specify how large the journal should be and on which device it should reside, run
instead together with the desired journal options
-J
about the
tune2fs
2 Edit the file
corresponding partition from
program is available in the
/etc/fstab
as the
ext2
root
user to change the file system type specified for the
to
ext3
, then save the changes.
size=
tune2fs
and
device=
man page.
. More information
tune2fs
Overview of File Systems in Linux 15
Page 16
This ensures that the Ext3 file system is recognized as such. The change takes effect after the next reboot.
3 To boot a root file system that is set up as an Ext3 partition, include the modules
in the
initrd
.
ext3
and
jbd
novdocx (en) 7 January 2010
3a Edit
/etc/sysconfig/kernel
as
root
, adding
ext3
and
jbd
to the
INITRD_MODULES
variable, then save the changes.
3b Run the
mkinitrd
command.
This builds a new initrd and prepares it for use.
4 Reboot the system.
1.2.3 Oracle Cluster File System 2
OCFS2 is a journaling file system that has been tailor-made for clustering setups. In contrast to a standard single-node file system like Ext3, OCFS2 is capable of managing several nodes. OCFS2 allows you to spread a file system across shared storage, such as a SAN or multipath setup.
Every node in an OCFS2 setup has concurrent read and write access to all data. This requires OCFS2 to be cluster-aware, meaning that OCFS2 must include a means to determine which nodes are in the cluster and whether these nodes are actually alive and available. To compute a cluster’s membership, OCFS2 includes a node manager. To monitor the availability of the nodes in a cluster, OCFS2 includes a simple heartbeat implementation. To avoid problems arising from various nodes directly accessing the file system, OCFS2 also contains a distributed lock manager. Communication between the nodes is handled via a TCP-based messaging system.
Major features and benefits of OCFS2 include:
Metadata caching and journaling
Asynchronous and direct I/O support for database files for improved database performance
Support for multiple block sizes (where each volume can have a different block size) up to 4
KB, for a maximum volume size of 16 TB
Cross-node file data consistency
Support for up to 255 cluster nodes
For more in-depth information about OCFS2, refer to the High Availability Storage Infrastructure Administration Guide.
1.2.4 ReiserFS
Officially one of the key features of the 2.4 kernel release, ReiserFS has been available as a kernel patch for 2.2.x SUSE kernels since version 6.4. ReiserFS was designed by Hans Reiser and the Namesys development team. It has proven itself to be a powerful alternative to Ext2. Its key assets are better disk space utilization, better disk access performance, faster crash recovery, and reliability through data journaling.
“Better Disk Space Utilization” on page 17
“Better Disk Access Performance” on page 17
“Fast Crash Recovery” on page 17
“Reliability through Data Journaling” on page 17
16 SLES 11: Storage Administration Guide
Page 17
Better Disk Space Utilization
In ReiserFS, all data is organized in a structure called a B*-balanced tree. The tree structure contributes to better disk space utilization because small files can be stored directly in the B* tree leaf nodes instead of being stored elsewhere and just maintaining a pointer to the actual disk location. In addition to that, storage is not allocated in chunks of 1 or 4 KB, but in portions of the exact size needed. Another benefit lies in the dynamic allocation of inodes. This keeps the file system more flexible than traditional file systems, like Ext2, where the inode density must be specified at file system creation time.
Better Disk Access Performance
For small files, file data and “stat_data” (inode) information are often stored next to each other. They can be read with a single disk I/O operation, meaning that only one access to disk is required to retrieve all the information needed.
Fast Crash Recovery
Using a journal to keep track of recent metadata changes makes a file system check a matter of seconds, even for huge file systems.
novdocx (en) 7 January 2010
Reliability through Data Journaling
ReiserFS also supports data journaling and ordered data modes similar to the concepts outlined in
“Ext3” on page 15. The default mode is
data=ordered
, which ensures both data and metadata
integrity, but uses journaling only for metadata.
1.2.5 XFS
Originally intended as the file system for their IRIX OS, SGI started XFS development in the early 1990s. The idea behind XFS was to create a high-performance 64-bit journaling file system to meet extreme computing challenges. XFS is very good at manipulating large files and performs well on high-end hardware. However, even XFS has a drawback. Like ReiserFS, XFS takes great care of metadata integrity, but less care of data integrity.
A quick review of XFS’s key features explains why it might prove to be a strong competitor for other journaling file systems in high-end computing.
“High Scalability through the Use of Allocation Groups” on page 17
“High Performance through Efficient Management of Disk Space” on page 18
“Preallocation to Avoid File System Fragmentation” on page 18
High Scalability through the Use of Allocation Groups
At the creation time of an XFS file system, the block device underlying the file system is divided into eight or more linear regions of equal size. Those are referred to as allocation groups. Each allocation group manages its own inodes and free disk space. Practically, allocation groups can be seen as file systems in a file system. Because allocation groups are rather independent of each other, more than one of them can be addressed by the kernel simultaneously. This feature is the key to XFS’s great scalability. Naturally, the concept of independent allocation groups suits the needs of multiprocessor systems.
Overview of File Systems in Linux 17
Page 18
High Performance through Efficient Management of Disk Space
Free space and inodes are handled by B+ trees inside the allocation groups. The use of B+ trees greatly contributes to XFS’s performance and scalability. XFS uses delayed allocation, which handles allocation by breaking the process into two pieces. A pending transaction is stored in RAM and the appropriate amount of space is reserved. XFS still does not decide where exactly (in file system blocks) the data should be stored. This decision is delayed until the last possible moment. Some short-lived temporary data might never make its way to disk, because it is obsolete by the time XFS decides where actually to save it. In this way, XFS increases write performance and reduces file system fragmentation. Because delayed allocation results in less frequent write events than in other file systems, it is likely that data loss after a crash during a write is more severe.
Preallocation to Avoid File System Fragmentation
Before writing the data to the file system, XFS reserves (preallocates) the free space needed for a file. Thus, file system fragmentation is greatly reduced. Performance is increased because the contents of a file are not distributed all over the file system.
1.3 Other Supported File Systems
novdocx (en) 7 January 2010
Table 1- 1 summarizes some other file systems supported by Linux. They are supported mainly to
ensure compatibility and interchange of data with different kinds of media or foreign operating systems.
Table 1-1 File System Types in Linux
File System Type Description
cramfs
hpfs
iso9660
minix
msdos fat
ncpfs
nfs
Network File System: Here, data can be stored on any machine in a network
Compressed ROM file system: A compressed read-only file system for ROMs.
High Performance File System: The IBM* OS/2* standard file system. Only supported in read-only mode.
Standard file system on CD-ROMs.
This file system originated from academic projects on operating systems and was the first file system used in Linux. Today, it is used as a file system for floppy disks.
, the file system originally used by DOS, is today used by various operating
systems.
File system for mounting Novell® volumes over networks.
and access might be granted via a network.
smbfs
sysv
ufs
umsdos
Server Message Block is used by products such as Windows* to enable file access over a network.
Used on SCO UNIX*, Xenix, and Coherent (commercial UNIX systems for PCs).
Used by BSD*, SunOS*, and NextStep*. Only supported in read-only mode.
UNIX on MS-DOS*: Applied on top of a standard UNIX functionality (permissions, links, long filenames) by creating special files.
18 SLES 11: Storage Administration Guide
fat
file system, achieves
Page 19
File System Type Description
novdocx (en) 7 January 2010
vfat
ntfs
Virtual FAT: Extension of the
Windows NT file system; read-only.
fat
file system (supports long filenames).
1.4 Large File Support in Linux
Originally, Linux supported a maximum file size of 2 GB. This was enough before the explosion of multimedia and as long as no one tried to manipulate huge databases on Linux. Becoming more and more important for server computing, the kernel and C library were modified to support file sizes larger than 2 GB when using a new set of interfaces that applications must use. Today, almost all major file systems offer LFS support, allowing you to perform high-end computing. Table 1-2 offers an overview of the current limitations of Linux files and file systems.
Table 1-2 Maximum Sizes of File Systems (On-Disk Format)
File System File Size (Bytes) File System Size (Bytes)
Ext2 or Ext3 (1 KB block size) 234 (16 GB) 241 (2 TB)
Ext2 or Ext3 (2 KB block size) 2
Ext2 or Ext3 (4 KB block size) 2
Ext2 or Ext3 (8 KB block size) (systems with 8 KB pages, like Alpha)
38
(256 GB) 243 (8 TB)
41
(2 TB) 244 -4096 (16 TB-4096 Bytes)
46
(64 TB) 245 (32 TB)
2
ReiserFS v3 246 (64 TB) 245 (32 TB)
63
XFS 2
NFSv2 (client side) 2
NFSv3 (client side) 2
(8 EB) 263 (8 EB)
31
(2 GB) 263 (8 EB)
63
(8 EB) 263 (8 EB)
IMPORTANT: Table 1 -2 describes the limitations regarding the on-disk format. The 2.6 Linux
kernel imposes its own limits on the size of files and file systems handled by it. These are as follows:
File Size
41
On 32-bit systems, files cannot exceed 2 TB (2
bytes).
File System Size
File systems can be up to 2
73
bytes in size. However, this limit is still out of reach for the
currently available hardware.
1.5 Additional Information
The File System Primer (http://wiki.novell.com/index.php/File_System_Primer) on the Novell Web site describes a variety of file systems for Linux. It discusses the file systems, why there are so many, and which ones are the best to use for which workloads and data.
Overview of File Systems in Linux 19
Page 20
Each of the file system projects described above maintains its own home page on which to find mailing list information, further documentation, and FAQs:
E2fsprogs: Ext2/3/4 Filesystem Utilities (http://e2fsprogs.sourceforge.net/)
Introducing Ext3 (http://www.ibm.com/developerworks/linux/library/l-fs7.html)
ReiserFSprogs (http://chichkin_i.zelnet.ru/namesys/)
XFS: A High-Performance Journaling Filesytem (http://oss.sgi.com/projects/xfs/)
OCFS2 Project (http://oss.oracle.com/projects/ocfs2/)
A comprehensive multipart tutorial about Linux file systems can be found at IBM developerWorks in the Advanced Filesystem Implementor’s Guide (http://www-106.ibm.com/developerworks/
library/l-fs.html). An in-depth comparison of file systems (not only Linux file systems) is available
from the Wikipedia project in Comparison of File Systems (http://en.wikipedia.org/wiki/
Comparison_of_file_systems#Comparison).
novdocx (en) 7 January 2010
20 SLES 11: Storage Administration Guide
Page 21
2
What’s New
The features and behavior changes noted in this section were made for the SUSE® Linux Enterprise Server 11 release.
Section 2.1, “EVMS2 Is Deprecated,” on page 21
Section 2.2, “Ext3 as the Default File System,” on page 21
Section 2.3, “JFS File System Is Deprecated,” on page 21
Section 2.4, “OCFS2 File System Is in the High Availability Release,” on page 22
Section 2.5, “/dev/disk/by-name Is Deprecated,” on page 22
Section 2.6, “Device Name Persistence in the /dev/disk/by-id Directory,” on page 22
Section 2.7, “Filters for Multipathed Devices,” on page 22
Section 2.8, “User-Friendly Names for Multipathed Devices,” on page 23
Section 2.9, “Advanced I/O Load-Balancing Options for Multipath,” on page 23
Section 2.10, “Location Change for Multipath Tool Callouts,” on page 23
novdocx (en) 7 January 2010
2
Section 2.11, “Change from mpath to multipath for mkinitrd -f Option,” on page 23
2.1 EVMS2 Is Deprecated
The Enterprise Volume Management Systems (EVMS2) storage management solution is deprecated. All EVMS management modules have been removed from the SUSE Linux Enterprise Server 11 packages. Your EVMS-managed devices should be automatically recognized and managed by Linux Volume Manager 2 (LVM2) when you upgrade your system. For more information, see Evolution of
Storage and Volume Management in SUSE Linux Enterprise (http://www.novell.com/linux/ volumemanagement/strategy.html).
For information about managing storage with EVMS2 on SUSE Linux Enterprise Server 10, see the
SUSE Linux Enterprise Server 10 SP3: Storage Administration Guide (http://www.novell.com/ documentation/sles10/stor_admin/data/bookinfo.html).
2.2 Ext3 as the Default File System
The Ext3 file system has replaced ReiserFS as the default file system recommended by the YaST tools at installation time and when you create file systems. ReiserFS is still supported. For more information, see File System Future Directions (http://www.novell.com/linux/techspecs.html?tab=0) on the SUSE Linux Enterprise 10 File System Support Web page.
2.3 JFS File System Is Deprecated
The JFS file system is no longer supported. The JFS utilities were removed from the distribution.
What’s New
21
Page 22
2.4 OCFS2 File System Is in the High Availability Release
The OCFS2 file system is fully supported as part of the SUSE Linux Enterprise High Availability Extension.
2.5 /dev/disk/by-name Is Deprecated
The
/dev/disk/by-name
path is deprecated in SUSE Linux Enterprise Server 11 packages.
2.6 Device Name Persistence in the /dev/disk/by­id Directory
novdocx (en) 7 January 2010
In SUSE Linux Enterprise Server 11, the default multipath setup relies on existing symbolic links in the start multipathing, the link points to the SCSI device with its running, the symbolic link points to the device using its symbolic links in the whether multipath is started or not. The configuration files (such as need to be modified because they automatically point to the correct device.
See the following sections for more information about how this behavior change affects other features:
Section 2.7, “Filters for Multipathed Devices,” on page 22
Section 2.8, “User-Friendly Names for Multipathed Devices,” on page 23
/dev/disk/by-id
/dev/disk/by-id
path persistently point to the same device regardless
directory when multipathing is started. Before you
scsi-xxx
dm-uuid-xxx
udev
to overwrite the
name. When multipathing is
name. This ensures that the
lvm.conf
and
md.conf
) do not
2.7 Filters for Multipathed Devices
The deprecation of the
name Is Deprecated,” on page 22) affects how you set up filters for multipathed devices in the
configuration files. If you used the filters in the path. Consider the following when setting up filters that use the
The
purpose.
/etc/lvm/lvm.conf
/dev/disk/by-id/scsi-*
/dev/disk/by-name
/dev/disk/by-name
file, you need to modify the file to use the
device names are persistent and created for exactly this
directory (as described in Section 2.5, “/dev/disk/by-
device name path for the multipath device
/dev/disk/by-id
by-id
path:
Do not use the
Device-Mapper devices, and result in reporting duplicate PVs in response to a command. The names appear to change from
pvuuid
For information about setting up filters, see Section 7.2.3, “Using LVM2 on Multipath Devices,” on
page 46.
22 SLES 11: Storage Administration Guide
.
/dev/disk/by-id/dm-*
name in the filters. These are symbolic links to the
LVM-pvuuid
to
dm-uuid
pvscan
and back to
LVM-
Page 23
2.8 User-Friendly Names for Multipathed Devices
novdocx (en) 7 January 2010
A change in how multipathed device names are handled in the described in Section 2.6, “Device Name Persistence in the /dev/disk/by-id Directory,” on page 22) affects your setup for user-friendly names because the two names for the device differ. You must modify the configuration files to scan only the device mapper names after multipathing is configured.
For example, you need to modify the specifying the
/dev/disk/by-id/dm-uuid-.*-mpath-.*
lvm.conf
file to scan using the multipathed device names by
/dev/disk/by-id
path instead of
directory (as
/dev/disk/by-id
.
2.9 Advanced I/O Load-Balancing Options for Multipath
The following advanced I/O load-balancing options are available in additon to round-robin for Device Mapper Multipath:
Least-pending
Length-load-balancing
Service-time
For information, see path_selector in “Understanding Priority Groups and Attributes” on page 62.
2.10 Location Change for Multipath Tool Callouts
The
mpath_*
libraries in memory on daemon startup. This helps avoid a system deadlock on an all paths down scenario where the programs would have to be loaded from the disk, which might not be available at this point.
prio_callouts for the Device Mapper Multipath tool have been moved to shared
/lib/libmultipath/lib*
. By using shared libraries, the callouts are loaded into
2.11 Change from mpath to multipath for mkinitrd -f Option
The option for adding Device Mapper Multipath services to the
to
mpath
To make a new
mkinitrd -f multipath
-f multipath
initrd
.
, the command is now:
initrd
has changed from
-f
What’s New 23
Page 24
novdocx (en) 7 January 2010
24 SLES 11: Storage Administration Guide
Page 25
3
Planning a Storage Solution
Consider what your storage needs are and how you can effectively manage and divide your storage space to best meet your needs. Use the information in this section to help plan your storage deployment for file systems on your SUSE
Section 3.1, “Partitioning Devices,” on page 25
Section 3.2, “Multipath Support,” on page 25
Section 3.3, “Software RAID Support,” on page 25
Section 3.4, “File System Snapshots,” on page 25
Section 3.5, “Backup and Antivirus Support,” on page 25
®
Linux Enterprise Server 11 server.
3.1 Partitioning Devices
For information about using the YaST Expert Partitioner, see “Using the YaST Partitioner” in the SUSE Linux Enterprise Server 11 Installation and Administration Guide.
novdocx (en) 7 January 2010
3
3.2 Multipath Support
Linux supports using multiple I/O paths for fault-tolerant connections between the server and its storage devices. Linux multipath support is disabled by default. If you use a multipath solution that is provided by your storage subsystem vendor, you do not need to configure the Linux multipath separately.
3.3 Software RAID Support
Linux supports hardware and software RAID devices. If you use hardware RAID devices, software RAID devices are unnecessary. You can use both hardware and software RAID devices on the same server.
To maximize the performance benefits of software RAID devices, partitions used for the RAID should come from different physical devices. For software RAID 1 devices, the mirrored partitions cannot share any disks in common.
3.4 File System Snapshots
Linux supports file system snapshots.
3.5 Backup and Antivirus Support
Section 3.5.1, “Open Source Backup,” on page 26
Section 3.5.2, “Commercial Backup and Antivirus Support,” on page 26
Planning a Storage Solution
25
Page 26
3.5.1 Open Source Backup
novdocx (en) 7 January 2010
Open source tools for backing up data on Linux include
tar, cpio
, and
rsync
. See the man pages
for these tools for more information.
PAX: POSIX File System Archiver. It supports
cpio
and
tar
, which are the two most common
forms of standard archive (backup) files. See the man page for more information.
Amanda: The Advanced Maryland Automatic Network Disk Archiver. See www.amanda.org
(http://www.amanda.org/).
3.5.2 Commercial Backup and Antivirus Support
Novell® Open Enterprise Server (OES) 2 Support Pack 1 for Linux is a product that includes SUSE Linux Enterprise Server (SLES) 10 Support Pack 2. Antivirus and backup software vendors who support OES 2 SP1 also support SLES 10 SP2. You can visit the vendor Web sites to find out about their scheduled support of SLES 11.
For a current list of possible backup and antivirus software vendors, see Novell Open Enterprise
Server Partner Support: Backup and Antivirus Support (http://www.novell.com/products/ openenterpriseserver/partners_communities.html). This list is updated quarterly.
26 SLES 11: Storage Administration Guide
Page 27
4
LVM Configuration
This section briefly describes the principles behind Logical Volume Manager (LVM) and its basic features that make it useful under many circumstances. The YaST LVM configuration can be reached from the YaST Expert Partitioner. This partitioning tool enables you to edit and delete existing partitions and create new ones that should be used with LVM.
WARNING: Using LVM might be associated with increased risk, such as data loss. Risks also include application crashes, power failures, and faulty commands. Save your data before implementing LVM or reconfiguring volumes. Never work without a backup.
Section 4.1, “Understanding the Logical Volume Manager,” on page 27
Section 4.2, “Creating LVM Partitions,” on page 28
Section 4.3, “Creating Volume Groups,” on page 28
Section 4.4, “Configuring Physical Volumes,” on page 29
Section 4.5, “Configuring Logical Volumes,” on page 30
Section 4.6, “Direct LVM Management,” on page 32
novdocx (en) 7 January 2010
4
Section 4.7, “Resizing an LVM Partition,” on page 32
4.1 Understanding the Logical Volume Manager
LVM enables flexible distribution of hard disk space over several file systems. It was developed because the need to change the segmentation of hard disk space might arise only after the initial partitioning has already been done during installation. Because it is difficult to modify partitions on a running system, LVM provides a virtual pool (volume group or VG) of memory space from which logical volumes (LVs) can be created as needed. The operating system accesses these LVs instead of the physical partitions. Volume groups can span more than one disk, so that several disks or parts of them can constitute one single VG. In this way, LVM provides a kind of abstraction from the physical disk space that allows its segmentation to be changed in a much easier and safer way than through physical repartitioning.
Figure 4-1 Physical Partitioning versus LVM
LVM Configuration
27
Page 28
Figure 4-1 compares physical partitioning (left) with LVM segmentation (right). On the left side,
one single disk has been divided into three physical partitions (PART), each with a mount point (MP) assigned so that the operating system can access them. On the right side, two disks have been divided into two and three physical partitions each. Two LVM volume groups (VG 1 and VG 2) have been defined. VG 1 contains two partitions from DISK 1 and one from DISK 2. VG 2 contains the remaining two partitions from DISK 2. In LVM, the physical disk partitions that are incorporated in a volume group are called physical volumes (PVs). Within the volume groups, four logical volumes (LV 1 through LV 4) have been defined, which can be used by the operating system via the associated mount points. The border between different logical volumes need not be aligned with any partition border. See the border between LV 1 and LV 2 in this example.
LVM features:
Several hard disks or partitions can be combined in a large logical volume.
Provided the configuration is suitable, an LV (such as
space is exhausted.
Using LVM, it is possible to add hard disks or LVs in a running system. However, this requires
hot-swappable hardware that is capable of such actions.
It is possible to activate a striping mode that distributes the data stream of a logical volume over
several physical volumes. If these physical volumes reside on different disks, this can improve the reading and writing performance just like RAID 0.
The snapshot feature enables consistent backups (especially for servers) in the running system.
/usr
) can be enlarged when the free
novdocx (en) 7 January 2010
With these features, using LVM already makes sense for heavily used home PCs or small servers. If you have a growing data stock, as in the case of databases, music archives, or user directories, LVM is especially useful. It allows file systems that are larger than the physical hard disk. Another advantage of LVM is that up to 256 LVs can be added. However, keep in mind that working with LVM is different from working with conventional partitions. Instructions and further information about configuring LVM is available in the official LVM HOWTO (http://tldp.org/HOWTO/LVM-
HOWTO/).
Starting from kernel version 2.6, LVM version 2 is available, which is downward-compatible with the previous LVM and enables the continued management of old volume groups. When creating new volume groups, decide whether to use the new format or the downward-compatible version. LVM 2 does not require any kernel patches. It makes use of the device mapper integrated in kernel
2.6. This kernel only supports LVM version 2. Therefore, when talking about LVM, this section always refers to LVM version 2.
4.2 Creating LVM Partitions
You create an LVM partition by first clicking Create > Do not format then selecting 0x8E Linux LVM as the partition identifier. After creating all the partitions to use with LVM, click LVM to start
the LVM configuration.
4.3 Creating Volume Groups
If no volume group exists on your system yet, you are prompted to add one (see Figure 4-2). It is possible to create additional groups with Add group, but usually one single volume group is sufficient. Enterprise Server system files are located. The physical extent size defines the size of a physical block in the volume group. All the disk space in a volume group is handled in chunks of this size.
28 SLES 11: Storage Administration Guide
system
is suggested as a name for the volume group in which the SUSE® Linux
Page 29
This value is normally set to 4 MB and allows for a maximum size of 256 GB for physical and logical volumes. The physical extent size should only be increased, for example, to 8, 16, or 32 MB, if you need logical volumes larger than 256 GB.
Figure 4-2 Creating a Volume Group
novdocx (en) 7 January 2010
4.4 Configuring Physical Volumes
After a volume group has been created, the next dialog (see Figure 4-3) lists all partitions with either the “Linux LVM” or “Linux native” type. No swap or DOS partitions are shown. If a partition is already assigned to a volume group, the name of the volume group is shown in the list. Unassigned partitions are indicated with “--”.
If there are several volume groups, set the current volume group in the selection box to the upper left. The buttons in the upper right enable creation of additional volume groups and deletion of existing volume groups. Only volume groups that do not have any partitions assigned can be deleted. All partitions that are assigned to a volume group are also referred to as a physical volumes.
LVM Configuration 29
Page 30
Figure 4-3 Physical Volume Setup
novdocx (en) 7 January 2010
To add a previously unassigned partition to the selected volume group, first click the partition, then click Add Volume. At this point, the name of the volume group is entered next to the selected partition. Assign all partitions reserved for LVM to a volume group. Otherwise, the space on the partition remains unused. Before exiting the dialog, every volume group must be assigned at least one physical volume. After assigning all physical volumes, click Next to proceed to the configuration of logical volumes.
4.5 Configuring Logical Volumes
After the volume group has been filled with physical volumes, use the next dialog (see Figure 4-4) to define the logical volumes the operating system should use. Set the current volume group in a selection box to the upper left. Next to it, the free space in the current volume group is shown. The list below contains all logical volumes in that volume group. All normal Linux partitions to which a mount point is assigned, all swap partitions, and all already existing logical volumes are listed here. You can use Add, Edit, and Remove options to manage the logical volumes as needed until all space in the volume group has been exhausted. Assign at least one logical volume to each volume group.
30 SLES 11: Storage Administration Guide
Page 31
Figure 4-4 Logical Volume Management
novdocx (en) 7 January 2010
To create a new logical volume (see Figure 4-5), click Add and fill out the pop-up that opens. For partitioning, specify the size, file system, and mount point. Normally, a file system, such as Reiserfs or Ext2, is created on a logical volume and is then designated a mount point. The files stored on this logical volume can be found at this mount point on the installed system. Additionally it is possible to distribute the data stream in the logical volume among several physical volumes (striping). If these physical volumes reside on different hard disks, this generally results in a better reading and writing
n
performance (like RAID 0). However, a striping LV with the hard disk space required by the LV can be distributed evenly to
stripes can only be created correctly if
n
physical volumes. If, for
example, only two physical volumes are available, a logical volume with three stripes is impossible.
WARNING: YaST has no chance at this point to verify the correctness of your entries concerning striping. Any mistake made here is apparent only later when the LVM is implemented on disk.
LVM Configuration 31
Page 32
Figure 4-5 Creating Logical Volumes
novdocx (en) 7 January 2010
If you have already configured LVM on your system, the existing logical volumes can be specified now. Before continuing, assign appropriate mount points to these logical volumes too. Click Next to return to the YaST Expert Partitioner and finish your work there.
4.6 Direct LVM Management
If you already have configured LVM and only want to change something, there is an alternative method available. In the YaST Control Center, select System > Partitioner. You can manage your LVM system by using the methods already described.
4.7 Resizing an LVM Partition
You can increase the size of a logical volume by using the YaST partitioner, or by using the
lvextend
To extend an LV there must be enough unallocated space available on the VG.
LVs can be extended or shrunk while they are being used, but this may not be true for a filesystem on them. Extending or shrinking the LV does not automatically modify the size of file systems in the volume. You must use a different command to grow the filesystem afterwards. For information about resizing file systems, see Chapter 5, “Resizing File Systems,” on page 35.
Make sure you use the right sequence:
command. YaST uses
parted(8)
to grow the partition.
If you extend an LV, you must extend the LV before you attempt to grow the filesystem.
If you shrink an LV, you must shrink the filesystem before you attempt to shrink the LV.
32 SLES 11: Storage Administration Guide
Page 33
To extend the size of a logical volume:
novdocx (en) 7 January 2010
1 Open a terminal console, log in as the
root
user.
2 If the logical volume contains file systems that are hosted for a virtual machine (such as a Xen
VM), shut down the VM.
3 Dismount the file systems on the logical volume.
4 At the terminal console prompt, enter the following command to grow the size of the logical
volume:
lvextend -L +size /dev/vgname/lvname
For size, specify the amount of space you want to add to the logical volume, such as 10GB. Replace
v1
lvextend -L +10GB /dev/vg1/v1
/dev/vgname/lvname
. For example:
with the Linux path to the logical volume, such as
/dev/vg1/
For example, to extend an LV with a (mounted and active) ReiserFS on it by 10GB:
lvextend L +10G /dev/vgname/lvname resize_reiserfs s +10GB f /dev/vgname/lvname
For example, to shrink an LV with a ReiserFS on it by 5GB:
umount /mountpointofLV resize_reiserfs s 5GB /dev/vgname/lvname lvreduce /dev/vgname/lvname mount /dev/vgname/lvname /mountpointofLV
LVM Configuration 33
Page 34
novdocx (en) 7 January 2010
34 SLES 11: Storage Administration Guide
Page 35
5
Resizing File Systems
When your data needs grow for a volume, you might need to increase the amount of space allocated to its file system.
Section 5.1, “Guidelines for Resizing,” on page 35
Section 5.2, “Increasing an Ext2 or Ext3 File System,” on page 36
Section 5.3, “Increasing the Size of a Reiser File System,” on page 37
Section 5.4, “Decreasing the Size of an Ext2 or Ext3 File System,” on page 37
Section 5.5, “Decreasing the Size of a Reiser File System,” on page 38
5.1 Guidelines for Resizing
Resizing any partition or file system involves some risks that can potentially result in losing data.
novdocx (en) 7 January 2010
5
WARNING: To avoid data loss, make sure to back up your data before you begin any resizing task.
Consider the following guidelines when planning to resize a file system.
Section 5.1.1, “File Systems that Support Resizing,” on page 35
Section 5.1.2, “Increasing the Size of a File System,” on page 35
Section 5.1.3, “Decreasing the Size of a File System,” on page 36
5.1.1 File Systems that Support Resizing
The file system must support resizing in order to take advantage of increases in available space for the volume. In SUSE systems Ext2, Ext3, and ReiserFS. The utilities support increasing and decreasing the size as follows:
Table 5-1 File System Support for Resizing
File System Utility Increase Size (Grow) Decrease Size (Shrink)
Ext2 resize2fs Yes, offline only Yes, offline only
Ext3 resize2fs Yes, online or offline Yes, online or offline
®
Linux Enterprise Server 11, file system resizing utilities are available for file
ReiserFS resize_reiserfs Yes, online or offline Yes, offline only
5.1.2 Increasing the Size of a File System
You can grow a file system to the maximum space available on the device, or specify an exact size. Make sure to grow the size of the device or logical volume before you attempt to increase the size of the file system.
Resizing File Systems
35
Page 36
When specifying an exact size for the file system, make sure the new size satisfies the following conditions:
The new size must be greater than the size of the existing data; otherwise, data loss occurs.
The new size must be equal to or less than the current device size because the file system size
cannot extend beyond the space available.
5.1.3 Decreasing the Size of a File System
When decreasing the size of the file system on a device, make sure the new size satisfies the following conditions:
The new size must be greater than the size of the existing data; otherwise, data loss occurs.
The new size must be equal to or less than the current device size because the file system size
cannot extend beyond the space available.
If you plan to also decrease the size of the logical volume that holds the file system, make sure to decrease the size of the file system before you attempt to decrease the size of the device or logical volume.
novdocx (en) 7 January 2010
5.2 Increasing an Ext2 or Ext3 File System
Ext2 and Ext3 file systems can be resized when mounted or unmounted with the command.
root
1 Open a terminal console, then log in as the
user or equivalent.
2 Increase the size of the file system using one of the following methods:
To extend the file system size to the maximum available size of the device called
, enter
sda1
resize2fs /dev/sda1
If a size parameter is not specified, the size defaults to the size of the partition.
To extend the file system to a specific size, enter
resize2fs /dev/sda1 size
The size parameter specifies the requested new size of the file system. If no units are specified, the unit of the size parameter is the block size of the file system. Optionally, the size parameter can be suffixed by one of the following the unit designators: s for 512 byte sectors; K for kilobytes (1 kilobyte is 1024 bytes); M for megabytes; or G for gigabytes.
Wait until the resizing is completed before continuing.
3 If the file system is not mounted, mount it now.
For example, to mount an Ext2 file system for a device named
, enter
home
mount -t ext2 /dev/sda1 /home
/dev/sda1
4 Check the effect of the resize on the mounted file system by entering
df -h
The Disk Free (df) command shows the total size of the disk, the number of blocks used, and the number of blocks available on the file system. The -h option print sizes in human-readable format, such as 1K, 234M, or 2G.
resize2fs
at mount point
/dev/
/
36 SLES 11: Storage Administration Guide
Page 37
5.3 Increasing the Size of a Reiser File System
A ReiserFS file system can be increased in size while mounted or unmounted.
novdocx (en) 7 January 2010
1 Open a terminal console, then log in as the
2 Increase the size of the file system on the device called
methods:
To extend the file system size to the maximum available size of the device, enter
resize_reiserfs /dev/sda2
When no size is specified, this increases the volume to the full size of the partition.
To extend the file system to a specific size, enter
resize_reiserfs -s size /dev/sda2
Replace size with the desired size in bytes. You can also specify units on the value, such as 50000K (kilobytes), 250M (megabytes), or 2G (gigabytes). Alternatively, you can specify an increase to the current size by prefixing the value with a plus (+) sign. For example, the following command increases the size of the file system on
resize_reiserfs -s +500M /dev/sda2
Wait until the resizing is completed before continuing.
3 If the file system is not mounted, mount it now.
For example, to mount an ReiserFS file system for device enter
mount -t reiserfs /dev/sda2 /home
4 Check the effect of the resize on the mounted file system by entering
df -h
The Disk Free (df) command shows the total size of the disk, the number of blocks used, and the number of blocks available on the file system. The -h option print sizes in human-readable format, such as 1K, 234M, or 2G.
root
user or equivalent.
/dev/sda2
, using one of the following
/dev/sda2
/dev/sda2
by 500 MB:
at mount point
/home
,
5.4 Decreasing the Size of an Ext2 or Ext3 File System
The Ext2 and Ext3 file systems can be resized when mounted or unmounted.
1 Open a terminal console, then log in as the
2 Decrease the size of the file system on the device such as
resize2fs /dev/sda1 <size>
Replace size with an integer value in kilobytes for the desired size. (A kilobyte is 1024 bytes.)
Wait until the resizing is completed before continuing.
3 If the file system is not mounted, mount it now. For example, to mount an Ext2 file system for
a device named
mount -t ext2 /dev/md0 /home
4 Check the effect of the resize on the mounted file system by entering
/dev/sda1
at mount point
root
user or equivalent.
/home
, enter
/dev/sda1
by entering
Resizing File Systems 37
Page 38
df -h
The Disk Free (df) command shows the total size of the disk, the number of blocks used, and the number of blocks available on the file system. The -h option print sizes in human-readable format, such as 1K, 234M, or 2G.
5.5 Decreasing the Size of a Reiser File System
Reiser file systems can be reduced in size only if the volume is unmounted.
novdocx (en) 7 January 2010
1 Open a terminal console, then log in as the
root
user or equivalent.
2 Unmount the device by entering
umount /mnt/point
If the partition you are attempting to decrease in size contains system files (such as the root (/) volume), unmounting is possible only when booting from a bootable CD or floppy.
3 Decrease the size of the file system on a device called
resize_reiserfs -s size /dev/sda2
/dev/sda1
by entering
Replace size with the desired size in bytes. You can also specify units on the value, such as 50000K (kilobytes), 250M (megabytes), or 2G (gigabytes). Alternatively, you can specify a decrease to the current size by prefixing the value with a minus (-) sign. For example, the following command reduces the size of the file system on
resize_reiserfs -s -500M /dev/sda2
/dev/md0
by 500 MB:
Wait until the resizing is completed before continuing.
4 Mount the file system by entering
mount -t reiserfs /dev/sda2 /mnt/point
5 Check the effect of the resize on the mounted file system by entering
df -h
The Disk Free (df) command shows the total size of the disk, the number of blocks used, and the number of blocks available on the file system. The -h option print sizes in human-readable format, such as 1K, 234M, or 2G.
38 SLES 11: Storage Administration Guide
Page 39
6
Using UUIDs to Mount Devices
This section describes the optional use of UUIDs instead of device names to identify file system devices in the boot loader file and the
Section 6.1, “Naming Devices with udev,” on page 39
Section 6.2, “Understanding UUIDs,” on page 39
Section 6.3, “Using UUIDs in the Boot Loader and /etc/fstab File (x86),” on page 40
Section 6.4, “Using UUIDs in the Boot Loader and /etc/fstab File (IA64),” on page 41
Section 6.5, “Additional Information,” on page 42
/etc/fstab
file.
6.1 Naming Devices with udev
novdocx (en) 7 January 2010
6
In the Linux 2.6 and later kernel, with persistent device naming. As part of the hotplug system, to or removed from the system.
A list of rules is used to match against specific device attributes. The (defined in the regardless of their order of recognition or the connection used for the device. The examine every appropriate block device that the kernel creates to apply naming rules based on certain buses, drive types, or file systems. For information about how to define your own rules for
udev
, see Writing udev Rules (http://reactivated.net/writing_udev_rules.html).
Along with the dynamic kernel-provided device node name, symbolic links pointing to the device in the
by-id, by-label, by-path
the
NOTE: Other programs besides not listed in
/etc/udev/rules.d
/dev/disk
.
udev
provides a userspace solution for the dynamic
udev
is executed if a device is added
udev
rules infrastructure
directory) provides stable names for all disk devices,
udev
maintains classes of persistent
/dev/disk
, and
by-uuid
udev
, such as LVM or md, might also generate UUIDs, but they are
subdirectories.
directory, which is further categorized by
/dev
udev
directory,
tools
6.2 Understanding UUIDs
A UUID (Universally Unique Identifier) is a 128-bit number for a file system that is unique on both the local system and across other systems. It is a randomly generated with system hardware information and time stamps as part of its seed. UUIDs are commonly used to uniquely tag devices.
Section 6.2.1, “Using UUIDs to Assemble or Activate File System Devices,” on page 40
Section 6.2.2, “Finding the UUID for a File System Device,” on page 40
Using UUIDs to Mount Devices
39
Page 40
6.2.1 Using UUIDs to Assemble or Activate File System Devices
The UUID is always unique to the partition and does not depend on the order in which it appears or where it is mounted. With certain SAN devices attached to the server, the system partitions are
/
renamed and moved to be the last device. For example, if root ( the install, it might be assigned to problem is to use the UUID in the boot loader and
The device ID assigned by the manufacturer for a drive never changes, no matter where the device is mounted, so it can always be found at boot. The UUID is a property of the file system and can change if you reformat the drive. In a boot loader file, you typically specify the location of the device (such as their UUIDs and administrator-specified volume labels. However, if you use a label and file location, you cannot change the label name when the partition is mounted.
You can use the UUID as criterion for assembling and activating software RAID devices. When a RAID is created, the superblock.
/dev/sda1
md
driver generates a UUID for the device, and stores the value in the md
/dev/sdg1
) to mount it at system boot. The boot loader can also mount devices by
after the SAN is connected. One way to avoid this
/etc/fstab
) is assigned to
files for the boot device.
/dev/sda1
during
novdocx (en) 7 January 2010
6.2.2 Finding the UUID for a File System Device
You can find the UUID for any block device in the UUID looks like this:
e014e482-1c2d-4d09-84ec-61b3aefde77a
/dev/disk/by-uuid
directory. For example, a
6.3 Using UUIDs in the Boot Loader and /etc/ fstab File (x86)
After the install, you can optionally use the following procedure to configure the UUID for the system device in the boot loader and
Before you begin, make a copy of
1 Install the SUSE® Linux Enterprise Server for x86 with no SAN devices connected.
2 After the install, boot the system.
3 Open a terminal console as the
4 Navigate to the
installed
4a At the terminal console prompt, enter
cd /dev/disk/by-uuid
4b List all partitions by entering
ll
4c Find the UUID, such as
e014e482-1c2d-4d09-84ec-61b3aefde77a —> /dev/sda1
5 Edit
/boot/grub/menu.1st
/dev/disk/by-uuid
/boot, /root
, and
/etc/fstab
/boot/grub/menu.1st
root
user or equivalent.
directory to find the UUID for the device where you
swap
.
file, using the Boot Loader option in YaST2 or using a text editor.
files for your x86 system.
file and the
/etc/fstab
file.
40 SLES 11: Storage Administration Guide
Page 41
For example, change
kernel /boot/vmlinuz root=/dev/sda1
to
kernel /boot/vmlinuz root=/dev/disk/by-uuid/e014e482-1c2d-4d09-84ec­61b3aefde77a
IMPORTANT: If you make a mistake, you can boot the server without the SAN connected, and fix the error by using the backup copy of the
/boot/grub/menu.1st
file as a guide.
If you use the Boot Loader option in YaST, there is a defect where it adds some duplicate lines to the boot loader file when you change a value. Use an editor to remove the following duplicate lines:
color white/blue black/light-gray
default 0
timeout 8
gfxmenu (sd0,1)/boot/message
When you use YaST to change the way that the root (/) device is mounted (such as by UUID or by label), the boot loader configuration needs to be saved again to make the change effective for the boot loader.
6 As the
root
user or equivalent, do one of the following to place the UUID in the
/etc/fstab
file:
Open YaST to System > Partitioner, select the device of interest, then modify Fstab
Options.
novdocx (en) 7 January 2010
Edit the
For example, if the root (
e014e482-1c2d-4d09-84ec-61b3aefde77a
/dev/sda1 / reiserfs acl,user_xattr 1 1
/etc/fstab
file to modify the system device from the location to the UUID.
/
) volume has a device path of
/dev/sda1
and its UUID is
, change line entry from
to
UUID=e014e482-1c2d-4d09-84ec-61b3aefde77a / reiserfs acl,user_xattr 1 1
IMPORTANT: Do not leave stray characters or spaces in the file.
6.4 Using UUIDs in the Boot Loader and /etc/ fstab File (IA64)
After the install, use the following procedure to configure the UUID for the system device in the boot loader and configuration file is
/etc/fstab
/boot/efi/SuSE/elilo.conf
Before you begin, make a copy of the
1 Install the SUSE Linux Enterprise Server for IA64 with no SAN devices connected.
2 After the install, boot the system.
3 Open a terminal console as the
files for your IA64 system. IA64 uses the EFI BIOS. Its file system
instead of
/boot/efi/SuSE/elilo.conf
root
user or equivalent.
/etc/fstab
file.
.
Using UUIDs to Mount Devices 41
Page 42
novdocx (en) 7 January 2010
4 Navigate to the
installed
/dev/disk/by-uuid
/boot, /root
, and
swap
directory to find the UUID for the device where you
.
4a At the terminal console prompt, enter
cd /dev/disk/by-uuid
4b List all partitions by entering
ll
4c Find the UUID, such as
e014e482-1c2d-4d09-84ec-61b3aefde77a —> /dev/sda1
5 Edit the boot loader file, using the Boot Loader option in YaST2.
For example, change
root=/dev/sda1
to
root=/dev/disk/by-uuid/e014e482-1c2d-4d09-84ec-61b3aefde77a
6 Edit the
/boot/efi/SuSE/elilo.conf
file to modify the system device from the location to
the UUID.
For example, change
/dev/sda1 / reiserfs acl,user_xattr 1 1
to
UUID=e014e482-1c2d-4d09-84ec-61b3aefde77a / reiserfs acl,user_xattr 1 1
IMPORTANT: Do not leave stray characters or spaces in the file.
6.5 Additional Information
For more information about using
Management with udev” (http://www.novell.com/documentation/sles11/sles_admin/data/ cha_udev.html) in the SUSE
For more information about console prompt:
man 8 udev
udev(8)
®
Linux Enterprise Server 11 Installation and Administration Guide.
udev(8)
for managing devices, see “Dynamic Kernel Device
commands, see its man page. Enter the following at a terminal
42 SLES 11: Storage Administration Guide
Page 43
7
Managing Multipath I/O for
novdocx (en) 7 January 2010
Devices
This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices.
Section 7.1, “Understanding Multipathing,” on page 43
Section 7.2, “Planning for Multipathing,” on page 44
Section 7.3, “Multipath Management Tools,” on page 50
Section 7.4, “Configuring the System for Multipathing,” on page 54
Section 7.5, “Enabling and Starting Multipath I/O Services,” on page 60
Section 7.6, “Configuring Path Failover Policies and Priorities,” on page 61
Section 7.7, “Tuning the Failover for Specific Host Bus Adapters,” on page 69
Section 7.8, “Configuring Multipath I/O for the Root Device,” on page 69
Section 7.9, “Configuring Multipath I/O for an Existing Software RAID,” on page 70
Section 7.10, “Scanning for New Devices without Rebooting,” on page 71
Section 7.11, “Scanning for New Partitioned Devices without Rebooting,” on page 73
Section 7.12, “Viewing Multipath I/O Status,” on page 74
Section 7.13, “Managing I/O in Error Situations,” on page 75
Section 7.14, “Resolving Stalled I/O,” on page 76
Section 7.15, “Additional Information,” on page 76
7
Section 7.16, “What’s Next,” on page 77
7.1 Understanding Multipathing
Section 7.1.1, “What Is Multipathing?,” on page 43
Section 7.1.2, “Benefits of Multipathing,” on page 43
7.1.1 What Is Multipathing?
Multipathing is the ability of a server to communicate with the same physical or logical block storage device across multiple physical paths between the host bus adapters in the server and the storage controllers for the device, typically in Fibre Channel (FC) or iSCSI SAN environments. You can also achieve multiple connections with direct attached storage when multiple channels are available.
7.1.2 Benefits of Multipathing
Linux multipathing provides connection fault tolerance and can provide load balancing across the active connections. When multipathing is configured and running, it automatically isolates and identifies device connection failures, and reroutes I/O to alternate connections.
Managing Multipath I/O for Devices
43
Page 44
Typical connection problems involve faulty adapters, cables, or controllers. When you configure multipath I/O for a device, the multipath driver monitors the active connection between devices. When the multipath driver detects I/O errors for an active path, it fails over the traffic to the device’s designated secondary path. When the preferred path becomes healthy again, control can be returned to the preferred path.
7.2 Planning for Multipathing
Section 7.2.1, “Guidelines for Multipathing,” on page 44
Section 7.2.2, “Using Multipathed Devices,” on page 45
Section 7.2.3, “Using LVM2 on Multipath Devices,” on page 46
Section 7.2.4, “Using mdadm with Multipath Devices,” on page 46
Section 7.2.5, “Using --noflush with Multipath Devices,” on page 46
Section 7.2.6, “Partitioning Multipath Devices,” on page 47
Section 7.2.7, “Supported Architectures for Multipath I/O,” on page 47
Section 7.2.8, “Supported Storage Arrays for Multipathing,” on page 47
novdocx (en) 7 January 2010
7.2.1 Guidelines for Multipathing
Use the guidelines in this section when planning your multipath I/O solution.
“Prerequisites” on page 44
“Vendor-Provided Multipath Solutions” on page 44
“Disk Management Tasks” on page 45
“Software RAIDs” on page 45
“High-Availability Solutions” on page 45
“Volume Managers” on page 45
“Virtualization Environments” on page 45
Prerequisites
Multipathing is managed at the device level.
The storage array you use for the multipathed device must support multipathing. For more
information, see Section 7.2.8, “Supported Storage Arrays for Multipathing,” on page 47.
You need to configure multipathing only if multiple physical paths exist between host bus
adapters in the server and host bus controllers for the block storage device. You configure multipath for the logical device as seen by the server.
Vendor-Provided Multipath Solutions
For some storage arrays, the vendor provides its own multipathing software to manage multipathing for the array’s physical and logical devices. In this case, you should follow the vendor’s instructions for configuring multipathing for those devices.
44 SLES 11: Storage Administration Guide
Page 45
Disk Management Tasks
Perform the following disk management tasks before you attempt to configure multipathing for a physical or logical device that has multiple paths:
Use third-party tools to carve physical disks into smaller logical disks.
Use third-party tools to partition physical or logical disks. If you change the partitioning in the
running system, the Device Mapper Multipath (DM-MP) module does not automatically detect and reflect these changes. DM-MP must be reinitialized, which usually requires a reboot.
Use third-party SAN array management tools to create and configure hardware RAID devices.
Use third-party SAN array management tools to create logical devices such as LUNs. Logical
device types that are supported for a given array depend on the array vendor.
Software RAIDs
The Linux software RAID management software runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a software RAID, you must configure the device for multipathing before you attempt to create the software RAID device. Automatic discovery of multipathed devices is not available. The software RAID is not aware of the multipathing management running underneath.
novdocx (en) 7 January 2010
High-Availability Solutions
High-availability solutions for clustering typically run on top of the multipathing server. For example, the Distributed Replicated Block Device (DRBD) high-availability solution for mirroring devices across a LAN runs on top of multipathing. For each device that has multiple I/O paths and that you plan to use in a DRDB solution, you must configure the device for multipathing before you configure DRBD.
Volume Managers
Volume managers such as LVM2 and EVMS run on top of multipathing. You must configure multipathing for a device before you use LVM2 or EVMS to create segment managers and file systems on it.
Virtualization Environments
When using multipathing in a virtualization environment, the multipathing is controlled in the host server environment. Configure multipathing for the device before you assign it to a virtual guest machine.
7.2.2 Using Multipathed Devices
If you want to use the entire LUN directly (for example, if you are using the SAN features to partition your storage), you can use the application, etc.
/dev/disk/by-id/xxx
names for
mkfs, fstab
, your
If the user-friendly names option is enabled in the
dev/disk/by-id/dm-uuid-.*-mpath-.*
ID. For information, see “Configuring User-Friendly Names or Alias Names in /etc/multipath.conf”
on page 58.
/etc/multipath.conf
file, you can use the
/
device name because this name is aliased to the device
Managing Multipath I/O for Devices 45
Page 46
7.2.3 Using LVM2 on Multipath Devices
By default, LVM2 does not recognize multipathed devices. To make LVM2 recognize the multipathed devices as possible physical volumes, you must modify
/etc/lvm/lvm.conf
important to modify it so that it does not scan and use the physical paths, but only accesses the multipath I/O storage through the multipath I/O layer. If you are using user-friendly names, make sure to specify the path so that it cans only the device mapper names for the device (
id/dm-uuid-.*-mpath-.*
) after multipathing is configured.
/dev/disk/by-
. It is
novdocx (en) 7 January 2010
To modif y
1 Open the
2 Change the
/etc/lvm/lvm.conf
/etc/lvm/lvm.conf
filter
filter = [ "a|/dev/disk/by-id/.*|", "r|.*|" ]
and
for multipath use:
file in a text editor.
types
entry in
/etc/lvm/lvm.conf
as follows:
This allows LVM2 to scan only the by-id paths and reject everything else.
If you are using user-friendly names, specify the path as follows so that only the device mapper names are scanned after multipathing is configured:
filter = [ "a|/dev/disk/by-id/dm-uuid-.*-mpath-.*|", "r|.*|" ]
3 If you are also using LVM2 on non-multipathed devices, make the necessary adjustments to
suit your setup.
filter = [ "a|/dev/disk/by-id/.*|", "r|.*|" ]
4 Save the file.
5 Add dm-multipath to
6 Make a new
initrd
/etc/sysconfig/kernel:INITRD_MODULES
.
to ensure that the Device Mapper Multipath services are loaded with the
changed settings. Enter
mkinitrd -f multipath
7 Reboot the server to apply the changes.
7.2.4 Using mdadm with Multipath Devices
The
mdadm
tool requires that the devices be accessed by the ID rather than by the device node path.
Therefore, the
DEVICE /dev/disk/by-id/*
DEVICE
entry in
If you are using user-friendly names, specify the path as follows so that only the device mapper names are scanned after multipathing is configured:
DEVICE /dev/disk/by-id/dm-uuid-.*-mpath-.*
7.2.5 Using --noflush with Multipath Devices
The option --noflush should always be used when running on multipath devices.
For example, in scripts where you perform a table reload, you use the --noflush option on resume to ensure that any outstanding I/O is not flushed, because you need the multipath topology information.
load resume --noflush
46 SLES 11: Storage Administration Guide
/etc/mdadm.conf
should be set as follows:
Page 47
7.2.6 Partitioning Multipath Devices
Behavior changes for how multipathed devices are handled might affect your configuration if you are upgrading.
“SUSE Linux Enterprise Server 11” on page 47
“SUSE Linux Enterprise Server 10” on page 47
“SUSE Linux Enterprise Server 9” on page 47
SUSE Linux Enterprise Server 11
novdocx (en) 7 January 2010
In SUSE Linux Enterprise Server 11, the default multipath setup relies on existing symbolic links in the
/dev/disk/by-id
start multipathing, the link points to the SCSI device with its running, the symbolic link points to the device using its symbolic links in the
/dev/disk/by-id
path persistently point to the same device regardless
directory when multipathing is started. Before you
scsi-xxx
dm-uuid-xxx
whether multipath is started or not. The configuration files (such as
udev
to overwrite the
name. When multipathing is
name. This ensures that the
lvm.conf
and
md.conf
) do not
need to be modified because they automatically point to the correct device.
SUSE Linux Enterprise Server 10
In SUSE Linux Enterprise Server 10, the
boot.multipath
to add symlinks to the
for any newly created partitions without requiring a reboot. This triggers
disk/by-*
symlinks. The main benefit is that you can call
kpartx
/dev/dm-*
software is used in the
line in the
kpartx
/etc/init.d/
multipath.conf
udevd
configuration file
to fill in the
with the new parameters without
/dev/
rebooting the server.
SUSE Linux Enterprise Server 9
In SUSE Linux Enterprise Server 9, it is not possible to partition multipath I/O devices themselves. If the underlying physical device is already partitioned, the multipath I/O device reflects those partitions and the layer provides
/dev/disk/by-id/<name>p1 ... pN
devices so you can access the partitions through the multipath I/O layer. As a consequence, the devices need to be partitioned prior to enabling multipath I/O. If you change the partitioning in the running system, DM-MP does not automatically detect and reflect these changes. The device must be reinitialized, which usually requires a reboot.
7.2.7 Supported Architectures for Multipath I/O
The multipathing drivers and tools support all seven of the supported processor architectures: IA32, AMD64/EM64T, IPF/IA64, p-Series (32-bit/64-bit), z-Series (31-bit and 64-bit).
7.2.8 Supported Storage Arrays for Multipathing
The multipathing drivers and tools support most storage arrays. The storage array that houses the multipathed device must support multipathing in order to use the multipathing drivers and tools. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.
“Storage Arrays That Are Automatically Detected for Multipathing” on page 48
Managing Multipath I/O for Devices 47
Page 48
“Tested Storage Arrays for Multipathing Support” on page 49
“Storage Arrays that Require Specific Hardware Handlers” on page 49
Storage Arrays That Are Automatically Detected for Multipathing
The
multipath-tools
package automatically detects the following storage arrays:
3PARdata VV Compaq* HSV110 Compaq MSA1000 DDN SAN MultiDirector DEC* HSG80 EMC* CLARiiON* CX EMC Symmetrix* FSC CentricStor* Hewlett Packard* (HP*) A6189A HP HSV110 HP HSV210 HP Open Hitachi* DF400 Hitachi DF500 Hitachi DF600 IBM* 3542 IBM ProFibre 4000R NetApp* SGI* TP9100 SGI TP9300 SGI TP9400 SGI TP9500 STK OPENstorage DS280 Sun* StorEdge 3510 Sun T4
novdocx (en) 7 January 2010
In general, most other storage arrays should work. When storage arrays are automatically detected, the default settings for multipathing apply. If you want non-default settings, you must manually create and configure the
/etc/multipath.conf
and Configuring the /etc/multipath.conf File,” on page 56.
Testing of the IBM zSeries* device with multipathing has shown that the dev_loss_tmo parameter should be set to 90 seconds, and the fast_io_fail_tmo parameter should be set to 5 seconds. If you are using zSeries devices, you must manually create and configure the to specify the values. For information, see “Configuring Default Settings for zSeries in /etc/
multipath.conf” on page 59.
Hardware that is not automatically detected requires an appropriate entry for configuration in the
DEVICES
section of the
/etc/multipath.conf
configure the configuration file. For information, see Section 7.4.5, “Creating and Configuring the /
etc/multipath.conf File,” on page 56.
48 SLES 11: Storage Administration Guide
file. For information, see Section 7.4.5, “Creating
/etc/multipath.conf
file
file. In this case, you must manually create and
Page 49
Consider the following caveats:
Not all of the storage arrays that are automatically detected have been tested on SUSE Linux
Enterprise Server. For information, see “Tested Storage Arrays for Multipathing Support” on
page 49.
Some storage arrays might require specific hardware handlers. A hardware handler is a kernel
module that performs hardware-specific actions when switching path groups and dealing with I/O errors. For information, see “Storage Arrays that Require Specific Hardware Handlers” on
page 49.
After you modify the
/etc/multipath.conf
file, you must run
mkinitrd
to re-create the
INITRD on your system, then reboot in order for the changes to take effect.
Tested Storage Arrays for Multipathing Support
The following storage arrays have been tested with SUSE Linux Enterprise Server:
EMC Hitachi Hewlett-Packard/Compaq IBM NetApp SGI
novdocx (en) 7 January 2010
Most other vendors’ storage arrays should also work. Consult your vendor’s documentation for guidance. For a list of the default storage arrays recognized by the
multipath-tools
package, see
“Storage Arrays That Are Automatically Detected for Multipathing” on page 48.
Storage Arrays that Require Specific Hardware Handlers
Storage arrays that require special commands on failover from one path to the other or that require special nonstandard error handling might require more extensive support. Therefore, the Device Mapper Multipath service has hooks for hardware handlers. For example, one such handler for the EMC CLARiiON CX family of arrays is already provided.
IMPORTANT: Consult the hardware vendor’s documentation to determine if its hardware handler must be installed for Device Mapper Multipath.
The
multipath -t
command shows an internal table of storage arrays that require special handling with specific hardware handlers. The displayed list is not an exhaustive list of supported storage arrays. It lists only those arrays that require special handling and that the
multipath-tools
developers had access to during the tool development.
IMPORTANT: Arrays with true active/active multipath support do not require special handling, so they are not listed for the
A listing in the
multipath -t
multipath -t
command.
table does not necessarily mean that SUSE Linux Enterprise Server was tested on that specific hardware. For a list of tested storage arrays, see “Tested Storage Arrays
for Multipathing Support” on page 49.
Managing Multipath I/O for Devices 49
Page 50
7.3 Multipath Management Tools
The multipathing support in SUSE Linux Enterprise Server 10 and later is based on the Device Mapper Multipath module of the Linux 2.6 kernel and the
mdadm
You can use
Section 7.3.1, “Device Mapper Multipath Module,” on page 50
Section 7.3.2, “Multipath I/O Management Tools,” on page 51
Section 7.3.3, “Using mdadm for Multipathed Devices,” on page 52
Section 7.3.4, “The Linux multipath(8) Command,” on page 53
to view the status of multipathed devices.
multipath-tools
7.3.1 Device Mapper Multipath Module
The Device Mapper Multipath (DM-MP) module provides the multipathing capability for Linux. DM-MP is the preferred solution for multipathing on SUSE Linux Enterprise Server 11. It is the only multipathing option shipped with the product that is completely supported by Novell SUSE.
userspace package.
®
and
novdocx (en) 7 January 2010
DM-MP features automatic configuration of the multipathing subsystem for a large variety of setups. Configurations of up to 8 paths to each device are supported. Configurations are supported for active/passive (one path active, others passive) or active/active (all paths active with round-robin load balancing).
The DM-MP framework is extensible in two ways:
Using specific hardware handlers. For information, see “Storage Arrays that Require Specific
Hardware Handlers” on page 49.
Using more sophisticated load-balancing algorithms than round-robin
The user-space component of DM-MP takes care of automatic path discovery and grouping, as well as automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.
DM-MP protects against failures in the paths to the device, and not failures in the device itself. If one of the active paths is lost (for example, a network adapter breaks or a fiber-optic cable is removed), I/O is redirected to the remaining paths. If the configuration is active/passive, then the path fails over to one of the passive paths. If you are using the round-robin load-balancing configuration, the traffic is balanced across the remaining healthy paths. If all active paths fail, inactive secondary paths must be waked up, so failover occurs with a delay of approximately 30 seconds.
If a disk array has more than one storage processor, make sure that the SAN switch has a connection to the storage processor that owns the LUNs you want to access. On most disk arrays, all LUNs belong to both storage processors, so both connections are active.
NOTE: On some disk arrays, the storage array manages the traffic through storage processors so that it presents only one storage processor at a time. One processor is active and the other one is passive until there is a failure. If you are connected to the wrong storage processor (the one with the passive path) you might not see the expected LUNs, or you might see the LUNs but get errors when trying to access them.
50 SLES 11: Storage Administration Guide
Page 51
Table 7-1 Multipath I/O Features of Storage Arrays
Features of Storage Arrays Description
Active/passive controllers One controller is active and serves all LUNs. The second controller acts as
a standby. The second controller also presents the LUNs to the multipath component so that the operating system knows about redundant paths. If the primary controller fails, the second controller takes over, and it serves all LUNs.
In some arrays, the LUNs can be assigned to different controllers. A given LUN is assigned to one controller to be its active controller. One controller does the disk I/O for any given LUN at a time, and the second controller is the standby for that LUN. The second controller also presents the paths, but disk I/O is not possible. Servers that use that LUN are connected to the LUN’s assigned controller. If the primary controller for a set of LUNs fails, the second controller takes over, and it serves all LUNs.
Active/active controllers Both controllers share the load for all LUNs, and can process disk I/O for
any given LUN. If one controller fails, the second controller automatically handles all traffic.
novdocx (en) 7 January 2010
Load balancing The Device Mapper Multipath driver automatically load balances traffic
across all active paths.
Controller failover When the active controller fails over to the passive, or standby, controller,
the Device Mapper Multipath driver automatically activates the paths between the host and the standby, making them the primary paths.
/
Boot/Root device support Multipathing is supported for the root (
Server 10 and later. The host server must be connected to the currently active controller and storage processor for the boot device.
Multipathing is supported for the Server 11 and later.
) device in SUSE Linux Enterprise
/boot
device in SUSE Linux Enterprise
Device Mapper Multipath detects every path for a multipathed device as a separate SCSI device. The SCSI device names take the form
/dev/sdN
beginning with a and issued sequentially as the devices are created, such as
, where N is an autogenerated letter for the device,
/dev/sda, /dev/sdb
,
and so on. If the number of devices exceeds 26, the letters are duplicated so that the next device after
/dev/sdz
If multiple paths are not automatically detected, you can configure them manually in the
multipath.conf
will be named
file. The
/dev/sdaa, /dev/sdab
multipath.conf
file does not exist until you create and configure it.
, and so on.
/etc/
For information, see Section 7.4.5, “Creating and Configuring the /etc/multipath.conf File,” on
page 56.
7.3.2 Multipath I/O Management Tools
The
multipath-tools
automatically tests the path periodically, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.
user-space package takes care of automatic path discovery and grouping. It
Managing Multipath I/O for Devices 51
Page 52
Table 7-2 Tools in the multipath-tools Package
Tool Description
multipath Scans the system for multipathed devices and assembles them.
novdocx (en) 7 January 2010
multipathd Waits for maps events, then executes
devmap-name Provides a meaningful device name to
kpartx Maps linear devmaps to partitions on the multipathed device, which makes
it possible to create multipath monitoring for partitions on the device.
multipath
udev
.
for device maps (devmaps).
For a list of files included in this package, see the multipath-tools Package Description (http://
www.novell.com/products/linuxpackages/suselinux/multipath-tools.html).
1 Ensure that the
multipath-tools
package is installed by entering the following at a terminal
console prompt:
rpm -q multipath-tools
If it is installed, the response repeats the package name and provides the version information, such as:
multipath-tools-04.7-34.23
If it is not installed, the response reads:
package multipath-tools is not installed
7.3.3 Using mdadm for Multipathed Devices
Udev is the default device handler, and devices are automatically known to the system by the Worldwide ID instead of by the device node name. This resolves problems in previous releases where
mdadm.conf
and
lvm.conf
did not properly recognize multipathed devices.
Just as for LVM2, node path. Therefore, the
DEVICE /dev/disk/by-id/*
mdadm
requires that the devices be accessed by the ID rather than by the device
DEVICE
entry in
/etc/mdadm.conf
should be set as follows:
If you are using user-friendly names, specify the path as follows so that only the device mapper names are scanned after multipathing is configured:
DEVICE /dev/disk/by-id/dm-uuid-.*-mpath-.*
To verify that
1 Ensure that the
mdadm
is installed:
mdadm
package is installed by entering the following at a terminal console
prompt:
rpm -q mdadm
If it is installed, the response repeats the package name and provides the version information. For example:
mdadm-2.6-0.11
If it is not installed, the response reads:
package mdadm is not installed
52 SLES 11: Storage Administration Guide
Page 53
novdocx (en) 7 January 2010
For information about modifying the
/etc/lvm/lvm.conf
file, see Section 7.2.3, “Using LVM2 on
Multipath Devices,” on page 46.
7.3.4 The Linux multipath(8) Command
Use the Linux
General syntax for the
multipath [-v verbosity] [-d] [-h|-l|-ll|-f|-F] [-p failover | multibus | group_by_serial | group_by_prio| group_by_node_name ]
General Examples
Configure multipath devices:
multipath
Configure a specific multipath device:
multipath devicename
Replace devicename with the device node name such as $DEVNAME variable), or in the
Selectively suppress a multipath map, and its device-mapped partitions:
multipath(8)
multipath(8)
command to configure and manage multipathed devices.
command:
major:minor
/dev/sdb
format.
(as shown by udev in the
multipath -f
Display potential multipath devices, but do not create any devices and do not update device maps (dry run):
multipath -d
Configure multipath devices and display multipath map information:
multipath -v2 <device>
multipath -v3
The -v2 option in multipath -v2 -d shows only local disks. Use the -v3 option to show the full path list. For example:
multipath -v3 -d
Display the status of all multipath devices, or a specified multipath device:
multipath -ll
multipath -ll <device>
Flush all unused multipath device maps (unresolves the multiple paths; it does not delete the device):
multipath -F
multipath -F <device>
Set the group policy:
Managing Multipath I/O for Devices 53
Page 54
multipath -p [failover|multibus|group_by_serial|group_by_prio|group_by_node_name]
Specify one of the group policy options that are described in Table 7-3:
Table 7-3 Group Policy Options for the multipath -p Command
Policy Option Description
failover One path per priority group. You can use only one path at a time.
multibus All paths in one priority group.
group_by_serial One priority group per detected SCSI serial number (the controller node
worldwide number).
group_by_prio One priority group per path priority value. Paths with the same priority are in the
same priority group. Priorities are determined by callout programs specified as a global, per-controller, or per-multipath option in the configuration file.
/etc/multipath.conf
novdocx (en) 7 January 2010
group_by_node_name One priority group per target node name. Target node names are fetched in the
/sys/class/fc_transport/target*/node_name
location.
7.4 Configuring the System for Multipathing
Section 7.4.1, “Preparing SAN Devices for Multipathing,” on page 54
Section 7.4.2, “Partitioning Multipathed Devices,” on page 55
Section 7.4.3, “Configuring the Server for Multipathing,” on page 55
Section 7.4.4, “Adding multipathd to the Boot Sequence,” on page 56
Section 7.4.5, “Creating and Configuring the /etc/multipath.conf File,” on page 56
7.4.1 Preparing SAN Devices for Multipathing
Before configuring multipath I/O for your SAN devices, prepare the SAN devices, as necessary, by doing the following:
Configure and zone the SAN with the vendor’s tools.
Configure permissions for host LUNs on the storage arrays with the vendor’s tools.
Install the Linux HBA driver module. Upon module installation, the driver automatically scans
the HBA to discover any SAN devices that have permissions for the host. It presents them to the host for further configuration.
NOTE: Ensure that the HBA driver you are using does not have native multipathing enabled.
See the vendor’s specific instructions for more details.
After the driver module is loaded, discover the device nodes assigned to specific array LUNs or
partitions.
54 SLES 11: Storage Administration Guide
Page 55
novdocx (en) 7 January 2010
If the LUNs are not seen by the HBA driver,
lsscsi
can be used to check whether the SCSI devices are seen correctly by the operating system. When the LUNs are not seen by the HBA driver, check the zoning setup of the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.
If the LUNs are seen by the HBA driver, but there are no corresponding block devices, additional kernel parameters are needed to change the SCSI device scanning behavior, such as to indicate that LUNs are not numbered consecutively. For information, see Options for SCSI Device Scanning
(http://support.novell.com/techcenter/sdb/en/2005/06/drahn_scsi_scanning.html) in the Novell
Support Knowledgebase.
7.4.2 Partitioning Multipathed Devices
Partitioning devices that have multiple paths is not recommended, but it is supported.
SUSE Linux Enterprise Server 10
In SUSE Linux Enterprise Server 10, you can use the multipathed devices without rebooting. You can also partition the device before you attempt to configure multipathing by using the Partitioner function in YaST2 or by using a third-party partitioning tool.
SUSE Linux Enterprise Server 9
kpartx
tool to create partitions on
In SUSE Linux Enterprise Server 9, if you want to partition the device, you should configure its partitions before you attempt to configure multipathing by using the Partitioner function in YaST2 or by using a third-party partitioning tool. This is necessary because partitioning an existing multipathed device is not supported. Partitioning operations on multipathed devices fail if attempted.
If you configure partitions for a device, DM-MP automatically recognizes the partitions and indicates them by appending p1 to pn to the device’s ID, such as
/dev/disk/by-id/26353900f02796769p1
To partition multipathed devices, you must disable the DM-MP service, partition the normal device node (such as
/dev/sdc
), then reboot to allow the DM-MP service to see the new partitions.
7.4.3 Configuring the Server for Multipathing
The system must be manually configured to automatically load the device drivers for the controllers to which the multipath I/O devices are connected within the
INITRD
driver module to the variable INITRD_MODULES in the file
For example, if your system contains a RAID controller accessed by the multipathed devices connected to a QLogic* controller accessed by the driver qla2xxx, this entry would look like:
INITRD_MODULES="cciss"
. You need to add the necessary
/etc/sysconfig/kernel
cciss
driver and
.
Because the QLogic driver is not automatically loaded on startup, add it here:
INITRD_MODULES="cciss qla23xx"
Managing Multipath I/O for Devices 55
Page 56
novdocx (en) 7 January 2010
After changing
mkinitrd
When you are using LILO as a boot manager, reinstall it with the
/etc/sysconfig/kernel
, you must re-create the
INITRD
command, then reboot in order for the changes to take effect.
/sbin/lilo
action is required if you are using GRUB.
7.4.4 Adding multipathd to the Boot Sequence
Use either of the methods in this section to add multipath I/O services ( sequence.
“YaST” on page 56
“Command Line” on page 56
YaST
1 In YaST, click System > System Services (Runlevel) > Simple Mode.
2 Select multipathd, then click Enable.
3 Click OK to acknowledge the service startup message.
4 Click Finish, then click Ye s.
The changes do not take affect until the server is restarted.
multipathd
on your system with the
command. No further
) to the boot
Command Line
1 Open a terminal console, then log in as the
root
user or equivalent.
2 At the terminal console prompt, enter
insserv multipathd
7.4.5 Creating and Configuring the /etc/multipath.conf File
The
/etc/multipath.conf packages/multipath-tools/multipath.conf.synthetic multipath.conf packages/multipath-tools/multipath.conf.annotated
file that you can use as a guide for multipath settings. See
comments for each of the attributes and their options.
“Creating the multipath.conf File” on page 57
“Verifying the Setup in the etc/multipath.conf File” on page 57
“Configuring User-Friendly Names or Alias Names in /etc/multipath.conf” on page 58
“Blacklisting Non-Multipathed Devices in /etc/multipath.conf” on page 58
“Configuring Default Multipath Behavior in /etc/multipath.conf” on page 59
“Configuring Default Settings for zSeries in /etc/multipath.conf” on page 59
“Applying Changes Made to the /etc/multipath.conf File” on page 60
file does not exist unless you create it. The
file contains a sample
for a template with extensive
/usr/share/doc/
/etc/
/usr/share/doc/
56 SLES 11: Storage Administration Guide
Page 57
Creating the multipath.conf File
If the
/etc/multipath.conf
file does not exist, copy the example to create the file:
novdocx (en) 7 January 2010
1 In a terminal console, log in as the
root
user.
2 Enter the following command (all on one line, of course) to copy the template:
cp /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic /etc/ multipath.conf
3 Use the
/usr/share/doc/packages/multipath-tools/multipath.conf.annotated
file
as a reference to determine how to configure multipathing for your system.
4 Make sure there is an appropriate
device
documentation on the proper setup of the
The
/etc/multipath.conf
file requires a different
entry for your SAN. Most vendors provide
device
section.
device
section for different SANs. If you are using a storage subsystem that is automatically detected (see “Tested Storage Arrays for
Multipathing Support” on page 49), the default entry for that device can be used; no further
configuration of the
/etc/multipath.conf
file is required.
5 Save the file.
Verifying the Setup in the etc/multipath.conf File
After setting up the configuration, you can perform a “dry run” by entering
multipath -v3 -d
This command scans the devices, then displays what the setup would look like. The output is similar to the following:
26353900f02796769 [size=127 GB] [features="0"] [hwhandler="1 emc"]
\_ round-robin 0 [first] \_ 1:0:1:2 sdav 66:240 [ready ] \_ 0:0:1:2 sdr 65:16 [ready ]
\_ round-robin 0 \_ 1:0:0:2 sdag 66:0 [ready ] \_ 0:0:0:2 sdc 8:32 [ready ]
Paths are grouped into priority groups. Only one priority group is ever in active use. To model an active/active configuration, all paths end up in the same group. To model active/passive configuration, the paths that should not be active in parallel are placed in several distinct priority groups. This normally happens automatically on device discovery.
The output shows the order, the scheduling policy used to balance I/O within the group, and the paths for each priority group. For each path, its physical address (host:bus:target:lun), device node name,
major:minor
number, and state is shown.
Managing Multipath I/O for Devices 57
Page 58
Configuring User-Friendly Names or Alias Names in /etc/multipath.conf
A multipath device can be identified by either its WWID or an alias that you assign for it. The WWID (Worldwide Identifier) is an identifier for the multipath device that is guaranteed to be globally unique and unchanging. The default name used in multipathing is the ID of the logical unit as found in the
/dev/dm-n
and
/dev/disk/by-id
can change on reboot, referring to multipath devices by their ID is preferred.
directory. Because device node names in the form of
/dev/sdn
novdocx (en) 7 January 2010
The multipath device names in the always consistent because they use the association. These device names are user-friendly names such as
mpath-.*
.
You can specify your own device names to use via the ALIAS directive in the
multipath.conf mpath-.*
names.
file. Alias names override the use of ID and
/dev/mapper
/var/lib/multipath/bindings
directory reference the ID of the LUN and are
/dev/disk/by-id/dm-uuid-.*-
file to track the
/dev/disk/by-id/dm-uuid-.*-
/etc/
IMPORTANT: We recommend that you do not use aliases for the root device, because the ability to seamlessly switch off multipathing via the kernel command line is lost because the device name differs.
For an example of
tools/multipath.conf.synthetic
multipath.conf
1 In a terminal console, log in as the
2 Open the
3 Uncomment the
4 Uncomment the
/etc/multipath.conf
Defaults
user_friendly_names option
settings, see the
/usr/share/doc/packages/multipath-
file.
root
user.
file in a text editor.
directive and its ending bracket.
, then change its value from No to Yes.
For example:
## Use user friendly names, instead of using WWIDs as names. defaults { user_friendly_names yes }
5 Optionally specify your own user-friendly names for devices using the
multipath
section.
alias
directive in the
For example:
multipath { wwid 26353900f02796769 alias sdd4l0 }
6 Save your changes, then close the file.
Blacklisting Non-Multipathed Devices in /etc/multipath.conf
The
/etc/multipath.conf
devices are listed. For example, local IDE hard drives and floppy drives are not normally multipathed. If you have single-path devices that
multipath
58 SLES 11: Storage Administration Guide
to ignore them, put them in the
file should contain a
blacklist
blacklist
multipath
section where all non-multipathed
is trying to manage and you want
section to resolve the problem.
Page 59
NOTE: The keyword devnode_blacklist has been deprecated and replaced with the keyword blacklist.
novdocx (en) 7 January 2010
For example, to blacklist local devices and all arrays from the multipath, the
blacklist { wwid 26353900f02796769 devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|sda)[0-9]*" devnode "^hd[a-z][0-9]*" devnode "^cciss!c[0-9]d[0-9].*" }
blacklist
section looks like this:
cciss
driver from being managed by
You can also blacklist only the partitions from a driver instead of the entire array. For example, using the following regular expression blacklists only partitions from the cciss driver and not the entire array:
^cciss!c[0-9]d[0-9]*[p[0-9]*]
After you modify the
/etc/multipath.conf
file, you must run
mkinitrd
to re-create the INITRD
on your system, then reboot in order for the changes to take effect.
Afterwards, the local devices should no longer be listed in the multipath maps when you issue the
multipath -ll
command.
Configuring Default Multipath Behavior in /etc/multipath.conf
The
/etc/multipath.conf
behaviors. If the field is not otherwise specified in a
file should contain a
defaults
device
section, the default setting is applied for
section where you can specify default
that SAN configuration.
The following defaults section specifies a simple failover policy:
defaults { multipath_tool "/sbin/multipath -v0" udev_dir /dev polling_interval 10 default_selector "round-robin 0" default_path_grouping_policy failover default_getuid "/sbin/scsi_id -g -u -s /block/%n" default_prio_callout "/bin/true" default_features "0" rr_min_io 100 failback immediate
NOTE: In the above example instead of the sample path of
usr/share/doc/packages/multipath-tools/multipath.conf.synthetic
default_getuid
command line, use the path
/lib/udev/scsi_id
/sbin/scsi_id
as shown in the
that is found in the sample file
file (and in the
default and annotated sample files).
Configuring Default Settings for zSeries in /etc/multipath.conf
Testing of the IBM zSeries device with multipathing has shown that the dev_loss_tmo parameter should be set to 90 seconds, and the fast_io_fail_tmo parameter should be set to 5 seconds. If you are using zSeries devices, modify the
/etc/multipath.conf
file to specify the values as follows:
/
Managing Multipath I/O for Devices 59
Page 60
defaults { dev_loss_tmo 90 fast_io_fail_tmo 5 }
The dev_loss_tmo parameter sets the number of seconds to wait before marking a multipath link as bad. When the path fails, any current I/O on that failed path fails. The default value varies according to the device driver being used. The valid range of values is 0 to 600 seconds. To use the driver’s internal timeouts, set the value to zero (0) or to any value greater than 600.
The fast_io_fail_tmo parameter sets the length of time to wait before failing I/O when a link problem is detected. I/O that reaches the driver fails. If I/O is in a blocked queue, the I/O does not fail until the dev_loss_tmo time elapses and the queue is unblocked.
Applying Changes Made to the /etc/multipath.conf File
novdocx (en) 7 January 2010
Changes to the
/etc/multipath.conf
file cannot take effect when
multipathd
is running. After
you make changes, save and close the file, then do the following to apply the changes:
1 Stop the
multipathd
service.
2 Clear old multipath bindings by entering
/sbin/multipath -F
3 Create new multipath bindings by entering
/sbin/multipath -v2 -l
4 Start the
5 Run
multipathd
mkinitrd
service.
to re-create the INITRD on your system, then reboot in order for the changes to
take effect.
7.5 Enabling and Starting Multipath I/O Services
To start multipath services and enable them to start at reboot:
1 Open a terminal console, then log in as the
2 At the terminal console prompt, enter
chkconfig multipathd on
chkconfig boot.multipath on
If the boot.multipath service does not start automatically on system boot, do the following:
root
user or equivalent.
1 Open a terminal console, then log in as the
2 Enter
/etc/init.d/boot.multipath start
/etc/init.d/multipathd start
60 SLES 11: Storage Administration Guide
root
user or equivalent.
Page 61
7.6 Configuring Path Failover Policies and Priorities
In a Linux host, when there are multiple paths to a storage controller, each path appears as a separate block device, and results in multiple block devices for single LUN. The Device Mapper Multipath service detects multiple paths with the same LUN ID, and creates a new multipath device with that ID. For example, a host with two HBAs attached to a storage controller with two ports via a single unzoned Fibre Channel switch sees four block devices:
dev/sdd mpath1
. The Device Mapper Multipath service creates a single block device,
that reroutes I/O through those four underlying block devices.
/dev/sda, /dev/sdb, /dev/sdc
/dev/mpath/
This section describes how to specify policies for failover and configure priorities for the paths.
Section 7.6.1, “Configuring the Path Failover Policies,” on page 61
Section 7.6.2, “Configuring Failover Priorities,” on page 61
Section 7.6.3, “Using a Script to Set Path Priorities,” on page 67
Section 7.6.4, “Configuring ALUA,” on page 67
Section 7.6.5, “Reporting Target Path Groups,” on page 69
, and
/
novdocx (en) 7 January 2010
7.6.1 Configuring the Path Failover Policies
Use the
multipath devicename -p policy
Replace policy with one of the following policy options:
Table 7-4 Group Policy Options for the multipath -p Command
Policy Option Description
failover One path per priority group.
multibus All paths in one priority group.
group_by_serial One priority group per detected serial number.
group_by_prio One priority group per path priority value. Priorities are determined by callout
group_by_node_name One priority group per target node name. Target node names are fetched in the
multipath
command with the -p option to set the path failover policy:
programs specified as a global, per-controller, or per-multipath option in the
etc/multipath.conf
/sys/class/fc_transport/target*/node_name
configuration file.
location.
/
7.6.2 Configuring Failover Priorities
You must manually enter the failover priorities for the device in the Examples for all settings and options can be found in the
multipath-tools/multipath.conf.annotated
“Understanding Priority Groups and Attributes” on page 62
“Configuring for Round-Robin Load Balancing” on page 66
/etc/multipath.conf
/usr/share/doc/packages/
file.
Managing Multipath I/O for Devices 61
file.
Page 62
“Configuring for Single Path Failover” on page 66
“Grouping I/O Paths for Round-Robin Load Balancing” on page 66
Understanding Priority Groups and Attributes
A priority group is a collection of paths that go to the same physical LUN. By default, I/O is distributed in a round-robin fashion across all paths in the group. The
multipath
command automatically creates priority groups for each LUN in the SAN based on the path_grouping_policy setting for that SAN. The
multipath
command multiplies the number of paths in a group by the group’s priority to determine which group is the primary. The group with the highest calculated value is the primary. When all paths in the primary group are failed, the priority group with the next highest value becomes active.
A path priority is an integer value assigned to a path. The higher the value, the higher is the priority. An external program is used to assign priorities for each path. For a given device, the paths with the same priorities belong to the same priority group.
Table 7-5 Multipath Attributes
novdocx (en) 7 January 2010
Multipath Attribute Description Values
user_friendly_names Specifies whether to use IDs
or to use the
/var/lib/
multipath/bindings
to assign a persistent and unique alias to the multipath devices in the form of
mapper/mpathN
blacklist Specifies the list of device
names to ignore as non­multipathed devices, such as cciss, fd, hd, md, dm, sr, scd, st, ram, raw, loop.
blacklist_exceptions Specifies the list of device
names to treat as multipath devices even if they are included in the blacklist.
getuid The default program and
arguments to call to obtain a unique path identifier. Should be specified with an absolute path.
/dev/
.
yes: Autogenerate user-friendly names as aliases for the multipath devices instead of the
file
actual ID.
no: Default. Use the WWIDs shown in the
dev/disk/by-id/
For an example, see “Blacklisting Non-
Multipathed Devices in /etc/multipath.conf” on page 58.
For an example, see the
packages/multipath-tools/ multipath.conf.annotated
/sbin/scsi_id -g -u -s
This is the default location and arguments.
Example:
getuid "/sbin/scsi_id -g -u -d /dev/ %n"
/
location.
/usr/share/doc/
file.
62 SLES 11: Storage Administration Guide
Page 63
Multipath Attribute Description Values
novdocx (en) 7 January 2010
path_grouping_policy Specifies the path grouping
policy for a multipath device hosted by a given controller.
path_checker Determines the state of the
path.
failover: One path is assigned per priority group so that only one path at a time is used.
multibus: (Default) All valid paths are in one priority group. Traffic is load-balanced across all active paths in the group.
group_by_prio: One priority group exists for each path priority value. Paths with the same priority are in the same priority group. Priorities are assigned by an external program.
group_by_serial: Paths are grouped by the SCSI target serial number (controller node WWN).
group_by_node_name: One priority group is assigned per target node name. Target node names are fetched in
fc_transport/target*/node_name
directio: (Default in
0.4.8 and later) Reads the first sector that has direct I/O. This is useful for DASD devices. Logs failure messages in
readsector0: (Default in version 0.4.7 and earlier) Reads the first sector of the device. Logs failure messages in
log/messages
/sys/class/
multipath-tools
/var/log/messages
multipath-tools
.
.
version
.
/var/
tur: Issues a SCSI test unit ready command to the device. This is the preferred setting if the LUN supports it. The command does not fill up
var/log/messages
Some SAN vendors provide custom path_checker options:
on failure with messages.
emc_clariion: Queries the EMC Clariion
EVPD page 0xC0 to determine the path state.
hp_sw: Checks the path state (Up, Down,
or Ghost) for HP storage arrays with Active/ Standby firmware.
rdac: Checks the path state for the LSI/
Engenio RDAC storage controller.
/
Managing Multipath I/O for Devices 63
Page 64
Multipath Attribute Description Values
novdocx (en) 7 January 2010
path_selector Specifies the path-selector
algorithm to use for load balancing.
round-robin 0: (Default) The load-balancing algorithm used to balance traffic across all active paths in a priority group.
Beginning in SUSE Linux Enterprise Server 11, the following additional I/O balancing options are available:
least-pending: Provides a least-pending-I/O dynamic load balancing policy for bio based device mapper multipath. This load balancing policy considers the number of unserviced requests pending on a path and selects the path with least count of pending service requests.
This policy is especially useful when the SAN environment has heterogeneous components. For example, when there is one 8GB HBA and one 2GB HBA connected to the same server, the 8GB HBA could be utilized better with this algorithm.
length-load-balancing: A dynamic load balancer that balances the number of in-flight I/O on paths similar to the least-pending option.
service-time: A service-time oriented load balancer that balances I/O on paths according to the latency.
pg_timeout Specifies path group timeout
handling.
NONE (internal default)
64 SLES 11: Storage Administration Guide
Page 65
Multipath Attribute Description Values
novdocx (en) 7 January 2010
prio_callout
Multipath prio_callouts are located in shared libraries in
/lib/
libmultipath/
. By using
lib*
shared libraries, the callouts are loaded into memory on daemon startup.
Specifies the program and arguments to use to determine the layout of the multipath map.
When queried by the
multipath
specified mpath_prio_* callout program returns the priority for a given path in relation to the entire multipath layout.
When it is used with the path_grouping_policy of group_by_prio, all paths with the same priority are grouped into one multipath group. The group with the highest aggregate priority becomes the active group.
When all paths in a group fail, the group with the next highest aggregate priority becomes active. Additionally, a failover command (as determined by the hardware handler) might be send to the target.
The mpath_prio_* program can also be a custom script created by a vendor or administrator for a specified setup.
A %n in the command line expands to the device name in the
A %b expands to the device number in major:minor format in the
A %d expands to the device ID in the directory.
If devices are hot-pluggable, use the %d flag instead of %n. This addresses the short time that elapses between the time when devices are available and when creates the device nodes.
command, the
/dev
directory.
/dev
directory.
/dev/disk/by-id
udev
If no
prio_callout
are equal. This is the default.
/bin/true: Use this value when the group_by_priority is not being used.
prioritizer
The priorities when queried by the command. The program names must begin with
mpath_prio_
type or balancing method used. Current prioritizer programs include the following:
mpath_prio_alua %n: Generates path priorities based on the SCSI-3 ALUA settings.
mpath_prio_balance_units: Generates the same priority for all paths.
mpath_prio_emc %n: Generates the path priority for EMC arrays.
mpath_prio_hds_modular %b: Generates the path priority for Hitachi HDS Modular storage arrays.
mpath_prio_hp_sw %n: Generates the path priority for Compaq/HP controller in active/ standby mode.
mpath_prio_netapp %n: Generates the path priority for NetApp arrays.
mpath_prio_random %n: Generates a random priority for each path.
mpath_prio_rdac %n: Generates the path priority for LSI/Engenio RDAC controller.
mpath_prio_tpc %n: You can optionally use a script created by a vendor or administrator that gets the priorities from a file where you specify priorities to use for each path.
mpath_prio_spec.sh %n: Provides the path of a user-created script that generates the priorities for multipathing based on information contained in a second data file. (This path and filename are provided as an example. Specify the location of your script instead.) The script can be created by a vendor or administrator. The script’s target file identifies each path for all multipathed devices and specifies a priority for each path. For an example, see Section 7.6.3, “Using a Script to
Set Path Priorities,” on page 67.
attribute is used, all paths
programs generate path
multipath
and are named by the device
Managing Multipath I/O for Devices 65
Page 66
Multipath Attribute Description Values
novdocx (en) 7 January 2010
rr_min_io Specifies the number of I/O
transactions to route to a path before switching to the next path in the same path group, as determined by the specified algorithm in the
path_selector
rr_weight Specifies the weighting
method to use for paths.
no_path_retry Specifies the behaviors to use
on path failure.
failback Specifies whether to monitor
the failed path recovery, and indicates the timing for group failback after failed paths return to service.
When the failed path recovers, the path is added back into the multipath enabled path list based on this setting. Multipath evaluates the priority groups, and changes the active priority group when the priority of the primary path exceeds the secondary group.
setting.
n (>0): Specify an integer value greater than 0.
1000: Default.
uniform: Default. All paths have the same
round-robin weightings.
priorities: Each path’s weighting is determined by the path’s priority times the rr_min_io setting.
n (> 0): Specifies the number of retries until
multipath
Specify an integer value greater than 0.
fail: Specified immediate failure (no queuing).
queue : Never stop queuing (queue forever until
the path comes alive).
immediate: When a path recovers, enable the path immediately.
n (> 0): When the path recovers, wait n seconds before enabling the path. Specify an integer value greater than 0.
manual: (Default) The failed path is not monitored for recovery. The administrator runs
multipath
the paths and priority groups.
stops the queuing and fails the path.
command to update enabled
Configuring for Round-Robin Load Balancing
All paths are active. I/O is configured for some number of seconds or some number of I/O transactions before moving to the next open path in the sequence.
Configuring for Single Path Failover
A single path with the highest priority (lowest value setting) is active for traffic. Other paths are available for failover, but are not used unless failover occurs.
Grouping I/O Paths for Round-Robin Load Balancing
Multiple paths with the same priority fall into the active group. When all paths in that group fail, the device fails over to the next highest priority group. All paths in the group share the traffic load in a round-robin load balancing fashion.
66 SLES 11: Storage Administration Guide
Page 67
7.6.3 Using a Script to Set Path Priorities
You can create a script that interacts with Device Mapper - Multipath (DM-MP) to provide priorities for paths to the LUN when set as a resource for the
prio_callout
First, set up a text file that lists information about each device and the priority values you want to assign to each path. For example, name the file
/usr/local/etc/primary-paths
for each path in the following format:
host_wwpn target_wwpn scsi_id priority_value
Return a priority value for each path on the device. Make sure that the variable FILE_PRIMARY_PATHS resolves to a real file with appropriate data (host wwpn, target wwpn, scsi_id and priority value) for each device.
setting.
. Enter one line
novdocx (en) 7 January 2010
The contents of the
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:0 sdb 3600a0b8000122c6d00000000453174fc 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:1 sdc 3600a0b80000fd6320000000045317563 2
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:2 sdd 3600a0b8000122c6d0000000345317524 50
0x10000000c95ebeb4 0x200200a0b8122c6e 2:0:0:3 sde 3600a0b80000fd6320000000245317593 2
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:0 sdi 3600a0b8000122c6d00000000453174fc 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:1 sdj 3600a0b80000fd6320000000045317563 51
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:2 sdk 3600a0b8000122c6d0000000345317524 5
0x10000000c95ebeb4 0x200300a0b8122c6e 2:0:1:3 sdl 3600a0b80000fd6320000000245317593 51
primary-paths
file for a single LUN with eight paths each might look like this:
To continue the example mentioned in Table 7-5 on page 62, create a script named
sbin/path_prio.sh
. You can use any path and filename. The script does the following:
/usr/local/
On query from multipath, grep the device and its path from the
file.
paths
Return to multipath the priority value in the last column for that entry in the file.
7.6.4 Configuring ALUA
The
mpath_prio_alua(8)
command. It returns a number that is used by DM-MP to group SCSI devices with the same priority together. This path priority tool is based on ALUA (Asynchronous Logical Unit Access).
“Syntax” on page 68
“Prerequisite” on page 68
/usr/local/etc/primary-
command is used as a priority callout for the Linux
Managing Multipath I/O for Devices 67
multipath(8)
Page 68
“Options” on page 68
“Return Values” on page 68
Syntax
mpath_prio_alua [-d directory] [-h] [-v] [-V] device [device...]
Prerequisite
SCSI devices
Options
-d directory
Specifyies the Linux directory path where the listed device node names can be found. The default directory is
/dev
. When used, specify the device node name only (such as
device or devices you want to manage.
-h
Displays help for this command, then exits.
sda
novdocx (en) 7 January 2010
) for the
-v
Turns on verbose output to display status in human-readable format. Output includes information about which port group the specified device is in and its current state.
-V
Displays the version number of this tool, then exits.
device
Specifies the SCSI device you want to manage. The device must be a SCSI device that supports the Report Target Port Groups (
sg_rtpg(8)
) command. Use one of the following formats for
the device node name:
The full Linux directory path, such as
The device node name only, such as
The major and minor number of the device separated by a colon (:) with no spaces, such as
8:0
. This creates a temporary device node in the
tmpdev-<major>:<minor>-<pid>
of
/dev/sda
sda
. Specify the directory path by using the -d option.
. For example,
. Do not use with the -d option.
/dev
directory with a name in the format
/dev/tmpdev-8:0-<pid>
.
Return Values
On success, returns a value of 0 and the priority value for the group. Tab le 7 - 6 shows the priority values returned by the
mpath_prio_alua
command.
Table 7-6 ALUA Priorities for Device Mapper Multipath
Priority Value Description
50 The device is in the active, optimized group.
10 The device is in an active but non-optimized group.
68 SLES 11: Storage Administration Guide
Page 69
Priority Value Description
1 The device is in the standby group.
0 All other groups.
novdocx (en) 7 January 2010
Values are widely spaced because of the way the the number of paths in a group with the priority value for the group, then selects the group with the highest result. For example, if a non-optimized path group has six paths (6 x 10 = 60) and the optimized path group has a single path (1 x 50 = 50), the non-optimized group has the highest score, so multipath chooses the non-optimized group. Traffic to the device uses all six paths in the group in a round-robin fashion.
On failure, returns a value of 1 to 5 indicating the cause for the command’s failure. For information, see the man page for
mpath_prio_alua
.
multipath
command handles them. It multiplies
7.6.5 Reporting Target Path Groups
Use the SCSI Report Target Port Groups ( page for
sg_rtpg(8)
.
sg_rtpg(8)
) command. For information, see the man
7.7 Tuning the Failover for Specific Host Bus Adapters
When using multipath I/O, you want any host bus adapter (HBA) failure or cable failures to be reported faster than when multipathing is not in use. Configure time-out settings for your HBA to disable failover at the HBA level to allow the failure to propagate up to the multipath I/O layer as fast as possible where the I/O can be redirected to another, healthy path.
To disable the HBA handling of failover, modify the driver’s options in the
modprobe.conf.local
to disable failover settings for your driver.
For example, for the QLogic qla2xxx family of host bus adapters, the following setting is recommended:
options qla2xxx qlport_down_retry=1
file. Refer to the HBA vendor’s documentation for information about how
/etc/
7.8 Configuring Multipath I/O for the Root Device
IMPORTANT: In the SUSE Linux Enterprise Server 10 SP1 initial release and earlier, the root
/
partition ( partition. Otherwise, no boot loader is written.
DM-MP is now available and supported for
To enable multipathing on the existing root device:
1 Install Linux with only a single path active, preferably one where the
2 Mount the devices by using the
) on multipath is supported only if the
in the partitioner.
/dev/disk/by-id
/boot
/boot
partition is on a separate, non-multipathed
and
/root
in SUSE Linux Enterprise Server 11.
by-id
symlinks are listed
path used during the install.
Managing Multipath I/O for Devices 69
Page 70
novdocx (en) 7 January 2010
3 After installation, add
4 For System Z, before running mkinitrd, edit the
information in
5 Re-run
/sbin/mkinitrd
zipl.conf
6 For System Z, after running
dm-multipath
to
/etc/sysconfig/kernel:INITRD_MODULES
/etc/zipl.conf
file to change the by-path
with the same by-id information that was used in the /
to update the
mkinitrd
initrd
, run
image.
zipl
.
.
etc/fstab
7 Reboot the server.
To disable multipathing on the root device:
1 Add
multipath=off
to the kernel command line.
This affects only the root device. All other devices are not affected.
7.9 Configuring Multipath I/O for an Existing Software RAID
Ideally, you should configure multipathing for devices before you use them as components of a software RAID device. If you add multipathing after creating any software RAID devices, the DM­MP service might be starting after the
multipath
appear not to be available for RAIDs. You can use the procedure in this section to get multipathing running for a previously existing software RAID.
For example, you might need to configure multipathing for devices in a software RAID under the following circumstances:
service on reboot, which makes multipathing
.
If you create a new software RAID as part of the Partitioning settings during a new install or
upgrade.
If you did not configure the devices for multipathing before using them in the software RAID
as a member device or spare.
If you grow your system by adding new HBA adapters to the server or expanding the storage
subsystem in your SAN.
NOTE: The following instructions assume the software RAID device is
/dev/mapper/mpath0
, which is its device name as recognized by the kernel. Make sure to modify the instructions for the device name of your software RAID.
1 Open a terminal console, then log in as the
root
user or equivalent.
Except where otherwise directed, use this console to enter the commands in the following steps.
2 If any software RAID devices are currently mounted or running, enter the following commands
for each device to dismount the device and stop it.
umount /dev/mapper/mpath0
mdadm --misc --stop /dev/mapper/mpath0
3 Stop the
/etc/init.d/boot.md stop
4 Start the
/etc/init.d/boot.multipath start
/etc/init.s/multipathd start
boot.md
boot.multipath
service by entering
and
multipathd
services by entering the following commands:
70 SLES 11: Storage Administration Guide
Page 71
5 After the multipathing services are started, verify that the software RAID’s component devices
are listed in the
Devices Are Listed: The device names should now have symbolic links to their Device
/dev/disk/by-id
Mapper Multipath device names, such as
Devices Are Not Listed: Force the multipath service to recognize them by flushing and
directory. Do one of the following:
/dev/dm-1
.
rediscovering the devices.
To do this, enter the following commands:
multipath -F
multipath -v0
The devices should now be listed in
/dev/disk/by-id
, and have symbolic links to their
Device Mapper Multipath device names. For example:
lrwxrwxrwx 1 root root 10 Jun 15 09:36 scsi-mpath1 -> ../../dm-1
6 Restart the
/etc/init.d/boot.md start
boot.md
service and the RAID device by entering
7 Check the status of the software RAID by entering
mdadm --detail /dev/mapper/mpath0
The RAID’s component devices should match their Device Mapper Multipath device names that are listed as the symbolic links of devices in the
/dev/disk/by-id
directory.
novdocx (en) 7 January 2010
8 Make a new
initrd
to ensure that the Device Mapper Multipath services are loaded before the
RAID services on reboot. Enter
mkinitrd -f multipath
9 Reboot the server to apply these post-install configuration settings.
10 Verify that the software RAID array comes up properly on top of the multipathed devices by
checking the RAID status. Enter
mdadm --detail /dev/mapper/mpath0
For example:
Number Major Minor RaidDevice State
0 253 0 0 active sync /dev/dm-0
1 253 1 1 active sync /dev/dm-1
2 253 2 2 active sync /dev/dm-2
7.10 Scanning for New Devices without Rebooting
If your system has already been configured for multipathing and you later need to add more storage to the SAN, you can use the
rescan-scsi-bus.sh
this script scans all HBAs with typical LUN ranges.
script to scan for the new devices. By default,
Syntax
rescan-scsi-bus.sh [options] [host [host ...]]
You can specify hosts on the command line (deprecated), or use the (recommended).
--hosts=LIST
Managing Multipath I/O for Devices 71
option
Page 72
Options
For most storage subsystems, the script can be run successfully without options. However, some special cases might need to use one or more of the following parameters for the
bus.sh
Option Description
script:
rescan-scsi-
novdocx (en) 7 January 2010
-l
-L NUM
-w
-c
-r
--remove
-i
--issueLip
--forcerescan
--forceremove
--nooptscan
--color
--hosts=LIST
--channels=LIST
Activates scanning for LUNs 0-7. [Default: 0]
Activates scanning for LUNs 0 to NUM. [Default: 0]
Scans for target device IDs 0 to 15. [Default: 0 to 7]
Enables scanning of channels 0 or 1. [Default: 0]
Enables removing of devices. [Default: Disabled]
Issues a Fibre Channel LIP reset. [Default: Disabled]
Rescans existing devices.
Removes and re-adds every device. (DANGEROUS)
Don’t stop looking for LUNs if 0 is not found.
Use colored prefixes OLD/NEW/DEL.
Scans only hosts in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.
--hosts=A[-B][,C[-D]]
Scans only channels in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.
--channels=A[-B][,C[-D]]
--ids=LIST
--luns=LIST
Scans only target IDs in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.
--ids=A[-B][,C[-D]]
Scans only LUNs in LIST, where LIST is a comma-separated list of single values and ranges. No spaces are allowed.
--luns=A[-B][,C[-D]]
Procedure
Use the following procedure to scan the devices and make them available to multipathing without rebooting the system.
1 On the storage subsystem, use the vendor’s tools to allocate the device and update its access
control settings to allow the Linux system access to the new storage. Refer to the vendor’s documentation for details.
2 Scan all targets for a host to make its new device known to the middle layer of the Linux
kernel’s SCSI subsystem. At a terminal console prompt, enter
72 SLES 11: Storage Administration Guide
Page 73
rescan-scsi-bus.sh [options]
3 Check for scanning progress in the system log (the
/var/log/messages
file). At a terminal
console prompt, enter
tail -30 /var/log/messages
This command displays the last 30 lines of the log. For example:
# tail -30 /var/log/messages . . . Feb 14 01:03 kernel: SCSI device sde: 81920000 Feb 14 01:03 kernel: SCSI device sdf: 81920000 Feb 14 01:03 multipathd: sde: path checker registered Feb 14 01:03 multipathd: sdf: path checker registered Feb 14 01:03 multipathd: mpath4: event checker started Feb 14 01:03 multipathd: mpath5: event checker started Feb 14 01:03:multipathd: mpath4: remaining active paths: 1 Feb 14 01:03 multipathd: mpath5: remaining active paths: 1
4 Repeat Step 2 through Step 3 to add paths through other HBA adapters on the Linux system
that are connected to the new device.
5 Run the
multipath
command to recognize the devices for DM-MP configuration. At a
terminal console prompt, enter
multipath
You can now configure the new device for multipathing.
novdocx (en) 7 January 2010
7.11 Scanning for New Partitioned Devices without Rebooting
Use the example in this section to detect a newly added multipathed LUN without rebooting.
1 Open a terminal console, then log in as the
2 Scan all targets for a host to make its new device known to the middle layer of the Linux
kernel’s SCSI subsystem. At a terminal console prompt, enter
rescan-scsi-bus.sh [options]
For syntax and options information for the
“Scanning for New Devices without Rebooting,” on page 71.
3 Verify that the device is seen (the link has a new time stamp) by entering
ls -lrt /dev/dm-*
4 Verify the new WWN of the device appears in the log by entering
tail -33 /var/log/messages
5 Use a text editor to add a new alias definition for the device in the
such as
oradata3
.
6 Create a partition table for the device by entering
fdisk /dev/dm-8
7 Trigger udev by entering
echo 'add' > /sys/block/dm-8/uevent
This generates the device-mapper devices for the partitions on
8 Create a file system and label for the new partition by entering
root
user.
rescan-scsi-bus-sh
dm-8
script, see Section 7.10,
/etc/multipath.conf
.
file,
Managing Multipath I/O for Devices 73
Page 74
mke2fs -j /dev/dm-9
tune2fs -L oradata3 /dev/dm-9
9 Restart DM-MP to let it read the aliases by entering
/etc/init.d/multipathd restart
10 Verify that the device is recognized by
multipath -ll
11 Use a text editor to add a mount entry in the
At this point, the alias you created in Step 5 is not yet in the Add the mount entry the
/dev/dm-9
multipathd
by entering
/etc/fstab
file.
/dev/disk/by-label
path, then change the entry before the next time you
reboot to
LABEL=oradata3
12 Create a directory to use as the mount point, then mount the device by entering
md /oradata3
mount /oradata3
7.12 Viewing Multipath I/O Status
novdocx (en) 7 January 2010
directory.
Querying the multipath I/O status outputs the current status of the multipath maps.
The
multipath -l
option displays the current path status as of the last time that the path checker
was run. It does not run the path checker.
multipath -ll
The
option runs the path checker, updates the path information, then displays the current status information. This option always the displays the latest information about the path status.
1 At a terminal console prompt, enter
multipath -ll
This displays information for each multipathed device. For example:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"]
\_ round-robin 0 [active][first] \_ 1:0:1:2 sdav 66:240 [ready ][active] \_ 0:0:1:2 sdr 65:16 [ready ][active]
\_ round-robin 0 [enabled] \_ 1:0:0:2 sdag 66:0 [ready ][active] \_ 0:0:0:2 sdc 8:32 [ready ][active]
For each device, it shows the device’s ID, size, features, and hardware handlers.
Paths to the device are automatically grouped into priority groups on device discovery. Only one priority group is active at a time. For an active/active configuration, all paths are in the same group. For an active/passive configuration, the passive paths are placed in separate priority groups.
The following information is displayed for each group:
Scheduling policy used to balance I/O within the group, such as round-robin
Whether the group is active, disabled, or enabled
74 SLES 11: Storage Administration Guide
Page 75
Whether the group is the first (highest priority) group
Paths contained within the group
The following information is displayed for each path:
The physical address as host:bus:target:lun, such as 1:0:1:2
Device node name, such as
Major:minor numbers
Status of the device
sda
7.13 Managing I/O in Error Situations
You might need to configure multipathing to queue I/O if all paths fail concurrently by enabling queue_if_no_path. Otherwise, I/O fails immediately if all paths are gone. In certain scenarios, where the driver, the HBA, or the fabric experiences spurious errors, it is advisable that DM-MP be configured to queue all I/O where those errors lead to a loss of all paths, and never propagate errors upwards.
novdocx (en) 7 January 2010
When using multipathed devices in a cluster, you might choose to disable queue_if_no_path. This automaticall fails the path instead of queuing the I/O, and escalates the I/O error to causes a failover of the cluster resources.
Because enabling queue_if_no_path leads to I/O being queued indefinitely unless a path is reinstated, make sure that
multipathd
is running and works for your scenario. Otherwise, I/O might be stalled indefinitely on the affected multipathed device until reboot or until you manually return to failover instead of queuing.
To test the scenario:
root
1 In a terminal console, log in as the
user.
2 Activate queuing instead of failover for the device I/O by entering:
dmsetup message device_ID 0 queue_if_no_path
Replace the device_ID with the ID for your device. For example, enter:
dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_path
3 Return to failover for the device I/O by entering:
dmsetup message device_ID 0 fail_if_no_path
This command immediately causes all queued I/O to fail.
Replace the device_ID with the ID for your device. For example, enter:
dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path
To set up queuing I/O for scenarios where all paths fail:
1 In a terminal console, log in as the
2 Open the
/etc/multipath.conf
3 Uncomment the defaults section and its ending bracket, then add the
setting, as follows:
defaults { default_features "1 queue_if_no_path" }
root
user.
file in a text editor.
default_features
Managing Multipath I/O for Devices 75
Page 76
novdocx (en) 7 January 2010
4 After you modify the
INITRD on your system, then reboot in order for the changes to take effect.
5 When you are ready to return over to failover for the device I/O, enter:
dmsetup message mapname 0 fail_if_no_path
Replace the mapname with the mapped alias name or the device ID for the device.
This command immediately causes all queued I/O to fail and propagates the error to the calling application.
/etc/multipath.conf
file, you must run
mkinitrd
to re-create the
7.14 Resolving Stalled I/O
If all paths fail concurrently and I/O is queued and stalled, do the following:
1 Enter the following command at a terminal console prompt:
dmsetup message mapname 0 fail_if_no_path
Replace all queued I/O to fail and propagates the error to the calling application.
2 Reactivate queueing by entering the following command at a terminal console prompt:
dmsetup message mapname 0 queue_if_no_path
mapname
with the correct device ID or mapped alias name for the device. This causes
7.15 Additional Information
For more information about configuring and using multipath I/O on SUSE Linux Enterprise Server, see the following additional resources in the Novell Support Knowledgebase:
How to Setup/Use Multipathing on SLES (http://support.novell.com/techcenter/sdb/en/2005/
04/sles_multipathing.html)
Troubleshooting SLES Multipathing (MPIO) Problems (Technical Information Document
3231766) (http://www.novell.com/support/ search.do?cmd=displayKC&docType=kc&externalId=3231766&sliceId=SAL_Public)
Dynamically Adding Storage for Use with Multipath I/O (Technical Information Document
3000817) (https://secure-support.novell.com/KanisaPlatform/Publishing/911/ 3000817_f.SAL_Public.html)
DM MPIO Device Blacklisting Not Honored in multipath.conf (Technical Information
Document 3029706) (http://www.novell.com/support/ search.do?cmd=displayKC&docType=kc&externalId=3029706&sliceId=SAL_Public&dialogI D=57872426&stateId=0%200%2057878058)
Static Load Balancing in Device-Mapper Multipathing (DM-MP) (Technical Information
Document 3858277) (http://www.novell.com/support/ search.do?cmd=displayKC&docType=kc&externalId=3858277&sliceId=SAL_Public&dialogI D=57872426&stateId=0%200%2057878058)
Troubleshooting SCSI (LUN) Scanning Issues (Technical Information Document 3955167)
(http://www.novell.com/support/ search.do?cmd=displayKC&docType=kc&externalId=3955167&sliceId=SAL_Public&dialogI D=57868704&stateId=0%200%2057878206)
76 SLES 11: Storage Administration Guide
Page 77
7.16 What’s Next
If you want to use software RAIDs, create and configure them before you create file systems on the devices. For information, see the following:
Chapter 8, “Software RAID Configuration,” on page 79
Chapter 10, “Managing Software RAIDs 6 and 10 with mdadm,” on page 89
novdocx (en) 7 January 2010
Managing Multipath I/O for Devices 77
Page 78
novdocx (en) 7 January 2010
78 SLES 11: Storage Administration Guide
Page 79
8
Software RAID Configuration
The purpose of RAID (redundant array of independent disks) is to combine several hard disk partitions into one large virtual hard disk to optimize performance, data security, or both. Most RAID controllers use the SCSI protocol because it can address a larger number of hard disks in a more effective way than the IDE protocol and is more suitable for parallel processing of commands. There are some RAID controllers that support IDE or SATA hard disks. Software RAID provides the advantages of RAID systems without the additional cost of hardware RAID controllers. However, this requires some CPU time and has memory requirements that make it unsuitable for real high performance computers.
IMPORTANT: Software RAID is not supported underneath clustered file systems such as OCFS2, because RAID does not support concurrent activation. If you want RAID for OCFS2, you need the RAID to be handled by the storage subsystem.
®
SUSE system. RAID implies several strategies for combining several hard disks in a RAID system, each with different goals, advantages, and characteristics. These variations are commonly known as RAID levels.
Linux Enterprise offers the option of combining several hard disks into one soft RAID
novdocx (en) 7 January 2010
8
Section 8.1, “Understanding RAID Levels,” on page 79
Section 8.2, “Soft RAID Configuration with YaST,” on page 80
Section 8.3, “Troubleshooting,” on page 82
Section 8.4, “For More Information,” on page 82
8.1 Understanding RAID Levels
This section describes common RAID levels 0, 1, 2, 3, 4, 5, and nested RAID levels.
Section 8.1.1, “RAID 0,” on page 79
Section 8.1.2, “RAID 1,” on page 80
Section 8.1.3, “RAID 2 and RAID 3,” on page 80
Section 8.1.4, “RAID 4,” on page 80
Section 8.1.5, “RAID 5,” on page 80
Section 8.1.6, “Nested RAID Levels,” on page 80
8.1.1 RAID 0
This level improves the performance of your data access by spreading out blocks of each file across multiple disk drives. Actually, this is not really a RAID, because it does not provide data backup, but the name RAID 0 for this type of system has become the norm. With RAID 0, two or more hard disks are pooled together. The performance is very good, but the RAID system is destroyed and your data lost if even one hard disk fails.
Software RAID Configuration
79
Page 80
8.1.2 RAID 1
This level provides adequate security for your data, because the data is copied to another hard disk 1:1. This is known as hard disk mirroring. If a disk is destroyed, a copy of its contents is available on another mirrored disk. All disks except one could be damaged without endangering your data. However, if damage is not detected, damaged data might be mirrored to the correct disk and the data is corrupted that way. The writing performance suffers a little in the copying process compared to when using single disk access (10 to 20 % slower), but read access is significantly faster in comparison to any one of the normal physical hard disks, because the data is duplicated so can be scanned in parallel. RAID 1 generally provides nearly twice the read transaction rate of single disks and almost the same write transaction rate as single disks.
8.1.3 RAID 2 and RAID 3
These are not typical RAID implementations. Level 2 stripes data at the bit level rather than the block level. Level 3 provides byte-level striping with a dedicated parity disk and cannot service simultaneous multiple requests. Both levels are rarely used.
novdocx (en) 7 January 2010
8.1.4 RAID 4
Level 4 provides block-level striping just like Level 0 combined with a dedicated parity disk. If a data disk fails, the parity data is used to create a replacement disk. However, the parity disk might create a bottleneck for write access. Nevertheless, Level 4 is sometimes used.
8.1.5 RAID 5
RAID 5 is an optimized compromise between Level 0 and Level 1 in terms of performance and redundancy. The hard disk space equals the number of disks used minus one. The data is distributed over the hard disks as with RAID 0. Parity blocks, created on one of the partitions, are there for security reasons. They are linked to each other with XOR, enabling the contents to be reconstructed by the corresponding parity block in case of system failure. With RAID 5, no more than one hard disk can fail at the same time. If one hard disk fails, it must be replaced as soon as possible to avoid the risk of losing data.
8.1.6 Nested RAID Levels
Several other RAID levels have been developed, such as RAIDn, RAID 10, RAID 0+1, RAID 30, and RAID 50. Some of them being proprietary implementations created by hardware vendors. These levels are not very widespread, and are not explained here.
8.2 Soft RAID Configuration with YaST
The YaST soft RAID configuration can be reached from the YaST Expert Partitioner. This partitioning tool enables you to edit and delete existing partitions and create new ones that should be used with soft RAID.
You can create RAID partitions by first clicking Create > Do not format then selecting 0xFD Linux RAID as the partition identifier. For RAID 0 and RAID 1, at least two partitions are needed—for RAID 1, usually exactly two and no more. If RAID 5 is used, at least three partitions are required. It is recommended to use only partitions of the same size because each segment can contribute only
80 SLES 11: Storage Administration Guide
Page 81
the same amount of space as the smallest sized partition. The RAID partitions should be stored on different hard disks to decrease the risk of losing data if one is defective (RAID 1 and 5) and to optimize the performance of RAID 0. After creating all the partitions to use with RAID, click RAID > Create RAID to start the RAID configuration.
In the next dialog, choose among RAID levels 0, 1, and 5, then click Next. The following dialog (see
Figure 8-1) lists all partitions with either the Linux RAID or Linux native type. No swap or DOS
partitions are shown. If a partition is already assigned to a RAID volume, the name of the RAID device (for example,
Figure 8-1 RAID Partitions
/dev/md0
) is shown in the list. Unassigned partitions are indicated with “--”.
novdocx (en) 7 January 2010
To add a previously unassigned partition to the selected RAID volume, first select the partition then click Add. At this point, the name of the RAID device is displayed next to the selected partition. Assign all partitions reserved for RAID. Otherwise, the space on the partition remains unused. After assigning all partitions, click Next to proceed to the settings dialog where you can fine-tune the performance (see Figure 8-2).
Software RAID Configuration 81
Page 82
Figure 8-2 File System Settings
novdocx (en) 7 January 2010
As with conventional partitioning, set the file system to use as well as encryption and the mount point for the RAID volume. After completing the configuration with Finish, see the device and others indicated with RAID in the Expert Partitioner.
/dev/md0
8.3 Troubleshooting
Check the of a system failure, shut down your Linux system and replace the defective hard disk with a new one partitioned the same way. Then restart your system and enter the command
add /dev/sdX
automatically into the RAID system and fully reconstructs it.
Although you can access all data during the rebuild, you might encounter some performance issues until the RAID has been fully rebuilt.
/proc/mdstats
. Replace X with your particular device identifiers. This integrates the hard disk
file to find out whether a RAID partition has been damaged. In the event
mdadm /dev/mdX --
8.4 For More Information
Configuration instructions and more details for soft RAID can be found in the HOWTOs at:
The Software RAID HOWTO (http://en.tldp.org/HOWTO/Software-RAID-HOWTO.html)
The Software RAID HOWTO in the
RAID.HOWTO.html
Linux RAID mailing lists are also available, such as linux-raid (http://marc.theaimsgroup.com/
?l=linux-raid).
file
/usr/share/doc/packages/mdadm/Software-
82 SLES 11: Storage Administration Guide
Page 83
9
Configuring Software RAID for the
novdocx (en) 7 January 2010
Root Partition
In SUSE® Linux Enterprise Server 11, the Device Mapper RAID tool has been integrated into the YaST Partitioner. You can use the partitioner at install time to create a software RAID for the system
/
device that contains your root (
Section 9.1, “Prerequisites for the Software RAID,” on page 83
Section 9.2, “Enabling iSCSI Initiator Support at Install Time,” on page 83
Section 9.3, “Enabling Multipath I/O Support at Install Time,” on page 84
Section 9.4, “Creating a Software RAID Device for the Root (/) Partition,” on page 84
9.1 Prerequisites for the Software RAID
Make sure your configuration meets the following requirements:
You need two or more hard drives, depending on the type of software RAID you plan to create.
RAID 0 (Striping): RAID 0 requires two or more devices. RAID 0 offers no fault
tolerance benefits, and it is not recommended for the system device.
RAID 1 (Mirroring): RAID 1 requires two devices.
RAID 5 (Redundant Striping): RAID 5 requires three or more devices.
) partition.
9
The hard drives should be similarly sized. The RAID assumes the size of the smallest drive.
The block storage devices can be any combination of local (in or directly attached to the
machine), Fibre Channel storage subsystems, or iSCSI storage subsystems.
If you are using hardware RAID devices, do not attempt to run software RAIDs on top of it.
If you are using iSCSI target devices, enable the iSCSI initiator support before you create the
RAID device.
If your storage subsystem provides multiple I/O paths between the server and its directly
attached local devices, Fibre Channel devices, or iSCSI devices that you want to use in the software RAID, you must enable the multipath support before you create the RAID device.
9.2 Enabling iSCSI Initiator Support at Install Time
If there are iSCSI target devices that you want to use for the root (/) partition, you must enable the iSCSI Initiator software to make those devices available to you before you create the software RAID device.
1 Proceed with the YaST install of SUSE Linux Enterprise 11 until you reach the Installation
Settings page.
2 Click Partitioning to open the Preparing Hard Disk page, click Custom Partitioning (for
experts), then click Next.
Configuring Software RAID for the Root Partition
83
Page 84
3 On the Expert Partitioner page, expand Hard Disks in the System View panel to view the default
proposal.
4 On the Hard Disks page, select Configure > Configure iSCSI, then click Continue when
prompted to continue with initializing the iSCSI initiator configuration.
9.3 Enabling Multipath I/O Support at Install Time
If there are multiple I/O paths to the devices you want to use to create a software RAID device for the root (/) partition, you must enable multipath support before you create the software RAID device.
1 Proceed with the YaST install of SUSE Linux Enterprise 11 until you reach the Installation
Settings page.
2 Click Partitioning to open the Preparing Hard Disk page, click Custom Partitioning (for
experts), then click Next.
3 On the Expert Partitioner page, expand Hard Disks in the System View panel to view the default
proposal.
4 On the Hard Disks page, select Configure > Configure Multipath, then click Yes when
prompted to activate multipath.
This re-scans the devices and resolves the multiple paths so that each device is listed only once in the list of hard disks.
novdocx (en) 7 January 2010
9.4 Creating a Software RAID Device for the Root (/) Partition
1 Proceed with the YaST install of SUSE Linux Enterprise 11 until you reach the Installation
Settings page.
2 Click Partitioning to open the Preparing Hard Disk page, click Custom Partitioning (for
experts), then click Next.
3 On the Expert Partitioner page, expand Hard Disks in the System View panel to view the default
proposal, select the proposed partitions, then click Delete.
4 Create a swap partition.
4a On the Expert Partitioner page under Hard Disks, select the device you want to use for the
swap partition, then click Add on the Hard Disk Partitions tab.
4b Under New Partition Type, select Primary Partition, then click Next.
4c Under New Partition Size, specify the size to use, then click Next.
4d Under Format Options, select Format partition, then select Swap from the drop-down list.
4e Under Mount Options, select Mount partition, then select swap from the drop-down list.
4f Click Finish.
5 Set up the 0xFD Linux RAID format for each of the devices you want to use for the software
RAID.
5a On the Expert Partitioner page under Hard Disks, select the device you want to use in the
RAID, then click Add on the Hard Disk Partitions tab.
5b Under New Partition Type, select Primary Partition, then click Next.
84 SLES 11: Storage Administration Guide
Page 85
5c Under New Partition Size, specify to use the maximum size, then click Next.
5d Under Format Options, select Do not format partition, then select 0xFD Linux RAID from
the drop-down list.
5e Under Mount Options, select Do not mount partition.
5f Click Finish.
5g Repeat Step 5a to Step 5f for each device that you plan to use in the software RAID
6 Create the RAID device.
6a In the System View panel, select RAID, then click Add RAID on the RAID page.
The devices that you prepared in Step 5 are listed in Available Devices.
novdocx (en) 7 January 2010
6b Under RAID Type, select RAID 0 (Striping), RAID 1 (Mirroring), or RAID 5 (Redundant
Striping).
For example, select RAID 1 (Mirroring).
6c In the Available Devices panel, select the devices you want to use for the RAID, then click
Add to move the devices to the Selected Devices panel.
Specify two or more devices for a RAID 1, two devices for a RAID 0, or at least three devices for a RAID 5.
To continue the example, two devices are selected for RAID 1.
Configuring Software RAID for the Root Partition 85
Page 86
novdocx (en) 7 January 2010
6d Click Next.
6e Under RAID Options, select the chunk size from the drop-down list.
The default chunk size for a RAID 1 (Mirroring) is 4 KB.
The default chunk size for a RAID 0 (Striping) is 32 KB.
Available chunk sizes are 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, or 4 MB.
6f Under Formatting Options, select Format partition, then select the file system type (such
as Ext3) from the File system drop-down list.
/
6g Under Mounting Options, select Mount partition, then select
from the Mount Point
drop-down list.
6h Click Finish.
86 SLES 11: Storage Administration Guide
Page 87
The software RAID device is managed by Device Mapper, and creates a device under the
/dev/md0
path.
novdocx (en) 7 January 2010
7 On the Expert Partitioner page, click Accept.
The new proposal appears under Partitioning on the Installation Settings page.
For example, the setup for the
8 Continue with the install.
Whenever you reboot your server, Device Mapper is started at boot time so that the software RAID is automatically recognized, and the operating system on the root (/) partition can be started.
Configuring Software RAID for the Root Partition 87
Page 88
novdocx (en) 7 January 2010
88 SLES 11: Storage Administration Guide
Page 89
10
Managing Software RAIDs 6 and
novdocx (en) 7 January 2010
10 with mdadm
This section describes how to create software RAID 6 and 10 devices, using the Multiple Devices Administration ( tool provides the functionality of legacy programs
Section 10.1, “Creating a RAID 6,” on page 89
Section 10.2, “Creating Nested RAID 10 Devices with mdadm,” on page 90
Section 10.3, “Creating a Complex RAID 10 with mdadm,” on page 93
Section 10.4, “Creating a Degraded RAID Array,” on page 96
mdadm(8)
10.1 Creating a RAID 6
Section 10.1.1, “Understanding RAID 6,” on page 89
Section 10.1.2, “Creating a RAID 6,” on page 90
10.1.1 Understanding RAID 6
RAID 6 is essentially an extension of RAID 5 that allows for additional fault tolerance by using a second independent distributed parity scheme (dual parity). Even if two of the hard disk drives fail during the data recovery process, the system continues to be operational, with no data loss.
) tool. You can also use
mdadm
to create RAIDs 0, 1, 4, and 5. The
mdtools
and
raidtools
.
mdadm
10
RAID 6 provides for extremely high data fault tolerance by sustaining multiple simultaneous drive failures. It handles the loss of any two devices without data loss. Accordingly, it requires N+2 drives to store N drives worth of data. It requires a minimum of 4 devices.
The performance for RAID 6 is slightly lower but comparable to RAID 5 in normal mode and single disk failure mode. It is very slow in dual disk failure mode.
Table 10-1 Comparison of RAID 5 and RAID 6
Feature RAID 5 RAID 6
Number of devices N+1, minimum of 3 N+2, minimum of 4
Parity Distributed, single Distributed, dual
Performance Medium impact on write and
rebuild
Fault-tolerance Failure of one component device Failure of two component devices
More impact on sequential write than RAID 5
Managing Software RAIDs 6 and 10 with mdadm
89
Page 90
10.1.2 Creating a RAID 6
novdocx (en) 7 January 2010
The procedure in this section creates a RAID 6 device
,
dev/sdb1
/dev/sdc1
, and
/dev/sdd1
. Make sure to modify the procedure to use your actual
/dev/md0
with four devices:
/dev/sda1, /
device nodes.
root
1 Open a terminal console, then log in as the
user or equivalent.
2 Create a RAID 6 device. At the command prompt, enter
mdadm --create /dev/md0 --run --level=raid6 --chunk=128 --raid-devices=4 / dev/sdb1 /dev/sdc1 /dev/sdc1 /dev/sdd1
The default chunk size is 64 KB.
3 Create a file system on the RAID 6 device
/dev/md0
, such as a Reiser file system (reiserfs).
For example, at the command prompt, enter
mkfs.reiserfs /dev/md0
Modify the command if you want to use a different file system.
4 Edit the
/dev/md0
5 Edit the
/etc/mdadm.conf
.
/etc/fstab
file to add an entry for the RAID 6 device
file to add entries for the component devices and the RAID device
/dev/md0
.
6 Reboot the server.
The RAID 6 device is mounted to
/local
.
7 (Optional) Add a hot spare to service the RAID array. For example, at the command prompt
enter:
mdadm /dev/md0 -a /dev/sde1
10.2 Creating Nested RAID 10 Devices with mdadm
Section 10.2.1, “Understanding Nested RAID Devices,” on page 90
Section 10.2.2, “Creating Nested RAID 10 (1+0) with mdadm,” on page 91
Section 10.2.3, “Creating Nested RAID 10 (0+1) with mdadm,” on page 92
10.2.1 Understanding Nested RAID Devices
A nested RAID device consists of a RAID array that uses another RAID array as its basic element, instead of using physical disks. The goal of this configuration is to improve the performance and fault tolerance of the RAID.
Linux supports nesting of RAID 1 (mirroring) and RAID 0 (striping) arrays. Generally, this combination is referred to as RAID 10. To distinguish the order of the nesting, this document uses the following terminology:
RAID 1+0: RAID 1 (mirror) arrays are built first, then combined to form a RAID 0 (stripe)
array.
RAID 0+1: RAID 0 (stripe) arrays are built first, then combined to form a RAID 1 (mirror)
array.
90 SLES 11: Storage Administration Guide
Page 91
The following table describes the advantages and disadvantages of RAID 10 nesting as 1+0 versus 0+1. It assumes that the storage objects you use reside on different disks, each with a dedicated I/O capability.
Table 10-2 Nested RAID Levels
RAID Level Description Performance and Fault Tolerance
novdocx (en) 7 January 2010
10 (1+0) RAID 0 (stripe)
built with RAID 1 (mirror) arrays
10 (0+1) RAID 1 (mirror)
built with RAID 0 (stripe) arrays
RAID 1+0 provides high levels of I/O performance, data redundancy, and disk fault tolerance. Because each member device in the RAID 0 is mirrored individually, multiple disk failures can be tolerated and data remains available as long as the disks that fail are in different mirrors.
You can optionally configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.
RAID 0+1 provides high levels of I/O performance and data redundancy, but slightly less fault tolerance than a 1+0. If multiple disks fail on one side of the mirror, then the other mirror is available. However, if disks are lost concurrently on both sides of the mirror, all data is lost.
This solution offers less disk fault tolerance than a 1+0 solution, but if you need to perform maintenance or maintain the mirror on a different site, you can take an entire side of the mirror offline and still have a fully functional storage device. Also, if you lose the connection between the two sites, either site operates independently of the other. That is not true if you stripe the mirrored segments, because the mirrors are managed at a lower level.
If a device fails, the mirror on that side fails because RAID 1 is not fault­tolerant. Create a new RAID 0 to replace the failed side, then resynchronize the mirrors.
10.2.2 Creating Nested RAID 10 (1+0) with mdadm
A nested RAID 1+0 is built by creating two or more RAID 1 (mirror) devices, then using them as component devices in a RAID 0.
IMPORTANT: If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, “Managing
Multipath I/O for Devices,” on page 43.
The procedure in this section uses the device names shown in the following table. Make sure to modify the device names with the names of your own devices.
Managing Software RAIDs 6 and 10 with mdadm 91
Page 92
Table 10-3 Scenario for Creating a RAID 10 (1+0) by Nesting
Raw Devices RAID 1 (mirror) RAID 1+0 (striped mirrors)
novdocx (en) 7 January 2010
/dev/sdb1
/dev/sdc1
/dev/sdd1
/dev/sde1
1 Open a terminal console, then log in as the
/dev/md0
/dev/md1
root
user or equivalent.
/dev/md2
2 Create 2 software RAID 1 devices, using two different devices for each RAID 1 device. At the
command prompt, enter these two commands:
mdadm --create /dev/md0 --run --level=1 --raid-devices=2 /dev/sdb1 /dev/ sdc1
mdadm --create /dev/md1 --run --level=1 --raid-devices=2 /dev/sdd1 /dev/ sde1
3 Create the nested RAID 1+0 device. At the command prompt, enter the following command
using the software RAID 1 devices you created in Step 2:
mdadm --create /dev/md2 --run --level=0 --chunk=64 --raid-devices=2 /dev/ md0 /dev/md1
The default chunk size is 64 KB.
4 Create a file system on the RAID 1+0 device
/dev/md2
, such as a Reiser file system (reiserfs).
For example, at the command prompt, enter
mkfs.reiserfs /dev/md2
Modify the command if you want to use a different file system.
5 Edit the
/dev/md2
6 Edit the
/etc/mdadm.conf
.
/etc/fstab
file to add entries for the component devices and the RAID device
file to add an entry for the RAID 1+0 device
/dev/md2
.
7 Reboot the server.
The RAID 1+0 device is mounted to
/local
.
10.2.3 Creating Nested RAID 10 (0+1) with mdadm
A nested RAID 0+1 is built by creating two to four RAID 0 (striping) devices, then mirroring them as component devices in a RAID 1.
IMPORTANT: If you need to manage multiple connections to the devices, you must configure multipath I/O before configuring the RAID devices. For information, see Chapter 7, “Managing
Multipath I/O for Devices,” on page 43.
In this configuration, spare devices cannot be specified for the underlying RAID 0 devices because RAID 0 cannot tolerate a device loss. If a device fails on one side of the mirror, you must create a replacement RAID 0 device, than add it into the mirror.
The procedure in this section uses the device names shown in the following table. Make sure to modify the device names with the names of your own devices.
92 SLES 11: Storage Administration Guide
Page 93
Table 10-4 Scenario for Creating a RAID 10 (0+1) by Nesting
Raw Devices RAID 0 (stripe) RAID 0+1 (mirrored stripes)
novdocx (en) 7 January 2010
/dev/sdb1
/dev/sdc1
/dev/sdd1
/dev/sde1
/dev/md0
/dev/md2
/dev/md1
1 Open a terminal console, then log in as the root user or equivalent.
2 Create two software RAID 0 devices, using two different devices for each RAID 0 device. At
the command prompt, enter these two commands:
mdadm --create /dev/md0 --run --level=0 --chunk=64 --raid-devices=2 /dev/ sdb1 /dev/sdc1
mdadm --create /dev/md1 --run --level=0 --chunk=64 --raid-devices=2 /dev/ sdd1 /dev/sde1
The default chunk size is 64 KB.
3 Create the nested RAID 0+1 device. At the command prompt, enter the following command
using the software RAID 0 devices you created in Step 2:
mdadm --create /dev/md2 --run --level=1 --raid-devices=2 /dev/md0 /dev/md1
4 Create a file system on the RAID 0+1 device
/dev/md2
, such as a Reiser file system (reiserfs).
For example, at the command prompt, enter
mkfs.reiserfs /dev/md2
Modify the command if you want to use a different file system.
5 Edit the
/dev/md2
6 Edit the
/etc/mdadm.conf
.
/etc/fstab
file to add an entry for the RAID 0+1 device
file to add entries for the component devices and the RAID device
/dev/md2
.
7 Reboot the server.
The RAID 0+1 device is mounted to
/local
.
10.3 Creating a Complex RAID 10 with mdadm
Section 10.3.1, “Understanding the mdadm RAID10,” on page 93
Section 10.3.2, “Creating a RAID 10 with mdadm,” on page 96
10.3.1 Understanding the mdadm RAID10
In
mdadm
, the RAID10 level creates a single complex software RAID that combines features of both RAID 0 (striping) and RAID 1 (mirroring). Multiple copies of all data blocks are arranged on multiple drives following a striping discipline. Component devices should be the same size.
“Comparing the Complex RAID10 and Nested RAID 10 (1+0)” on page 94
“Number of Replicas in the mdadm RAID10” on page 94
“Number of Devices in the mdadm RAID10” on page 94
Managing Software RAIDs 6 and 10 with mdadm 93
Page 94
“Near Layout” on page 95
“Far Layout” on page 95
Comparing the Complex RAID10 and Nested RAID 10 (1+0)
The complex RAID 10 is similar in purpose to a nested RAID 10 (1+0), but differs in the following ways:
Table 10-5 Complex vs. Nested RAID 10
Feature mdadm RAID10 Option Nested RAID 10 (1+0)
novdocx (en) 7 January 2010
Number of devices Allows an even or odd number of
component devices
Component devices Managed as a single RAID
device
Striping Striping occurs in the near or far
layout on component devices.
The far layout provides sequential read throughput that scales by number of drives, rather than number of RAID 1 pairs.
Multiple copies of data Two or more copies, up to the
number of devices in the array
Hot spare devices A single spare can service all
component devices
Requires an even number of component devices
Manage as a nested RAID device
Striping occurs consecutively across component devices
Copies on each mirrored segment
Configure a spare for each underlying mirrored array, or configure a spare to serve a spare group that serves all mirrors.
Number of Replicas in the mdadm RAID10
When configuring an mdadm RAID10 array, you must specify the number of replicas of each data block that are required. The default number of replicas is 2, but the value can be 2 to the number of devices in the array.
Number of Devices in the mdadm RAID10
You must use at least as many component devices as the number of replicas you specify. However, number of component devices in a RAID10 array does not need to be a multiple of the number of replicas of each data block. The effective storage size is the number of devices divided by the number of replicas.
For example, if you specify 2 replicas for an array created with 5 component devices, a copy of each block is stored on two different devices. The effective storage size for one copy of all data is 5/2 or
2.5 times the size of a component device.
94 SLES 11: Storage Administration Guide
Page 95
Near Layout
With the near layout, copies of a block of data are striped near each other on different component devices. That is, multiple copies of one data block are at similar offsets in different devices. Near is the default layout for RAID10. For example, if you use an odd number of component devices and two copies of data, some copies are perhaps one chunk further into the device.
mdadm
The near layout for the
RAID10 yields read and write performance similar to RAID 0 over
half the number of drives.
Near layout with an even number of disks and two replicas:
sda1 sdb1 sdc1 sde1 0011 2233 4455 6677 8899
Near layout with an odd number of disks and two replicas:
sda1 sdb1 sdc1 sde1 sdf1 00112 23344 55667 78899 10 10 11 11 12
novdocx (en) 7 January 2010
Far Layout
The far layout stripes data over the early part of all drives, then stripes a second copy of the data over the later part of all drives, making sure that all copies of a block are on different drives. The second set of values starts halfway through the component drives.
mdadm
With a far layout, the read performance of the
RAID10 is similar to a RAID 0 over the full number of drives, but write performance is substantially slower than a RAID 0 because there is more seeking of the drive heads. It is best used for read-intensive operations such as for read-only file servers.
The speed of the raid10 for writing is similar to other mirrored RAID types, like raid1 and raid10 using near layout, as the elevator of the file system schedules the writes in a more optimal way than raw writing. Using raid10 in the far layout well-suited for mirrored writing applications.
Far layout with an even number of disks and two replicas:
sda1 sdb1 sdc1 sde1 0123 3567 . . . 3123 7456
Far layout with an odd number of disks and two replicas:
Managing Software RAIDs 6 and 10 with mdadm 95
Page 96
sda1 sdb1 sdc1 sde1 sdf1 01234 56789 . . . 40123 95678
10.3.2 Creating a RAID 10 with mdadm
novdocx (en) 7 January 2010
The RAID10 option for
mdadm
creates a RAID 10 device without nesting. For information about
RAID10-, see Section 10.3, “Creating a Complex RAID 10 with mdadm,” on page 93.
The procedure in this section uses the device names shown in the following table. Make sure to modify the device names with the names of your own devices.
Table 10-6 Scenario for Creating a RAID 10 Using the mdadm RAID10 Option
Raw Devices RAID10 (near or far striping scheme)
/dev/sdf1
/dev/sdg1
/dev/sdh1
/dev/sdi1
/dev/md3
1 In YaST, create a 0xFD Linux RAID partition on the devices you want to use in the RAID, such
/dev/sdf1, /dev/sdg1, /dev/sdh1
as
, and
/dev/sdi1
.
2 Open a terminal console, then log in as the root user or equivalent.
3 Create a RAID 10 command. At the command prompt, enter (all on the same line):
mdadm --create /dev/md3 --run --level=10 --chunk=4 --raid-devices=4 /dev/ sdf1 /dev/sdg1 /dev/sdh1 /dev/sdi1
4 Create a Reiser file system on the RAID 10 device
mkfs.reiserfs /dev/md3
5 Edit the
/dev/md3
DEVICE /dev/md3
6 Edit the
/etc/mdadm.conf
. For example:
/etc/fstab
file to add an entry for the RAID 10 device
file to add entries for the component devices and the RAID device
/dev/md3
. At the command prompt, enter
/dev/md3
.
7 Reboot the server.
The RAID10 device is mounted to
/raid10
.
10.4 Creating a Degraded RAID Array
A degraded array is one in which some devices are missing. Degraded arrays are supported only for RAID 1, RAID 4, RAID 5, and RAID 6. These RAID types are designed to withstand some missing devices as part of their fault-tolerance features. Typically, degraded arrays occur when a device fails. It is possible to create a degraded array on purpose.
96 SLES 11: Storage Administration Guide
Page 97
RAID Type Allowable Number of Slots Missing
RAID 1 All but one device
RAID 4 One slot
RAID 5 One slot
RAID 6 One or two slots
novdocx (en) 7 January 2010
To create a degraded array in which some devices are missing, simply give the word
mdadm
place of a device name. This causes
When creating a RAID 5 array,
mdadm
to leave the corresponding slot in the array empty.
automatically creates a degraded array with an extra spare
missing
in
drive. This is because building the spare into a degraded array is generally faster than resynchronizing the parity on a non-degraded, but not clean, array. You can override this feature with the
--force
option.
Creating a degraded array might be useful if you want create a RAID, but one of the devices you want to use already has data on it. In that case, you create a degraded array with other devices, copy data from the in-use device to the RAID that is running in degraded mode, add the device into the RAID, then wait while the RAID is rebuilt so that the data is now across all devices. An example of this process is given in the following procedure:
1 Create a degraded RAID 1 device
/dev/md0
, using one single drive
/dev/sd1
, enter the
following at the command prompt:
mdadm --create /dev/md0 -l 1 -n 2 /dev/sda1 missing
The device should be the same size or larger than the device you plan to add to it.
2 If the device you want to add to the mirror contains data that you want to move to the RAID
array, copy it now to the RAID array while it is running in degraded mode.
3 Add a device to the mirror. For example, to add
/dev/sdb1
to the RAID, enter the following at
the command prompt:
mdadm /dev/md0 -a /dev/sdb1
You can add only one device at a time. You must wait for the kernel to build the mirror and bring it fully online before you add another mirror.
4 Monitor the build progress by entering the following at the command prompt:
cat /proc/mdstat
To see the rebuild progress while being refreshed every second, enter
watch -n 1 cat /proc/mdstat
Managing Software RAIDs 6 and 10 with mdadm 97
Page 98
novdocx (en) 7 January 2010
98 SLES 11: Storage Administration Guide
Page 99
11
Resizing Software RAID Arrays
novdocx (en) 7 January 2010
with mdadm
This section describes how to increase or reduce the size of a software RAID 1, 4, 5, or 6 device with the Multiple Device Administration (
WARNING: Before starting any of the tasks described in this section, make sure that you have a valid backup of all of the data.
Section 11.1, “Understanding the Resizing Process,” on page 99
Section 11.2, “Increasing the Size of a Software RAID,” on page 100
Section 11.3, “Decreasing the Size of a Software RAID,” on page 104
mdadm(8)
11.1 Understanding the Resizing Process
Resizing an existing software RAID device involves increasing or decreasing the space contributed by each component partition.
Section 11.1.1, “Guidelines for Resizing a Software RAID,” on page 99
Section 11.1.2, “Overview of Tasks,” on page 100
) tool.
11
11.1.1 Guidelines for Resizing a Software RAID
The
mdadm(8)
levels provide disk fault tolerance so that one component partition can be removed at a time for resizing. In principle, it is possible to perform a hot resize for RAID partitions, but you must take extra care for your data when doing so.
The file system that resides on the RAID must also be able to be resized in order to take advantage of the changes in available space on the device. In SUSE resizing utilities are available for file systems Ext2, Ext3, and ReiserFS. The utilities support increasing and decreasing the size as follows:
Table 11-1 File System Support for Resizing
File System Utility Increase Size Decrease Size
Ext2 or Ext3 resize2fs Yes, offline only Yes, offline only
ReiserFS resize_reiserfs Yes, online or offline Yes, offline only
Resizing any partition or file system involves some risks that can potentially result in losing data.
WARNING: To avoid data loss, make sure to back up your data before you begin any resizing task.
tool supports resizing only for software RAID levels 1, 4, 5, and 6. These RAID
®
Linux Enterprise Server 11, file system
Resizing Software RAID Arrays with mdadm
99
Page 100
11.1.2 Overview of Tasks
Resizing the RAID involves the following tasks. The order in which these tasks is performed depends on whether you are increasing or decreasing its size.
Table 11-2 Tasks Involved in Resizing a RAID
novdocx (en) 7 January 2010
Tasks Description
Resize each of the component partitions.
Resize the software RAID itself.
Resize the file system. You must resize the file system that resides on the
Increase or decrease the active size of each component partition. You remove only one component partition at a time, modify its size, then return it to the RAID.
The RAID does not automatically know about the increases or decreases you make to the underlying component partitions. You must inform it about the new size.
RAID. This is possible only for file systems that provide tools for resizing, such as Ext2, Ext3, and ReiserFS.
Order If Increasing Size
12
23
31
Order If Decreasing Size
11.2 Increasing the Size of a Software RAID
Before you begin, review the guidelines in Section 11.1, “Understanding the Resizing Process,” on
page 99.
Section 11.2.1, “Increasing the Size of Component Partitions,” on page 100
Section 11.2.2, “Increasing the Size of the RAID Array,” on page 101
Section 11.2.3, “Increasing the Size of the File System,” on page 102
11.2.1 Increasing the Size of Component Partitions
Apply the procedure in this section to increase the size of a RAID 1, 4, 5, or 6. For each component partition in the RAID, remove the partition from the RAID, modify its size, return it to the RAID, then wait until the RAID stabilizes to continue. While a partition is removed, the RAID operates in degraded mode and has no or reduced disk fault tolerance. Even for RAIDs that can tolerate multiple concurrent disk failures, do not remove more than one component partition at a time.
WARNING: If a RAID does not have disk fault tolerance, or it is simply not consistent, data loss results if you remove any of its partitions. Be very careful when removing partitions, and make sure that you have a backup of your data available.
The procedure in this section uses the device names shown in the following table. Make sure to modify the names to use the names of your own devices.
100 SLES 11: Storage Administration Guide
Loading...