Nvidia DGX Station User Manual

DGX STATION
DU-08255-001 _v2.1 | May 2018
User Guide
TABLE OF CONTENTS
About this Guide..................................................................................................v
Chapter1.Introduction to the NVIDIA® DGX Station™................................................... 1
1.1.What's in the Box.........................................................................................2
1.2.DGX OS Desktop Software Summary...................................................................2
1.3.DGX Station Hardware Summary....................................................................... 2
Chapter2.Setting Up the NVIDIA DGX Station............................................................ 4
2.1.Siting the DGX Station................................................................................... 4
2.2.Removing or Replacing the Packing Inside the DGX Station....................................... 5
2.3.Connecting and Powering on the DGX Station....................................................... 7
2.4.Completing the Initial Ubuntu OS Configuration................................................... 12
2.5.Adding Support for Additional Languages to the DGX Station....................................12
2.6.Registering Your DGX Station..........................................................................13
2.7.Configuring the DGX Station To Use Multiple Displays............................................ 13
2.8.Enabling Multiple Users to Access the DGX Station Remotely....................................15
2.9.Preparing the DGX Station for Use with Docker................................................... 15
2.9.1.Enabling Users To Run Docker Containers......................................................15
2.9.2.Preventing IP Address Conflicts Between Docker and the DGX Station....................16
Chapter3.Updating DGX Station Software............................................................... 18
3.1.Updating DGX Station Software from the Details Window........................................18
3.2.Updating DGX Station Software from the Command Line........................................ 21
3.3.Available DGX Station Software Updates............................................................ 21
3.3.1.Updates to Docker and Software Exclusive to the DGX Station............................ 21
3.3.2.Updates to the Ubuntu Software on the DGX Station........................................22
3.4.Checking for Updates to DGX Station Software....................................................23
3.5.Getting Release Information for DGX Station.......................................................24
3.6.Updating Software on an Air-Gapped DGX Station System....................................... 25
3.6.1.Providing DGX Station Software Updates from a Private Repository.......................25
3.6.2.Loading a Container Image onto an Air-Gapped DGX Station System......................25
Chapter4.Maintaining and Servicing the NVIDIA DGX Station........................................27
4.1.Problem Resolution and Customer Care............................................................. 27
4.2.Cleaning the Mesh Filter Under the DGX Station.................................................. 27
4.3.Collecting Information for Troubleshooting the DGX Station.....................................28
4.4.Checking the Health of the DGX Station............................................................ 29
4.5.Replacing the System and Components..............................................................29
4.5.1.Replacing the System............................................................................. 30
4.5.2.Repacking the DGX Station for Shipment...................................................... 30
4.6.Maintaining the DGX Station Persistent Storage................................................... 33
4.6.1.Changing the RAID Level of the RAID Array................................................... 33
4.6.2.Checking the Status of the DGX Station RAID Array..........................................34
4.6.3.Checking the Status of the DGX Station SSDs................................................. 35
www.nvidia.com
DGX Station DU-08255-001 _v2.1|ii
4.6.4.Replacing an SSD................................................................................... 36
4.6.5.Rebuilding the DGX Station RAID Array........................................................ 40
4.7.Restoring the DGX Station Software Image......................................................... 41
4.7.1.Obtaining the DGX Station Software ISO Image and Checksum File....................... 41
4.7.2.Creating a Bootable Installation Medium...................................................... 42
4.7.2.1.Creating a Bootable USB Flash Drive by Using Startup Disk Creator..................42
4.7.2.2.Creating a Bootable USB Flash Drive by Using Akeo Rufus............................. 43
4.7.3.Verifying the Bootable Installation Medium................................................... 45
4.7.3.1.Verifying a Bootable USB Flash Drive..................................................... 45
4.7.3.2.Verifying a Bootable DVD-ROM............................................................. 46
4.7.4.Installing the DGX Station Software Image from a USB Flash Drive or DVD-ROM.........47
4.8.Updating the DGX Station System BIOS..............................................................48
4.9.Maintaining the GPU Liquid Cooling System........................................................ 49
4.9.1.Monitoring GPU Temperatures................................................................... 49
4.9.2.Checking the Level of the Liquid in the GPU Cooling System.............................. 50
4.9.3.Replenishing the Liquid in the GPU Cooling System..........................................53
Appendix A.Safety............................................................................................. 57
A.1.Intended Application Uses............................................................................. 58
A.2.General Precautions.................................................................................... 58
A.3.Electrical Precautions.................................................................................. 58
A.4.Communications Cable Precautions.................................................................. 59
A.5. Other Hazards........................................................................................... 60
AppendixB.Connections, Controls, and Indicators.....................................................61
B.1.Front-Panel Connections and Controls...............................................................61
B.2.Rear-Panel Connections and Controls................................................................61
B.3.LAN Port Indicators..................................................................................... 63
B.4.Audio I/O Connections................................................................................. 64
AppendixC.Compliance...................................................................................... 66
C.1.DGX Station Model Number........................................................................... 66
C.2. Argentina................................................................................................. 66
C.3.Australia/New Zealand.................................................................................66
C.4. Brazil...................................................................................................... 67
C.5. Canada.................................................................................................... 67
C.6. China...................................................................................................... 68
C.7.European Union..........................................................................................69
C.8. India....................................................................................................... 70
C.9. Israel...................................................................................................... 70
C.10. Japan.................................................................................................... 71
C.11. Russia.................................................................................................... 71
C.12. South Africa............................................................................................ 71
C.13. South Korea.............................................................................................72
C.14. Taiwan................................................................................................... 72
C.15. United States........................................................................................... 73
www.nvidia.com
DGX Station DU-08255-001 _v2.1|iii
C.16.United States/Canada.................................................................................74
C.17. Vietnam..................................................................................................74
AppendixD.DGX Station Hardware Specifications...................................................... 75
D.1.Environmental Conditions..............................................................................75
D.2.Component Specifications............................................................................. 75
D.3.Mechanical Specifications..............................................................................76
D.4.Power Specifications....................................................................................76
www.nvidia.com
DGX Station DU-08255-001 _v2.1|iv
ABOUT THIS GUIDE
DGX Station User Guide explains how to install, set up, and maintain the NVIDIA® DGX Station™.
This guide is aimed at users and administrators who are familiar with the Ubuntu Desktop Linux OS, including use of the command line and the sudo command. For information about how to use the Ubuntu Desktop Linux OS, refer to Ubuntu Desktop
Guide (https://help.ubuntu.com/16.04/ubuntu-help/index.html).
For details about the DGX OS Desktop software for the DGX Station, refer to DGX OS
Desktop Release Notes.
For information about how to use the DGX Station to download and run containers for deep learning frameworks, refer to DGX Container Registry User Guide.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|v
About this Guide
www.nvidia.com
DGX Station DU-08255-001 _v2.1|vi
Chapter1. INTRODUCTION TO THE NVIDIA® DGX
STATION
The NVIDIA DGX Station is a fast, multi-GPU workstation for deep learning and AI analytics. You can use the DGX Station to run neural networks, and deploy deep learning models. Because the DGX Station is software compatible with the NVIDIA DGX-1 server, you can also use the DGX Station to optimize applications to run on a production DGX-1 cluster.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|1
1.1.What's in the Box
DGX Station
Accessory boxes containing:
Quick Start Guide
AC power cable
3 DisplayPort™ 1.2 to HDMI 2.0 adapters
USB recovery flash drive containing a backup copy of the operating system
image and CUDA toolkit DVD-ROM containing source code of open-source software installed on the
DGX Station Toxic Substance Notice and Safety Instructions
Declaration of Conformity
Repacking Instructions/Intra-Transit
Introduction to the NVIDIA® DGX Station
Inspect each piece of equipment in the packing box. If anything is missing or damaged, contact your supplier.
1.2.DGX OS Desktop Software Summary
The DGX OS Desktop software that is supplied with the DGX Station includes the software that you need for downloading and running containers for deep learning frameworks. The software is already installed on the DGX Station, except where licensing requirements mandate that the software be supplied separately. Any software that must be supplied separately is installed automatically when the DGX Station is first powered on.
For details about the DGX OS Desktop software, refer to DGX OS Desktop Release
Notes.
1.3.DGX Station Hardware Summary
Processors
Component Qty Description
CPU 1 Intel Xeon E5-2698 v4 2.2 GHz (20-Core)
GPU - current units 4 NVIDIA Tesla® V100-DGXS-32GB with 32 GB per GPU (128 GB total) of
GPU - earlier units 4 NVIDIA Tesla V100-DGXS-16GB with 16 GB per GPU (64 GB total) of
www.nvidia.com
DGX Station DU-08255-001 _v2.1|2
GPU memory
GPU memory
System Memory and Storage
Introduction to the NVIDIA® DGX Station
Unit
Component Qty
System memory 8 32 GB 256 GB ECC Registered LRDIMM DDR4 SDRAM
Data storage 3 1.92 TB 5.76 TB 2.5" 6 Gb/s SATA III SSD in RAID 0 configuration
OS storage 1 1.92 TB 1.92 TB 2.5" 6 Gb/s SATA III SSD
Capacity
Total Capacity Description
www.nvidia.com
DGX Station DU-08255-001 _v2.1|3
Chapter2. SETTING UP THE NVIDIA DGX STATION
Before using the DGX Station, ensure that its initial set-up is complete.
2.1.Siting the DGX Station
Caution
The DGX Station weighs 88 lbs (40 kg). Do not attempt to lift the DGX Station. Instead, remove the DGX Station from its packaging and move it into position by rolling it on its fitted casters.
To prevent damage to components inside the DGX Station, do not subject the DGX Station to excessive vibration or mechanical shock. After moving or transporting the DGX Station, visually inspect the NVLINK bridge, which connects the GPUs, and the drive trays in the drive cage to see if they have shifted out of position. If any of these components has shifted, reseat the component before operating the DGX Station.
Site the DGX Station in a location that is clean, dust-free, well ventilated, and near an appropriately rated, grounded AC power outlet.
Leave approximately 5" (12.5 cm) of clearance behind and at the sides of the DGX Station to allow sufficient airflow for cooling the unit.
When operating the DGX Station, keep the ambient temperature and relative humidity within the following ranges:
Ambient temperature: 10°C to 30°C (50°F to 86°F)
Relative humidity: 10% to 80% (non-condensing)
Always keep the DGX Station upright. Do not lay the unit on its side.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|4
Setting Up the NVIDIA DGX Station
2.2.Removing or Replacing the Packing Inside the DGX Station
To prevent damage to components inside the DGX Station during transit, a foam packing piece is packed inside the DGX Station. Before you connect and power on the DGX Station, you must remove this packing piece from inside the DGX Station. If you are returning the DGX Station to NVIDIA under a return merchandise authorization (RMA), replace this packing piece before repacking the DGX Station.
Before you begin, ensure that:
The DGX Station is shut down and powered off.
The power cable, all communications cables, and any peripheral devices such as
displays and keyboards are disconnected from the DGX Station.
1.
Push the button on the right side of the DGX Station back panel to release the side panel on the right of the DGX Station when viewed from the rear.
2.
Lift the panel to remove it.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|5
Setting Up the NVIDIA DGX Station
Caution To prevent damage from electrostatic discharge, avoid touching any of the components inside the DGX Station.
3.
Remove or replace the foam packing piece that surrounds the GPU cards inside the DGX Station.
To remove the foam packing piece, gently grasp it and pull it towards you.
If you are unpacking an advance-shipped replacement for a unit that you are returning to NVIDIA under an RMA, retain this foam packing piece with all other DGX Station packaging. You will need the packaging to repack your original DGX Station for shipment to NVIDIA.
To replace the foam packing piece, gently push it into position around the GPU
cards inside the DGX Station.
4.
Align the bottom edge of the side panel with the bottom edge of the DGX Station.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|6
5.
Firmly push the panel back into place to re-engage the latches.
Setting Up the NVIDIA DGX Station
2.3.Connecting and Powering on the DGX Station
To complete this task you need the following items, which are not supplied with the DGX Station:
Display with power cable and connector cable terminated in a DisplayPort
connector or HDMI connector
If your display connector cable is terminated in an HDMI connector, you can use one of the supplied adapters to connect the cable to the DGX Station.
USB keyboard
USB mouse
Ethernet cable
www.nvidia.com
DGX Station DU-08255-001 _v2.1|7
Setting Up the NVIDIA DGX Station
1.
Connect a display to any DisplayPort connector and a keyboard and mouse to any two USB ports.
For initial setup, connect only one display to the DGX Station. After you complete the initial Ubuntu OS configuration, you can configure the DGX Station to use multiple displays. For details, see Configuring the DGX Station To Use
Multiple Displays.
2.
Use any of the two Ethernet ports to connect the DGX Station to your LAN with Internet connectivity.
Connect only one Ethernet port on the DGX Station to the Internet unless you plan to configure the ports manually and disable DHCP on at least one of the ports.
By default, both Ethernet ports on the DGX Station are configured for DHCP. If both the ports are connected simultaneously, each port will get its own IP address. The IP address that the Linux operating system (OS) uses will
www.nvidia.com
DGX Station DU-08255-001 _v2.1|8
Setting Up the NVIDIA DGX Station
then alternate between these addresses, causing the OS and applications to malfunction.
3.
Make sure that the power supply rocker switch is in the OFF position.
Current units:
Earlier units:
4.
Connect the supplied power cable from the power socket at the back of the unit to an appropriately rated, grounded AC outlet.
For details of the power consumption, input voltage, and current rating of the DGX Station, see Power Specifications.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|9
Current units:
Earlier units:
Setting Up the NVIDIA DGX Station
Caution
Use only the supplied power cable and do not use this power cable with any other products or for any other purpose. Not all power cables have the same current ratings.
Do not use household extension cables with your product. Household extension cables do not have overload protection and are not intended for use with computer systems.
5.
Connect the display to a suitable AC outlet and power on the display.
6.
Move the DGX Station power supply rocker switch to the ON position.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|10
Current units:
Earlier units:
Setting Up the NVIDIA DGX Station
7.
Push the Power button on the front of the unit to power on the DGX Station.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|11
Setting Up the NVIDIA DGX Station
2.4.Completing the Initial Ubuntu OS Configuration
When you power on the DGX Station for the first time, you are prompted to accept end user license agreements for NVIDIA software. You are then guided through the process for completing the initial Ubuntu OS configuration. As part of this process, you are prompted to create your user name and password for logging in to the DGX Station.
To protect the DGX Station from unauthorized access, choose a strong password. The strength of the password you choose is indicated as you type it.
After the Ubuntu OS configuration is complete, you can log in to the DGX Station to access your Ubuntu desktop.
Updates to the DGX Station software might have been made available after your DGX Station was manufactured. To ensure that you have the latest DGX Station software, including security updates, check for updates and install any available updates before using your DGX Station. For more information, see Updating DGX
Station Software.
2.5.Adding Support for Additional Languages to the DGX Station
During the initial Ubuntu OS configuration, you are prompted to select the default language on the DGX Station. If the language that you select is included in the DGX OS Desktop software image, it is installed in addition to English and you will see that language after you log in to access your desktop. If the language that you select is not included, you will still see English after logging in and you will need to install the language separately.
The following languages are included in the DGX OS Desktop software image:
English
Chinese (Simplified)
French
German
Italian
Portuguese
Russian
Spanish
www.nvidia.com
DGX Station DU-08255-001 _v2.1|12
Setting Up the NVIDIA DGX Station
For information about how to install languages, see Install languages (https://
help.ubuntu.com/16.04/ubuntu-help/prefs-language-install.html) in the Ubuntu Official
Documentation.
2.6.Registering Your DGX Station
Be sure to register your DGX Station with NVIDIA as soon as you receive your purchase confirmation e-mail. By registering your DGX Station, you will be entitled to receive technical support, warranty services, and software updates. You will also be able to set up an NVIDIA DGX Container Registry account.
To register your DGX Station, you will need information provided in your purchase confirmation e-mail. If you do not have the information, send an e-mail to NVIDIA Enterprise Support at enterprisesupport@nvidia.com.
1.
From a browser, go to the NVIDIA DGX Product Registration (http://
www.nvidia.com/object/dgx-product-registration) page.
2.
Enter all required information and then click SUBMIT to complete the registration process and receive all warranty entitlements and, if applicable, DGX Station support services entitlements.
2.7.Configuring the DGX Station To Use Multiple Displays
One of the NVIDIA Tesla V100 GPU cards in the DGX Station provides three DisplayPort connectors, enabling you to connect up to three displays to the DGX Station. If you want to use more than one display with the DGX Station, configure it to use multiple displays after you complete the initial Ubuntu OS configuration.
1.
Connect the displays that you want to use to the DisplayPort connectors at the rear of the DGX Station.
Each display is automatically detected as you connect it.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|13
Setting Up the NVIDIA DGX Station
2.
Optional: If necessary, adjust the display configuration, such as switching the primary display, or changing monitor positions or orientation.
a)
From the Ubuntu system menu at the right of the desktop menu bar, choose System Settings and in the System Settings window that opens, click Displays.
b)
In the Displays window that opens, make the changes to the display settings that you want and click Apply.
High-resolution displays consume a large quantity of GPU memory. If you have connected three 4K displays to the DGX Station, they may consume most of the GPU memory on the NVIDIA Tesla V100 GPU card to which they are connected, especially if you are running graphics-intensive applications.
If you are running memory-intensive compute workloads on the DGX Station and are experiencing performance issues, consider conserving GPU memory by reducing or minimizing the graphics workload.
To reduce the graphics workload, disconnect any additional displays you connected
and use only one display with the DGX Station.
If you disconnect a display from the DGX Station, the disconnection is automatically detected and the display settings are automatically adjusted for the remaining displays.
To minimize the graphics workload, shut down the LightDM display manager and
use secure shell (SSH) to log in to the DGX Station remotely.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|14
Setting Up the NVIDIA DGX Station
To shut down the LightDM display manager, type the following command:
$ sudo service lightdm stop
To start the LightDM display manager, log in to the DGX Station remotely and type the following command:
$ sudo service lightdm start
2.8.Enabling Multiple Users to Access the DGX Station Remotely
To enable multiple users to access the DGX Station remotely, secure shell (SSH) server is installed and enabled on the DGX Station.
Add other Ubuntu OS users to the DGX Station to allow them to log in remotely to the DGX Station through SSH.
For information about how to add a user, see Add a new user account (https://
help.ubuntu.com/16.04/ubuntu-help/user-add.html) in the Ubuntu Official
Documentation. For information about how to log in remotely through SSH, see
Connecting to an OpenSSH Server (https://help.ubuntu.com/community/SSH/OpenSSH/ ConnectingTo) on the Ubuntu Community Help Wiki.
The DGX Station does not provide any additional isolation guarantees between users beyond the guarantees that the Ubuntu OS offers. For guidelines about how to secure access to the DGX Station over SSH, see Configuring an OpenSSH Server (https://
help.ubuntu.com/community/SSH/OpenSSH/Configuring) on the Ubuntu Community
Help Wiki.
2.9.Preparing the DGX Station for Use with Docker
Some initial setup of the DGX Station is required to ensure that users have the required privileges to run Docker containers and to prevent IP address conflicts between Docker and the DGX Station.
2.9.1.Enabling Users To Run Docker Containers
To prevent the docker daemon from running without protection against escalation of privileges, the Docker software requires sudo privileges to run containers. Meeting this requirement involves enabling users who will run Docker containers to run commands with sudo privileges. Therefore, you should ensure that only users whom you trust
www.nvidia.com
DGX Station DU-08255-001 _v2.1|15
Setting Up the NVIDIA DGX Station
and who are aware of the potential risks to the DGX Station of running commands with sudo privileges are able to run Docker containers.
Before allowing multiple users to run commands with sudo privileges, consult your IT department to determine whether you would be violating your organization's security policies. For the security implications of enabling users to run Docker containers, see
Docker daemon attack surface.
You can enable users to run the Docker containers in one of the following ways:
Add each user as an administrator user with sudo privileges.
Add each user as a standard user without sudo privileges and then add the user
to the docker group. This approach is inherently insecure because any user who can send commands to the docker engine can escalate privilege and run root-user operations.
To add an existing user to the docker group, run this command:
$ sudo usermod -aG docker user-login-id
user-login-id
The user login ID of the existing user that you are adding to the docker group.
2.9.2.Preventing IP Address Conflicts Between Docker and the DGX Station
To ensure that the DGX Station can access the network interfaces for Docker containers, configure the containers to use a subnet distinct from other network resources used by the DGX Station. By default, Docker uses the 172.17.0.0/16 subnet. If addresses within this range are already used on the DGX Station network, change the Docker network to specify the bridge IP address range and container IP address range to be used by Docker containers.
This task requires sudo privileges.
1.
Open the /etc/systemd/system/docker.service.d/docker­override.conf file in a plain-text editor, such as vi.
$ sudo vi /etc/systemd/system/docker.service.d/docker-override.conf
2.
Append the following options to the line that begins ExecStart=/usr/bin/
dockerd, which specifies the command to start the dockerd daemon:
--bip=bridge-ip-address-range
--fixed-cidr=container-ip-address-range
bridge-ip-address-range
The bridge IP address range to be used by Docker containers, for example,
192.168.127.1/24.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|16
Setting Up the NVIDIA DGX Station
container-ip-address-range
The container IP address range to be used by Docker containers, for example,
192.168.127.128/25.
This example shows a complete /etc/systemd/system/docker.service.d/ docker-override.conf file that has been edited to specify the bridge IP address
range and container IP address range to be used by Docker containers.
[Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// -s overlay2 --default-shm-size=1G -­bip=192.168.127.1/24 --fixed-cidr=192.168.127.128/25 LimitMEMLOCK=infinity LimitSTACK=67108864
Starting with DGX OS Desktop release 3.1.4, the option --disable-legacy-
registry=false is removed from the Docker CE service configuration file
docker-override.conf. The option is removed for compatibility with
Docker CE 17.12 and later.
3.
Save and close the /etc/systemd/system/docker.service.d/docker­override.conf file.
4.
Reload the Docker settings for the systemd daemon.
$ sudo systemctl daemon-reload
5.
Restart the docker service.
$ sudo systemctl restart docker
www.nvidia.com
DGX Station DU-08255-001 _v2.1|17
Chapter3. UPDATING DGX STATION SOFTWARE
Updates to DGX Station software are available from several sources. These updates may contain important security vulnerability fixes. You are responsible for updating the software on the DGX Station from these sources. For details about the available updates, see Available DGX Station Software Updates.
You can use any of the standard means provided by the Ubuntu Desktop OS to update this software. For examples, see:
Updating DGX Station Software from the Details Window
Updating DGX Station Software from the Command Line
Caution When you use these means to update software on the DGX Station, you update all software for which updates are available from your configured software sources, including applications that you installed yourself. If you want to prevent an application from being updated, you can instruct the Ubuntu package manager to keep the current version. For more information, see Introduction to Holding Packages (https://help.ubuntu.com/community/
PinningHowto#Introduction_to_Holding_Packages) on the Ubuntu Community Help
Wiki.
3.1.Updating DGX Station Software from the Details Window
When you open the Details window to get information about your DGX Station, the system checks for updates and, if any updates are available, gives you the option to install them.
Ensure that you are logged in to your Ubuntu desktop on the DGX Station as an administrator user.
1.
From the Ubuntu system menu at the top right of the desktop, choose About This Computer.
The Details window opens and the system checks for updates.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|18
Updating DGX Station Software
2.
In the Details window, click Install Updates.
3.
In the Software Updater window that opens, review the available updates and click Install Now.
www.nvidia.com
DGX Station DU-08255-001 _v2.1|19
Loading...
+ 58 hidden pages