DGX Station User Guide explains how to install, set up, and maintain the NVIDIA® DGX
Station™.
This guide is aimed at users and administrators who are familiar with the Ubuntu
Desktop Linux OS, including use of the command line and the sudo command. For
information about how to use the Ubuntu Desktop Linux OS, refer to Ubuntu Desktop
For details about the DGX OS Desktop software for the DGX Station, refer to DGX OS
Desktop Release Notes.
For information about how to use the DGX Station to download and run containers for
deep learning frameworks, refer to DGX Container Registry User Guide.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|v
About this Guide
www.nvidia.com
DGX StationDU-08255-001 _v2.1|vi
Chapter1.
INTRODUCTION TO THE NVIDIA® DGX
STATION
The NVIDIA DGX Station is a fast, multi-GPU workstation for deep learning and
AI analytics. You can use the DGX Station to run neural networks, and deploy deep
learning models. Because the DGX Station is software compatible with the NVIDIA
DGX-1 server, you can also use the DGX Station to optimize applications to run on a
production DGX-1 cluster.
™
www.nvidia.com
DGX StationDU-08255-001 _v2.1|1
1.1.What's in the Box
DGX Station
‣
Accessory boxes containing:
‣
Quick Start Guide
‣
AC power cable
‣
3 DisplayPort™ 1.2 to HDMI 2.0 adapters
‣
USB recovery flash drive containing a backup copy of the operating system
‣
image and CUDA toolkit
DVD-ROM containing source code of open-source software installed on the
‣
DGX Station
Toxic Substance Notice and Safety Instructions
‣
Declaration of Conformity
‣
Repacking Instructions/Intra-Transit
‣
Introduction to the NVIDIA® DGX Station
™
Inspect each piece of equipment in the packing box. If anything is missing or damaged,
contact your supplier.
1.2.DGX OS Desktop Software Summary
The DGX OS Desktop software that is supplied with the DGX Station includes the
software that you need for downloading and running containers for deep learning
frameworks. The software is already installed on the DGX Station, except where
licensing requirements mandate that the software be supplied separately. Any software
that must be supplied separately is installed automatically when the DGX Station is first
powered on.
For details about the DGX OS Desktop software, refer to DGX OS Desktop Release
Notes.
1.3.DGX Station Hardware Summary
Processors
ComponentQtyDescription
CPU1Intel Xeon E5-2698 v4 2.2 GHz (20-Core)
GPU - current units4NVIDIA Tesla® V100-DGXS-32GB with 32 GB per GPU (128 GB total) of
GPU - earlier units4NVIDIA Tesla V100-DGXS-16GB with 16 GB per GPU (64 GB total) of
www.nvidia.com
DGX StationDU-08255-001 _v2.1|2
GPU memory
GPU memory
System Memory and Storage
Introduction to the NVIDIA® DGX Station
™
Unit
ComponentQty
System memory832 GB256 GBECC Registered LRDIMM DDR4 SDRAM
Data storage31.92 TB5.76 TB2.5" 6 Gb/s SATA III SSD in RAID 0 configuration
OS storage11.92 TB1.92 TB2.5" 6 Gb/s SATA III SSD
Capacity
Total
CapacityDescription
www.nvidia.com
DGX StationDU-08255-001 _v2.1|3
Chapter2.
SETTING UP THE NVIDIA DGX STATION
Before using the DGX Station, ensure that its initial set-up is complete.
2.1.Siting the DGX Station
Caution
The DGX Station weighs 88 lbs (40 kg). Do not attempt to lift the DGX Station.
Instead, remove the DGX Station from its packaging and move it into position by
rolling it on its fitted casters.
To prevent damage to components inside the DGX Station, do not subject the DGX
Station to excessive vibration or mechanical shock. After moving or transporting the
DGX Station, visually inspect the NVLINK bridge, which connects the GPUs, and the
drive trays in the drive cage to see if they have shifted out of position. If any of these
components has shifted, reseat the component before operating the DGX Station.
Site the DGX Station in a location that is clean, dust-free, well ventilated, and near an
appropriately rated, grounded AC power outlet.
Leave approximately 5" (12.5 cm) of clearance behind and at the sides of the DGX Station
to allow sufficient airflow for cooling the unit.
When operating the DGX Station, keep the ambient temperature and relative humidity
within the following ranges:
Ambient temperature: 10°C to 30°C (50°F to 86°F)
‣
Relative humidity: 10% to 80% (non-condensing)
‣
Always keep the DGX Station upright. Do not lay the unit on its side.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|4
Setting Up the NVIDIA DGX Station
2.2.Removing or Replacing the Packing Inside the
DGX Station
To prevent damage to components inside the DGX Station during transit, a foam
packing piece is packed inside the DGX Station. Before you connect and power on the
DGX Station, you must remove this packing piece from inside the DGX Station. If you
are returning the DGX Station to NVIDIA under a return merchandise authorization
(RMA), replace this packing piece before repacking the DGX Station.
Before you begin, ensure that:
The DGX Station is shut down and powered off.
‣
The power cable, all communications cables, and any peripheral devices such as
‣
displays and keyboards are disconnected from the DGX Station.
1.
Push the button on the right side of the DGX Station back panel to release the side
panel on the right of the DGX Station when viewed from the rear.
2.
Lift the panel to remove it.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|5
Setting Up the NVIDIA DGX Station
Caution To prevent damage from electrostatic discharge, avoid touching any of
the components inside the DGX Station.
3.
Remove or replace the foam packing piece that surrounds the GPU cards inside the
DGX Station.
To remove the foam packing piece, gently grasp it and pull it towards you.
‣
If you are unpacking an advance-shipped replacement for a unit that you are
returning to NVIDIA under an RMA, retain this foam packing piece with all
other DGX Station packaging. You will need the packaging to repack your
original DGX Station for shipment to NVIDIA.
To replace the foam packing piece, gently push it into position around the GPU
‣
cards inside the DGX Station.
4.
Align the bottom edge of the side panel with the bottom edge of the DGX Station.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|6
5.
Firmly push the panel back into place to re-engage the latches.
Setting Up the NVIDIA DGX Station
2.3.Connecting and Powering on the DGX Station
To complete this task you need the following items, which are not supplied with the
DGX Station:
Display with power cable and connector cable terminated in a DisplayPort
‣
connector or HDMI connector
If your display connector cable is terminated in an HDMI connector, you can use one
of the supplied adapters to connect the cable to the DGX Station.
USB keyboard
‣
USB mouse
‣
Ethernet cable
‣
www.nvidia.com
DGX StationDU-08255-001 _v2.1|7
™
Setting Up the NVIDIA DGX Station
1.
Connect a display to any DisplayPort connector and a keyboard and mouse to any
two USB ports.
For initial setup, connect only one display to the DGX Station. After you
complete the initial Ubuntu OS configuration, you can configure the DGX Station
to use multiple displays. For details, see Configuring the DGX Station To Use
Multiple Displays.
2.
Use any of the two Ethernet ports to connect the DGX Station to your LAN with
Internet connectivity.
Connect only one Ethernet port on the DGX Station to the Internet unless you
plan to configure the ports manually and disable DHCP on at least one of the
ports.
By default, both Ethernet ports on the DGX Station are configured for DHCP.
If both the ports are connected simultaneously, each port will get its own
IP address. The IP address that the Linux operating system (OS) uses will
www.nvidia.com
DGX StationDU-08255-001 _v2.1|8
Setting Up the NVIDIA DGX Station
then alternate between these addresses, causing the OS and applications to
malfunction.
3.
Make sure that the power supply rocker switch is in the OFF position.
Current units:
Earlier units:
4.
Connect the supplied power cable from the power socket at the back of the unit to
an appropriately rated, grounded AC outlet.
For details of the power consumption, input voltage, and current rating of the DGX
Station, see Power Specifications.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|9
Current units:
Earlier units:
Setting Up the NVIDIA DGX Station
Caution
Use only the supplied power cable and do not use this power cable with any other
products or for any other purpose. Not all power cables have the same current
ratings.
Do not use household extension cables with your product. Household extension
cables do not have overload protection and are not intended for use with
computer systems.
5.
Connect the display to a suitable AC outlet and power on the display.
6.
Move the DGX Station power supply rocker switch to the ON position.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|10
Current units:
Earlier units:
Setting Up the NVIDIA DGX Station
7.
Push the Power button on the front of the unit to power on the DGX Station.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|11
Setting Up the NVIDIA DGX Station
2.4.Completing the Initial Ubuntu OS
Configuration
When you power on the DGX Station for the first time, you are prompted to accept end
user license agreements for NVIDIA software. You are then guided through the process
for completing the initial Ubuntu OS configuration. As part of this process, you are
prompted to create your user name and password for logging in to the DGX Station.
To protect the DGX Station from unauthorized access, choose a strong password. The
strength of the password you choose is indicated as you type it.
After the Ubuntu OS configuration is complete, you can log in to the DGX Station to
access your Ubuntu desktop.
Updates to the DGX Station software might have been made available after your
DGX Station was manufactured. To ensure that you have the latest DGX Station
software, including security updates, check for updates and install any available
updates before using your DGX Station. For more information, see Updating DGX
Station Software.
2.5.Adding Support for Additional Languages to
the DGX Station
During the initial Ubuntu OS configuration, you are prompted to select the default
language on the DGX Station. If the language that you select is included in the DGX
OS Desktop software image, it is installed in addition to English and you will see that
language after you log in to access your desktop. If the language that you select is not
included, you will still see English after logging in and you will need to install the
language separately.
The following languages are included in the DGX OS Desktop software image:
English
‣
Chinese (Simplified)
‣
French
‣
German
‣
Italian
‣
Portuguese
‣
Russian
‣
Spanish
‣
www.nvidia.com
DGX StationDU-08255-001 _v2.1|12
Setting Up the NVIDIA DGX Station
For information about how to install languages, see Install languages (https://
help.ubuntu.com/16.04/ubuntu-help/prefs-language-install.html) in the Ubuntu Official
Documentation.
2.6.Registering Your DGX Station
Be sure to register your DGX Station with NVIDIA as soon as you receive your purchase
confirmation e-mail. By registering your DGX Station, you will be entitled to receive
technical support, warranty services, and software updates. You will also be able to set
up an NVIDIA DGX Container Registry account.
To register your DGX Station, you will need information provided in your purchase
confirmation e-mail. If you do not have the information, send an e-mail to NVIDIA
Enterprise Support at enterprisesupport@nvidia.com.
1.
From a browser, go to the NVIDIA DGX Product Registration (http://
Enter all required information and then click SUBMIT to complete the registration
process and receive all warranty entitlements and, if applicable, DGX Station
support services entitlements.
2.7.Configuring the DGX Station To Use Multiple
Displays
One of the NVIDIA Tesla V100 GPU cards in the DGX Station provides three
DisplayPort connectors, enabling you to connect up to three displays to the DGX Station.
If you want to use more than one display with the DGX Station, configure it to use
multiple displays after you complete the initial Ubuntu OS configuration.
1.
Connect the displays that you want to use to the DisplayPort connectors at the rear
of the DGX Station.
Each display is automatically detected as you connect it.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|13
Setting Up the NVIDIA DGX Station
2.
Optional: If necessary, adjust the display configuration, such as switching the
primary display, or changing monitor positions or orientation.
a)
From the Ubuntu system menu at the right of the desktop menu bar, choose
System Settings and in the System Settings window that opens, click Displays.
b)
In the Displays window that opens, make the changes to the display settings that
you want and click Apply.
High-resolution displays consume a large quantity of GPU memory. If you have
connected three 4K displays to the DGX Station, they may consume most of the GPU
memory on the NVIDIA Tesla V100 GPU card to which they are connected, especially if
you are running graphics-intensive applications.
If you are running memory-intensive compute workloads on the DGX Station and are
experiencing performance issues, consider conserving GPU memory by reducing or
minimizing the graphics workload.
To reduce the graphics workload, disconnect any additional displays you connected
‣
and use only one display with the DGX Station.
If you disconnect a display from the DGX Station, the disconnection is automatically
detected and the display settings are automatically adjusted for the remaining
displays.
To minimize the graphics workload, shut down the LightDM display manager and
‣
use secure shell (SSH) to log in to the DGX Station remotely.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|14
Setting Up the NVIDIA DGX Station
To shut down the LightDM display manager, type the following command:
$ sudo service lightdm stop
To start the LightDM display manager, log in to the DGX Station remotely and type
the following command:
$ sudo service lightdm start
2.8.Enabling Multiple Users to Access the DGX
Station Remotely
To enable multiple users to access the DGX Station remotely, secure shell (SSH) server is
installed and enabled on the DGX Station.
Add other Ubuntu OS users to the DGX Station to allow them to log in remotely to the
DGX Station through SSH.
For information about how to add a user, see Add a new user account (https://
help.ubuntu.com/16.04/ubuntu-help/user-add.html) in the Ubuntu Official
Documentation. For information about how to log in remotely through SSH, see
Connecting to an OpenSSH Server (https://help.ubuntu.com/community/SSH/OpenSSH/
ConnectingTo) on the Ubuntu Community Help Wiki.
The DGX Station does not provide any additional isolation guarantees between users
beyond the guarantees that the Ubuntu OS offers. For guidelines about how to secure
access to the DGX Station over SSH, see Configuring an OpenSSH Server (https://
help.ubuntu.com/community/SSH/OpenSSH/Configuring) on the Ubuntu Community
Help Wiki.
2.9.Preparing the DGX Station for Use with
Docker
Some initial setup of the DGX Station is required to ensure that users have the required
privileges to run Docker containers and to prevent IP address conflicts between Docker
and the DGX Station.
2.9.1.Enabling Users To Run Docker Containers
To prevent the docker daemon from running without protection against escalation of
privileges, the Docker software requires sudo privileges to run containers. Meeting this
requirement involves enabling users who will run Docker containers to run commands
with sudo privileges. Therefore, you should ensure that only users whom you trust
www.nvidia.com
DGX StationDU-08255-001 _v2.1|15
Setting Up the NVIDIA DGX Station
and who are aware of the potential risks to the DGX Station of running commands with
sudo privileges are able to run Docker containers.
Before allowing multiple users to run commands with sudo privileges, consult your IT
department to determine whether you would be violating your organization's security
policies. For the security implications of enabling users to run Docker containers, see
Docker daemon attack surface.
You can enable users to run the Docker containers in one of the following ways:
Add each user as an administrator user with sudo privileges.
‣
Add each user as a standard user without sudo privileges and then add the user
‣
to the docker group. This approach is inherently insecure because any user who
can send commands to the docker engine can escalate privilege and run root-user
operations.
To add an existing user to the docker group, run this command:
$ sudo usermod -aG docker user-login-id
user-login-id
The user login ID of the existing user that you are adding to the docker group.
2.9.2.Preventing IP Address Conflicts Between Docker
and the DGX Station
To ensure that the DGX Station can access the network interfaces for Docker containers,
configure the containers to use a subnet distinct from other network resources used
by the DGX Station. By default, Docker uses the 172.17.0.0/16 subnet. If addresses
within this range are already used on the DGX Station network, change the Docker
network to specify the bridge IP address range and container IP address range to be
used by Docker containers.
This task requires sudo privileges.
1.
Open the /etc/systemd/system/docker.service.d/dockeroverride.conf file in a plain-text editor, such as vi.
$ sudo vi /etc/systemd/system/docker.service.d/docker-override.conf
2.
Append the following options to the line that begins ExecStart=/usr/bin/
dockerd, which specifies the command to start the dockerd daemon:
--bip=bridge-ip-address-range
‣
--fixed-cidr=container-ip-address-range
‣
bridge-ip-address-range
The bridge IP address range to be used by Docker containers, for example,
192.168.127.1/24.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|16
Setting Up the NVIDIA DGX Station
container-ip-address-range
The container IP address range to be used by Docker containers, for example,
192.168.127.128/25.
This example shows a complete /etc/systemd/system/docker.service.d/
docker-override.conf file that has been edited to specify the bridge IP address
range and container IP address range to be used by Docker containers.
Starting with DGX OS Desktop release 3.1.4, the option --disable-legacy-
registry=false is removed from the Docker CE service configuration file
docker-override.conf. The option is removed for compatibility with
Docker CE 17.12 and later.
3.
Save and close the /etc/systemd/system/docker.service.d/dockeroverride.conf file.
4.
Reload the Docker settings for the systemd daemon.
$ sudo systemctl daemon-reload
5.
Restart the docker service.
$ sudo systemctl restart docker
www.nvidia.com
DGX StationDU-08255-001 _v2.1|17
Chapter3.
UPDATING DGX STATION SOFTWARE
Updates to DGX Station software are available from several sources. These updates
may contain important security vulnerability fixes. You are responsible for updating the
software on the DGX Station from these sources. For details about the available updates,
see Available DGX Station Software Updates.
You can use any of the standard means provided by the Ubuntu Desktop OS to update
this software. For examples, see:
Updating DGX Station Software from the Details Window
‣
Updating DGX Station Software from the Command Line
‣
Caution When you use these means to update software on the DGX Station,
you update all software for which updates are available from your configured
software sources, including applications that you installed yourself. If you
want to prevent an application from being updated, you can instruct the
Ubuntu package manager to keep the current version. For more information,
see Introduction to Holding Packages (https://help.ubuntu.com/community/
PinningHowto#Introduction_to_Holding_Packages) on the Ubuntu Community Help
Wiki.
3.1.Updating DGX Station Software from the
Details Window
When you open the Details window to get information about your DGX Station, the
system checks for updates and, if any updates are available, gives you the option to
install them.
Ensure that you are logged in to your Ubuntu desktop on the DGX Station as an
administrator user.
1.
From the Ubuntu system menu at the top right of the desktop, choose About This
Computer.
The Details window opens and the system checks for updates.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|18
Updating DGX Station Software
2.
In the Details window, click Install Updates.
3.
In the Software Updater window that opens, review the available updates and click
Install Now.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|19
Updating DGX Station Software
If no updates are available, the Software Updater informs you that your software is
up to date.
If an update requires the removal of obsolete packages, you will be warned that not
all updates can be installed. To continue with the update, perform these steps:
a)
Click Partial Upgrade.
b) Review the list of packages that will be removed.
To identify obsolete DGX OS Desktop packages, see the lists of obsolete packages
in the DGX OS Desktop Release Notes for all releases after your current release.
c)
If the list contains only packages that you want to remove, click Start Upgrade.
4.
When prompted to authenticate, type your password into the Password field and
click Authenticate.
5.
If necessary, restart your DGX Station when prompted to complete the updates.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|20
Updating DGX Station Software
3.2.Updating DGX Station Software from the
Command Line
Use the apt (http://manpages.ubuntu.com/manpages/xenial/en/man8/apt.8.html)
command to update DGX Station software from the command line.
Ensure that you are logged in to your Ubuntu desktop on the DGX Station as an
administrator user.
1.
Download information from all configured sources about the latest versions of the
packages.
$ sudo apt update
2.
Review the available updates by simulating an upgrade of the packages.
$ sudo apt full-upgrade -s
3.
Upgrade the packages to the latest version.
$ sudo apt full-upgrade
3.3.Available DGX Station Software Updates
Updates to DGX Station are made available through standard Ubuntu repositories.
DGX Station is preset to obtain from these repositories updates to the following
software:
Docker
‣
Software that is exclusive to the DGX Station, including the CUDA Toolkit and
‣
CUDA Drivers packages
Ubuntu software
‣
For more information about repositories, see Repositories/Ubuntu (https://
help.ubuntu.com/community/Repositories/Ubuntu) on the Ubuntu Community Help
Wiki.
3.3.1.Updates to Docker and Software Exclusive to the
DGX Station
Updates to Docker and to software that is exclusive to the DGX Station, including the
CUDA Toolkit and CUDA Drivers packages, are available from a repository maintained
by NVIDIA.
Caution
‣
Do not obtain updates to the CUDA Toolkit and CUDA Drivers packages from the
public CUDA package repository for Ubuntu. Updates from the public repository
www.nvidia.com
DGX StationDU-08255-001 _v2.1|21
Updating DGX Station Software
may be incompatible with the DGX Optimized Frameworks that are available from
the NVIDIA® DGX™ Container Registry.
‣
Do not obtain updates to Docker from Docker's repositories. NVIDIA Container
Runtime for Docker has strict dependencies on the Docker CE version and updates
from Docker's repository may cause NVIDIA Container Runtime for Docker to be
removed.
The repository maintained by NVIDIA is enabled by default in Ubuntu Software &
Updates, Other Software on the DGX Station, as shown in the following screen capture.
Although a Docker repository is also enabled, DGX Station no longer uses this
repository to obtain updates to Docker because the repository maintained by NVIDIA
takes precedence over the Docker repository.
3.3.2.Updates to the Ubuntu Software on the DGX
Station
Updates to the Ubuntu software on the DGX Station are available from the Canonical
repositories.
The repositories that are enabled by default in Ubuntu Software & Updates, UbuntuSoftware on the DGX Station are shown in the following screen capture.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|22
Updating DGX Station Software
By default, the DGX Station does not notify you of available updates or automatically
install any updates, including important security updates. To minimize the risk to your
DGX Station from security vulnerabilities, you must ensure that it is kept up to date
with the latest important security updates.
Updates to another LTS base OS version are blocked because they can disrupt the DGX
Station software and disable the NVIDIA graphics drivers.
3.4.Checking for Updates to DGX Station
Software
To check for software updates and to configure updates from the Ubuntu software
repositories, use System Settings, Software & Updates. You can configure your DGX
Station to notify you of important security updates more frequently than other updates.
In the following example, the DGX Station is configured to check for updates daily, to
display important security updates immediately, and to display other updates every two
weeks.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|23
Updating DGX Station Software
3.5.Getting Release Information for DGX Station
The file /etc/dgx-release provides release information for the DGX Station, such as
the product name and serial number.
This file also tracks the history of DGX OS Desktop software updates by providing the
following information:
The version number and installation date of the last version to be installed from an
‣
ISO image
The version number and update date of each over-the-network update applied since
‣
the software was last installed from an ISO image
You can use this information to determine if your DGX Station is running the current
version of the DGX OS Desktop software.
To get release information for the DGX Station, view the content of the file /etc/dgx-release.
DGX_OTA_VERSION="3.1.3"
DGX_OTA_DATE="Wed Nov 15 15:35:25 PST 2017"
DGX_OTA_VERSION="3.1.4"
DGX_OTA_DATE="Fri Jan 19 13:49:06 PST 2018"
www.nvidia.com
DGX StationDU-08255-001 _v2.1|24
Updating DGX Station Software
3.6.Updating Software on an Air-Gapped DGX
Station System
For security purposes, some installations require that the DGX Station be an air-gapped
system. An air-gapped system is not connected to any unsecured networks, such as
the public Internet or an unsecured LAN, or to any other computers connected to an
unsecured network. The default mechanisms for updating software on the DGX Station
and loading container images from the NVIDIA DGX Container Registry require an
Internet connection. On an air-gapped system, which is isolated from the Internet, you
must provide alternative mechanisms for updating software and loading container
images.
3.6.1.Providing DGX Station Software Updates from a
Private Repository
The public NVIDIA and Canonical repositories that provide software updates to the
DGX Station are Ubuntu repositories. Access to these repositories requires an Internet
connection. On an air-gapped system, which is isolated from the Internet, you must
provide these updates from a private repository that mirrors the public repositories.
1.
Identify the sources corresponding to the public NVIDIA and Canonical repositories
that provide updates to the DGX Station software.
You can identify these sources from the /etc/apt/sources.list file and the
contents of /etc/apt.sources.list.d/ directory, or by using System Settings,
Software & Updates.
2.
Create and maintain a private repository that mirrors the sources that you identified
in the previous step.
For detailed instructions, refer to Debian Repository Setup (https://wiki.debian.org/
DebianRepository/Setup) on the Debian wiki.
3.
Update the sources that provide updates to the DGX Station to use your private
repository instead of the public repositories.
You can update these sources by modifying the /etc/apt/sources.list file
and the contents of /etc/apt.sources.list.d/ directory, or by using SystemSettings, Software & Updates.
Future updates to the DGX Station software will be obtained from your private
repository.
3.6.2.Loading a Container Image onto an Air-Gapped
DGX Station System
Loading a container image from the NVIDIA DGX Container Registry requires an
Internet connection. On an air-gapped system, which is isolated from the Internet, you
www.nvidia.com
DGX StationDU-08255-001 _v2.1|25
Updating DGX Station Software
must use a removable medium to copy the container image from a system with an
Internet connection to the air-gapped system.
1.
On a system with an Internet connection, log in to the NVIDIA DGX Container
Registry and load the container image that you want.
For instructions, refer to DGX Container Registry User Guide.
2.
Save the container image as a tar archive.
$ docker save nvcr.io/registry-space/repository:tag > archive-file.tar
registry-space
The name of the space within the registry that contains the container image. For
container images provided by NVIDIA, the registry space is nvidia.
repository
The repository that contains the container image. A repository is a collection of
all versions of a container image with the same name. The repository name is the
main container image name.
tag
A tag that identifies the version of the container image.
archive-file
Your choice of name for the archive file to which you are saving the container
image.
3.
Transfer the image to the air-gapped system by using a removable medium such as a
USB flash drive or DVD-ROM.
4.
On the air-gapped system, load the container image from the local copy of the
archive file that contains the image.
$ docker load –i framework.tar
5.
Confirm that the image is loaded on the air-gapped system.
$ docker images
www.nvidia.com
DGX StationDU-08255-001 _v2.1|26
Chapter4.
MAINTAINING AND SERVICING THE NVIDIA
DGX STATION
Be sure to familiarize yourself with the NVIDIA Terms & Conditions documents before
attempting to perform any modification or repair to the DGX Station. These Terms
& Conditions for the DGX Station can be found through the NVIDIA DGX Systems
Support (http://www.nvidia.com/object/dgxsystems-support.html) page.
Caution The DGX Station is designed as an integrated system and does not support the
installation of additional PCIe devices such as GPU cards. Any attempt to modify the
DGX Station by installing additional PCIe devices is an unauthorized modification and
will void the DGX Station hardware warranty. Any such modification will also impair
the performance of the system, may overload the system’s electrical circuits, and
may cause it to overheat.
4.1.Problem Resolution and Customer Care
Log on to the NVIDIA Enterprise Services (https://nvid.nvidia.com/dashboard/) site
for assistance with troubleshooting, diagnostics, or to report problems with your DGX
Station.
4.2.Cleaning the Mesh Filter Under the DGX
Station
To prevent dust from entering the DGX Station through the ventilation holes under the
unit, a mesh filter is fitted to the underside of the DGX Station. Clean this mesh filter
periodically to prevent the accumulation of dust on the filter from impeding the flow of
air through the DGX Station.
1.
Reach under the front of the DGX Station and grasp the mesh filter by its handle.
2.
Pull the mesh filter towards you to slide it out from the font of the unit.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|27
Maintaining and Servicing the NVIDIA DGX Station
3.
Use compressed air to blow the dust from the mesh filter.
4.
Line up the mesh filter with the runners under the DGX Station and slide it back
into position under the unit.
4.3.Collecting Information for Troubleshooting
the DGX Station
To help diagnose and resolve issues, the DGX Station provides a tool to collect
troubleshooting information for NVIDIA Support Enterprise Services.
The tool verifies basic functionality and performance of the DGX Station and collects the
following information:
Log files
‣
www.nvidia.com
DGX StationDU-08255-001 _v2.1|28
Maintaining and Servicing the NVIDIA DGX Station
Hardware inventory
‣
SW inventory
‣
To collect information for troubleshooting the DGX Station, run the following command:
sudo nvsysinfo [-o output-file]
For DGX OS Desktop releases 3.1.1 through 3.1.3, the command to run is as follows:
sudo nvidia-sysinfo [-o output-file]
output-file is the name and the path of the file in which the information is
written. If you omit the output file, the information is written to the file /tmp/nvsysinfo-timestamp.random-number.out.
For DGX OS Desktop releases 3.1.1 through 3.1.3, the file name is /tmp/nvidia-
sys-info-timestamp.random-number.out.
Use any method that is convenient for you to send the file to NVIDIA Support
Enterprise Services. For example, send the file as an e-mail attachment.
4.4.Checking the Health of the DGX Station
The DGX Station provides the NVIDIA System Health Checker (nvhealth) tool to
exercise the system and verify its health. The output of nvhealth is an itemized list of
checks and their status, typically Healthy or Unhealthy. On a healthy system, all checks
should return Healthy. You should investigate any checks that return Unhealthy to
determine their root cause and resolve them.
To check the health of the DGX Station, run the following command:
$ sudo nvhealth [-k output-file]
output-file is the name and the path of the file in which the raw state of the system
is written. If you omit the output file, the information is written to the file /tmp/
nvhealth-log.random-string.jsonl, for example, /tmp/nvhealthlog.6wf3WriAC3.jsonl. The nvhealth command displays this file name at the end
of the output from the command.
4.5.Replacing the System and Components
Be sure to familiarize yourself with the NVIDIA Terms & Conditions documents before
attempting to perform any modification or repair to the DGX Station. These Terms
& Conditions for the DGX Station can be found through the NVIDIA DGX Systems
Support (http://www.nvidia.com/object/dgxsystems-support.html) page.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|29
Maintaining and Servicing the NVIDIA DGX Station
Contact NVIDIA Enterprise Customer support to obtain an RMA number for any
system or component that needs to be returned for repair or replacement.
The only components that are customer-replaceable are the Solid State Drives (SSDs).
Return the failed components to NVIDIA.
4.5.1.Replacing the System
When returning a DGX Station under RMA, consider the following points.
Packaging
To prevent damage during shipping, repack the DGX Station in the packaging in which
the replacement unit was advanced shipped by following the instructions in Repacking
the DGX Station for Shipment.
SSDs
If necessary, you can remove and keep the SSDs prior to shipping the system back for
replacement. If you already received a replacement system and you want to keep the
original SSDs, install the new SSDs into the defective system when shipping it back.
AC Power Cable
Do not return the AC power cable when returning the DGX Station.
Accessories
Include all supplied accessories except the AC power cable when returning the DGX
Station.
4.5.2.Repacking the DGX Station for Shipment
If you are returning the DGX Station to NVIDIA under an RMA, repack it in the
packaging in which the replacement unit was advanced shipped to prevent damage
during shipment.
Caution The DGX Station weighs 88 lbs (40 kg). Do not attempt to lift the DGX Station.
Instead, move it into position by rolling it on its fitted casters.
Before you begin, ensure that the foam packing piece that surrounds the GPU cards
inside the DGX Station has been replaced. For detailed instructions, see Removing or
Replacing the Packing Inside the DGX Station.
1.
Place the bottom tray of the DGX Station shipping carton on the floor and ensure
that the flap at the front of the tray is pulled down to form a ramp.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|30
Maintaining and Servicing the NVIDIA DGX Station
2.
Roll the DGX Station up the ramp into the bottom tray of its shipping carton.
Caution Ensure that you have a second person to help you roll the DGX Station
into position.
3.
Insert the front packing piece into the tray, ensuring that the lip of the packing piece
is under the DGX Station.
4.
Insert the side packing pieces into the tray, ensuring that the lip of each piece is
under the DGX Station.
5.
Pack all supplied accessories in the accessory boxes except the AC power cable.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|31
Maintaining and Servicing the NVIDIA DGX Station
Keep the AC power cable to use with your replacement DGX Station.
6.
Place both accessory boxes in the slots in the tray on each side of the DGX Station.
Ensure that the lugs that protrude from the edges of each accessory box are facing
away from the DGX Station.
The accessory boxes are required to help hold the DGX Station in place in its
packaging during shipment. Be sure to place both accessory boxes in the slots in the
tray, even if one or both boxes are empty.
7.
Pull up the flap at the front of the bottom tray of the DGX Station shipping carton.
8.
Lower the top cover of the shipping carton into position so that the holes in the top
cover and the holes in the bottom tray are aligned.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|32
Maintaining and Servicing the NVIDIA DGX Station
9.
Insert the packing clasps into the cutouts in the top cover of the shipping carton and
engage the clasps to secure the top cover in place.
To prevent the packing clasps from becoming jammed inside the shipping carton, do
not use excessive force when inserting them into the cutouts.
4.6.Maintaining the DGX Station Persistent
Storage
The DGX Station persistent storage consists of SSDs for data storage and the operating
system. As supplied from the factory, these SSDs are configured as described in System
Memory and Storage.
4.6.1.Changing the RAID Level of the RAID Array
As supplied from the factory, the RAID level of the DGX Station RAID array is RAID 0.
RAID 0 provides the maximum storage capacity, but does not provide any redundancy.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|33
Maintaining and Servicing the NVIDIA DGX Station
If a single SSD in the array fails, all data stored on the array is lost. If you are willing to
accept reduced capacity in return for some level of protection against failure of a single
SSD, you can change the level of the RAID array to RAID 5. If you change the RAID level
from RAID 0 to RAID 5, the total storage capacity of the RAID array is reduced from
5.76 TB to 3.84 TB.
Before changing the RAID level of the DGX Station RAID array, back up all data on the
array that you want to preserve. Changing the RAID level of the DGX Station RAID
array erases all data stored on the array.
The DGX Station software includes the custom script configure_raid_array.py,
which you can use to change the level of the RAID array without unmounting the RAID
volume.
To change the RAID level to RAID 5, run the following command:
‣
$ sudo configure_raid_array.py -m raid5
To change the RAID level to RAID 0, run the following command:
‣
$ sudo configure_raid_array.py -m raid0
To confirm that the RAID level was changed as required, run the lsblk command. The
entry in the TYPE column for each SSD in the RAID array indicates the RAID level of the
array.
The following example shows that the RAID level of the array is RAID 0. The name of
the RAID volume is md0 and the mount point of the volume is /raid.
~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
|_sda1 8:1 0 487M 0 part /boot/efi
|_sda2 8:2 0 1.8T 0 part /
sdb 8:16 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
sdc 8:32 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
sdd 8:48 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
4.6.2.Checking the Status of the DGX Station RAID
Array
Use the mdadm command to print details of the md0 device.
$ sudo mdadm -D /dev/md0
This example shows the status of a RAID array that is functioning properly.
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Mon Jun 5 17:40:48 2017
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Chunk Size : 512K
Name : lab-VirtualBox:0 (local to host lab-VirtualBox)
UUID : c8ba911a:8634bd99:2ebeea3d:c9a7db4c
Events : 0
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync /dev/sdc
2 8 48 2 active sync /dev/sdd
This example shows the status of a RAID array in which one SSD has failed or is
missing. The failed or missing SSD is identified by the empty RaidDevice State
column.
$ sudo mdadm -D /dev/md0
...
Number Major Minor RaidDevice State
0 8 16 0 active sync /dev/sdb
1 8 32 1 active sync
2 8 48 2 active sync /dev/sdd
4.6.3.Checking the Status of the DGX Station SSDs
LEDs on the DGX Station SSDs indicate the status of the SSDs. The SSDs are mounted
inside the DGX Station and are visible only when the side panel that covers the SSDs is
removed.
1.
Remove the side panel on the left of the DGX Station when viewed from the rear.
a) Push the button on the left side of the DGX Station back panel to release the
panel.
b) Lift the panel to remove it.
Caution To prevent damage from electrostatic discharge, avoid touching any
of the components inside the DGX Station other than any components that you
are replacing or servicing.
2.
Examine each SSD to determine its status from the state of the LED on the SSD.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|35
Maintaining and Servicing the NVIDIA DGX Station
On (steady)
The SSD is operational but is idle.
On (blinking)
The SSD is being read from or written to.
Off
The SSD has failed and must be replaced.
3.
Replace the side panel of the DGX Station.
a) Align the bottom edge of the side panel with the bottom edge of the DGX Station.
b) Firmly push the panel back into place to re-engage the latches.
If an SSD has failed, you must replace it as explained in Replacing an SSD.
4.6.4.Replacing an SSD
If an SSD in the DGX Station fails, replace the SSD to return the system to operation.
Caution The default RAID level of the array in the DGX Station is RAID 0, which does
not provide any redundancy. If a single SSD in the array fails, all data stored on the
array is lost. To prevent the failure of an SSD from causing a loss of data, ensure that
any data on the array that you want to preserve is backed up.
1.
Remove the side panel on the left of the DGX Station when viewed from the rear.
a) Push the button on the left side of the DGX Station back panel to release the
panel.
b) Lift the panel to remove it.
Caution To prevent damage from electrostatic discharge, avoid touching any
of the components inside the DGX Station other than any components that you
are replacing or servicing.
2.
On the SSD that you want to replace, press the drive-tray eject button to loosen the
drive-tray latch.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|36
Maintaining and Servicing the NVIDIA DGX Station
3.
Pull the drive-tray latch upwards to unseat the drive tray.
4.
Slide the drive tray upwards to completely remove it from the unit.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|37
Maintaining and Servicing the NVIDIA DGX Station
5.
Using a Phillips screwdriver, remove the four screws attaching the SSD to the drive
tray.
Save the screws for the replacement SSD.
6.
Slide the SSD out of the drive tray.
7.
Slide the replacement SSD into the drive tray.
Make sure that the connector is on the open edge side of the tray.
8.
Secure the replacement SSD to the drive tray using the four screws.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|38
Maintaining and Servicing the NVIDIA DGX Station
9.
With the drive-tray eject button at the right, insert the drive tray into the appropriate
drive bay, then slide the drive tray all the way into the drive bay.
10.
Press the drive-try latch downwards until you hear a click to completely seat the
drive tray.
11.
Replace the side panel of the DGX Station.
a) Align the bottom edge of the side panel with the bottom edge of the DGX Station.
b) Firmly push the panel back into place to re-engage the latches.
What you need to do to return the DGX Station to service depends on whether you
replaced an SSD in the RAID array the OS SSD.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|39
Maintaining and Servicing the NVIDIA DGX Station
If you replaced an SSD in the RAID array, rebuild the RAID array as explained in
‣
Rebuilding the DGX Station RAID Array.
If you replaced the OS SSD, restore the software image as explained in Restoring the
‣
DGX Station Software Image.
4.6.5.Rebuilding the DGX Station RAID Array
If the DGX Station RAID array is degraded because an SSD failed, replace the SSD as
explained in Replacing an SSD.
After replacing a failed SSD in the RAID array, you must rebuild the array to
add the new SSD to a RAID 0 array or to regenerate the lost data on the new
SSD in a RAID 5 array. The DGX Station software includes the custom script
configure_raid_array.py for this purpose.
To rebuild the array, run the following command:
$ sudo configure_raid_array.py -r
The time required to rebuild a RAID 5 array depends on factors such as system load,
SSD capacity, and the number of SSDs in the array. Rebuilding the array of three, 1.92terabyte SSDs in the DGX Station may require several hours.
You can monitor the progress of a long-running rebuild by examining the contents of the
In this example, the rebuild is 4.0% complete and the rebuild is estimated to finish in
438.3 minutes.
The RAID array is rebuilt with its existing RAID level.
If the array is a RAID 0 array, all data that was on the array is erased after array is
‣
rebuilt.
If the array is a RAID 5 array, the data on the array is preserved after array is rebuilt.
‣
If you have rebuilt a RAID 0 array and have a backup of data on the array that you want
to preserve, restore the data from the backup.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|40
Maintaining and Servicing the NVIDIA DGX Station
4.7.Restoring the DGX Station Software Image
If the DGX Station software image becomes corrupted or the OS SSD was replaced after
a failure, restore the DGX Station software image to its original factory condition from a
pristine copy of the image.
A USB flash drive is supplied from which you can restore the DGX Station software
image. Before using this USB drive to restore the DGX Station software image, contact
NVIDIA Support Enterprise Services to see if a later version of the software image is
available. If a later version of the image is available, prepare a bootable installation
medium that contains the current software image as explained in the following topics:
Obtaining the DGX Station Software ISO Image and Checksum File
‣
Creating a Bootable Installation Medium
‣
When you have a bootable installation medium that contains the current software image,
install the image as explained in Installing the DGX Station Software Image from a USB
Flash Drive or DVD-ROM.
Updates to the DGX Station software might have been made available after the
latest available ISO image file was created. To ensure that you have the latest DGX
Station software, including security updates, check for updates and install any
available updates after you restore the software image. For more information, see
Updating DGX Station Software.
4.7.1.Obtaining the DGX Station Software ISO Image
and Checksum File
To ensure that you restore the latest available version of the DGX Station software
image, obtain the current ISO image file from NVIDIA Support Enterprise Services. A
checksum file is provided for the image to enable you to verify the bootable installation
medium that you create from the image file.
1.
Log on to the NVIDIA Enterprise Services (https://nvid.nvidia.com/dashboard/) site.
2.
Click the Announcements tab to locate the download links for the DGX Station
software image.
3.
Download the ISO image and its checksum file and save them to your local disk.
The ISO image is also available in an archive file. If you download the archive file, be
sure to extract the ISO image before proceeding.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|41
Maintaining and Servicing the NVIDIA DGX Station
4.7.2.Creating a Bootable Installation Medium
After obtaining an ISO file that contains the software image from NVIDIA Support
Enterprise Services, create a bootable installation medium, such as a USB flash drive or
DVD-ROM, that contains the image.
If you are creating a bootable USB flash drive, follow the instructions for the
‣
platform that you are using:
On Ubuntu Desktop, see Creating a Bootable USB Flash Drive by Using Startup
‣
Disk Creator.
On Windows, see Creating a Bootable USB Flash Drive by Using Akeo Rufus.
‣
If you are creating a bootable DVD-ROM, you can use any of the methods
‣
described in Burning the ISO on to a DVD (https://help.ubuntu.com/community/
BurningIsoHowto#Burning_the_ISO_on_to_a_DVD) on the Ubuntu Community
Help Wiki.
4.7.2.1.Creating a Bootable USB Flash Drive by Using Startup Disk
Creator
On an Ubuntu Desktop system, you can use Startup Disk Creator to create a bootable
USB flash drive that contains the DGX Station software image.
Ensure that the following prerequisites are met:
The correct DGX Station software image is saved to your local disk. For more
‣
information, see Obtaining the DGX Station Software ISO Image and Checksum File.
The USB flash drive has a capacity of at least 4 GB.
‣
1.
Plug the USB flash drive into one of the USB ports of your Ubuntu Desktop system.
2.
Open the Dash, search for Startup Disk Creator , and click the Startup Disk
Creator icon.
3.
In the Make Startup Disk window that opens, from the Source disc image (.iso) list,
select the DGX Station software image file.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|42
Maintaining and Servicing the NVIDIA DGX Station
If the DGX Station software image file is not listed, click Other and in the window
that opens, navigate to the file, select the file, and click Open.
4.
From the Disk to use list, select the USB flash drive and click Make Startup Disk.
4.7.2.2.Creating a Bootable USB Flash Drive by Using Akeo Rufus
On a Windows system, you can use the Akeo Reliable USB Formatting Utility (Rufus)
(https://rufus.akeo.ie/) to create a bootable USB flash drive that contains the DGX Station
software image.
Ensure that the following prerequisites are met:
The correct DGX Station software image is saved to your local disk. For more
‣
information, see Obtaining the DGX Station Software ISO Image and Checksum File.
The USB flash drive has a capacity of at least 4 GB.
‣
1.
Plug the USB flash drive into one of the USB ports of your Windows system.
2.
Download and launch the Akeo Reliable USB Formatting Utility (Rufus) (https://
rufus.akeo.ie/).
www.nvidia.com
DGX StationDU-08255-001 _v2.1|43
Maintaining and Servicing the NVIDIA DGX Station
3.
Under Partition scheme and target system type, select GPT partition scheme for
UEFI.
4.
Select the Create a bootable disk using option and from the dropdown menu, select
ISO image.
5.
Click the optical drive icon and open the DGX Station software ISO image.
6.
Click Start.
Because the image is a hybrid ISO file, you are prompted to select whether to write
the image in ISO Image (file copy) mode or DD Image (disk image) mode.
7.
Select Write in ISO Image mode and click OK.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|44
Maintaining and Servicing the NVIDIA DGX Station
4.7.3.Verifying the Bootable Installation Medium
On a Linux system, you can use the checksum file provided for the DGX Station
software image to verify the installation medium that you created from the image.
Ensure that the following prerequisites are met:
The checksum file for the DGX Station software image is saved to your local disk.
‣
For more information, see Obtaining the DGX Station Software ISO Image and
Checksum File.
You have created a bootable installation medium from the image. For more
‣
information, see Creating a Bootable Installation Medium.
How to verify a bootable installation medium depends on whether it is a USB flash drive
or a DVD-ROM.
4.7.3.1.Verifying a Bootable USB Flash Drive
1.
Plug the USB flash drive into one of the USB ports of your Linux system.
2.
Obtain the device ID of the USB flash drive by running the lsblk (http://
You can identify the USB flash drive from its size, which is much smaller than the
size of the SSDs in the DGX Station, and from the mount points of any partitions on
the drive, which are under /media.
In the following example, the device ID of the USB flash drive is sde1.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
|_sda1 8:1 0 487M 0 part /boot/efi
|_sda2 8:2 0 1.8T 0 part /
sdb 8:16 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
sdc 8:32 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
sdd 8:48 0 1.8T 0 disk
|_md0 9:0 0 5.2T 0 raid0 /raid
sde 8:64 1 3.7G 0 disk
|_sde1 8:65 1 3.2G 0 part /media/deepl/DGXSTATION
|_sde2 8:66 1 2.3M 0 part
$
3.
Compute the checksum of the image on the USB flash drive.
$ sudo dd if=device-id bs=block-size | cksum
device-id
The device ID of the USB flash drive, for example, /dev/sde1.
block-size
The block size to be used by the dd command, for example, 1M.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|45
Maintaining and Servicing the NVIDIA DGX Station
This example computes the checksum of an image on the USB flash drive with
device ID /dev/sde1 using a block size of 1 MB.
$ sudo dd if=/dev/sde1 bs=1M | cksum
3299+1 records in
3299+1 records out
3459317760 bytes (3.5 GB, 3.2 GiB) copied, 164.369 s, 21.0 MB/s
3992706625 3459317760
4.
Obtain the checksum value from the checksum file.
$ cat checksum-file
checksum-file
The path, including the file name, to the checksum file.
This example obtains the checksum value for the image
DGXStation-3.1.2_56d4a9.iso from the checksum file
DGXStation-3.1.2_56d4a9.crc in the current working directory.
If the value obtained from the checksum file matches the value computed from the
image, the integrity of the installation medium has been successfully verified.
4.7.3.2.Verifying a Bootable DVD-ROM
1.
Load the DVD-ROM into an optical drive connected to your Linux system.
2.
Compute the checksum of the image on the DVD-ROM.
$ cksum < /dev/sr0
This example computes the checksum of an image on a DVD-ROM.
$ cksum < /dev/sr0
3992706625 3459317760
3.
Obtain the checksum value from the checksum file.
$ cat checksum-file
checksum-file
The path, including the file name, to the checksum file.
This example obtains the checksum value for the image
DGXStation-3.1.2_56d4a9.iso from the checksum file
DGXStation-3.1.2_56d4a9.crc in the current working directory.
If the value obtained from the checksum file matches the value computed from the
image, the integrity of the installation medium has been successfully verified.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|46
Maintaining and Servicing the NVIDIA DGX Station
4.7.4.Installing the DGX Station Software Image from a
USB Flash Drive or DVD-ROM
Before installing the DGX Station software image, ensure that you have a bootable USB
flash drive or DVD-ROM that contains the current DGX Station software image.
Caution Installing the DGX Station software image erases all data stored on the OS
SSD. The /home partition, where all users' documents, software settings, bookmarks,
and other personal files are stored, resides on the OS SSD and will be erased.
However, if you chose to install the DGX Station software and preserve the RAID array
contents, persistent data stored in the RAID array is unaffected.
1.
Shut down the DGX Station.
2.
Load the USB flash drive or DVD-ROM into the DGX Station.
If you are using a USB flash drive, plug it into one of the USB ports of the DGX
‣
Station.
If you are using a DVD-ROM, connect an external optical drive to the DGX
‣
Station and load the DVD-ROM into the drive.
3.
Power on the DGX Station.
4.
At the first NVIDIA screen to appear, press F8 to select the boot device.
5.
In the menu for selecting the boot device, use the arrow keys to select UEFI: usbkey-or-dvd-rom-name, Partition n (size) and press Enter.
6.
When the GNU GRUB menu appears, select the option you want for installing the
DGX Station software and press Enter.
To install the software while preserving persistent data stored in the RAID array,
‣
select Install DGX OS Desktop release and preserve RAID contents.
To install the software and re-initialize the RAID array, select Install DGX OS
‣
Desktop release and re-initialize RAID0 volume.
Caution If you chose this option, all data stored in the RAID array will be
erased.
The installation requires several minutes to complete.
Licensing requirements prevent some DGX Station software, such as the NVIDIA
Graphics Drivers, from being supplied in the software image. The DGX Station
automatically installs this software when installation from the software image is
complete.
7.
When the installation is complete, respond to the prompts to accept end user
license agreements for NVIDIA software and to configure the Ubuntu OS, including
creating your user name and password for logging in to the DGX Station.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|47
Maintaining and Servicing the NVIDIA DGX Station
8.
After the Ubuntu OS configuration is complete, log in to the DGX Station to access
your Ubuntu desktop.
9.
Eject the USB flash drive or DVD-ROM.
10.
Unplug the USB flash drive or optical drive from the DGX Station.
4.8.Updating the DGX Station System BIOS
If you need to update the DGX Station system BIOS, you can obtain the current version
of it from NVIDIA Support Enterprise Services.
Caution
Update the system BIOS only if required to resolve an issue with the DGX Station.
If your DGX Station is operating normally, do not update the system BIOS. An error
during an attempt to update the system BIOS may leave your DGX Station unable to
boot.
If you must update the system BIOS, be sure to obtain the BIOS file from NVIDIA
Enterprise Services. Do not obtain a BIOS file from the motherboard manufacturer or
any other source.
To complete this task, you need a USB flash drive formatted to a single FAT 16 or FAT 32
partition.
1.
Obtain the system BIOS file.
a) Log on to NVIDIA Enterprise Services (https://nvid.nvidia.com/dashboard/).
b)
Click the Announcements tab to locate the download links for the archive file
containing the DGX Station system BIOS file.
c) Download the archive file and extract the system BIOS file.
2.
Copy the system BIOS file to the USB flash drive.
3.
Shut down the DGX Station.
4.
Plug the USB flash drive into one of the USB ports of the DGX Station.
5.
Power on the DGX Station.
6.
At the first NVIDIA screen to appear, press Delete or F2 to enter the UEFI BIOS
setup.
7.
In the UEFI BIOS Utility - EZ Mode screen, click Advanced Mode.
8.
From the Tool menu, choose EZ 3 Flash Utility and press Enter.
9.
In the EZ 3 Flash Update screen, select via Storage Device(s) as the BIOS update
method and press Enter.
10.
In the Drive list, use the up arrow and down arrow keys to select the USB flash
drive that contains the BIOS file and press Enter.
11.
In the Folder list, use the up arrow and down arrow keys to select the BIOS file.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|48
Maintaining and Servicing the NVIDIA DGX Station
12.
Press Enter to start the BIOS update process.
Caution To avoid the risk of leaving your DGX Station unable to boot, do not shut
down or reset the DGX Station during the BIOS update process.
13.
When the BIOS update process is complete, reboot the DGX Station.
4.9.Maintaining the GPU Liquid Cooling System
A liquid cooling system keeps the GPUs in the DGX Station within their required
operating temperature range. To ensure reliable operation of the cooling system, you
must maintain it periodically.
4.9.1.Monitoring GPU Temperatures
1.
Open the Dash, search for NVIDIA X Server Settings, and click the NVIDIA
X Server Settings icon.
2.
Under each GPU in the list of GPUs in the NVIDIA X Server Settings window, click
Thermal Settings.
Thermal sensor information for the GPU is displayed, including its current
temperature and an indication of whether the temperature is within the GPU's
operating range.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|49
Maintaining and Servicing the NVIDIA DGX Station
If the GPUs are running too hot, check the level of the liquid in the GPU cooling system
as explained in Checking the Level of the Liquid in the GPU Cooling System.
4.9.2.Checking the Level of the Liquid in the GPU
Cooling System
In normal operation, some coolant liquid may be lost from system. Every 12 months,
check the level of the liquid in the cooling system to ensure that it remains at the
required level.
1.
Remove the side panel on the right of the DGX Station when viewed from the rear.
a) Push the button on the right side of the DGX Station back panel to release the
panel.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|50
b) Lift the panel to remove it.
Maintaining and Servicing the NVIDIA DGX Station
Caution To prevent damage from electrostatic discharge, avoid touching any
of the components inside the DGX Station other than any components that you
are replacing or servicing.
2.
Look at the gauge on the side of the cooling system pump to determine the level of
the liquid in the cooling system.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|51
Maintaining and Servicing the NVIDIA DGX Station
If level of the liquid in the cooling system is at or above the Minimum Level in the
‣
reservoir, go to the next step.
If the liquid has fallen below the Minimum Level in the reservoir, replenish it as
‣
explained in Replenishing the Liquid in the GPU Cooling System.
3.
Replace the side panel of the DGX Station.
a) Align the bottom edge of the side panel with the bottom edge of the DGX Station.
b) Firmly push the panel back into place to re-engage the latch.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|52
Maintaining and Servicing the NVIDIA DGX Station
4.9.3.Replenishing the Liquid in the GPU Cooling
System
Replenish the liquid in the GPU cooling system if the liquid is below the required level
or to refill the cooling system after draining it to renew the cooling liquid.
To complete this task, you need the following tools and materials:
Torx T20 Allen wrench
‣
1 bottle of EK-CryoFuel Clear Premix coolant
‣
Caution Use only EK-CryoFuel Clear coolant. Do not use any other type of
coolant. Use of other types of coolant will void the DGX Station hardware
warranty and may cause damage to or impair the performance of the system.
Flexible plastic filling bottle with delivery tube
‣
www.nvidia.com
DGX StationDU-08255-001 _v2.1|53
Maintaining and Servicing the NVIDIA DGX Station
Before you begin, ensure that the DGX Station is powered off.
1.
Fill the plastic filling bottle with the mixture.
2.
Use the Torx T20 Allen wrench to loosen the filler cap at top of the cooling system
pump and when the cap is loose, remove it.
3.
Insert the delivery tube of the filling bottle into the open filler cap at the top of the
pump.
4.
Gently squeeze the filler bottle to dispense the coolant liquid into the pump until the
liquid reaches the Maximum Level in the reservoir.
5.
Replace the filler cap at top of the pump and use the Torx T20 Allen wrench to
tighten the cap until it is finger tight.
Do not over tighten the filler cap.
6.
Power on the DGX Station and let it run for one minute.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|54
Maintaining and Servicing the NVIDIA DGX Station
If the pump makes a grinding noise, power off and power on the DGX Station four
times.
7.
Ensure that the level of the liquid in the cooling system is at the Maximum Level in
the reservoir.
If the liquid has fallen below the Maximum Level in the reservoir, repeat the
following sequence of steps until level of the liquid in the cooling system remains at
the Maximum Level.
a) Remove the filler cap at top of the cooling system pump.
b) Dispense more coolant liquid into the pump until the liquid reaches the
Maximum Level in the reservoir again.
c) Replace the filler cap at top of the pump.
d) Power on the DGX Station and let it run for one minute.
e) Check the level of the liquid in the cooling system.
8.
Power off the DGX Station.
9.
Replace the side panel of the DGX Station.
a) Align the bottom edge of the side panel with the bottom edge of the DGX Station.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|55
Maintaining and Servicing the NVIDIA DGX Station
b) Firmly push the panel back into place to re-engage the latch.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|56
AppendixA.
SAFETY
To reduce the risk of bodily injury, electrical shock, fire, and equipment damage, read
this document and observe all warnings and precautions in this guide before installing
or maintaining your product. NVIDIA products are designed to operate safely when
installed and used according to the product instructions and general safety practices.
The guidelines included in this document explain the potential risks associated with
computer operation and provide important safety practices designed to minimize these
risks.
The product is designed and tested to meet IEC 60950-1, the Standard for the Safety of
Information Technology Equipment. This also covers the national implementation of
IEC 60950-1 based safety standards around the world, for example, UL 60950-1. These
standards reduce the risk of injury from the following hazards:
Electric shock: Hazardous voltage levels contained in parts of the product
‣
Fire: Overload, temperature, material flammability
Retain and follow all product safety and operating instructions. Always refer to the
documentation supplied with your equipment. Observe all warnings on the product and
in the operating instructions.
WARNING: FAILURE TO FOLLOW THESE SAFETY INSTRUCTIONS COULD RESULT IN FIRE,
ELECTRIC SHOCK OR OTHER INJURY OR DAMAGE. ELECTRICAL EQUIPMENT CAN BE
HAZARDOUS IF MISUSED. OPERATION OF THIS PRODUCT, OR SIMILAR PRODUCTS, MUST
ALWAYS BE SUPERVISED BY AN ADULT. DO NOT ALLOW CHILDREN ACCESS TO THE INTERIOR
OF ANY ELECTRICAL PRODUCT AND DO NOT PERMIT THEM TO HANDLE ANY CABLES.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|57
Safety
A.1.Intended Application Uses
This product was evaluated as Information Technology Equipment (ITE), which may
be installed in offices, schools, computer rooms, and similar commercial type locations.
The suitability of this product for other product categories and environments (such as
medical, industrial, residential, alarm systems, and test equipment), other than an ITE
application, may require further evaluation.
A.2.General Precautions
To reduce the risk of personal injury or damage to the equipment:
Shut down the product and disconnect all AC power cables before installation.
‣
Do not connect or disconnect any cables when performing installation, maintenance,
‣
or reconfiguration of this product during an electrical storm.
Never turn on any equipment when there is evidence of fire, water, or structural
‣
damage.
Place the product away from radiators, heat registers, stoves, amplifiers, or other
‣
products that produce heat.
Never use the product in a wet location.
‣
Avoid inserting foreign objects through openings in the product.
‣
Do not use conductive tools that could bridge live parts.
‣
Do not make mechanical or electrical modifications to the equipment.
‣
Use the product only with approved equipment.
‣
Follow all cautions and instructions marked on the equipment. Do not attempt to
‣
defeat safety interlocks (where provided).
Operate the DGX Station in a place where the temperature is always in the range
‣
10°C to 30°C (50°F to 86°F).
A.3.Electrical Precautions
Power Cable
To reduce the risk of electric shock, fire, or damage to the equipment:
Use only the supplied power cable and do not use this power cable with any other
‣
products or for any other purpose. Not all power cables have the same current
ratings.
Do not use household extension cables with your product. Household extension
‣
cables do not have overload protection and are not intended for use with computer
systems.
If you lose or damage the supplied power cable, or have to change the power cable
‣
for any reason, use a cable rated for your product and for the voltage and current
www.nvidia.com
DGX StationDU-08255-001 _v2.1|58
marked on the electrical ratings label of the product. The voltage and current rating
of the cable must be greater than the voltage and current rating marked on the
product.
Plug the power cable into a grounded (earthed) electrical outlet that is easily
‣
accessible at all times. The product is equipped with a three-wire electrical
grounding-type plug which has a third pin for ground. This plug fits only into a
grounded electrical power outlet.
Do not disable the power cable grounding plug. The grounding plug is an important
‣
safety feature.
Do not place objects on power cables. Arrange them so that no one may accidentally
‣
step on or trip over them.
Do not pull on a cable. When unplugging the product from the electrical outlet,
‣
grasp the plug.
When possible, use one hand only to connect or disconnect cables.
‣
Do not modify power cables or plugs. Consult a licensed electrician or your power
‣
company for site modifications.
Safety
Power Supply
Ensure that the voltage and frequency of your power source match the voltage and
‣
frequency inscribed on the equipment’s electrical rating label. If you have a question
about the type of power source to use, contact your authorized service provider.
Connect the equipment to a properly wired and grounded electrical outlet and
‣
always follow your local or national wiring rules.
Ensure that the socket outlet is near the equipment and is readily accessible for
‣
disconnection.
To help protect your system from sudden, transient increases and decreases in
‣
electrical power, consider using a surge suppressor or line conditioner.
Never force a connector into a port. Check for obstructions on the port. If the
‣
connector and port don’t join with reasonable ease, they probably don’t match. Make
sure that the connector matches the port and that you have positioned the connector
correctly in relation to the port.
Do not open the power supply. Hazardous voltage, current and energy levels are
‣
present inside the power supply. The power supply in this product contains no userserviceable parts. Return to manufacturer for servicing.
A.4.Communications Cable Precautions
To reduce the risk of exposure to electrical shock hazards from communications cables:
Do not connect communications cables during an electrical storm. There may be a
‣
risk of electric shock from lightning.
Do not connect or use communications cables in a wet location.
‣
www.nvidia.com
DGX StationDU-08255-001 _v2.1|59
Disconnect the communications cables before opening a product enclosure, or
‣
touching or installing internal components.
A.5.Other Hazards
Proposition 65 Warning
This product contains chemicals known to the State of California to cause cancer and
birth defects or other reproductive harm.
California Department of Toxic Substances Control
Perchlorate Material – special handling may apply. See www.dtsc.ca.gov/
The decorative metal foam on the DGX Station casework contains some nickel. The
metal foam is not intended for direct and prolonged skin contact. While nickel exposure
is unlikely to be a problem, you should be aware of the possibility in case you’re
susceptible to nickel-related reactions.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|60
AppendixB.
CONNECTIONS, CONTROLS, AND
INDICATORS
B.1.Front-Panel Connections and Controls
IDTypeQtyDescription
1Power Button1Press to turn the DGX Station on or off
B.2.Rear-Panel Connections and Controls
Current Units
IDTypeQtyDescription
1USB 3.1 Type-C1USB 3.1 Type-C port
2Ethernet210G LAN ports (see LAN Port Indicators):
www.nvidia.com
DGX StationDU-08255-001 _v2.1|61
Connections, Controls, and Indicators
IDTypeQtyDescription
‣
Lower port: LAN 1
‣
Upper port: LAN 2
3USB 3.04USB 3.0 ports
4S/PDIF Audio Output1Optical S/PDIF out port
5eSATA2eSATA ports for connecting external storage devices, such as
hard drives or optical drives, with an external power supply
6AC Input1Power supply input
7Reset Button1Press to reboot the system without turning off the system
power
8USB 3.1 Type-A1USB 3.1 Type-A port
9Audio I/O53.5 mm I/O ports for 2-, 4-, 6-, or 8-channel audio (see
Audio I/O Connections)
10DisplayPort3Ports for connecting up to 3 displays
11Power Supply Switch1Turn the power supply on and off
Earlier Units
IDTypeQtyDescription
1USB 3.1 Type-C1USB 3.1 Type-C port
2Ethernet210G LAN ports (see LAN Port Indicators):
www.nvidia.com
DGX StationDU-08255-001 _v2.1|62
‣
Lower port: LAN 1
‣
Upper port: LAN 2
Connections, Controls, and Indicators
IDTypeQtyDescription
3USB 3.04USB 3.0 ports
4S/PDIF Audio Output1Optical S/PDIF out port
5eSATA2eSATA ports for connecting external storage devices, such as
hard drives or optical drives, with an external power supply
6Power Supply Switch1Turn the power supply on and off
7Reset Button1Press to reboot the system without turning off the system
power
8USB 3.1 Type-A1USB 3.1 Type-A port
9Audio I/O53.5 mm I/O ports for 2-, 4-, 6-, or 8-channel audio (see
Audio I/O Connections)
10DisplayPort3Ports for connecting up to 3 displays
11AC Input1Power supply input
B.3.LAN Port Indicators
LEDs on each Ethernet LAN port indicate the connection status as illustrated in the
following figure and described in the following tables.
The NVIDIA DGX Station is compliant with the regulations listed in this section.
C.1.DGX Station Model Number
Model: P2587
C.2.Argentina
S-Mark
C.3.Australia/New Zealand
RCM
www.nvidia.com
DGX StationDU-08255-001 _v2.1|66
C.4.Brazil
INMETRO
C.5.Canada
Innovation, Science and Economic Development Canada (ISED)
Compliance
CAN ICES-3(A)/NMB-3(A)
The Class A digital apparatus meets all requirements of the Canadian InterferenceCausing Equipment Regulation.
Cet appareil numérique de la classe A respecte toutes les exigences du Règlement sur le
matériel brouilleur du Canada.
www.nvidia.com
DGX StationDU-08255-001 _v2.1|67
C.6.China
RoHS Material Content
Compliance
www.nvidia.com
DGX StationDU-08255-001 _v2.1|68
Compliance
C.7.European Union
European Conformity; Conformité Européenne (CE)
This is a Class A product. In a domestic environment this product may cause radio
frequency interference in which case the user may be required to take adequate
measures.
The product has been marked with the CE Mark to illustrate its compliance.
This device complies with the following Directives:
EMC Directive (2014/30/EU) for Class A, I.T.E equipment.
‣
Low Voltage Directive (2014/35/EU) for electrical safety.
‣
RoHS Directive (2011/65/EU) for hazardous substances.
‣
ErP Directive (2009/125/EC) for European Ecodesign.
‣
A copy of the Declaration of Conformity to the essential requirements may be obtained
directly from NVIDIA GmbH (Floessergasse 2, 81369 Munich, Germany).
www.nvidia.com
DGX StationDU-08255-001 _v2.1|69
C.8.India
BIS
Self Declaration - Conforming to IS13252:2010, R-41078743
Compliance
C.9.Israel
www.nvidia.com
DGX StationDU-08255-001 _v2.1|70
C.10.Japan
VCCI
C.11.Russia
CU-TR
Compliance
C.12.South Africa
LOA
Compliant with SANS IEC 60950
SABS
Compliant with SANS 222 CISPR 22
www.nvidia.com
DGX StationDU-08255-001 _v2.1|71
C.13.South Korea
KC
Compliance
C.14.Taiwan
BSMI
www.nvidia.com
DGX StationDU-08255-001 _v2.1|72
Compliance
C.15.United States
Federal Communications Commission (FCC)
FCC Marking (Class A)
This device complies with part 15 of the FCC Rules. Operation is subject to the following
two conditions: (1) this device may not cause harmful interference, and (2) this device
must accept any interference received, including any interference that may cause
undesired operation of the device.
NOTE: This equipment has been tested and found to comply with the limits for a Class
A digital device, pursuant to part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can radiate
www.nvidia.com
DGX StationDU-08255-001 _v2.1|73
Compliance
radio frequency energy and, if not installed and used in accordance with the instruction
manual, may cause harmful interference to radio communications. Operation of this
equipment in a residential area is likely to cause harmful interference in which case the
user will be required to correct the interference at his own expense.
C.16.United States/Canada
cULus Listing Mark
C.17.Vietnam
ICT
www.nvidia.com
DGX StationDU-08255-001 _v2.1|74
AppendixD.
DGX STATION HARDWARE SPECIFICATIONS
D.1.Environmental Conditions
ConditionOperating RangeNonoperating Range
Ambient temperature10°C to 30°C (50°F to 86°F)5°C to 40°C (41°F to 104°F)
Relative humidity10% to 80% (non-condensing)8% to 90% (non-condensing)