The NVIDIA® DGX-1™ Deep Learning System is the world’s first purpose-built system
for deep learning with fully integrated hardware and software that can be deployed
quickly and easily.
1.1.Using the DGX-1: Overview
The NVIDIA DGX-1 comes with a base operating system consisting of an Ubuntu OS,
Docker, Docker Engine Utility for NVIDIA GPUs, and NVIDIA drivers. Ths system is
designed to run a number of NVIDIA-optimized deep learning framework applications
packaged in Docker containers. You can use your own scheduling and management
software to run jobs, and also build and run your own applications on the DGX-1.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|1
Introduction to the NVIDIA DGX-1 Deep Learning System
4USB2USB 3.0 ports are available to connect a keyboard.
5VGA1The VGA port connects to a VGA capable monitor for local viewing of
the DGX-1 setup console or base OS.
6DB91RS232 serial port for internal debugging
7AC input4Power supply inputs
8Ethernet (RJ45)210GBASE-T dual port network adapter Mezzanine
9
IPMI (RJ45)
110/100BASE-T Intelligent Platform Management Interface (IPMI) port
1.2.5.Rear Panel Power Controls
IDTypeQtyDescription
1Power button1
2Power LED1
3Main Board Status
LED
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|4
Press and immediately release the power button for a graceful
shutdown of the host OS.
Press and hold the power button for at least four seconds to shut
down the system immediately. The BMC remains live.
Off: Power off
Blue (steady): Power on
Blue (blinking): BMC reports system health fault.
1
Off: Normal
Amber (blinking): BMC reports system health fault.
Introduction to the NVIDIA DGX-1 Deep Learning System
1.2.6.LAN LEDs
LEDs next to each Ethernet port indicate the connection status as described in the table
below:
LEDStatusDescription
1
(Port 1 Link/Activity)
2
(Port 1 Speed)
3
(Port 0 Link/Activity)
4
(Port 0 Speed)
Amber (steady)LAN link
Amber (blinking) LAN access (off when there is traffic)
OffDisconnected
Green10 Gb/s
Amber1 Gb/s
Off100 Mb/s
Amber (steady)LAN link
Amber (blinking) LAN access (off when there is traffic)
OffDisconnected
Green10 Gb/s
Amber1 Gb/s
Off100 Mb/s
1.2.7.IPMI Port LEDs
LEDs on the IPMI port indicate the connection status as described in the table below:
LinkActivityDescription
OffOffUnplugged
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|5
Introduction to the NVIDIA DGX-1 Deep Learning System
LinkActivityDescription
Green (steady)Green (blinking)100M active link
OffGreen (blinking)10M active link
1.2.8.Hard Disk Indicators
IDFeatureDescription
1Button and release lever for removing the HDD
2
HDD present LED
3
HDD activity LED
Blue (Steady): Drive present
Blue (Blinking twice/sec): Identification (such as when
initializing or locating through the SBIOS)
Blue (Blinking once/sec): Rebuilding (such as when creating a
RAID array)
Amber (Steady): Warning/failure
Off: Slot empty
Blue: Access
1.2.9.Power Supply Unit (PSU) LED
The PSU LED indicates the operation status of the PSU as described in the table below:
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|6
ActivityDescription
GreenNormal operation
Introduction to the NVIDIA DGX-1 Deep Learning System
Amber (blinking)Power off; Fault
Green (blinking)Power on; Standby mode
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|7
Chapter2.
INSTALLATION AND SETUP
This chapter provides the basic instructions for installing and setting up the NVIDIA
DGX-1.
2.1.Registering Your DGX-1
Be sure to register your DGX-1 with NVIDIA as soon as you receive your purchase
confirmation e-mail. Registration enables your hardware warranty and allows you to set
up an NVIDIA DGX Container Registry account.
To register your DGX-1, you will need information provided in your purchase
confirmation e-mail. If you do not have the information, send an e-mail to NVIDIA
Enterprise Support at enterprisesupport@nvidia.com.
1.
From a browser, go to the NVIDIA DGX Product Registration (http://
Enter all required information and then click SUBMIT to complete the registration
process and receive all warranty entitlements and, if applicable, DGX-1 support
services entitlements.
Refer to the Customer Support chapter for customer support contact information.
2.2.Obtaining Software and Software Updates
You must register your DGX-1 in order to receive software updates. Once registered,
you will receive an email notification whenever a new software update is available.
You can access software update instructions as well as software downloads through the
Enterprise Support site as follows:
From your browser, go to NVIDIA Enterprise Services (https://nvid.nvidia.com/
‣
enterpriselogin/), and log in.
Click the Announcements tab, which contains download links and supplemental
‣
documentation.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|8
Installation and Setup
Refer to the DGX OS Server Software Release Notes for instructions on how to perform
‣
a software update.
2.3.Choosing a Setup Location / Site Preparation
Decide on a suitable location for setting up and operating the DGX-1. The location
should be clean, dust-free, and well ventilated.
General Conditions
Prepare a sufficiently wide aisle to accommodate the unboxed chassis (chassis
‣
dimensions - 5.16”H x 17.5"W x 34.1"D).
The rack must accommodate a 134 lb, 3U rack mount system (chassis dimensions -
‣
5.16”H x 17.5"W x 34.1"D).
The rack must have square mounting holes.
‣
Leave enough clearance in front of the rack (36" (91.4 cm)) to enable you to install
‣
the unit into the rack.
Leave approximately 30" (76.2cm) of clearance in the back of the rack to allow for
‣
sufficient airflow and ease in servicing.
Always make sure the rack is secured and stable before adding or removing the
‣
appliance or any other component.
Prepare adequate sound-proofing: The equipment fans can generate 72-100 dBA.
‣
Environmental Conditions
Operating environment
‣
Temperature: 5 ◦ C to 35 ◦ C (41 ◦ F to 95 ◦ F)
‣
Relative humidity: 20% to 85% noncondensing
‣
Air flow
‣
The chassis fans can produce a maximum of 340 CFM of air flow.
‣
Do not block the ventilation areas at the front and rear of the chassis.
‣
Minimize any restrictions on air flow around the chassis.
‣
Connections
Power:
‣
The DGX-1 is powered through four 1600W power supply units, each rated at
‣
200-240VAC, 8A, 50/60 Hz. Total system power requirement: 3500W
C13/C14 cables provided for each power supply to connect to a compatible
‣
PDU.
IMPORTANT: Use only the supplied power cables and do not use the cables
with any other product or for any other purpose.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|9
Installation and Setup
Network: Dual 10GBASE-T RJ45 connection
‣
Use industry standard CAT6 Ethernet cables for connecting to the network ports.
(Cables not included.)
IPMI: 10/100BASE-T RJ45 connection
‣
Use industry standard CAT6 Ethernet cables for connecting to the network ports.
(Cables not included.)
InfiniBand: Qty 4 - QSFP28 ports, InfiniBand and Ethernet compliant
‣
Use Mellanox-compliant InfiniBand cables for connecting to the InfiniBand ports.
(Cables not included.)
Preparing for Network Access
The IPMI port and Ethernet ports can be connected to your local LAN.
‣
These ports are configured for DHCP by default.
To use DHCP, connect the port to a local DHCP server which should provide an
‣
IP address and assign a DNS configuration to the DGX-1.
If DHCP is not available, then you will need to set up a static IP for each
‣
Ethernet port.
NVIDIA recommends that customers follow best security practices for BMC
‣
management (IPMI port). These include, but are not limited to, such measures as:
Restricting the DGX-1 IPMI port to an isolated, dedicated, management network
‣
Using a separate, firewalled subnet
‣
Configuring a separate VLAN for BMC traffic if a dedicated network is not
‣
available
Make sure your network can connect to the following:
‣
http://us.archive.ubuntu.com/ubuntu/
‣
http://security.ubuntu.com/ubuntu
‣
http://international.download.nvidia.com/dgx1/repos/ (Base OS Software 2.x or
‣
earlier)
http://international.download.nvidia.com/dgx/repos/ (Base OS Software 3.1 or
‣
later)
https://apt.dockerproject.org/repo
‣
If access to those URLs requires use of a proxy, refer to Setting Up a System Proxy
for setup instructions.
2.4.Unpacking the DGX-1
1.
Remove the shrinkwrap.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|10
Installation and Setup
2.
Collapse the yellow "Do not stack" cone, if included.
3.
Open the main DGX-1 box, then remove the accessory and rail kit boxes.
CAUTION: At least four people, or a mechanical assist, are required to remove
the DGX-1 from the box. To reduce the risk of personal injury or damage to the
equipment, always observe local occupational health and safety requirements and
guidelines for material handling.
DO NOT use the handles at the front of the DGX-1 to lift the unit. The handles are
designed for sliding the unit out of a rack, and not for carrying the full weight of the
DGX-1.
4.
Remove the protective plastic sheet from the top of the DGX-1.
5.
Preserve and retain packaging.
6.
Be sure to inspect each piece of equipment shipped in the packing box. If anything is
missing or damaged, contact your supplier.
2.5.What's In the Box
The NVIDIA DGX-1 shipping box includes the following:
NVIDIA DGX-1
‣
Bezel
‣
Rail hardware kit
‣
Accessory Box
‣
AC Power Cables (qty 4 – IEC 60320 C13/14, compatible with data center PDUs)
‣
IMPORTANT: Use only the supplied power cables and do not use the cables
with any other product or for any other purpose.
Hard disk bay screws
‣
Toxic Substance Notice & Safety Instructions
‣
Quick Start Guide
‣
DVD containing source files for open source software
‣
The four power cables included in the box are not optional. All power cables are
necessary and must be plugged into individual 10 A capable sockets for optimal DGX-1
operation. Failure to do so can result in a reduction in power redundancy, a reduction
in performance, or a complete system failure.
2.6.Installing the DGX-1 Into a Rack
CAUTION: To prevent bodily injury when mounting or servicing the DGX-1 in a rack, you must
take special precautions to ensure that the system remains stable. The following guidelines
are provided to ensure your safety.
• The DGX-1 should be mounted at the bottom of the rack if it is the only unit in the rack.
• When mounting the DGX-1 in a partially filled rack, load the rack from the bottom to the
top with the heaviest component at the bottom of the rack.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|11
Installation and Setup
• If the rack is provided with stabilizing devices, install the stabilizers before mounting or
servicing the DGX-1 in the rack.
• The DGX-1 weighs approximately 134 lbs, so an equipment lift is required to safely lift the
unit and then accurately align the chassis rails with the rack rails.
• DO NOT use the handles at the front of the DGX-1 to lift the unit. The handles are designed
for sliding the unit out of a rack, and not for carrying the full weight of the DGX-1.
2.6.1.Installing the Rails
The rail assemblies shipped with the appliance fit into a standard 19” rack between
26-inches and 33.5-inches deep (66 cm to 85 cm). The outer rail is adjustable from
approximately 23.5” to 34” (59.7 cm to 86.4 cm)
Refer to the instructions in the rail packaging for details on installing the rails onto the
rack and chassis.
The following are supplemental instructions:
1.
Use a Phillips screwdriver to assist in mounting the rails to the rack.
2.
If necessary, detach the inner rails from the outer slide rails.
3.
Follow any designations on the inner rail (or its outer rail mate) to determine the
proper orientation and positioning to connect to the chassis, then secure to the
chassis.
IMPORTANT: Make sure that the reinforced hole at the front end of the rail is
positioned on the bottom side of the rail, and that it aligns with the thumbscrew on
the front of the DGX-1. If the hole is positioned on the top side, then the rail is on the
wrong side of the DGX-1 and the DGX-1 will not fit properly in the rack.
4.
Follow any designations on the outer slide rail to determine front/back and left-side/
right-side positioning against the rack.
5.
Secure the back of one of the slide rails to the rack, then extend the rail until it fits
securely to the front of the rack.
6.
Secure the slide rail to the front of the rack.
7.
Repeat steps 4-6 for the other slide rail.
2.6.2.Mounting the DGX-1
CAUTION: Stability hazard — The rack stabilizing mechanism must be in place, or the
rack must be bolted to the floor before you slide the DGX-1 out for servicing. Failure
to stabilize the rack can cause the rack to tip over.
1.
Confirm that the DGX-1 has the inner rails attached and that you have already
mounted the outer rails into the rack.
2.
With the front of the unit facing away from the rack, use an equipment lift to assist
in sliding the unit into the rack as follows:
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|12
Installation and Setup
CAUTION: The DGX-1 weighs approximately 134 lbs, so an equipment lift is required to
safely lift the unit and then accurately align the chassis rails with the rack rails.
a) Align the inner chassis rails with the front of the outer rack rails.
b) Slide the inner rails into the outer rails, keeping the pressure even on both sides
(you may have to depress the locking tabs when inserting).
When the DGX-1 has been pushed completely into the rack, you should hear the
locking tabs "click" into the locked position.
3.
Lock the unit in place using the thumb screws located on the front of the unit.
2.7.Attaching the Bezel
The bezel is designed to attach easily to the front of the DGX-1.
1.
Prepare the DGX-1 by making sure that the power supply handles (located at the
power supply fans) are flipped up.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|13
Installation and Setup
2.
Move any other obstructions, such as cable ties, away from the outer edge of the
DGX-1.
3.
With the bezel positioned so that the NVIDIA logo is visible from the front and is on
the left hand side, line up the pins near the corners of the DGX-1 with the holes in
back of the bezel, then gently press the bezel against the DGX-1.
CAUTION: Be careful not to accidentally press the power button that is on the
right edge of the DGX-1 when removing or installing the bezel.
The bezel is held in place magnetically .
2.8.Connecting the Power Cables
1.
Open the accessory box and remove the four C13/C14 power cables.
2.
Use the cables to connect each of the four plugs at the right-rear of the DGX-1 to a
PDU.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|14
Installation and Setup
a) Secure each cable to the DGX-1, using the power cable retention clips attached to
the power plugs.
b) Connect each cable to the PDU.
Ensure that the cables are distributed over at least two circuits and, if using 3phase PDUs, they are balanced across all phases as much as possible. Ideally,
each cable should connect to a different PDU.
c) Verify that each cable is firmly inserted into the PDU.
There is usually a click to indicate full insertion.
2.9.Connecting the Network Cables
1.
Using an Ethernet cable, connect one of the dual Ethernet ports (em1 or em2) to your
LAN for internet access to the NVIDIA Cloud Portal, remote access to launched
application containers on the DGX-1, or to connect to the DGX-1 using SSH.
The left-side/right-side ethernet port designation depends on the Base OS software
version installed on the DGX-1 as listed in the table below.
Ethernet Port Position
Port Designation: Base OS
Software 2.x and earlier
Port Designation: Base OS
Software 3.x and later
Right Sideem1enp1s0f0
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|15
Installation and Setup
Port Designation: Base OS
Ethernet Port Position
Left Sideem2enp1s0f1
NVIDIA recommends connecting only one of the Ethernet ports to your LAN. If
you are connecting both Ethernet ports, they must each be connected to separate
networks, The DGX-1 is not configured from the factory to have multiple Ethernet
interfaces on the same network.
2.
Using an Ethernet cable, connect the IPMI (BMC) port to your LAN for remote
Software 2.x and earlier
Port Designation: Base OS
Software 3.x and later
access to the base management controllerr (BMC).
Vefiy that all network cables are firmly inserted into the DGX-1 and the associated
network switch.
2.10.Setting Up the DGX-1
These instructions describe the setup process that occurs the first time the DGX-1 is
powered on after delivery. Be prepared to accept all EULAs and to set up your username
and password.
1.
Connect a display to the VGA connector, and a keyboard to any of the USB ports.
For best display results, use a monitor with a native resolution of 1024x768 or lower.
2.
Power on the DGX-1.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|16
Installation and Setup
The system will take a few minutes to boot.
You may be presented with end user license agreements (EULAs) for the NVIDIA
software at this point in the setup, depending on the DGX-1 software version.
Accept all EULAs to proceed with the installation.
You are prompted to configure the DGX-1 software.
3.
Perform the steps to configure the DGX-1 software.
Select your time zone and keyboard layout.
‣
Create a user account with your name, username, and password.
‣
You will need these credentials to log in to the DGX-1 as well as to log in to the
BMC remotely. When logging in to the BMC, enter your username for both the
User ID as well as the password. Be sure to create a unique BMC password at
the first opportunity.
The BMC software will not accept "sysadmin" for a user name. If you create
this user name for the system log in, "sysadmin" will not be available for
logging in to the BMC.
Choose a primary network interface for the DGX-1.
‣
After you select the primary network interface, the system attempts to
configure the interface for DHCP and then asks you to enter a hostname for
the system. If DHCP is not available, you will have the option to configure
the network manually. If you need to configure a static IP address on a
network interface connected to a DHCP network, select Cancel at the
Network configuration – Please enter the hostname for the system screen.
The system will then present a screen with the option to configure the
network manually.
Choose a host name for the DGX-1.
‣
Choose to install predefined software.
‣
Press the space bar to select or deselect the software to install.
By default, the DGX-1 installs only minimal software packages necessary
to ensure system functionality. You can deselect the OpenSSH package;
however, NVIDIA recommends that you keep this package selected, and
uninstall it only if required by your IT security policy.
4.
Select OK to continue.
You may be presented with end user license agreements (EULAs) for the NVIDIA
software at this point in the setup, depending on the DGX-1 software version.
Accept all EULAs to complete the installation.
The system completes the installation, reboots, then presents the system login
prompt:
<hostname> login:
Password:
5.
Log in.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|17
Installation and Setup
Refer to the DGX OS Server release notes for information on available over-the-network
software updates.
2.11.Post Setup Instructions for DGX OS Server
Software Version 2.x and Earlier
These instructions apply if your DGX-1 is installed with software version 2.x or earlier.
To determine the DGX OS Server software version on your system, enter the following
command.
$ grep VERSION /etc/dgx-release
DGX_SWBUILD_VERSION="3.1.1"
1.
If your network is configured for DHCP, then make sure that dynamic DNS updates
are enabled.
Check whether /etc/resolv.conf is a link to /run/resolvconf/resolv.conf.
c) Repeat step 2 to confirm that the nvidia-peer-memory module has been added.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|19
Chapter3.
PREPARING FOR USING DOCKER
CONTAINERS
This chapter presents an overview of the prerequisites for accessing NVIDIA Docker
containers from the Docker command line for use on the NVIDIA® DGX-1™ in base
OS mode. These containers include NVIDIA DGX-1 specific software to ensure the
best performance for your applications. Using these containers as a basis for your
applications should provide the best single-GPU performance and multi-GPU scaling.
Installing Docker and NVIDIA Docker on DGX OS Server Software 2.x or Earlier
‣
Configuring Docker IP Addresses
‣
Letting Users Issue Docker Commands
‣
Configuring a System Proxy
‣
Configuring NFS Mount and Cache
‣
3.1.Installing Docker and NVIDIA Docker on DGX
OS Server Software 2.x or Earlier
To enable portability in Docker images that leverage GPUs, NVIDIA® developed
nvidia-docker, an open-source project that provides a command line tool to mount
the user mode components of the NVIDIA driver and the GPUs into the Docker
container at launch.
As of DGX OS Server software version 3.1.1 and later, Docker and nvidia-docker are part
of the base software installation and you do not need to perform the steps in this section.
However, if your DGX-1 is installed with software version 2.x or earlier, then follow
these instructions to install Docker and nvidia-docker on the system.
To determine the DGX OS Server software version on your system, enter the following
command.
$ grep VERSION /etc/dgx-release
DGX_SWBUILD_VERSION="3.1.1"
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|20
Preparing for Using Docker Containers
Ensure your environment meets the prerequisites before installing Docker. For more
information, see Getting Started with Docker.
1.
Install Docker.
$ sudo apt-key adv --keyserver
hkp://p80.pool.sks-keyservers.net:80 --recv-keys
58118E89F3A912897C070ADBF76221572C52609D
$ echo deb https://apt.dockerproject.org/repo ubuntu-trusty main
| sudo tee /etc/apt/sources.list.d/docker.list
Edit the /etc/default/docker file to use the Overlay2 storage driver.
a) Open the /etc/default/docker file for editing.
$ sudo vi /etc/default/docker
b) Add the following line:
DOCKER_OPTS="--storage-driver=overlay2"
If there is already a DOCKER_OPTS line, then add the parameters (text between
the quote marks) to the DOCKER_OPTS environment variable.
c) Save and close the /etc/default/docker file when done.
d) Restart Docker with the new configuration.
$ sudo service docker restart
3.
Install NVIDIA Docker.
The following example installs both nvidia-docker and the nvidia-docker-plugin.
To ensure that the DGX-1 can access the network interfaces for nvidia-docker containers,
the nvidia-docker containers should be configured to use a subnet distinct from other
network resources used by the DGX-1.
By default, Docker uses the 172.17.0.0/16 subnet. Consult your network
administrator to find out which IP addresses are used by your network. If your network
does not conflict with the default Docker IP address range, then no changes are needed
and you can skip this section.
However, ff your network uses the addresses within this range for the DGX-1,
you should change the default nvidia-docker network addresses. The method for
accomplishing this depends on the Base OS software version installed on the DGX-1.
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|21
Preparing for Using Docker Containers
1.
If you don't know the Base OS software version installed on the DGX-1, then enter
the following and inspect the VERSION entry.
Follow the instructions in the section appropriate for the software version installed.
Configuring Docker IP Addresses for DGX OS Server Software Version 2.x and
‣
Earlier
Configuring Docker IP Addresses for DGX OS Server Software Version 3.1.1 and
‣
Later
3.2.1.Configuring Docker IP Addresses for DGX OS
Server Software Version 2.x and Earlier
1.
Open the /etc/default/docker file for editing.
$ sudo vi /etc/default/docker
2.
Modify the /etc/default/docker file, specifying the correct bridge IP address
and IP address ranges for your network. Consult your IT administrator for the
correct addresses.
For example, if your DNS server exists at IP address 10.10.254.254, and the
192.168.0.0/24 subnet is not otherwise needed by the DGX-1, you can add the
If there is already a DOCKER_OPTS line, then add the parameters (text between the
quote marks) to the DOCKER_OPTS environment variable.
3.
Save and close the /etc/default/docker file when done.
4.
Restart Docker with the new configuration.
$ sudo service docker restart
3.2.2.Configuring Docker IP Addresses for DGX OS
Server Software Version 3.1.1 and Later
You can change the default Docker network addresses by either modifying the /
etc/docker/daemon.json file or modifying the /etc/systemd/ system/
docker.service.d/docker-override.conf file. These instructions provide an
example of modifying the /etc/systemd/system/docker.service.d/docker-
override.conf to override the default nvidia-docker network addresses.
Make the changes indicated in bold below, setting the correct bridge IP address and
IP address ranges for your network. Consult your IT administrator for the correct
addresses.
Save and close the /etc/systemd/system/docker.service.d/docker-
override.conf file when done.
3.
Reload the systemctl daemon.
$ sudo systemctl daemon-reload
4.
Restart Docker.
$ sudo systemctl restart docker
3.3.Letting Users Issue Docker Commands
To prevent the docker daemon from running without protection against escalation of
privileges, the NVIDIA Docker software requires sudo privileges to run containers.
You can grant the required privileges to users who will run containers on the DGX-1 in
one of the following ways:
Add each user as an administrator user with sudo privileges.
‣
Add each user as a standard user without sudo privileges and then add the user to
‣
the docker group.
This section provides instructions for adding users to the docker group.
WARNING: Only add users to the docker group whom you would trust with root
privilege. These instructions make it more convenient for users to access Docker
containers; however, the resulting docker group is equivalent to the root user,
because once a user is able to send commands to the Docker engine, they are able to
escalate privilege and run root level operations. This may violate your organization's
security policies. See the Docker Daemon Attack Surface for information on how this
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|23
Preparing for Using Docker Containers
can impact security in your system. Always consult your IT department to make sure
the installation is in accordance with the security policies of your data center.
The commands in this section require sudo access, and should be performed by a
system administrator.
3.3.1.Checking if a User is in the Docker Group
To check whether a user is already part of the docker group, enter the following:
$ groups username
The output shows all the groups of which that user is a member. If docker is not listed,
then add that user.
3.3.2.Creating a User
To create a new user in order to add them to the docker group, perform the following:
1.
Add the user.
$ sudo useradd username
2.
Set up the password.
$ sudo passwd username
Enter a password at the prompts:
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
3.3.3.Adding a User to the Docker Group
For each user you want to add to the docker group, enter the following command:
$ sudo usermod -a -G docker username
3.4.Configuring a System Proxy
If you will be using the DGX-1 in base OS mode, and your network requires use of a
proxy, then edit the file /etc/apt/apt.conf.d/proxy.conf and make sure the following lines
are present, using the parameters that apply to your network:
This is to ensure that Docker is able to access the DGX-1 Container Registry through the
proxy. For best practice recommendations on configuring proxies for Docker, see https://
docs.docker.com/engine/admin/systemd/#http-proxy.
3.5.Configuring NFS Mount and Cache
The DGX-1 includes four SSDs in a RAID 0 configuration. These SSDs are intended for
application caching, so you must set up your own NFS drives for long term data storage.
The following instructions describe how to mount the NFS onto the DGX-1, and how to
cache the NFS using the DGX-1 SSDs for improved performance.
Make sure your DGX-1 is set up in Base OS mode, that you have an NFS server with one
or more exports with data to be accessed by the DGX-1, and that there is network access
between the DGX-1 and the NFS server.
Skip this section if you are going to use the DGX-1 in cloud-managed mode. The
DGX-1 Cloud Services software will set up the NFS cache for you as part of the cloudmanaged mode configuration. Similarly, in cloud-managed mode, the person setting
up the job will specify any NFS mount requirements for the job at that time.
1.
Check if the cache daemon is installed and configured.
$ service cachefilesd status
If the output indicates that cachefilesd is disabled, continue with the following steps.
Otherwise, skip to step 7.
2.
Install the cache daemon.
$ sudo apt-get install cachefilesd
3.
Edit the cache daemon startup file.
$ sudo vi /etc/default/cachefilesd
Uncomment the "RUN=yes" line in the startup file and then save the file.
4.
Configure the cache daemon for the DGX-1.
a) Open the cache daemon configuration file.
$ sudo vi /etc/cachefilesd.conf
b) Edit the contents to match the following, then save the file.
dir /raid
tag dgx1cache
brun 25%
bcull 15%
bstop 5%
frun 10%
fcull 7%
fstop 3%
These settings are optimized for Deep Learning workloads, and provide the best
throughput for training from large datasets.
5.
Start the cache daemon.
$ service cachefilesd start
www.nvidia.com
NVIDIA DGX-1DU-08033-001 _v13.1|25
Preparing for Using Docker Containers
6.
Verify the cache daemon started properly.
$ service cachefilesd status
Expected output.
Checking status of FilesCache daemon cachefilesd
7.
Configure an NFS mount for the DGX-1.
a) Edit the filesystem tables configuration.
sudo vi /etc/fstab
b) Add a new line for the NFS mount, using the local mount point of /mnt.