3 Dell EMC Isilon, PowerSwitch and NVIDIA DGX-2 Systems for Deep Learning | H18079
Table of Contents
Revisions ................................................................................................................................................................................. 5
Executive summary ................................................................................................................................................................. 5
Audience ................................................................................................................................................................................. 5
Introduction .............................................................................................................................................................................. 5
Deep learning dataflow ........................................................................................................................................................... 5
Solution architecture ............................................................................................................................................................... 7
OVERVIEW ......................................................................................................................................................................... 7
STORAGE: DELL EMC ISILON F800 ................................................................................................................................. 7
Storage tiering .................................................................................................................................................................. 9
OneFS caching ................................................................................................................................................................ 9
Locks and concurrency .................................................................................................................................................... 9
NETWORKING: DELL EMC POWERSWITCH S5232F-ON SWITCH ............................................................................. 10
COMPUTE: NVIDIA DGX-2 SYSTEM ............................................................................................................................... 10
BILL OF MATERIALS ........................................................................................................................................................ 10
SOFTWARE VERSIONS ................................................................................................................................................... 11
Deep learning training performance and analysis ................................................................................................................. 11
BENCHMARK METHODOLOGY ...................................................................................................................................... 11
BENCHMARK RESULTS .................................................................................................................................................. 13
SYSTEM METRICS ........................................................................................................................................................... 14
MEASUREMENT OF NETWORK I/O BETWEEN DGX-2 SYSTEMS .............................................................................. 16
UNDERSTANDING FILE CACHING ................................................................................................................................. 18
UNDERSTANDING THE TRAINING PIPELINE ............................................................................................................... 18
NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL) ......................................................................................... 19
Storage-only performance ..................................................................................................................................................... 20
STORAGE NETWORK PERFORMANCE USING IPERF ................................................................................................ 20
STORAGE-ONLY PERFORMANCE USING FIO ............................................................................................................. 20
STORAGE-ONLY PERFORMANCE USING TENSORFLOW .......................................................................................... 20
Solution sizing guidance ....................................................................................................................................................... 23
Conclusions ........................................................................................................................................................................... 24
Acknowledgements ............................................................................................................................................................... 25
Appendix – System configuration ......................................................................................................................................... 26
ISILON ............................................................................................................................................................................... 26
Configuration .................................................................................................................................................................. 26
Configuring automatic storage tiering ............................................................................................................................ 26
Testing automatic storage tiering ................................................................................................................................... 28
DELL EMC POWERSWITCH S5232F-ON DATA SWITCHES ......................................................................................... 29
NVIDIA DGX-2 SYSTEM ................................................................................................................................................... 29
INSTALL AI BENCHMARK UTILITIES .............................................................................................................................. 31
ISILON VOLUME MOUNTING .......................................................................................................................................... 32
Appendix – Benchmark setup ............................................................................................................................................... 32