Nvidia Data Parallelism: How to Train Deep Learning Models on Multiple GPUs (DPHTDLM)
Classroom Schulung | Deutsch | Anspruch
Schulungsdauer: 1 Tag
Ziele
This workshop teaches you techniques for data-parallel deep learning training on multiple GPUs to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you’ll learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.
Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.
Zielgruppe
This course is designed for machine learning engineers, deep learning practitioners, and data scientists who want to optimize and scale training workloads across multiple GPUs. It is ideal for professionals working in AI research, computer vision, natural language processing, and high-performance computing (HPC) who seek to improve model training efficiency using PyTorch Distributed Data Parallel (DDP). Participants should have experience with Python, deep learning frameworks (especially PyTorch), and basic knowledge of GPU acceleration.
Voraussetzungen
- Experience with deep learning training using Python
Agenda
Stochastic Gradient Descent and the Effects of Batch Size
- Learn the significance of stochastic gradient descent when training on multiple GPUs
- Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
- Understand loss function, gradient descent, and stochastic gradient descent (SGD).
- Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
Training on Multiple GPUs with PyTorch Distributed Data Parallel (DDP)
- Learn to convert single GPU training to multiple GPUs using PyTorch Distributed Data Parallel
- Understand how DDP coordinates training among multiple GPUs.
- Refactor single-GPU training programs to run on multiple GPUs with DDP.
Maintaining Model Accuracy when Scaling to Multiple GPUs
- Understand and apply key algorithmic considerations to retain accuracy when training on multiple GPUs
- Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
- Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
Workshop Assessment
- Use what you have learned during the workshop: complete the workshop assessment to earn a certificate of competency
Final Review
- Review key learnings and wrap up questions.
- Take the workshop survey.
Ziele
This workshop teaches you techniques for data-parallel deep learning training on multiple GPUs to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you’ll learn how to decrease model training time by distributing data to multiple GPUs, while retaining the accuracy of training on a single GPU.
Please note that once a booking has been confirmed, it is non-refundable. This means that after you have confirmed your seat for an event, it cannot be cancelled and no refund will be issued, regardless of attendance.
Zielgruppe
This course is designed for machine learning engineers, deep learning practitioners, and data scientists who want to optimize and scale training workloads across multiple GPUs. It is ideal for professionals working in AI research, computer vision, natural language processing, and high-performance computing (HPC) who seek to improve model training efficiency using PyTorch Distributed Data Parallel (DDP). Participants should have experience with Python, deep learning frameworks (especially PyTorch), and basic knowledge of GPU acceleration.
Voraussetzungen
- Experience with deep learning training using Python
Agenda
Stochastic Gradient Descent and the Effects of Batch Size
- Learn the significance of stochastic gradient descent when training on multiple GPUs
- Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
- Understand loss function, gradient descent, and stochastic gradient descent (SGD).
- Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
Training on Multiple GPUs with PyTorch Distributed Data Parallel (DDP)
- Learn to convert single GPU training to multiple GPUs using PyTorch Distributed Data Parallel
- Understand how DDP coordinates training among multiple GPUs.
- Refactor single-GPU training programs to run on multiple GPUs with DDP.
Maintaining Model Accuracy when Scaling to Multiple GPUs
- Understand and apply key algorithmic considerations to retain accuracy when training on multiple GPUs
- Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
- Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
Workshop Assessment
- Use what you have learned during the workshop: complete the workshop assessment to earn a certificate of competency
Final Review
- Review key learnings and wrap up questions.
- Take the workshop survey.