|
|
BE STATE-OF-THE-ART
TRAINING
| Maximize the performance of your parallel applications
|
|

CAPS Professional Services are dedicated to help you getting the most from your investment and achieve success with your development projects. This set of courses, completed with practical exercises to control key points of knowledge, allows people to gain the relevant expertise to build high-performance applications.
For detailed information about each course, please read more below or contact training@caps-entreprise.com
|
-
|

Register now for a 3-day CUDA training session!
Abstract: In this 3-day CUDA training, participants will learn the CUDA programming & streaming model and how to handle multi-GPU applications. The training includes hands-on practical labs where participants will be able to progressively build streamed and multi-GPU applications.
Price: 1,500€ / participant (1,000€/participant when joining for 'Advanced part' days 2 and 3 only)
Location: CAPS Headquarters - 4 Allée Marie Berhaut - 35000 Rennes, France
Download program
|
|
-
-
|

- Duration: 2 days
- Prerequisites: knowledge in C or Fortran, Linux ; knowledge in CUDA is a plus
- Training limited to 14 attendees
Download datasheet
|
 |
Objectives
In this 2 day HMPP training, participants will learn the HMPP programming model and tips to reach performance. The training includes hands-on practical labs where participants will be able to progressively program high performance computations.
Training content
Day 1
Morning - CUDA Basics
- Introduction to GPU computing
- CUDA architecture and programming model
- CUDA API
- CUDA tools: debugging
- Introduction to CUDA performance
Afternoon - HMPP Basics
- Introduction to parallel hybrid programming
- HMPP overview
- Basis of HMPP programming
- HMPP compilation model
Day 2
Morning – Advanced HMPP programming
- Managing data transfers
- Grouping GPU computations
- Data movement optimizations
Afternoon - Optimizing CUDA code generation with HMPP
- Advanced CUDA performance
- Driving gridification
- Applying loop transformations: unrolling, splitting, jamming
|
|
-
|

- Duration: 3 days
- Prerequisites: knowledge in C, CUDA basis ; knowledge in OpenGL is a plus
- Training limited to 14 attendees
Download datasheet
|
 |
Objectives
In this 3 day CUDA training, participants will learn how to visualize their physical unit. The training includes hands-on practical labs where participants will be able to progressively build a graphic view of their results.
Training content
Day 1
Morning - CUDA Basics
- Introduction to GPU computing
- CUDA architecture and programming model
- CUDA API
- CUDA debugging
- Lab session: getting used to CUDA device
- Lab session: programming a basic addition
Afternoon - CUDA Kernel performance (1/2)
- Using 2D CUDA grid for large computations
- Lab session: programming 2D CUDA grid
computation
- CUDA warps
- Data alignment & coalescing
- Lab session: ensuring coalescing
Day 2
Morning – CUDA Kernel performance (2/2)
- Texture memory & constant memory
- Shared memory
- Lab session: use CUDA shared memory as a cache
- Maximizing occupancy
- Interpreting profiler counters
- CUDA performance tools: Visual Profiler, ...
Afternoon - HMPP Basics
- Introduction to parallel hybrid programming
- HMPP overview
- Lab session: HMPP Hello World
- Basis of HMPP programming
- Lab session: offloading a computation into a GPU
- HMPP compilation model
- Lab session: compiling an HMPP application
Day 3
Morning – HMPP Transfers optimization
- Managing data transfers
- Lab session: programming data transfers
- Grouping GPU computations
- Optimizing data movement
Afternoon - Optimizing GPU code generation with HMPP
- Advanced kernel performance
- Driving the code generation and gridification
- Automatic loop transformations: unrolling, splitting, jamming, ...
- Lab session: optimizing Sgemm code generation
|
|
-
|

- Duration: 2 days
- Prerequisites: knowledge in C
- Training limited to 14 attendees
Download datasheet
|
 |
Objectives
In this 2 day training, participants will learn the fundamentals of parallel programming. An overview of OpenMP and MPI will be provided with hands-on practical labs where participants will be able to progressively understand the concepts of parallel programming.
Training content
Day 1
- Parallel architectures and programming models (talk)
- OpenMP (talks & practicals)
Day 2
- Introduction to MPI
- Point to point communication (+exercises)
- Non-blocking communication (+exercises)
- Collective communication (+exercises)
- Other MPI-1 and MPI-2 features (talk)
- Hybrid programming (talk)
- Tools & libraries: debugging, performance analysis, other libraries (short talk)
|
|
-
|

- Duration: 2 days
- Prerequisites: knowledge in C
- Training limited to 14 attendees
Download datasheet
|
 |
Objectives
In this 2 day CUDA training, participants will learn the CUDA programming model and tips to reach performance. The training includes hands-on practical labs where participants will be able to progressively program high performance computations.
Training content
Day 1
Morning - CUDA Basics
- Introduction to GPU computing
- CUDA architecture and programming model
- CUDA API
- CUDA debugging
Afternoon - CUDA kernel performance
- CUDA warps
- Data alignment & coalescing
Day 2
Morning - CUDA kernel performance
- Texture memory & constant memory
- Shared memory
Afternoon - CUDA grids & kernels optimization
- Maximizing occupancy
- Interpreting profiler counters
- CUDA performance tools: Viual Profiler, ...
|
|
-
|

- Duration: 3 days
- Prerequisites: knowledge in C, CUDA basis
- Training limited to 14 attendees
Download datasheet
|
 |
Objectives
In this 3 day CUDA training, participants will learn the CUDA programming & streaming model and how to handle multi-GPU applications. The training includes hands-on practical labs where participants will be able to progressively build streamed and multi-GPU applications.
Training content
Day 1
Morning - CUDA Basics
- Introduction to GPU computing
- CUDA architecture and programming model
- CUDA API
- CUDA debugging
Afternoon - CUDA Kernel performance
- CUDA warps
- Data alignment & coalescing
Day 2
Morning - CUDA Kernel performance
- Texture memory & constant memory
- Shared memory
Afternoon - CUDA Grids a kernels optimization
- Maximizing occupancy
- Interpreting profiler counters
- CUDA performance tools: Visual Profiler, ...
Day 3
Morning - CUDA streaming
- Asynchronism & overlapping
- CUDA streams
Afternoon - CUDA with MPI
- MPI introduction
- Mixing CUDA and MPI
- Multi-GPU computing with CUDA and MPI
|
|
|
|