Introduction to OpenMP Offloading with AMD GPUs

All communication will be done through Zoom, Slack and email.

OpenMP is one major option how to use GPUs to accelerate computations on today's heterogenous computer systems. This course will give an introduction to the AMD Instinct™ GPU architecture to lay foundations of how GPUs work and can be used for offloading in OpenMP. New features of recent OpenMP versions and GPUs such as the unified memory programming model will be introduced, which makes writing HPC applications much easier for a wide range of GPU programming models. In addition, tools for performance anylsis and optimization will be introduced.

This workshop is ideal for developers and researchers looking to deepen their expertise in parallel computing and performance optimization. Join us to unlock the full potential of your potential computing resources!

In this course, participants will

  • Gain foundational knowledge about Graphics Processing Units (GPUs) and Accelerated Processing Units (APUs), and their roles in high-performance computing.
  • Learn how to utilize OpenMP offloading with unified shared memory to simplify data management and improve performance.
  • Explore techniques for explicit data management in OpenMP offloading, enabling more control over data movement and optimization.
  • Understand the principles and benefits of asynchronous offloading to enhance computational efficiency and overlap computation with data transfer.
  • Discover various tools and methodologies for analyzing and optimizing the performance of your applications.
  • Apply your knowledge in a practical session where you’ll port a small application, reinforcing the concepts learned throughout the workshop.

Location

Online course
Organizer: HLRS, University of Stuttgart, Germany

Start date

Oct 22, 2024
09:00

End date

Oct 22, 2024
15:30

Language

English

Entry level

Intermediate

Course subject areas

Hardware Accelerators

Parallel Programming

Performance Optimization & Debugging

Topics

Code Optimization

GPU Programming

MPI+OpenMP

OpenMP

Back to list

Prerequisites and content levels

Prerequisites

Basic experience in OpenMP programming, e.g. by attending the Parallel Programming Workshop. Participants should have an application developer's general knowledge of computer hardware, operating systems, and be familiar with C/C++ or Fortran.

See also the suggested prereading below (resources and public videos).

Content levels

Basic: 2 hours
Intermediate: 2.5 hours
Advanced: 1 hours

Learn more about course curricula and content levels

Instructors

Michael Klemm, Paul Bauer, Luka Stanisic, Johanna Potyka, Igor Pasichnyk, and Bob Robey (AMD).

Agenda (preliminary)

All times are CEST.

08:45 - 09:00 Drop in to Zoom

9:00-15:30 Lectures and exercises on the following topics

  • Introduction by HLRS and AMD
  • Introduction to GPU and APU
  • Introduction to OpenMP offload using unified shared memory
  • Introduction to OpenMP offload with explicit data management
  • Asynchronous offloading
  • Tools for performance anaylsis and optimizations
  • Hands-on with porting a small app

Registration information

Register via the button at the top of this page (will be available soon).

Registration closes on October 7, 2024.

Fees

This course is free of charge.

Resources for additional reading

  • Book on HIP programming - Porting CUDA
    • Accelerated Computing with HIP,  Yifan Sun, Trinayan Baruah, David R Kaeli,
      ISBN-13: ‎ 979-8218107444
  • Book on OpenMP GPU programming
    • Programming Your GPU with OpenMP, Tom Deakin and Tim Mattson,
      ISBN-13: ‎ 978-0262547536
  • Book of parallel and high performance computing topics
    • Parallel and High Performance Computing, Manning Publications, Robert Robey and Yuliana Zamora,
      ISBN-13: ‎ 978-0262547536
  • ENCCS resourses
  • AMD Lab Notes series on GPUOpen.com

    • Finite difference method - Laplacian part 1
    • Finite difference method - Laplacian part 2
    • Finite difference method - Laplacian part 3
    • Finite difference method - Laplacian part 4
    • AMD matrix cores
    • Introduction to profiling tools for AMD hardware
    • AMD ROCm™ installation
    • AMD Instinct™ MI200 GPU memory space overview 
    • Register pressure in AMD CDNA2™ GPUs
    • GPU-Aware MPI with ROCm
    • Creating a PyTorch/TensorFlow Code Environment on AMD GPUs
    • Jacobi Solver with HIP and OpenMP offloading
    • Sparse matrix vector multiplication - part 1
  • Quick start guides at Oak Ridge National Laboratory

Contact

Khatuna Kakhiani phone 0711 685 65796, training(at)hlrs.de
Tobias Haas phone 0711 685 87223, training(at)hlrs.de

HLRS Training Collaborations in HPC

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC.

Further courses

See the training overview and the Supercomputing Academy pages.

Related training

All training

September 16 - October 18, 2024

Online (flexible)


September 25 - 26, 2024

Ljubljana, Slovenia


October 14 - 18, 2024

Stuttgart, Germany


October 23 - 25, 2024

Dresden, Germany


November 04 - 08, 2024

Online


November 11 - 15, 2024

Hybrid Event - Stuttgart, Germany


December 02 - 05, 2024

Online by JSC


December 09 - 13, 2024

Online