Introduction to OpenMP Offloading with AMD GPUs

All communication will be done through Zoom, Slack and email.

OpenMP is one major option how to use GPUs to accelerate/offload computations on today's heterogenous computer systems. This course will give an introduction to the AMD Instinct™ GPU and Accelerated Processing Unit (APU) architectures to lay foundations of how GPUs work and can be used for offloading in OpenMP. New features of recent OpenMP versions and GPUs such as the unified memory programming model will be introduced, which make writing HPC applications much easier for a wide range of GPU programming models. In addition, tools for performance analysis and optimization will be presented.

This course targets beginners in GPU programming with basic knowledge of parallelization with OpenMP and/or MPI on CPUs. After this course you will have learned the basics to confidently start porting your application from a CPU only system to systems with discrete GPU accelerators or APUs.

In this course, participants will

  • Gain foundational knowledge about GPUs and APUs, and their roles in high-performance computing.
  • Learn how to utilize OpenMP offloading with unified shared memory to simplify data management and improve performance.
  • Explore techniques for explicit data management in OpenMP offloading, enabling more control over data movement and optimization.
  • Understand the principles and benefits of asynchronous offloading to enhance computational efficiency and overlap computation with data transfer.
  • Discover various tools and methodologies for analyzing and optimizing the performance of your applications.
  • Apply your knowledge in a practical session where you’ll port a small application, reinforcing the concepts learned throughout the workshop.

Veranstaltungsort

Online course
Organizer: HLRS, University of Stuttgart, Germany

Veranstaltungsbeginn

22. Okt 2024
09:00

Verstaltungsende

22. Okt 2024
15:30

Sprache

Englisch

Einstiegslevel

Basis

Themenbereiche

Bootcamp/Hackathon

Themen

Code-Optimierung

GPU-Programmierung

MPI+OpenMP

OpenMP

Zurück zur Liste

Prerequisites and content levels

Prerequisites

Basic experience in OpenMP programming, e.g. by attending the Parallel Programming Workshop. Participants should have an application developer's general knowledge of computer hardware, operating systems, and be familiar with C/C++ or Fortran.

See also the suggested prereading below (resources and public videos).

Content levels

Basic: 2 hours
Intermediate: 2.5 hours
Advanced: 1 hours

Learn more about course curricula and content levels

Instructors

Michael Klemm, Paul Bauer, Luka Stanisic, Johanna Potyka, Igor Pasichnyk, and Bob Robey (AMD).

Agenda (update, 21.10)

All times are CEST.

08:45 - 09:00 Drop in to Zoom

  • 9:00 - 11:45 Introduction to OpenMP offload with and without unified shared memory (with exercises)

    11:45 – 12:45 lunch break

    12:45 – 15:30 Real world OpenMP porting: App porting examples and tools (with exercises)

Lectures and exercises will cover following topics:

  • Introduction to GPU and APU
  • OpenMP offload using unified shared memory
  • OpenMP offload with explicit data management
  • Asynchronous offloading
  • Tools for performance analysis and optimizations
  • Hands-on with porting a small app

Registration-information

Register via the button at the top of this page.
We encourage you to register to the waiting list if the course is full. Places might become available.

Please be aware that the talks and Q'n'A sessions will be recorded. You declare that you are aware of and consent to the recording by registering.

Registration closes on October 17, 2024. Late registration might still be possible if the course allows.

Fees

This course is free of charge.

Resources for additional reading

  • Book on OpenMP GPU programming
    • Programming Your GPU with OpenMP, Tom Deakin and Tim Mattson,
      ISBN-13: ‎ 978-0262547536
  • Book of parallel and high performance computing topics
    • Parallel and High Performance Computing, Manning Publications, Robert Robey and Yuliana Zamora,
      ISBN-13: ‎ 978-0262547536
  • ENCCS resourses
  • AMD Lab Notes series on GPUOpen.com

    • Finite difference method - Laplacian part 1
    • Finite difference method - Laplacian part 2
    • Finite difference method - Laplacian part 3
    • Finite difference method - Laplacian part 4
    • AMD matrix cores
    • Introduction to profiling tools for AMD hardware
    • AMD ROCm™ installation
    • AMD Instinct™ MI200 GPU memory space overview 
    • Register pressure in AMD CDNA2™ GPUs
    • GPU-Aware MPI with ROCm
    • Jacobi Solver with HIP and OpenMP offloading
    • Sparse matrix vector multiplication - part 1

Contact

Khatuna Kakhiani phone 0711 685 65796, training(at)hlrs.de
Tobias Haas phone 0711 685 87223, training(at)hlrs.de

HLRS Training Collaborations in HPC

HLRS is part of the Gauss Centre for Supercomputing (GCS), together with JSC in Jülich and LRZ in Garching near Munich. EuroCC@GCS is the German National Competence Centre (NCC) for High-Performance Computing. HLRS is also a member of the Baden-Württemberg initiative bwHPC.

Further courses

See the training overview and the Supercomputing Academy pages.

Ähnliche Trainingskurse

Alle Trainingskurse

Januar 13 - 31, 2025

Hybrid Event - Stuttgart, Germany


Januar 21 - 23, 2025

Hybrid Event - Stuttgart, Germany


Februar 17 - 21, 2025

Stuttgart, Germany


März 17 - 21, 2025

Dresden, Germany


März 24 - 28, 2025

Hybrid Event - Stuttgart, Germany