Parallel Programming - Introduction to OpenMP
Event description
Many scientific modelling programs rely on numerical iterative methods, e.g. Finite Difference method, Conjugate Gradient method etc. or stochastic methods such as Monte-Carlo methods. The nature of those methods heavily involves iterations. More often they are the bottleneck of the code's performance.
OpenMP is a directive-based API (application programming interface) for writing parallel programs on a shared-memory system. The implementation renders parallelism for programs by running concurrent multithreads. The common usage case is to accelerate nested loops by sharing workloads between multiple threads, which had been its main delivery before OpenMP 3.0.
The workshop is designed to introduce some most common yet powerful OpenMP practices to scientists to quickly turn a serial iterative C code into parallel.
Check the Common HPC and Accelerator Tools section below for more information.
If you have any questions regarding this training, please contact training.nci@anu.edu.au.
Prerequisites
Knowledge about C preprocessor directives, functions, pointer array.
Basic experience with C/C++ is required.
Basic experience with CLI and Git.
Having a valid NCI account.
The training session is driven on the NCI ARE service. You can find relevant documentations here: ARE User Guide.
Learning Outcomes
At the completion of this training session, you will be able to
know when to use OpenMP,
create Parallel Construct,
create a team of threads,
identify potential data race conditions,
distinguish data storage attributes,
understand how to split loop iterations to improve efficiency,
understand the limitations of multithreaded programming,
feel confident to advance to next-level parallel programming.
Topics Covered
Threading in OpenMP
Shared-memory system v.s. Distributed-memory system
Loop parallelism methodologies
Parallel construct
Worksharing-loop construct
Reduction
Data race condition
OpenMP Library routines
Synchronisations
Data storage attributes
Loop Scheduling
Profiling OpenMP
Common HPC and Accelerator Tools
Tool | Category | Language/API | Parallelism Type | Target Hardware | Typical Use Case |
---|---|---|---|---|---|
CuPy | Python library | Python (NumPy-compatible API) | Data-parallel GPU | NVIDIA GPUs | GPU-accelerated array and matrix operations as a drop-in replacement for NumPy |
CUDA | GPU programming platform | C/C++ API (also Python via PyCUDA, Numba) | Data-parallel GPU | NVIDIA GPUs | Writing custom GPU kernels and fine-grained GPU code |
OpenACC | Compiler directives | C/C++, Fortran pragmas | Data-parallel GPU/CPU | GPUs & other accelerators | Annotating loops to offload work to accelerators |
OpenMP | Compiler directives | C/C++, Fortran pragmas | Shared-memory CPU | Multi-core CPUs | Parallelizing loops and regions on a single node |
MPI | Library & standard | C/C++, Fortran, Python (via mpi4py) | Distributed-memory | Clusters & networks | Decomposing work across processes/nodes with message passing |
Tickets for good, not greed Humanitix dedicates 100% of profits from booking fees to charity