Parallel Programming - Introduction to OpenMP

Name: Parallel Programming - Introduction to OpenMP
Start: 2025-08-25T19:30:00-0400
End: 2025-08-26T02:30:00-0400

Mon, 25 Aug, 7:30pm - 26 Aug, 2:30am EDT

Online Event

National Computational Infrastructure

274 followers · Contact host

Event description

Many scientific modelling programs rely on numerical iterative methods, e.g. Finite Difference method, Conjugate Gradient method etc. or stochastic methods such as Monte-Carlo methods. The nature of those methods heavily involves iterations. More often they are the bottleneck of the code's performance.

OpenMP is a directive-based API (application programming interface) for writing parallel programs on a shared-memory system. The implementation renders parallelism for programs by running concurrent multithreads. The common usage case is to accelerate nested loops by sharing workloads between multiple threads, which had been its main delivery before OpenMP 3.0.

The workshop is designed to introduce some most common yet powerful OpenMP practices to scientists to quickly turn a serial iterative C code into parallel.

Check the Common HPC and Accelerator Tools section below for more information.

If you have any questions regarding this training, please contact training.nci@anu.edu.au.

Prerequisites

Knowledge about C preprocessor directives, functions, pointer array.

Basic experience with C/C++ is required.
Basic experience with CLI and Git.
Having a valid NCI account.
The training session is driven on the NCI ARE service. You can find relevant documentations here: ARE User Guide.

Learning Outcomes

At the completion of this training session, you will be able to

know when to use OpenMP,
create Parallel Construct,
create a team of threads,
identify potential data race conditions,
distinguish data storage attributes,
understand how to split loop iterations to improve efficiency,
understand the limitations of multithreaded programming,
feel confident to advance to next-level parallel programming.

Topics Covered

Threading in OpenMP
Shared-memory system v.s. Distributed-memory system
Loop parallelism methodologies
Parallel construct
Worksharing-loop construct
Reduction
Data race condition
OpenMP Library routines
Synchronisations
Data storage attributes
Loop Scheduling
Profiling OpenMP

Common HPC and Accelerator Tools

Tool	Category	Language/API	Parallelism Type	Target Hardware	Typical Use Case
CuPy	Python library	Python (NumPy-compatible API)	Data-parallel GPU	NVIDIA GPUs	GPU-accelerated array and matrix operations as a drop-in replacement for NumPy
CUDA	GPU programming platform	C/C++ API (also Python via PyCUDA, Numba)	Data-parallel GPU	NVIDIA GPUs	Writing custom GPU kernels and fine-grained GPU code
OpenACC	Compiler directives	C/C++, Fortran pragmas	Data-parallel GPU/CPU	GPUs & other accelerators	Annotating loops to offload work to accelerators
OpenMP	Compiler directives	C/C++, Fortran pragmas	Shared-memory CPU	Multi-core CPUs	Parallelizing loops and regions on a single node
MPI	Library & standard	C/C++, Fortran, Python (via mpi4py)	Distributed-memory	Clusters & networks	Decomposing work across processes/nodes with message passing