Parallel Python - Introduction to Dask
Event description
The "Introduction to Dask" workshop offers a comprehensive guide to utilising Dask, a parallel computing library for Python designed to scale data analysis from single machines to clusters. Participants will explore Dask's core components, including task graphs, arrays, dataframes, delayed, and futures. The workshop also covers distributed computing with Dask and its integration with machine learning through Dask-ML. A Python virtual environment is provided for hands-on exercises.
Prerequisites
Experience with Python.
Experience with bash or similar Unix shells.
Having a valid NCI account and vp91 membership (instructions will be sent out before the event)
The training session is driven on the NCI ARE service. You can find relevant documentations here: ARE User Guide.
Learning Outcomes
Describe what Dask is and when to use it for large data or parallel tasks.
Work with Dask Arrays and DataFrames to handle datasets that don’t fit in memory.
Use
dask.delayed
to turn regular Python code into parallel tasks.Run Dask on a laptop or a cluster using the Dask Distributed scheduler.
Monitor their computations using the Dask Dashboard.
Combine Dask with tools they already know, like NumPy and Pandas.
Topics Covered
Foundation Topics
Task Graphs
Dask Arrays
Dask DataFrame
Dask Delayed
Dask Futures
Distributed Dask
Machine Learning Topics
Dask ML
Hyper Parameter Search
Parallel Prediction
Incremental Learning
Distributed Learning
Contacts
If you have any questions regarding the training content, please contact training.nci@anu.edu.au.
Tickets for good, not greed Humanitix dedicates 100% of profits from booking fees to charity