More dates

Payment plans available!

How payment plans work

  • Your order will be reserved but sent to you only after the full payment plan has been completed.
  • A minimum upfront payment is required to secure your order. This includes a surcharge, a non-refundable cancellation fee, and a refundable deposit.
  • You’ll receive a notification before each payment attempt. You must ensure sufficient funds are available.

Introduction to NCI’s Data Catalogue and Indexing Schemes

Share
Online Event
Add to calendar

Wed, 11 Jun, 12am - 3am EDT

Event description

We’re hosting a tutorial to introduce the NCI data catalogue and its two indexing schemes: Intake-ESM and Intake-Spark.

A data catalogue helps users discover and access datasets through structured metadata, while indexing improves performance by enabling fast, targeted searches. Built on the Python Intake package, these tools support scalable, memory-efficient access to large datasets. At NCI, Intake-Spark uses Parquet-based indexes for high-performance querying with Spark, while Intake-ESM uses lightweight CSV-based indexes ideal for climate data workflows. 

This session will include hands-on Jupyter Notebook examples showing how to use the catalogue in data analysis and machine learning workflows. You’ll learn how to search, load, and filter datasets efficiently from the /g/data collections. 

The tutorial is ideal for researchers working with large-scale data or looking to streamline their pipelines.

If you have any questions regarding this training, please contact training.nci@anu.edu.au.

Prerequisites

    1. Experience with Python.
    2. Experience with bash or similar Unix shells.
    3. Having a valid NCI account 
    4. Experience using NCI ARE service is recommended. You can find relevant documentations here: ARE User Guide.

Learning Outcomes

After this training session, you will be able to

  • Learn about NCI data services
  • Understand NCI data catalogue and schemes
  • Perform search, load, and filter datasets efficiently from the /g/data collection
  • Can use data catalog in data analysis and machine learning workflows.


Topics Covered

  • Welcome and Introduction to NCI’s Intake-Spark and Intake-ESM Indexing Schemes
  • Overview of NCI’s Data Catalogue Services
  • Working with the Intake-ESM Indexing Scheme
  • Applying the Intake-ESM Scheme in AI/ML Workflows
  • Using the Intake-Spark Indexing Scheme

Powered by

Tickets for good, not greed Humanitix dedicates 100% of profits from booking fees to charity

Online Event