The Helmholtz AI team at Forschungszentrum Jülich invites you for a Prologue Day on campus, for a full range of satellite events that go beyond the program of the main conference. We aim to offer an interactive platform where you can delve deeper into specific topics of applied AI, foster collaboration, and provide hands-on learning experiences. Pick your favourite workshop, tutorial, or hackathon!

What’s more, you can join our lab tours, explore the FZJ facilities, and learn more about our AI-related initiatives. It is our pleasure to handle the logistics for you, providing transportation from Düsseldorf to Jülich in the morning and back in the evening, ensuring a seamless transition to the Helmholtz AI conference the following day.

Where: Forschungszentrum Jülich
When: June 11, 2024, (full day)

EXTRA REGISTRATION NEEDED

Please note: For the proper organization of all satellite events and access to FZJ campus a separate registration for participation in the Prologue Day is mandatory! Non-registered individuals cannot get access to FZJ campus.
Registration for the Prologue Day and the individual events: https://go.fzj.de/haicon24_prologue_day
Please register by May 7, 2024.

All satellite events a submitted at github. Some are still open for discussion. Feel free to indicate your interest or join the discussion.

PRELIMINARY PROGRAM

  • 8:30 AM | Shuttle from Düsseldorf to FZJ
  • Morning:
    • Start of full-day satellite events
    • Lab Tours:
      • Super Computers at Jülich Super Computing Centre
      • Quantum Computing at Jülich Super Computing Center
      • Ion-Beam Analysis Lab for material analysis
      • Neuroscience Lab Tour: The Journey from Brain to HPC
      • more to come
  • Lunch
  • Afternoon: Satellite Events (see below)
Ethics of Artificial Intelligence in Research Contexts

Leads: Bert Heinrichs, Jan-Hendrik Heinrichs, Charles Rathkopf
AG Neuroethics and Ethics of AI, Institute for Neuroscience and Medicine 7: Brain and Behaviour, Forschungszentrum Jülich

Duration: half-day

The workshop will discuss current issues in the ethics of Artificial Intelligence in research Contexts. Topics will include the ethical dimensions of AI-systems in general as well as particular issues arising in specialized contexts of application and particular types of AI-systems.
Amongst them:

  • Dual use potential
  • AI in medical diagnostics
  • use of unethical / illegal training data
Foundation Models for Topological Data – Challenges and Opportunities

Leads: Timo Dickscheid1,2,3, Christian Schiffer1,2, Susanne Wenzel1,2, Martin Schultz4
1 Helmholtz AI, Forschungszentrum Jülich
2 Institute or Neuroscience and Medicine (INM-1), Forschungszentrum Jülich
3 Institute of Computer Science, Heinrich-Heine-University Düsseldorf
4 Jülich Super Computing Centre, Forschungszentrum Jülich

OPEN CALL FOR CONTRIBUTIONS!

Duration: half-day

Foundation models have started to revolutionise the field of artificial intelligence, and with that many scientific and industrial applications. So far, research on these powerful generalist models resolves around models that operate on images or written language, or sometimes both combined.

In comparison, topological data (e.g., attributed graphs) has received relatively little attention with respect to the development of foundation models, despite its central role in many scientific domains, including climate modelling, neuroscience, molecular science, and remote sensing.

With this workshop, we will present and discuss applications from different scientific domains which rely on analysing topological data and aim to identify common challenges and potential solutions towards developing foundation models for topological data.

Helmholtz Foundation Model Initiative Get-together

Leads: Stefan Kesselheim1, Stefan Bauer2
1 Jülich Supercomputing Centre, Forschungszentrum Jülich, Helmholtz AI
2 Helmholtz Munich, Helmholtz AI

Duration: Half-day

As the Helmholtz Foundation Model Initiative projects are being started in May 2024, we want to bring members of all funded projects and the Synergy Unit together. The projects will be requested to provide an introductory presentation about their plans, and we will plan for an open discussion, so the participants can get to know each other. This aims at creating a productive and constructive working environment for the initiative of mutual support and cooperation.

Bringing Deep Learning Workloads to JSC supercomputers

Leads: Sabrina Benassou, Alexandre Strube
JSC – Helmholtz AI Jülich

Duration: half-day

Fancy using High Performance Computing machines for AI? Fancy learning how to run your code on one of Europe’s fastest computers JUWELS Booster at FZJ?

In this tutorial, we will guide you through the first steps of using the supercomputer machines for your own AI application. This workshop should be tailored to your needs – and our team will guide you through questions like:

  • How do I get access to the machines?
  • How do I use the pre-installed, optimized software?
  • How can I run my own code?
  • How can I store data so I can access it fast in training?
  • How can parallelize my training and use more than one GPU?

In this tutorial, we will try to get your code and your workflow running and would like to make the start on a supercomputer as smooth as possible. After this course, you are not only ready to use not only HAICORE but you have made your first step into unlocking compute resources even on the largest scale with a compute time application at the Gauss Supercomputing Center.
FIND DETAILS AND REQUIREMENTS HERE.

Uncertainty Quantification of ML models

Leads: Peter Steinbach
HZDR, Helmholtz AI

Duration: half-day

In this tutorial, we will give a hands-on introduction to uncertainty quantification for ML models. We will focus on MCDropout and DeepEnsembles as the traditional methods used in the field. We hope to attract speakers which can provide the lay of land in more recent methods beyond the ones just mentioned. We hope to conclude the workshop with practical advice for the participants attending the workshop.
FIND DETAILS AND REQUIREMENTS HERE.

Efficient Hyperparameter Optimization for Machine Learning

Leads: Marcel Aach1,2, Xin Liu1
1 JSC, Forschungszentrum Jülich
2 University of Iceland

Duration: half-day

The performance of machine learning models is highly dependent on their hyperparameters that are set by the user before the training. The hyperparameters define the general architecture of the model (e.g., via the number of layers or the neurons per layer in a neural network) and control the learning process (e.g., via the learning rate of the optimizer or the weight decay). However, searching for optimal hyperparameter values is a long and resource-intensive process, as many different combinations need to be evaluated and the final performance of a combination can usually only be measured after a machine learning model is fully trained.

This tutorial presents a systematic introduction to the field of Hyperparameter Optimization (HPO) and demonstrates how to make use of resource-efficient methods to reduce the runtime of HPO in small and large settings on High-Performance Computing systems. Two HPO optimization libraries (Ray Tune and DeepHyper) are introduced, making use of evolutionary, Bayesian, and early stopping-based algorithms. As HPO is a general method and can be adapted to any machine learning model, it is useful for scientists from many different domains.
FIND DETAILS AND REQUIREMENTS HERE.

HPC for Researchers

Leads: Vytautas Jančauskas, Daniela Espinoza Molina, Antony Zappacosta, Roman Zitlau
DLR, Helmholtz AI

Duration: full-day

An introduction to HPC for reasearchers using Python. We will be using the HAICORE platform as an example. The “Helmholtz AI computing resources” (HAICORE) provide easy and low-barrier GPU access to the entire AI community within the Helmholtz Association. In this tutorial you will learn to:

  • Gain access to the platform, set up 2FA and log-in.
  • Understand basic HPC concepts (distributed computing, etc.)
  • Set up your own software environment using conda.
  • Request and use GPU and CPU resources through SLURM.
  • Set up and use Dask to distribute your data science workflows.
  • Accelerate your software with Numba.
  • Write custom CUDA kernels in Python.
    FIND DETAILS AND REQUIREMENTS HERE.
Introduction to Simulation Based Inference: enhancing synthetic models with Artificial Intelligence

Leads: Alina Bazarova, Stefan Kesselheim,
Helmholtz AI, Jülich Supercomputing Center, Forschungszentrum Jülich

Duration: half-day

Artificial intelligence (AI) techniques are constantly changing scientific research, but their potential to enhance simulation pipelines is not widely recognised. Conversely, Bayesian inference is a well-established method in the research community, offering distributional estimates of model parameters and the ability to update models with new data. However, traditional Bayesian inference often faces computational challenges and limited parallelisation capabilities.

Simulation Based Inference (SBI) presents a comprehensive solution by combining simulations, AI techniques, and Bayesian methods. SBI utilizes AI-driven approximate Bayesian computation to significantly reduce inference times and produce reliable estimates, even with sparse observed data. This approach allows any representative simulation model to inform parameter constraints, leading to approximate posterior distributions. Furthermore, SBI enables workload distribution across high-performance computing clusters, further decreasing runtime.

This tutorial explores the theoretical foundations and provides hands-on training for constructing tailored SBI frameworks for specific models. Through practical examples, participants will gain insights into different levels of model granularity, ranging from a simple black box approach to a highly customizable design. By participating in this tutorial, attendees will develop the skills necessary to implement Simulation Based Inference in their own research projects.

Topics to be covered:

  • Key features of Bayesian Inference and examples when classical Bayesian approach fails;
  • Typical SBI pipelines: one-liner, flexible interface, summary statistics;
  • Different SBI methods: SNLE, SNRE, SNPE;
  • Neural network architectures behind SBI;
  • Parallelisation and distributing DBI over multiple nodes.
    FIND DETAILS AND REQUIREMENTS HERE.
Accelerating massive data processing in Python with Heat 

Leads: Fabian Hoppe1, Kai Krajsek2, Claudia Comito2
1 Deutsches Zentrum für Luft- und Raumfahrt e.V., Institut für Softwaretechnologie, High-Performance Computing, Köln
2 Forschungszentrum Jülich GmbH, Institute for Advanced Simulation, Jülich Supercomputing Centre

Duration: half-day

Many data processing workflows in science and technology build on Python libraries like NumPy, SciPy, scikit learn etc., that are easy-to-learn and easy-to-use. In addition, these libraries are based on highly optimized computational backends and thus allow to achieve quite a competitive performance – at least if no GPU-acceleration is taken into account and as long as the memory of a single workstation/cluster-node is sufficient for all required tasks.

However, in the presence of steadily growing data sets the limitation to the RAM of a single machine may pose a severe obstacle. Nevertheless, the step from a workstation to a (GPU-)cluster can be challenging for domain experts without prior HPC-experience.

This group of users is targeted by our Python library Heat (“Helmholtz Analytics Toolkit”) to which we want to give a brief hands-on introduction in this tutorial. Our library builds on PyTorch and mpi4py and simplifies porting of NumPy/SciPy-based code to GPU (CUDA, ROCm), including multi-GPU, multi-node clusters. On the surface, Heat implements a NumPy-like API, is largely interoperable with the Python array ecosystem, and can be employed seamlessly as a backend to accelerate existing single-CPU pipelines, as well as to develop new HPC-applications from scratch. Under the hood, Heat distributes memory-intensive operations and algorithms via MPI-communication and thus avoids some of the overhead that is often introduced by different, task-parallelism-based libraries for scaling NumPy/SciPy/scikit-learn applications.

In this tutorial you will get an overview of:

  • Heats basics: getting started with distributed I/O, data decomposition scheme, array operations;
  • Existing functionalities: multi-node linear algebra, statistics, signal processing, machine learning…;
  • DIY how-to: using existing Heat infrastructure to build your own multi-node, multi-GPU research software.

We will also touch upon Heat’s implementation roadmap, and possible paths to collaboration.

Understanding SHAP for Interpretable Machine Learning

Leads: Nicolás Nieto1,2, Federico Raimondo1, Vera Komeyer1,2,3
1 Institute of Neuroscience and Medicine (INM-7: Brain and Behaviour), Research Centre Jülich, Jülich, Germany
2 nstitute of Systems Neuroscience, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
3 Department of Biology, Faculty of Mathematics and Natural Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany

Duration: 2 hours (tutorial) + 4 hours (hands-on)

This tutorial and hands-on aim to provide participants with a comprehensive understanding of SHAP for interpreting machine learning models. The scope includes an explanation and exploration of SHAP principles, practical implementations, and considerations for model interpretation. Participants will gain proficiency in leveraging SHAP values to enhance the explainability of machine learning models across various scenarios, including dealing with unbalanced data, collinear features and the nuanced relationship between causality and correlation.
FIND TIMELINE AND DETAILS HERE.

Model generalization hackathon: Projecting future climate impacts to crop yields

Leads: Lily-Belle Sweet1, Daniel Klotz1, Brian Groenke2
1 Department of Compound Environmental Risks, Helmholtz Centre for Environmental Research (UFZ)
2 Alfred Wegener Institute

Duration: full-day

Can we learn from the recent past to predict climate impacts in the future?

Machine learning models are frequently trained on observed data from the last decades and then used to make projections of future climate change impacts. However, the ability of such models to generalise to these unseen conditions outside of the observed distribution is not guaranteed. How far into the future can we make good predictions? Which types of models or training methods do better or worse? Can domain generalisation strategies help, and if so, how much? We have created a benchmark dataset to help answer these questions, using simulated agricultural maize and wheat yields from biophysical crop models.

Participants will train models to predict end-of-season annual gridded yields for current global cropping areas using data from 1980-2020, using daily growing-season climate data (precipitation, solar radiation, temperature), soil texture and CO2 concentration at 0.5-degree spatial resolution. The models will then be evaluated based on their ability to predict yields from 2020 to 2100 under a high-emissions climate change scenario (RCP8.5).

No climate/agricultural knowledge is needed to participate, and the data will be processed to be very straightforward to work with. It’s a large, multivariate, and high-dimensional dataset, so there is a lot of scope to experiment with interesting model architectures.

Timeline: The day will start with an introduction to the problem, including a quick explainer of the life cycle of maize and wheat and how their growth is affected by weather, followed by a short hands-on tutorial for downloading the data (probably via Kaggle) and using it to train a simple model. After this, participants are free to work individually or in teams to train their models and submit their predictions. At the end of the day, we invite each team/participant to briefly present what they did and share their experiences. We will show not just an overall leaderboard, but also a breakdown of scores for each decade.

Further outcomes: This event would be part of a broader activity organised by the AgMIP Machine Learning team AgML, and the challenge will remain online for participation for ~six months outside of this satellite event. We aim to then publish the dataset as a benchmark for model generalisation, along with some analysis of the results (by breaking down the test scores based on different types of model architectures, modelling methodology choices etc). Hackathon participants may be able to contribute to this if interested but would also be free (and encouraged) to publish their results independently if they are able to develop particularly high-performing models.
FIND DETAILS AND REQUIREMENTS HERE.