Running Dask with SLURM and MPI

Install dask-mpi

Before you begin, you need to install dask-mpi and mpi4py. These packages can facilitate the creation of the Dask cluster using MPI.

Due to several issues with dask-mpi and mpi4py provided by the conda-forge repository, please install them with pip.

# Activate your conda environment
# Replace `dask-demo` with the name of your conda environment
conda activate dask-demo

# Install with pip
pip install dask-mpi mpi4py

Launching Dask

Running these commands in the login shell will launch a Dask cluster. The commands below only specify required parameters, and you may need extra paramenters to customize your job submission.

# Launch Dask
# Replace `--cpus-per-task=24` with the number of CPU threads needed
# Replace `--mem=20480 with` the amount of RAM needed (in MB)
# Replace `-np 10` with the number of workers needed plus one (1 scheduler, n-1 workers)
sbatch -A ssg --cpus-per-task=24 --mem=20480 mpirun -np 10 `which dask-mpi` --no-nanny --scheduler-file ~/scheduler.json

# Add extra workers
# Replace `--cpus-per-task=24` with the number of CPU threads needed
# Replace `--mem=20480 with` the amount of RAM needed (in MB)
# Replace `-n 5` with the number of workers needed
sbatch -A ssg --cpus-per-task=24 --mem=20480 mpirun -n 5 `which dask-mpi` --no-scheduler --no-nanny --scheduler-file ~/scheduler.json

Connect from Jupyter Notebook

To connect to the Dask cluster you just launched, use the following Python code in your Jupyter Notebook.

from dask.distributed import Client, progress

# Load scheduler config file (generated by dask-mpi) from home directory
client = Client(scheduler_file='~/scheduler.json')
client