Running Dask with SLURM and MPI
Install dask-mpi
Before you begin, you need to install dask-mpi
and mpi4py
. These packages can facilitate the creation of the Dask cluster using MPI.
Due to several issues with dask-mpi
and mpi4py
provided by the conda-forge
repository, please install them with pip
.
# Activate your conda environment
# Replace `dask-demo` with the name of your conda environment
conda activate dask-demo
# Install with pip
pip install dask-mpi mpi4py
Launching Dask
Running these commands in the login shell will launch a Dask cluster. The commands below only specify required parameters, and you may need extra paramenters to customize your job submission.
# Launch Dask
# Replace `--cpus-per-task=24` with the number of CPU threads needed
# Replace `--mem=20480 with` the amount of RAM needed (in MB)
# Replace `-np 10` with the number of workers needed plus one (1 scheduler, n-1 workers)
sbatch -A ssg --cpus-per-task=24 --mem=20480 mpirun -np 10 `which dask-mpi` --no-nanny --scheduler-file ~/scheduler.json
# Add extra workers
# Replace `--cpus-per-task=24` with the number of CPU threads needed
# Replace `--mem=20480 with` the amount of RAM needed (in MB)
# Replace `-n 5` with the number of workers needed
sbatch -A ssg --cpus-per-task=24 --mem=20480 mpirun -n 5 `which dask-mpi` --no-scheduler --no-nanny --scheduler-file ~/scheduler.json
Connect from Jupyter Notebook
To connect to the Dask cluster you just launched, use the following Python code in your Jupyter Notebook.
from dask.distributed import Client, progress
# Load scheduler config file (generated by dask-mpi) from home directory
client = Client(scheduler_file='~/scheduler.json')
client