6 Array Submission of Multiple Independent R Jobs
Consider the need to obtain random numbers across varying sample sizes and means.
\[N = \begin{cases} 250 \\ 500 \\ 750 \end{cases}, \mu = \begin{cases} 0 \\ 1.5 \end{cases}\]
6.1 Sample Job Script
sim_job.R
# Expect command line args at the end.
= commandArgs(trailingOnly = TRUE)
args # Skip args[1] to prevent getting --args
# Extract and cast as numeric from character
rnorm(n = as.numeric(args[2]), mean = as.numeric(args[3]))
Download a copy onto the cluster with:
wget https://hpc.thecoatlessprofessor.com/slurm/scripts/sim_job.R
chmod +x sim_job.R
6.2 Sample Parameter Inputs
inputs.txt
250 0
500 0
750 0
250 1.5
500 1.5
750 1.5
Download a copy onto the cluster with:
# Download a pre-made inputs.txt onto the cluster
wget https://hpc.thecoatlessprofessor.com/slurm/scripts/inputs.txt
Note: Parameters are best generated using expand.grid()
.
= c(250, 500, 750)
N_vals = c(0, 1.5)
mu_vals
= expand.grid(N = N_vals, mu = mu_vals)
sim_frame
sim_frame# 250 0.0
# 500 0.0
# 750 0.0
# 250 1.5
# 500 1.5
# 750 1.5
Write the simulation parameter configuration to inputs.txt
with:
write.table(sim_frame, file = "inputs.txt",
col.names = FALSE, row.names = FALSE)
6.3 Array Job Launch
sim_array_launch.slurm
#!/bin/bash
## Describe requirements for computing ----
## Name the job to ID it in squeue -u $USER
#SBATCH --job-name=myjobarray
## Send email on any change in job status (NONE, BEGIN, END, FAIL, ALL)
## Note: To be notified on each task on the array use: ALL,ARRAY_TASKS
#SBATCH --mail-type=ALL
## Email address of where the notification should be sent.
#SBATCH --mail-user=netid@illinois.edu
## Amount of time the job should run
## Note: specified in hour:min:sec, e.g. 01:30:00 is a 1 hour and 30 min job.
#SBATCH --time=00:10:00
## Request a single node
#SBATCH --ntasks=1
## Specify number of CPU cores for parallel jobs
## Note: Leave at 1 if not running in parallel.
#SBATCH --cpus-per-task=1
## Request a maximum amount of RAM per CPU core
## Note: For memory intensive work, set to a higher amount of ram.
#SBATCH --mem-per-cpu=5gb
## Standard output and error log
#SBATCH --output=myjobarray_%A-%a.out
# Array range
#SBATCH --array=1-6
## Setup computing environment for job ----
## Create a directory for the data output based on the SLURM_ARRAY_JOB_ID
mkdir -p ${SLURM_SUBMIT_DIR}/${SLURM_ARRAY_JOB_ID}
## Switch directory into job ID (puts all output here)
cd ${SLURM_SUBMIT_DIR}/${SLURM_ARRAY_JOB_ID}
## Run simulation ----
## Load a pre-set version of R
module load R/3.6.2
## Grab the appropriate line from the input file.
## Put that in a shell variable named "PARAMS"
export PARAMS=`cat ${HOME}/inputs.txt | sed -n ${SLURM_ARRAY_TASK_ID}p`
## Run R script in batch mode without file output
Rscript $HOME/sim_job.R --args $PARAMS
Download a copy and run it on the cluster with:
# Download script file
wget https://hpc.thecoatlessprofessor.com/slurm/scripts/sim_array_launch.slurm
# Queue the job on the Cluster
sbatch sim_array_launch.slurm
Note: %A
will be replaced by the value of the SLURM_ARRAY_JOB_ID
environment variable and %a
will be replaced by the value of SLURM_ARRAY_TASK_ID
environment variable. For example, SLURM_ARRAY_JOB_ID
corresponds to the number assigned to the job in the queue and SLURM_ARRAY_TASK_ID
relates to a value in the job array. In the case of this example, the SLURM_ARRAY_TASK_ID
would take on values from 1 to 6.