6 Job Arrays
When you have many independent samples to process — dozens or hundreds of files — submitting one job per sample is tedious and error-prone. A SLURM job array lets you submit once and run many sub-jobs, each automatically assigned a unique index.
- Submit once, run hundreds of times
- One parent job ID tracks the entire batch
- Control how many sub-jobs run concurrently
- Trivially scalable from 10 to 1,000 samples
6.1 The Core Concept: $SLURM_ARRAY_TASK_ID
SLURM sets this environment variable automatically in each sub-job. Your script uses it to select a different input for each task.
| Variable | Description |
|---|---|
$SLURM_ARRAY_JOB_ID |
Parent job ID (same for all sub-jobs) |
$SLURM_ARRAY_TASK_ID |
Index for this specific sub-job (1, 2, 3…) |
$SLURM_JOB_ID |
Unique ID for this sub-job’s SLURM step |
Array jobs appear in squeue as 12345_1, 12345_2, etc.
6.2 Three-Script Architecture
While we previously saw that a two-file system was best practice for job scripts, to deal with the added complexity of a job array cleanly, a three file system should be employed:
config.sh— all paths and resource parameters in one placerun_fastqc.sh(launcher) — calculates array size and submitsfastqc.sh(job script) — does the actual work using$SLURM_ARRAY_TASK_ID
6.2.1 Component 1: config.sh
Centralize every variable that might change between runs. The launcher and job script both source this file.
# config.sh
export ID="MY_PROJECT"
export IN_LIST="/rs1/researchers/s/smith/project/sample_list.txt"
export IN_DIR="/rs1/researchers/s/smith/project/raw_reads"
export FASTQC_SIF="/rs1/shares/brc/admin/containers/images/quay.io_biocontainers_fastqc:0.12.1--hdfd78af_0.sif"
mkdir -p /share/$GROUP/$USER/01_fastqc_results
export OUT_DIR="/share/$GROUP/$USER/01_fastqc_results"
# Resource parameters (used by launcher)
export CPUS=2
export MEM="4G"
export TIME="2:00:00"
export PARTITION="shared"You may notice that variables in the config file are not set directly, as we saw before, but are instead exported (export VARIABLE=""). Since the config file will be sourced in the launcher script which in turn submits other jobs, exporting variables will make sure all future subprocesses also receive the same variables.
Always validate that critical files exist before launching. Jobs that fail immediately waste queue time and your fair-share allocation.
6.2.2 Component 2: run_fastqc.sh (launcher)
Runs on the login node. Sources the config, calculates array size from the sample list, and submits.
#!/bin/bash
source ./config.sh
# Validate that the input files exist
[[ -f "$IN_LIST" ]] || { echo "ERROR: sample list not found: $IN_LIST"; exit 1; }
# In this exmple, every line in the "$IN_LIST" file is sample to be processed
NUM_SAMPLES=$(wc -l < "$IN_LIST")
echo "Submitting ${JOB_NAME} array for ${NUM_SAMPLES} samples..."
JOB_ID=$(sbatch \
--job-name="${JOB_NAME}" \
--array="1-${NUM_SAMPLES}%${NUM_SAMPLES}" \
--ntasks=1 \
--cpus-per-task="${CPUS}" \
--mem="${MEM}" \
--partition="${PARTITION}" \
--time="${TIME}" \
--output="logs/${JOB_NAME}.%A_%a.out" \
--error="logs/${JOB_NAME}.%A_%a.err" \
./fastqc.sh
echo "Submitted job array ID: ${JOB_ID}"
echo "Monitor with: squeue -j ${JOB_ID}"Array syntax explained:
| Part | Example | Meaning |
|---|---|---|
--array |
--array=1-50 |
Create sub-jobs with indices 1 through 50 |
%N suffix |
--array=1-50%10 |
Limit to 10 running concurrently |
Output tokens for array jobs: - %A = array job ID (parent) - %a = array task index - %j = unique job ID for each sub-job
6.2.3 Component 3: fastqc.sh (job script)
Contains the actual work. Note there are no #SBATCH resource directives here — they’re all passed by the launcher. The critical line is how $SLURM_ARRAY_TASK_ID selects one sample from the list.
#!/bin/bash
source ./config.sh
# Select sample for this sub-job, the "sed" command selects the line in the "$IN_LIST" file corresponding to the index: "${SLURM_ARRAY_TASK_ID}"
SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST")
echo "Task ${SLURM_ARRAY_TASK_ID}: processing ${SAMPLE}"
echo "Job ID: ${SLURM_JOB_ID} | Host: $(hostname) | Started: $(date)"
# Validate inputs
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found"; exit 1; }
[[ -d "${IN_DIR}" ]] || { echo "ERROR: input dir not found"; exit 1; }
# Execute
module load apptainer
apptainer exec "${FASTQC_SIF}" fastqc \
--threads "${SLURM_CPUS_PER_TASK}" \
--outdir "${OUT_DIR}" \
"${IN_DIR}/${SAMPLE}_"*.fastq*
echo "Completed: $(date)"The line SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST") is what makes each sub-job unique:
- Task 1 → line 1 →
sample_001 - Task 2 → line 2 →
sample_002 - Task 3 → line 3 →
sample_003
6.3 Running the Pipeline
$ mkdir -p logs
$ ./run_fastqc.shMonitor progress:
$ squeue -u $USER # all your jobs
$ squeue -j 12345 # specific array
$ squeue -j 12345 --format="%i %T %r" # index, state, reason6.4 Advanced: Non-Sequential and Failed-Job Resubmission
# Run only odd-numbered tasks
$ sbatch --array=1-100:2 fastqc.sh
# Run specific indices
$ sbatch --array=5,12,23 fastqc.sh # resubmit only failed tasks
# Skip certain indices (submit 1-100 except 7 and 15)
$ sbatch --array=1-6,8-14,16-100 fastqc.sh6.5 Identifying Failed Sub-Jobs
# Check exit codes for all sub-jobs
$ sacct -j 12345 --format=JobID,State,ExitCode
# List only failed sub-jobs
$ sacct -j 12345 --format=JobID,State --noheader \
| awk '$2 != "COMPLETED" {print $1}'