6 Job Arrays

When you have many independent samples to process — dozens or hundreds of files — submitting one job per sample is tedious and error-prone. A SLURM job array lets you submit once and run many sub-jobs, each automatically assigned a unique index.

Why Use Job Arrays?

Submit once, run hundreds of times
One parent job ID tracks the entire batch
Control how many sub-jobs run concurrently
Trivially scalable from 10 to 1,000 samples

6.1 The Core Concept: `$SLURM_ARRAY_TASK_ID`

SLURM sets this environment variable automatically in each sub-job. Your script uses it to select a different input for each task.

Variable	Description
`$SLURM_ARRAY_JOB_ID`	Parent job ID (same for all sub-jobs)
`$SLURM_ARRAY_TASK_ID`	Index for this specific sub-job (1, 2, 3…)
`$SLURM_JOB_ID`	Unique ID for this sub-job’s SLURM step

Array jobs appear in squeue as 12345_1, 12345_2, etc.

6.2 Three-Script Architecture

While we previously saw that a two-file system was best practice for job scripts, to deal with the added complexity of a job array cleanly, a three file system should be employed:

config.sh — all paths and resource parameters in one place
run_fastqc.sh (launcher) — calculates array size and submits
fastqc.sh (job script) — does the actual work using $SLURM_ARRAY_TASK_ID

6.2.1 Component 1: `config.sh`

Centralize every variable that might change between runs. The launcher and job script both source this file.

# config.sh
export ID="MY_PROJECT"
export IN_LIST="/rs1/researchers/s/smith/project/sample_list.txt"
export IN_DIR="/rs1/researchers/s/smith/project/raw_reads"
export FASTQC_SIF="/rs1/shares/brc/admin/containers/images/quay.io_biocontainers_fastqc:0.12.1--hdfd78af_0.sif"

mkdir -p /share/$GROUP/$USER/01_fastqc_results
export OUT_DIR="/share/$GROUP/$USER/01_fastqc_results"

# Resource parameters (used by launcher)
export CPUS=2
export MEM="4G"
export TIME="2:00:00"
export PARTITION="shared"

You may notice that variables in the config file are not set directly, as we saw before, but are instead exported (export VARIABLE=""). Since the config file will be sourced in the launcher script which in turn submits other jobs, exporting variables will make sure all future subprocesses also receive the same variables.

Warning

Always validate that critical files exist before launching. Jobs that fail immediately waste queue time and your fair-share allocation.

6.2.2 Component 2: `run_fastqc.sh` (launcher)

Runs on the login node. Sources the config, calculates array size from the sample list, and submits.

#!/bin/bash
source ./config.sh

# Validate that the input files exist
[[ -f "$IN_LIST" ]] || { echo "ERROR: sample list not found: $IN_LIST"; exit 1; }

# In this exmple, every line in the "$IN_LIST" file is sample to be processed
NUM_SAMPLES=$(wc -l < "$IN_LIST")

echo "Submitting ${JOB_NAME} array for ${NUM_SAMPLES} samples..."

JOB_ID=$(sbatch \
    --job-name="${JOB_NAME}" \
    --array="1-${NUM_SAMPLES}%${NUM_SAMPLES}" \
    --ntasks=1 \
    --cpus-per-task="${CPUS}" \
    --mem="${MEM}" \
    --partition="${PARTITION}" \
    --time="${TIME}" \
    --output="logs/${JOB_NAME}.%A_%a.out" \
    --error="logs/${JOB_NAME}.%A_%a.err" \
    ./fastqc.sh 
    
echo "Submitted job array ID: ${JOB_ID}"
echo "Monitor with: squeue -j ${JOB_ID}"

Array syntax explained:

Part	Example	Meaning
`--array`	`--array=1-50`	Create sub-jobs with indices 1 through 50
`%N` suffix	`--array=1-50%10`	Limit to 10 running concurrently

Output tokens for array jobs: - %A = array job ID (parent) - %a = array task index - %j = unique job ID for each sub-job

6.2.3 Component 3: `fastqc.sh` (job script)

Contains the actual work. Note there are no #SBATCH resource directives here — they’re all passed by the launcher. The critical line is how $SLURM_ARRAY_TASK_ID selects one sample from the list.

#!/bin/bash
source ./config.sh

# Select sample for this sub-job, the "sed" command selects the line in the "$IN_LIST" file corresponding to the index: "${SLURM_ARRAY_TASK_ID}" 
SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST")

echo "Task ${SLURM_ARRAY_TASK_ID}: processing ${SAMPLE}"
echo "Job ID: ${SLURM_JOB_ID} | Host: $(hostname) | Started: $(date)"

# Validate inputs
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found"; exit 1; }
[[ -d "${IN_DIR}" ]] || { echo "ERROR: input dir not found"; exit 1; }

# Execute
module load apptainer

apptainer exec "${FASTQC_SIF}" fastqc \
    --threads "${SLURM_CPUS_PER_TASK}" \
    --outdir "${OUT_DIR}" \
    "${IN_DIR}/${SAMPLE}_"*.fastq*

echo "Completed: $(date)"

Important

The line SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST") is what makes each sub-job unique:

Task 1 → line 1 → sample_001
Task 2 → line 2 → sample_002
Task 3 → line 3 → sample_003

6.3 Running the Pipeline

$ mkdir -p logs
$ ./run_fastqc.sh

Monitor progress:

$ squeue -u $USER                    # all your jobs
$ squeue -j 12345                    # specific array
$ squeue -j 12345 --format="%i %T %r"  # index, state, reason

6.4 Advanced: Non-Sequential and Failed-Job Resubmission

# Run only odd-numbered tasks
$ sbatch --array=1-100:2 fastqc.sh

# Run specific indices
$ sbatch --array=5,12,23 fastqc.sh    # resubmit only failed tasks

# Skip certain indices (submit 1-100 except 7 and 15)
$ sbatch --array=1-6,8-14,16-100 fastqc.sh

6.5 Identifying Failed Sub-Jobs

# Check exit codes for all sub-jobs
$ sacct -j 12345 --format=JobID,State,ExitCode

# List only failed sub-jobs
$ sacct -j 12345 --format=JobID,State --noheader \
  | awk '$2 != "COMPLETED" {print $1}'

6.1 The Core Concept: $SLURM_ARRAY_TASK_ID

6.2 Three-Script Architecture

6.2.1 Component 1: config.sh

6.2.2 Component 2: run_fastqc.sh (launcher)

6.2.3 Component 3: fastqc.sh (job script)

6.3 Running the Pipeline

6.4 Advanced: Non-Sequential and Failed-Job Resubmission

6.5 Identifying Failed Sub-Jobs

6.1 The Core Concept: `$SLURM_ARRAY_TASK_ID`

6.2.1 Component 1: `config.sh`

6.2.2 Component 2: `run_fastqc.sh` (launcher)

6.2.3 Component 3: `fastqc.sh` (job script)