6  Job Arrays

When you have many independent samples to process — dozens or hundreds of files — submitting one job per sample is tedious and error-prone. A SLURM job array lets you submit once and run many sub-jobs, each automatically assigned a unique index.

NoteWhy Use Job Arrays?
  • Submit once, run hundreds of times
  • One parent job ID tracks the entire batch
  • Control how many sub-jobs run concurrently
  • Trivially scalable from 10 to 1,000 samples

6.1 The Core Concept: $SLURM_ARRAY_TASK_ID

SLURM sets this environment variable automatically in each sub-job. Your script uses it to select a different input for each task.

Variable Description
$SLURM_ARRAY_JOB_ID Parent job ID (same for all sub-jobs)
$SLURM_ARRAY_TASK_ID Index for this specific sub-job (1, 2, 3…)
$SLURM_JOB_ID Unique ID for this sub-job’s SLURM step

Array jobs appear in squeue as 12345_1, 12345_2, etc.

6.2 Three-Script Architecture

While we previously saw that a two-file system was best practice for job scripts, to deal with the added complexity of a job array cleanly, a three file system should be employed:

  1. config.sh — all paths and resource parameters in one place
  2. run_fastqc.sh (launcher) — calculates array size and submits
  3. fastqc.sh (job script) — does the actual work using $SLURM_ARRAY_TASK_ID

6.2.1 Component 1: config.sh

Centralize every variable that might change between runs. The launcher and job script both source this file.

# config.sh
export ID="MY_PROJECT"
export IN_LIST="/rs1/researchers/s/smith/project/sample_list.txt"
export IN_DIR="/rs1/researchers/s/smith/project/raw_reads"
export FASTQC_SIF="/rs1/shares/brc/admin/containers/images/quay.io_biocontainers_fastqc:0.12.1--hdfd78af_0.sif"

mkdir -p /share/$GROUP/$USER/01_fastqc_results
export OUT_DIR="/share/$GROUP/$USER/01_fastqc_results"

# Resource parameters (used by launcher)
export CPUS=2
export MEM="4G"
export TIME="2:00:00"
export PARTITION="shared"

You may notice that variables in the config file are not set directly, as we saw before, but are instead exported (export VARIABLE=""). Since the config file will be sourced in the launcher script which in turn submits other jobs, exporting variables will make sure all future subprocesses also receive the same variables.

Warning

Always validate that critical files exist before launching. Jobs that fail immediately waste queue time and your fair-share allocation.

6.2.2 Component 2: run_fastqc.sh (launcher)

Runs on the login node. Sources the config, calculates array size from the sample list, and submits.

#!/bin/bash
source ./config.sh

# Validate that the input files exist
[[ -f "$IN_LIST" ]] || { echo "ERROR: sample list not found: $IN_LIST"; exit 1; }

# In this exmple, every line in the "$IN_LIST" file is sample to be processed
NUM_SAMPLES=$(wc -l < "$IN_LIST")

echo "Submitting ${JOB_NAME} array for ${NUM_SAMPLES} samples..."

JOB_ID=$(sbatch \
    --job-name="${JOB_NAME}" \
    --array="1-${NUM_SAMPLES}%${NUM_SAMPLES}" \
    --ntasks=1 \
    --cpus-per-task="${CPUS}" \
    --mem="${MEM}" \
    --partition="${PARTITION}" \
    --time="${TIME}" \
    --output="logs/${JOB_NAME}.%A_%a.out" \
    --error="logs/${JOB_NAME}.%A_%a.err" \
    ./fastqc.sh 
    
echo "Submitted job array ID: ${JOB_ID}"
echo "Monitor with: squeue -j ${JOB_ID}"

Array syntax explained:

Part Example Meaning
--array --array=1-50 Create sub-jobs with indices 1 through 50
%N suffix --array=1-50%10 Limit to 10 running concurrently

Output tokens for array jobs: - %A = array job ID (parent) - %a = array task index - %j = unique job ID for each sub-job

6.2.3 Component 3: fastqc.sh (job script)

Contains the actual work. Note there are no #SBATCH resource directives here — they’re all passed by the launcher. The critical line is how $SLURM_ARRAY_TASK_ID selects one sample from the list.

#!/bin/bash
source ./config.sh

# Select sample for this sub-job, the "sed" command selects the line in the "$IN_LIST" file corresponding to the index: "${SLURM_ARRAY_TASK_ID}" 
SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST")

echo "Task ${SLURM_ARRAY_TASK_ID}: processing ${SAMPLE}"
echo "Job ID: ${SLURM_JOB_ID} | Host: $(hostname) | Started: $(date)"

# Validate inputs
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found"; exit 1; }
[[ -d "${IN_DIR}" ]] || { echo "ERROR: input dir not found"; exit 1; }

# Execute
module load apptainer

apptainer exec "${FASTQC_SIF}" fastqc \
    --threads "${SLURM_CPUS_PER_TASK}" \
    --outdir "${OUT_DIR}" \
    "${IN_DIR}/${SAMPLE}_"*.fastq*

echo "Completed: $(date)"
Important

The line SAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" "$IN_LIST") is what makes each sub-job unique:

  • Task 1 → line 1 → sample_001
  • Task 2 → line 2 → sample_002
  • Task 3 → line 3 → sample_003

6.3 Running the Pipeline

$ mkdir -p logs
$ ./run_fastqc.sh

Monitor progress:

$ squeue -u $USER                    # all your jobs
$ squeue -j 12345                    # specific array
$ squeue -j 12345 --format="%i %T %r"  # index, state, reason

6.4 Advanced: Non-Sequential and Failed-Job Resubmission

# Run only odd-numbered tasks
$ sbatch --array=1-100:2 fastqc.sh

# Run specific indices
$ sbatch --array=5,12,23 fastqc.sh    # resubmit only failed tasks

# Skip certain indices (submit 1-100 except 7 and 15)
$ sbatch --array=1-6,8-14,16-100 fastqc.sh

6.5 Identifying Failed Sub-Jobs

# Check exit codes for all sub-jobs
$ sacct -j 12345 --format=JobID,State,ExitCode

# List only failed sub-jobs
$ sacct -j 12345 --format=JobID,State --noheader \
  | awk '$2 != "COMPLETED" {print $1}'