5 Best Practices for Job Scripts

The previous chapter showed the mechanics of writing a job script. This chapter is about writing one well — so that it runs reliably the first time, fails loudly when something is wrong, and remains useful six months from now when you (or a collaborator) need to rerun it on new data.

Most of the practices here will feel like extra work the first time you apply them. They pay for themselves the first time a job dies three hours into a run because of a typo in a path, or the first time you can’t remember which version of a script produced a result you need to reproduce.

5.1 Separate Your Config from Your Job Script

The single most useful habit you can build is splitting a job into two files: a config file that holds all paths and parameters, and a job script that holds the SLURM directives and execution logic. The job script sources the config at runtime.

This separation pays off quickly:

When you run the same analysis on a new dataset, you copy and edit the config — the job script stays untouched.
When a collaborator runs your code on a different cluster, they update one file rather than hunting for hardcoded paths scattered through your script.
When something goes wrong six months later, the config file is a snapshot of exactly what inputs and settings produced a given result.

The rule of thumb: if a value would change between runs or between machines, it belongs in the config.

5.1.1 The Config File

The config file is a plain shell script that defines variables. It has no SLURM directives and no executable logic — only assignments. In this example, the path to the fastqc software container is stored in FASTQC_SIF, and the input/output data directories are stored in their own variables.

# fastqc_config.sh
# -------------------------------------------------------
# Edit these values before each run.
# -------------------------------------------------------

# --- Container ---
FASTQC_SIF="/rs1/shares/brc/admin/containers/images/quay.io_biocontainers_fastqc:0.12.1--hdfd78af_0.sif"

# --- Data paths ---
IN_DIR="/rs1/shares/brc/trainings/hazel_hpc/data"
OUT_DIR="fastqc_results"
SAMPLE="sample_001"

5.1.2 The Job Script

The job script starts with SLURM resource directives, loads modules, sources the config, then runs the analysis. Paths and parameters come entirely from the config — nothing is hardcoded here. The line source fastqc_config.sh is what makes the config variables available inside the job.

#!/bin/bash
# -------------------------------------------------------
# FastQC job script — pairs with fastqc_config.sh
# -------------------------------------------------------

# --- Resources ---
#SBATCH --job-name=fastqc
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --partition=shared
#SBATCH --output=logs/fastqc.%j.out
#SBATCH --error=logs/fastqc.%j.err
#SBATCH --time=1:00:00

set -euo pipefail

# --- Load config ---
source fastqc_config.sh

# --- Environment ---
module load apptainer

# --- Validate inputs ---
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found: ${FASTQC_SIF}"; exit 1; }
[[ -d "${IN_DIR}"     ]] || { echo "ERROR: input dir not found: ${IN_DIR}";     exit 1; }
[[ -n "${SAMPLE:-}"   ]] || { echo "ERROR: SAMPLE not set in config";           exit 1; }

mkdir -p "${OUT_DIR}" logs

# --- Provenance ---
echo "Job ${SLURM_JOB_ID} started: $(date) on $(hostname)"
echo "Config: $(realpath fastqc_config.sh)"
echo "IN_DIR=${IN_DIR}  OUT_DIR=${OUT_DIR}  SAMPLE=${SAMPLE}"

# --- Execute ---
apptainer exec "${FASTQC_SIF}" fastqc \
    --threads "${SLURM_CPUS_PER_TASK}" \
    --outdir "${OUT_DIR}" \
    "${IN_DIR}/${SAMPLE}_"*.fastq*

echo "Job completed: $(date) — results in ${OUT_DIR}"

Submit it with:

$ sbatch fastqc_job.sh

The rest of this chapter unpacks the practices baked into that script.

5.2 Fail Fast with Strict Bash Mode

Add this line to the top of every job script, right after the SBATCH directives:

set -euo pipefail

It changes the default bash behavior in three important ways:

set -e — exit immediately if any command returns a non-zero exit status.
set -u — treat references to unset variables as an error (catches typos in variable names).
set -o pipefail — if any command in a pipeline fails, the whole pipeline fails.

By default, bash plows on after errors. A typo in a path or a failed wget will keep the script running, often producing empty output files or — worse — silently using stale data from a previous run. Strict mode turns those silent failures into loud, immediate ones.

Tip

For temporary debugging, add set -x to also print every command before it runs. Useful for tracing exactly where a script went off the rails.

5.3 Validate Inputs Before Doing Real Work

A SLURM job might wait hours in the queue before it runs. The worst time to discover that an input file is missing is three hours into a four-hour run. Check everything you depend on at the very start of the job script:

[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found: ${FASTQC_SIF}"; exit 1; }
[[ -d "${IN_DIR}"     ]] || { echo "ERROR: input dir not found: ${IN_DIR}";     exit 1; }
[[ -n "${SAMPLE:-}"   ]] || { echo "ERROR: SAMPLE not set in config";           exit 1; }

While the above code that is in the job script may look confusing, here is exactly what is going on step-by-step:

`[[ -f "${FASTQC_SIF}" ]] – This is an if statement. The double brackets will create a boolean output of the code inside. -f "${FASTQ_SIF}" will return true if the .sif file that was provided in the config file exists.
However, if the file DOES NOT exist, the code on the right side of the || operator will run, printing an error message and exiting the script.

The next lines are similar checks: the -d flag checks to see if the directory exists, and the -n flag checks to see if the $SAMPLE variable is non-empty. Each check fails the job in seconds with a message that points at exactly what went wrong, instead of letting the underlying tool produce a confusing error mid-run.

5.4 Use Absolute Paths

Inside a job script, always use absolute paths for inputs, outputs, and software. Relative paths look harmless but break in ways that are hard to debug:

sbatch runs the job from the directory where you submitted it, but if a coworker submits the same script from a different directory the relative paths resolve differently.
Job arrays and srun subprocesses don’t always inherit the working directory you expect.
Six months later you may not remember which directory you submitted from.

Absolute paths (/rs1/researchers/s/smith/data/...) work the same regardless of where you submit from.

5.5 Use SLURM Environment Variables Inside Your Script

When you run a tool that takes a thread or memory argument, use the SLURM environment variable rather than hardcoding the number a second time:

# Good — directives and the tool can never disagree
fastqc --threads "${SLURM_CPUS_PER_TASK}" ...

# Bad — change one and forget the other, and you waste cores or oversubscribe
fastqc --threads 2 ...

The most useful variables inside a job:

Variable	Use it for
`$SLURM_CPUS_PER_TASK`	`--threads`, `-t`, `-p` flags on multi-threaded tools
`$SLURM_JOB_ID`	Naming output files or scratch directories uniquely
`$SLURM_SUBMIT_DIR`	Returning to the submit directory if the job changes dirs
`$SLURM_ARRAY_TASK_ID`	Picking which sample to process in a job array (see Job Arrays chapter)

If you ever need to bump --cpus-per-task from 2 to 16, you change it in one place and the tool picks up the new value automatically.

5.6 Log What Produced This Result

When you come back to a result months later — or hand it to a collaborator — the first question is always “what produced this?” Make every job answer that question by printing key context to stdout at the start:

echo "Job ${SLURM_JOB_ID} on $(hostname) at $(date)"
echo "Working dir: $(pwd)"
echo "Submit dir:  ${SLURM_SUBMIT_DIR}"
echo "Config:      $(realpath fastqc_config.sh)"
echo "Git commit:  $(git -C "${SLURM_SUBMIT_DIR}" rev-parse --short HEAD 2>/dev/null || echo 'not a git repo')"
echo "Tool version:"
apptainer exec "${FASTQC_SIF}" fastqc --version

That handful of lines, captured automatically into your --output log, turns a result file into something you can trace back to the exact code, container, and inputs that produced it.

5.7 Test Interactively Before You `sbatch`

A 10-second typo can waste a 4-hour queue wait. Before submitting a new or modified job script, run the same commands by hand in an interactive session on a small subset of the data:

$ srun --pty -n 1 --cpus-per-task=2 --mem=4G --time=0:30:00 bash

# inside the interactive session:
$ source fastqc_config.sh
$ module load apptainer
$ apptainer exec "${FASTQC_SIF}" fastqc --version
$ apptainer exec "${FASTQC_SIF}" fastqc --threads 2 --outdir /tmp/test "${IN_DIR}/${SAMPLE}_"*.fastq* | head

If those work, your sbatch is much more likely to succeed.

5.8 Lay Out Projects Consistently

A predictable directory layout makes scripts portable across projects and makes life easier for anyone (including future you) reading the code:

my_project/
├── README.md           ← what this project does, how to run it
├── configs/            ← *_config.sh files, one per analysis
├── scripts/            ← *_job.sh files
├── logs/               ← --output/--error from SLURM (gitignored)
├── data/               ← inputs (often a symlink to /rs1)
└── results/            ← outputs (often a symlink to /rs1)

A consistent layout means your job scripts can use predictable relative-to-project paths in the config file.

5.9 Quick Checklist

Before every sbatch, scan your script for these:

set -euo pipefail near the top
All paths are absolute
All paths and parameters live in a config file, not hardcoded in the job script
Inputs are validated before the real work runs
Tool thread/memory flags use $SLURM_CPUS_PER_TASK etc., not duplicated numbers
Log directory exists (mkdir -p logs)
Provenance lines (job id, host, date, config path, versions) print at job start
You ran the commands interactively on a small subset and they worked
The latest version of both files is committed to git

5.1 Separate Your Config from Your Job Script

5.1.1 The Config File

5.1.2 The Job Script

5.2 Fail Fast with Strict Bash Mode

5.3 Validate Inputs Before Doing Real Work

5.4 Use Absolute Paths

5.5 Use SLURM Environment Variables Inside Your Script

5.6 Log What Produced This Result

5.7 Test Interactively Before You sbatch

5.8 Lay Out Projects Consistently

5.9 Quick Checklist

5.7 Test Interactively Before You `sbatch`