5 Best Practices for Job Scripts
The previous chapter showed the mechanics of writing a job script. This chapter is about writing one well — so that it runs reliably the first time, fails loudly when something is wrong, and remains useful six months from now when you (or a collaborator) need to rerun it on new data.
Most of the practices here will feel like extra work the first time you apply them. They pay for themselves the first time a job dies three hours into a run because of a typo in a path, or the first time you can’t remember which version of a script produced a result you need to reproduce.
5.1 Separate Your Config from Your Job Script
The single most useful habit you can build is splitting a job into two files: a config file that holds all paths and parameters, and a job script that holds the SLURM directives and execution logic. The job script sources the config at runtime.
This separation pays off quickly:
- When you run the same analysis on a new dataset, you copy and edit the config — the job script stays untouched.
- When a collaborator runs your code on a different cluster, they update one file rather than hunting for hardcoded paths scattered through your script.
- When something goes wrong six months later, the config file is a snapshot of exactly what inputs and settings produced a given result.
The rule of thumb: if a value would change between runs or between machines, it belongs in the config.
5.1.1 The Config File
The config file is a plain shell script that defines variables. It has no SLURM directives and no executable logic — only assignments. In this example, the path to the fastqc software container is stored in FASTQC_SIF, and the input/output data directories are stored in their own variables.
# fastqc_config.sh
# -------------------------------------------------------
# Edit these values before each run.
# -------------------------------------------------------
# --- Container ---
FASTQC_SIF="/rs1/shares/brc/admin/containers/images/quay.io_biocontainers_fastqc:0.12.1--hdfd78af_0.sif"
# --- Data paths ---
IN_DIR="/rs1/shares/brc/trainings/hazel_hpc/data"
OUT_DIR="fastqc_results"
SAMPLE="sample_001"5.1.2 The Job Script
The job script starts with SLURM resource directives, loads modules, sources the config, then runs the analysis. Paths and parameters come entirely from the config — nothing is hardcoded here. The line source fastqc_config.sh is what makes the config variables available inside the job.
#!/bin/bash
# -------------------------------------------------------
# FastQC job script — pairs with fastqc_config.sh
# -------------------------------------------------------
# --- Resources ---
#SBATCH --job-name=fastqc
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem=4G
#SBATCH --partition=shared
#SBATCH --output=logs/fastqc.%j.out
#SBATCH --error=logs/fastqc.%j.err
#SBATCH --time=1:00:00
set -euo pipefail
# --- Load config ---
source fastqc_config.sh
# --- Environment ---
module load apptainer
# --- Validate inputs ---
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found: ${FASTQC_SIF}"; exit 1; }
[[ -d "${IN_DIR}" ]] || { echo "ERROR: input dir not found: ${IN_DIR}"; exit 1; }
[[ -n "${SAMPLE:-}" ]] || { echo "ERROR: SAMPLE not set in config"; exit 1; }
mkdir -p "${OUT_DIR}" logs
# --- Provenance ---
echo "Job ${SLURM_JOB_ID} started: $(date) on $(hostname)"
echo "Config: $(realpath fastqc_config.sh)"
echo "IN_DIR=${IN_DIR} OUT_DIR=${OUT_DIR} SAMPLE=${SAMPLE}"
# --- Execute ---
apptainer exec "${FASTQC_SIF}" fastqc \
--threads "${SLURM_CPUS_PER_TASK}" \
--outdir "${OUT_DIR}" \
"${IN_DIR}/${SAMPLE}_"*.fastq*
echo "Job completed: $(date) — results in ${OUT_DIR}"Submit it with:
$ sbatch fastqc_job.shThe rest of this chapter unpacks the practices baked into that script.
5.2 Fail Fast with Strict Bash Mode
Add this line to the top of every job script, right after the SBATCH directives:
set -euo pipefailIt changes the default bash behavior in three important ways:
set -e— exit immediately if any command returns a non-zero exit status.set -u— treat references to unset variables as an error (catches typos in variable names).set -o pipefail— if any command in a pipeline fails, the whole pipeline fails.
By default, bash plows on after errors. A typo in a path or a failed wget will keep the script running, often producing empty output files or — worse — silently using stale data from a previous run. Strict mode turns those silent failures into loud, immediate ones.
For temporary debugging, add set -x to also print every command before it runs. Useful for tracing exactly where a script went off the rails.
5.3 Validate Inputs Before Doing Real Work
A SLURM job might wait hours in the queue before it runs. The worst time to discover that an input file is missing is three hours into a four-hour run. Check everything you depend on at the very start of the job script:
[[ -f "${FASTQC_SIF}" ]] || { echo "ERROR: container not found: ${FASTQC_SIF}"; exit 1; }
[[ -d "${IN_DIR}" ]] || { echo "ERROR: input dir not found: ${IN_DIR}"; exit 1; }
[[ -n "${SAMPLE:-}" ]] || { echo "ERROR: SAMPLE not set in config"; exit 1; }While the above code that is in the job script may look confusing, here is exactly what is going on step-by-step:
`
[[ -f "${FASTQC_SIF}" ]]– This is anifstatement. The double brackets will create abooleanoutput of the code inside.-f "${FASTQ_SIF}"will return true if the.siffile that was provided in the config file exists.However, if the file DOES NOT exist, the code on the right side of the
||operator will run, printing an error message and exiting the script.
The next lines are similar checks: the -d flag checks to see if the directory exists, and the -n flag checks to see if the $SAMPLE variable is non-empty. Each check fails the job in seconds with a message that points at exactly what went wrong, instead of letting the underlying tool produce a confusing error mid-run.
5.4 Use Absolute Paths
Inside a job script, always use absolute paths for inputs, outputs, and software. Relative paths look harmless but break in ways that are hard to debug:
sbatchruns the job from the directory where you submitted it, but if a coworker submits the same script from a different directory the relative paths resolve differently.- Job arrays and
srunsubprocesses don’t always inherit the working directory you expect. - Six months later you may not remember which directory you submitted from.
Absolute paths (/rs1/researchers/s/smith/data/...) work the same regardless of where you submit from.
5.5 Use SLURM Environment Variables Inside Your Script
When you run a tool that takes a thread or memory argument, use the SLURM environment variable rather than hardcoding the number a second time:
# Good — directives and the tool can never disagree
fastqc --threads "${SLURM_CPUS_PER_TASK}" ...
# Bad — change one and forget the other, and you waste cores or oversubscribe
fastqc --threads 2 ...The most useful variables inside a job:
| Variable | Use it for |
|---|---|
$SLURM_CPUS_PER_TASK |
--threads, -t, -p flags on multi-threaded tools |
$SLURM_JOB_ID |
Naming output files or scratch directories uniquely |
$SLURM_SUBMIT_DIR |
Returning to the submit directory if the job changes dirs |
$SLURM_ARRAY_TASK_ID |
Picking which sample to process in a job array (see Job Arrays chapter) |
If you ever need to bump --cpus-per-task from 2 to 16, you change it in one place and the tool picks up the new value automatically.
5.6 Log What Produced This Result
When you come back to a result months later — or hand it to a collaborator — the first question is always “what produced this?” Make every job answer that question by printing key context to stdout at the start:
echo "Job ${SLURM_JOB_ID} on $(hostname) at $(date)"
echo "Working dir: $(pwd)"
echo "Submit dir: ${SLURM_SUBMIT_DIR}"
echo "Config: $(realpath fastqc_config.sh)"
echo "Git commit: $(git -C "${SLURM_SUBMIT_DIR}" rev-parse --short HEAD 2>/dev/null || echo 'not a git repo')"
echo "Tool version:"
apptainer exec "${FASTQC_SIF}" fastqc --versionThat handful of lines, captured automatically into your --output log, turns a result file into something you can trace back to the exact code, container, and inputs that produced it.
5.7 Test Interactively Before You sbatch
A 10-second typo can waste a 4-hour queue wait. Before submitting a new or modified job script, run the same commands by hand in an interactive session on a small subset of the data:
$ srun --pty -n 1 --cpus-per-task=2 --mem=4G --time=0:30:00 bash
# inside the interactive session:
$ source fastqc_config.sh
$ module load apptainer
$ apptainer exec "${FASTQC_SIF}" fastqc --version
$ apptainer exec "${FASTQC_SIF}" fastqc --threads 2 --outdir /tmp/test "${IN_DIR}/${SAMPLE}_"*.fastq* | headIf those work, your sbatch is much more likely to succeed.
5.8 Lay Out Projects Consistently
A predictable directory layout makes scripts portable across projects and makes life easier for anyone (including future you) reading the code:
my_project/
├── README.md ← what this project does, how to run it
├── configs/ ← *_config.sh files, one per analysis
├── scripts/ ← *_job.sh files
├── logs/ ← --output/--error from SLURM (gitignored)
├── data/ ← inputs (often a symlink to /rs1)
└── results/ ← outputs (often a symlink to /rs1)
A consistent layout means your job scripts can use predictable relative-to-project paths in the config file.
5.9 Quick Checklist
Before every sbatch, scan your script for these: