7 Job Performance

7.1 Why Resource Estimation Matters

Too little	Too much
Job fails or times out	Longer queue wait
Must resubmit, wasting time	Cluster resources sit idle
Wastes your fair-share allocation	Reduces your future priority

The goal is to request what you need with a reasonable buffer — not to over-provision.

7.2 Estimating Resources Before Your First Run

7.2.1 Cores

Serial code — request 1 core
Multi-threaded (OpenMP, pthreads) — typically 4–16 cores; check the tool’s documentation
MPI — can span multiple nodes; test scaling before committing large core counts
GPU — request a gpu partition node

Always start small: run on 1–4 cores, measure actual runtime, then decide whether more parallelism helps.

7.2.2 Memory

Method	How
Check documentation	Many tools list minimum/recommended RAM
Look at past jobs	`sacct` shows actual peak memory usage
Run a small test	Use a subset of data, then check with `seff`
Estimate from data	If a tool loads input as a matrix and makes N copies: memory ≈ N × file size

Available node memory configurations on Hazel (GB): 64 / 128 / 192 / 256 / 512* / 1024**

* Limited outside partner queues · ** Rare; expect long waits

Request slightly below the node maximum — the OS also needs RAM. To target a 128 GB node, request --mem=120G.

7.2.3 Time

Run a small test job and extrapolate:

# Run a quick test
$ sbatch --time=0:30:00 -n 1 --cpus-per-task=4 ./test_run.sh

# Check how long it actually took after it finishes
$ seff JOBID

Add a 20–30% buffer over your measured time. For data-dependent runtimes, linear scaling is a safe first assumption (2× data ≈ 2× time), though some algorithms are super-linear.

7.3 Monitoring Running Jobs

$ squeue -u $USER                          # all your jobs and their state
$ squeue -j JOBID                          # status of one job
$ squeue -j JOBID --reason                 # why a pending job is waiting
$ scontrol show job JOBID                  # full details: nodes, resources, time left

7.3.1 Job States

Code	Meaning
`PD`	Pending — waiting for resources
`R`	Running
`CG`	Completing — cleaning up
`CD`	Completed successfully (in `sacct`)
`F`	Failed (in `sacct`)
`TO`	Timed out
`OOM`	Out of memory

7.4 Analyzing a Finished Job

7.4.1 `seff` — Quick Efficiency Report

seff JOBID is the fastest way to see whether your resource requests were appropriate:

Job ID: 948851
Cluster: hazel
Nodes: 1
Cores per node: 4
CPU Utilized: 00:28:43
CPU Efficiency: 85.3% of 00:33:36 core-walltime
Job Wall-clock time: 00:08:24
Memory Utilized: 3.52 GB
Memory Efficiency: 88.0% of 4.00 GB

What to look for:

CPU Efficiency < 50% — you requested more cores than the tool can use; reduce --cpus-per-task
Memory Efficiency < 50% — halve your --mem for future runs
Memory Efficiency > 95% — you were close to OOM; increase --mem

7.4.2 `sacct` — Detailed Accounting

# Basic resource summary for one job
$ sacct -j JOBID \
  --format=JobID,Elapsed,CPUTime,MaxRSS,AveRSS,ReqMem,AllocCPUs

# CSV output for scripting
$ sacct -j JOBID \
  --format=JobID,Partition,Elapsed,CPUTime,MaxRSS,AllocCPUs \
  --delimiter=',' --noheader

# All your jobs from the past week
$ sacct --starttime=$(date -d '7 days ago' +%Y-%m-%d) \
      --format=JobID,JobName,State,Elapsed,MaxRSS,CPUTime

Key sacct fields:

Field	Meaning
`Elapsed`	Wall-clock time (actual runtime)
`CPUTime`	Elapsed × AllocCPUs (total CPU time charged)
`MaxRSS`	Peak RAM used (resident set size)
`AveRSS`	Average RAM across all job steps
`ReqMem`	Memory you requested

7.5 Performance Red Flags

Symptom	Likely Cause	Fix
CPU efficiency < 30%	Tool isn’t using all cores	Reduce `--cpus-per-task`
Memory efficiency < 20%	Way over-requested	Halve `--mem`
Job OOM-killed	Under-requested memory	Double `--mem`, then tune with `seff`
Job timed out	Under-estimated runtime	Run small test first
High I/O wait	Too many tiny files on scratch	Bundle files into archives; use local node scratch `/tmp`

7.6 Exercise: Profile a Job

# 1. Submit a test job
$ sbatch --time=0:15:00 --ntasks=1 --cpus-per-task=4 --mem=8G ./test_program.sh

# 2. Note the job ID from the output, then monitor
$ squeue -u $USER

# 3. After it finishes, get the efficiency report
$ seff JOBID

# 4. Get detailed accounting
$ sacct -j JOBID --format=JobID,Elapsed,MaxRSS,AllocCPUs

Analyze:

Was CPU efficiency above 70%?
Was memory efficiency above 60%?
What would you change for the production run?

7.7 Workflow for Job Optimization

Start small — test with a single sample and conservative resources
Measure — use seff and sacct to see actual usage
Identify the bottleneck — CPU, memory, I/O, or scaling limit
Scale — increase resources or sample count incrementally
Document — record optimal settings in your config.sh

7.8 Common Job Profiles

CPU-bound (aligners, variant callers): Scale well with cores up to the tool’s thread limit. Find that limit with a core-scaling test (1, 2, 4, 8 cores) and plot runtime vs. cores.

Memory-bound (de novo assembly, large reference loading): RAM is the limiting factor. Cores help less; choose a high-memory node.

I/O-bound (many small file operations): More cores don’t help. Minimize file open/close operations; stage data locally if your cluster provides /tmp on compute nodes; use scratch storage for intermediates.

7.9 Practical Tips

Test with subsets of data — a 10% sample usually exposes errors and gives usable timing data
Share resource findings with your research group — optimal settings for common tools rarely change
Don’t over-optimize — spending 2 hours tuning a job that runs in 20 minutes has diminishing returns
Check output files for built-in timing summaries — many tools (BWA, STAR, GATK) print runtime stats