Instructor Notes

This is a placeholder file. Please add content here.

The Why: Analysis Reproducibility


Introduction to Snakemake


Instructor Note

If snakemake was not installed using pixi but with a conda environment or pip, you should remove the pixi run part and just run:

BASH

snakemake --cores 1 counts.txt

This will be the case for most users following this tutorial outside of the pixi environment.



Chaining Rules (The DAG)


Scaling with Wildcards


Instructor Note

  • Parallelism: This is the best moment to explain why the --cores flag matters. In HEP, we are used to sending 100 jobs to Condor. Here, we show they can run 4 (or 8, or 16) jobs in parallel locally on their laptop with zero extra effort.
  • The “Pattern Matching” Warning: Students often try to put wildcards in the input that aren’t in the output. I would emphasize that Snakemake works backwards: it sees a file it wants (the output) and then tries to figure out what the input should be.


Visualizing the Workflow


Containerized Execution


My Opinions on this Episode

  • The “LPC/LXPLUS” connection: This is where you should mention that on most HEP clusters, singularity or apptainer is already installed. This makes their local tutorial 100% transferable to the big machines.
  • Binding directories: Students often ask how the container sees their files. It’s worth a small note that Snakemake automatically “binds” the project directory so the container sees the code and data.


Bonus: The CMSDAS Challenge