Summary and Setup
In this tutorial, we will move beyond the limitations of manual Bash scripts and discover how to build reproducible, scalable, and portable analysis workflows using Snakemake. We will start by installing the necessary tools with Pixi, learn the “grammar” of Snakemake rules, and master the art of running jobs inside isolated Containers. Finally, we will put it all together by converting a real CMSDAS analysis into a fully automated pipeline that runs on your laptop just as easily as it does on the grid. By the end of this session, you will have the skills to turn your complex physics ideas into robust, one-command workflows.
Basic Knowledge
This tutorial assumes you have a basic understanding of the following packages and concepts:
- git: Version control system for tracking changes in code.
- containers: Lightweight, portable environments for running applications (e.g., Docker, Singularity).
- conda: Package and environment management system.
- coffea: A Python library for high-energy physics data analysis. (You won’t program anything in coffea during this tutorial, but it’s good to know what it is.)
Setup and Installation
Most of this tutorial can be run on your local machine, but we will also demonstrate how to run a heavy-duty physics workflow on a remote cluster.
We will prepare our local environment using Pixi to manage Snakemake. While the heavy-duty physics code will eventually run inside CMSSW or conda environments, or containers, we need a reliable “Orchestrator” on our laptops to manage the workflow.
Why Pixi instead of Conda?
For years, conda (and mamba) has been the
standard for HEP environment management. However, Pixi
is a modern alternative built on the same foundations (Conda-forge) but
with several key advantages:
- Speed: It is significantly faster at resolving dependencies than standard Conda.
-
Reproducibility: It creates a
pixi.lockfile automatically, ensuring that every student in this tutorial has the exact same version of every package. -
Project-Centric: Pixi keeps dependencies local to
your project folder rather than burying them in a global
/envs/directory. -
Single Tool: It handles environment creation,
package installation, and task execution (like
make) in one binary.
More information about Pixi can be found in the Pixi documentation.
1. Installing Pixi
First, we need to install the pixi binary itself. Open
your terminal and run the command appropriate for your system:
macOS and Linux:
(Note: You may need to restart your terminal or source your .bashrc / .zshrc after installation.)
Verify the installation:
2. Creating the Project and Installing Snakemake
Now, we will initialize a new project directory and install Snakemake. This ensures that our workflow is self-contained.
BASH
# 1. Create a new directory for the tutorial
mkdir snakemake-cms-tutorial
cd snakemake-cms-tutorial
# 2. Initialize a pixi project
pixi init .
## Snakemake is hosted in the conda-forge channel, so we need to add it to our project
pixi project channel add bioconda
# 3. Add Snakemake and Graphviz
# (Graphviz is used to visualize our workflow diagrams)
pixi add snakemake graphviz
3. Verification
To ensure everything is working correctly, we will run a simple command through the pixi environment. Pixi uses the run command to execute software inside the environment it just created.
If you see a version number (e.g., 8.x.x), you are ready
to go!
Check your environment
Look inside your cms-snakemake-workshop folder. Can you find the pixi.toml file? Open it with a text editor and identify where snakemake is listed.
If you prefer to use conda instead of pixi,
you can create a conda environment and install Snakemake there. However,
keep in mind that this tutorial is designed around Pixi’s workflow
management, so you may need to adjust some commands accordingly.