Summary and Schedule
This lesson guides CMS analysts on creating workflows. Specifically, we will use Snakemake to develop these workflows. They can be executed either locally on your personal or university machine, or within CERN’s REANA.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Let’s talk about workflows |
What are the common challenges faced in CMS analysis? How can workflow orchestration tools help in capturing the intricate steps involved in producing results in CMS analysis? |
Duration: 00h 10m | 2. Snakemake |
How can I automate complex computational pipelines? How can I visualize and understand the structure of my Snakemake workflow? How can I create flexible and scalable Snakemake workflows to handle diverse datasets? |
Duration: 00h 50m | 3. A simple SUSY analysis |
Are there more useful Snakemake flags that one can use? How does Snakemake ensure a consistent and isolated environment for each rule’s execution when using containerized environments? :::::: |
Duration: 01h 10m | 4. Running workflows in REANA |
What is REANA and how does it enhance the reproducibility of scientific
research? How does REANA leverage containerization technology and cloud computing resources to simplify the management of computational environments and data dependencies? :::::: |
Duration: 01h 40m | 5. Expanding the SUSY analysis |
Can we use Snakemake and REANA in more complex examples? :::::: |
Duration: 02h 10m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
This tutorial is your comprehensive guide to workflow automation. You’ll learn:
- The fundamentals of Snakemake and its role in streamlining data analysis pipelines
- How to construct and execute basic Snakemake workflows locally
- The benefits of cloud-based workflow execution with the REANA platform
- How to migrate your local Snakemake workflows to the REANA environment
Important Information about the tutorial
Please note that some steps in this tutorial are intentionally designed to produce errors. These errors will highlight specific features and capabilities of the tools involved. If you encounter any errors, please continue following the tutorial to understand the underlying concepts.
Basic knowledge
This tutorial assumes that the user has basic knowledge on
git
, singularity
or docker
containers, python
and CMSSW
.
REANA setup
REANA is a platform where you can submit your jobs. Think about it more of an Analysis Facility. To create an account you can follow the oficial documentation or just go to https://reana.cern.ch to create your access token. This token is important since you need to use it everytime that you want to submit jobs to REANA. This step can take some minutes, depending on how busy the REANA team are approving their request.
You can submit jobs to REANA from any machine. If you want to use lxplus, you just need to activate the following environment:
After this, you can make a test that your REANA account works by running:
BASH
export REANA_SERVER_URL=https://reana.cern.ch
export REANA_ACCESS_TOKEN=xxxxxxxxxxxxxxxxxxx
reana-client ping
In my opinion, using a container is the easiest way to interact with the REANA cluster from any machine. To do that you can use:
BASH
apptainer run --env REANA_SERVER_URL=https://reana.cern.ch --env REANA_ACCESS_TOKEN=xxxxxxxxxxxxxxxxx --bind ${PWD}:/srv --pwd /srv docker://docker.io/reanahub/reana-client:0.9.3 ping
You can save this in a bash script for convenience, or in your
.bashrc
as:
BASH
reana_client ()
{
local base_command="apptainer run --env REANA_SERVER_URL=https://reana.cern.ch --env REANA_ACCESS_TOKEN=xxxxxxxxxxxxxxxx --bind ${PWD}:/srv --pwd /srv docker://docker.io/reanahub/reana-client:0.9.3";
if [[ $# -eq 0 ]]; then
echo "Usage: reana_client <command> [arguments]";
return 1;
fi;
local command="$1";
shift;
run_reana_cmd="$base_command $command $@";
eval "$run_reana_cmd"
}
and, after reloading your .bashrc
, then simply:
If you are using lxplus
If you are only using lxplus, activating the reana environment is enough to use REANA and Snakemake. If you are not in lxplus, please follow the instructions below to use Snakemake.
Snakemake setup
To install snakemake here is the official documentation on how to install it in your machine. Since we will run some jobs locally in your machine, we can use snakemake as:
Recommended
We can use singularity and the official snakemake container to run our jobs. This is the simplest solution if you dont want/can install the official package in your machine. Other advantage is that one can keep the analysis environment clean this way.
An example in lxplus:
BASH
export APPTAINER_BINDPATH=/afs,/eos,/cvmfs,/cvmfs/grid.cern.ch/etc/grid-security:/etc/grid-security ## this is optional (if needed)
apptainer shell -B ${PWD}:/srv --pwd /srv docker://snakemake/snakemake /bin/bash
One can include these lines in a bash script.
Another simpler solution is to include a bash function in your
.bashrc
or .bash_aliases
. For
.bash_aliases
:
BASH
run_snakemake ()
{
local base_command="export APPTAINER_BINDPATH=/afs,/eos,/cvmfs,/cvmfs/grid.cern.ch/etc/grid-security:/etc/grid-security && apptainer exec -B ${PWD}:/srv --pwd /srv docker://snakemake/snakemake";
if [[ $# -eq 0 ]]; then
echo "Usage: run_snakemake <command> [arguments]";
return 1;
fi;
local command="$1";
shift;
local run_snakemake_cmd="$base_command $command $@";
eval "$run_snakemake_cmd"
}
Reload your .bashrc
and then you can just run it from
anywhere as
run_snakemake snakename --help
To test that everything works, you can run the following command:
snakemake --help
The rest of the tutorial will assume that the bash function exists. So better to emphasize to do that.