Content from Introduction to CERN GitLab CI/CD


Last updated on 2025-12-20 | Edit this page

Overview

Questions

  • What is GitLab CI and why should I use it?
  • How does a GitLab pipeline work in practice?

Objectives

  • Understand the purpose and benefits of using GitLab CI/CD.
  • Be able to create and explain a simple .gitlab-ci.yml file for automating tasks.

Before getting into details, a few links to useful documentation on GitLab CI/CD and also CERN-specific information:

These pages serve as a good entrypoint in case of problems and questions.

Callout

If you haven’t follow the setup instructions to add your SSH key to your CERN GitLab account, please do so before continuing with this lesson.

Caution

If you are familiar with Gitlab CI/CD, you can skip ahead to the next lesson.

Why Use GitLab CI/CD?


GitLab CI/CD (Continuous Integration/Continuous Deployment) helps you automate tasks like testing, building, or deploying your code every time you make changes. This ensures your code is always working, your results are reliable, and saves you time from catching errors and doing repetitive tasks manually.

GitLab CI/CD is especially useful in collaborative projects, where multiple people contribute code. It helps maintain code quality and consistency across the team.

How Does It Work?

The set of steps and instructions (pipeline) is defined in a file called .gitlab-ci.yml in your repository. This file specifies what tasks to run, when to run them, and how to run them. When you push code, GitLab reads this file and runs the jobs as described.

What is a GitLab Pipeline?

A pipeline is a sequence of jobs that run automatically when you push changes to your repository. Each job performs a specific task, such as checking your code or building your project.

Key concepts:

  • Job: A single task (e.g., run analyzer, create container, check cutflow).
  • Stage: A group of jobs that run in order (e.g., test, build, deploy).
  • Pipeline: The full set of stages and jobs, triggered by changes to your code.

Example 1: The Simplest GitLab CI Pipeline


Let’s start with the most basic pipeline. This example just prints a message to show that the pipeline is working.

Create a file called .gitlab-ci.yml in your project folder with the following content:

YAML

# .gitlab-ci.yml
stages:
  - test

test_job:
  stage: test
  script:
    - echo "Hello from GitLab CI!"

Now, push this file to your GitLab repository.

  1. Create the .gitlab-ci.yml file as above.

  2. Add, commit, and push it to your repository:

    BASH

    # inside your local git repository ~/cmsdas/cmsdas-gitlab-cms/
    git add .gitlab-ci.yml
    git commit -m "Add simple GitLab CI example"
    git push
  3. Go to your project’s Build > Pipelines page on GitLab to see it run!

Challenge

What happens if you follow these instructions?

If you go to the Gitlab website, and to your project’s Build > Pipelines page, you will see a new pipeline has been created and is running. Once it finishes, you can click on the job to see the logs, which will show the message “Hello from GitLab CI!”.

Gitlab CI Pipeline Example
Gitlab CI Pipeline Example

Every time you push to GitLab, it will run the test_job and print a message in the pipeline logs.

Make sure to explore the Pipelines page to see how it works!


Example 2: A Two-Step GitLab CI Pipeline with Dependency


Now let’s make it a bit more interesting. In this example, the first step creates a file, and the second step uses that file. This shows how jobs can depend on each other.

  1. Prepare: Create a file called message.txt with some text.
  2. Show: Display the contents of message.txt (created by the previous step).

Update your .gitlab-ci.yml file to the following:

YAML

# .gitlab-ci.yml
stages:
  - prepare
  - show

prepare_job:
  stage: prepare
  script:
    - echo "Hello from the pipeline!" > message.txt
  artifacts:
    paths:
      - message.txt

show_job:
  stage: show
  script:
    - echo "The message is:"
    - cat message.txt

Try again the previous steps: 1. Update the .gitlab-ci.yml file as above. 2. Add, commit, and push it to your repository: bash git add .gitlab-ci.yml git commit -m "Add two-step pipeline with dependency" git push 3. Go to your project’s CI/CD > Pipelines page on GitLab to watch the jobs run in order!

Challenge

What happens if you follow these instructions?

  • When you push this file to GitLab, it will first run prepare_job (creates message.txt).
  • The file message.txt is saved as an artifact and passed to the next job.
  • Then, show_job runs and prints the contents of message.txt.

Understanding the .gitlab-ci.yml File


The .gitlab-ci.yml file defines your pipeline. Here are the main keywords and their purpose:

  • stages:
    Lists the steps of your pipeline, in order. Each job is assigned to a stage.

    YAML

    stages:
      - test
      - build
      - deploy
  • job:
    Each job is a set of instructions to run. The job name is user-defined (e.g., test_job, build_job).

    YAML

    test_job:
      stage: test
      script:
        - echo "Running tests"
  • stage:
    Specifies which stage the job belongs to.

  • script:
    The commands to execute for the job. You can list one or more shell commands.

  • artifacts:
    Files or directories to pass from one job to another (between stages).

    YAML

    artifacts:
      paths:
        - result.txt

Basic Syntax Rules:

  • Indentation matters: use spaces, not tabs.
  • The file must start with a list of stages.
  • Each job must specify a stage and a script.

For more details, see the GitLab CI/CD documentation.

Key Points
  • GitLab CI/CD automates repetitive tasks and helps ensure code quality.
  • Pipelines are defined in .gitlab-ci.yml and consist of stages and jobs that run automatically on each push.

Content from A simple CMSSW example


Last updated on 2025-12-20 | Edit this page

Overview

Questions

  • How can I compile my CMSSW analysis code?
  • What is the correct way to add and organize my analysis code in a CMSSW work area?
  • How can I verify that my code changes produce the expected results?

Objectives

  • Successfully set up and compile example analysis code in a CMSSW environment.
  • Understand how to organize and add your analysis code to the CMSSW work area.
  • Learn how to test and compare analysis results after making code or selection changes.
Callout

If you are familiar with analysis development in CMSSW, you can skip ahead to the next lesson.

A Simple Example with CMSSW


Let’s walk through a basic example for CMSSW-based analyses. We’ll use a simple analysis code that selects pairs of electrons and muons, compile it in a CMSSW environment, and run it on a small dataset. This workflow is typical for HEP analysis at CERN. For you to understand the workflow, we will first try to run the analysis code on your “local” machine (cmslpc, lxplus, university machine).

Checklist

What You’ll Do:

Step 1: Set Up Your CMSSW Environment

For this local test, create a new CMSSW work area outside your GitLab repository:

BASH

# Go to a folder outside your gitlab repository
cmsrel CMSSW_15_1_0
cd CMSSW_15_1_0/src
cmsenv

This sets up a fresh CMSSW work area.

If your analysis depends on other CMSSW packages, add them using:

BASH

git cms-addpkg PhysicsTools/PatExamples

This ensures all necessary packages are available.

If you see an error about missing git configuration, set your name and email:

BASH

Cannot find your details in the git configuration.
Please set up your full name via:
    git config --global user.name '<your name> <your last name>'
Please set up your email via:
    git config --global user.email '<your e-mail>'
Please set up your GitHub user name via:
    git config --global user.github <your github username>

There are a couple of options to make things work:

  • set the config as described above,
  • alternatively, create a .gitconfig in your repository and use it as described here,
  • run git cms-init --upstream-only before git cms-addpkg to disable setting up a user remote.
Callout

Always add CMSSW packages before compiling or adding analysis code

Adding CMSSW packages has to happen before adding or compiling analysis code in the repository, since git cms-addpkg will call git cms-init for the $CMSSW_BASE/src directory, and git init doesn’t work if the directory already contains files.


Step 2: Add Analysis Code to Your CMSSW Work Area

Your analysis code must be placed inside the work area’s src directory, following the standard CMSSW structure (e.g., AnalysisCode/MyAnalysis). Usually, your Git repository contains only your analysis code, not the full CMSSW work area.

In this example, we’ll use a sample analysis that selects pairs of electrons and muons. The code is provided as a zip file, containing a ZPeakAnalysis directory with plugins (C++ code) and test (Python config).

Download and extract the code:

BASH

# Go to your project folder from lesson 1
cd ~/nobackup/cmsdas/cmsdas-gitlab-cms/   # Use ~/nobackup for cmslpc users
wget https://alefisico.github.io/cmsdas-cat-gitlab-cms/files/ZPeakAnalysis.zip
unzip ZPeakAnalysis.zip

Copy the analysis code into your CMSSW work area:

BASH

cd ${CMSSW_BASE}/src/
mkdir -p AnalysisCode
cp -r ~/cmsdas/cmsdas-gitlab-cms/ZPeakAnalysis AnalysisCode/

Now your code is in the right place for CMSSW to find and compile it.


Step 3: Compile the Code

Compile the analysis code:

BASH

cd ${CMSSW_BASE}/src/
scram b -j 4

The -j 4 flag uses 4 CPU cores for faster compilation.


Step 4: Run the Analysis Code Locally

Now that the code is compiled, run it on a small dataset to test it:

BASH

cmsenv   # Good practice to ensure the environment is set
cd ${CMSSW_BASE}/src/AnalysisCode/ZPeakAnalysis/
cmsRun test/MyZPeak_cfg.py

OUTPUT

19-Dec-2025 15:46:09 CST  Initiating request to open file root://cmseos.fnal.gov//store/user/cmsdas/2026/short_exercises/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root
19-Dec-2025 15:46:15 CST  Successfully opened file root://cmseos.fnal.gov//store/user/cmsdas/2026/short_exercises/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root
Begin processing the 1st record. Run 1, Event 15930998, LumiSection 5897 on stream 0 at 19-Dec-2025 15:46:15.905 CST
Begin processing the 1001st record. Run 1, Event 15933554, LumiSection 5897 on stream 0 at 19-Dec-2025 15:46:32.266 CST
Begin processing the 2001st record. Run 1, Event 15936188, LumiSection 5898 on stream 0 at 19-Dec-2025 15:46:32.534 CST
Begin processing the 3001st record. Run 1, Event 130736047, LumiSection 48385 on stream 0 at 19-Dec-2025 15:46:32.809 CST
....
Begin processing the 54001st record. Run 1, Event 1743339, LumiSection 646 on stream 0 at 19-Dec-2025 15:46:48.177 CST
19-Dec-2025 15:46:48 CST  Closed file root://cmseos.fnal.gov//store/user/cmsdas/2026/short_exercises/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root

If everything went well, you should see an output file called myZPeak.root. You can open it with ROOT to check the histograms.


Step 5: Check the Number of Events

To check the number of events in your histograms, run:

BASH

python3 test/check_number_events.py

This creates a text file called number_of_events.txt with output like:

OUTPUT

muonMult: 54856.0
eleMult:  54856.0
mumuMass: 16324.0

Step 6: Modify the Selection and Compare Results

Now, let’s modify the selection criteria and compare the results:

  1. Rerun the analysis with a modified selection:

    BASH

    cmsRun test/MyZPeak_cfg.py minPt=40

    This changes the minimum muon transverse momentum to 40 GeV.

  2. Check the number of events again:

    BASH

    cp number_of_events.txt number_of_events_old.txt
    python3 test/check_number_events.py
  3. Compare the two results:

    BASH

    python3 test/check_cutflows.py number_of_events.txt test/number_of_expected_events.txt

    You should see a difference in the number of selected events due to the modified selection. For reference, expected results are provided in test/number_of_expected_events.txt.


Testimonial

Why Compare results?

Creating a test to compare results is vital in analysis development!

As you develop your analysis, you’ll often modify selection criteria and code. This can lead to unintended changes in the number of selected events. By creating tests to compare results after each modification, you ensure your changes have the intended effect and do not introduce errors. This practice is crucial for maintaining the integrity of your analysis as it evolves.

Key Points
  • CMSSW analysis code must be placed inside the src directory of your CMSSW work area to be compiled.
  • Always add required CMSSW packages before copying or compiling your analysis code.
  • Running and testing your analysis locally helps catch issues before automating with CI.
  • Comparing results after code or selection changes is essential for reliable and reproducible analyses.

Content from Setting up CMSSW in GitLab CI/CD


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How do I run CMSSW in GitLab CI instead of on LXPLUS or cmslpc?
  • Which runner tags provide CVMFS access, and how do I select them?
  • What minimal steps must go into .gitlab-ci.yml to set up CMSSW?

Objectives

  • Select and use a CVMFS-enabled runner (e.g., k8s-cvmfs) and verify /cvmfs is mounted.
  • Write a minimal .gitlab-ci.yml that sets CMSSW (cmsset_default.sh, SCRAM_ARCH, cmsrel, cmsenv) and validates with cmsRun --help.
  • Understand CI isolation and ensure all environment setup is captured in the repository.
Prerequisite

If you skipped the previous lessons, ensure you have:

  • A GitLab repository at CERN GitLab (named cmsdas-gitlab-cms).
  • The repository cloned to your local machine with an empty .gitlab-ci.yml file.
  • The example analysis code downloaded and added to your repository.
Discussion

What Do We Want to Achieve?

In the previous lesson, you ran a simple analysis workflow locally on your machine (cmslpc, lxplus, or university machine) with three steps:

  1. Compile the analysis code in a CMSSW environment.
  2. Run the analysis code on a small dataset.
  3. Check the number of events in the output.

These are steps you’ll repeat often during analysis development. Therefore, it’s a best practice to automate them using GitLab CI/CD, ensuring consistency and catching errors early.

Setting Up CMSSW in GitLab CI/CD


Understanding the CI Environment

When you use GitLab CI, your code is not executed on your own computer, but on a separate machine (called a runner) provided by CERN GitLab. Every time you push changes to your repository, the runner downloads your code and executes the pipeline steps in a clean environment.

Because of this isolation, you must ensure that all necessary setup—installing dependencies, configuring the environment, and preparing required files—is included in your CI configuration. This approach guarantees reproducibility, but also means you cannot rely on files or settings from your local machine: everything the pipeline needs must be version-controlled in your repository.

Choosing the Correct GitLab Runner

To run a GitLab CI pipeline, you need a .gitlab-ci.yml file in your repository root. The simple pipeline from the previous lesson had minimal software requirements, but running CMSSW analysis requires special setup.

To use CMSSW in GitLab CI, you need access to CVMFS (CERN Virtual Machine File System), a network file system optimized for delivering software in high-energy physics. CVMFS provides access to CMSSW, grid proxies, and other tools from /cvmfs/.

Standard GitLab CI runners at CERN do not mount CVMFS by default. To get a runner with CVMFS access, add a tags section to your .gitlab-ci.yml:

YAML

tags:
  - k8s-cvmfs

Here’s a minimal example to verify CVMFS is mounted:

YAML

cvmfs_test:
  tags:
    - k8s-cvmfs
  script:
    - ls /cvmfs/cms.cern.ch/

The job cvmfs_test simply lists the contents of /cvmfs/cms.cern.ch/. If CVMFS is not mounted, this will fail.

BASH

git add .gitlab-ci.yml
git commit -m "added a CI"
git push

Then go to your GitLab project page and navigate to Build > Pipelines to watch the job run.

Discussion

Challenge: Exploring Different Runners

Try these variations and observe what happens:

  1. Run the job without the tags section. What happens?
  2. Run the job with the tag cvmfs instead of k8s-cvmfs. What happens?

Discuss the differences. (Hint: Check the CERN GitLab CI runners documentation for details.)

After pushing, check your pipeline on the GitLab project page. You’ll see the job logs and the k8s-cvmfs label indicating the runner type.

Setting Up CMSSW in Your Pipeline

Now you’ll learn how to configure a GitLab CI job that properly sets up and uses CMSSW. This example is applicable to any CI job requiring CVMFS access and CMS-specific tools.

Callout

Important:

The default user in the runner container is not your username, and the container has no CMS-related environment setup. Unlike LXPLUS (where CMS members get automatic environment configuration via the zh group), everything must be set up manually in CI.

Steps to Set Up CMSSW

In a typical workflow on LXPLUS, you would run:

BASH

cmssw-el7
source /cvmfs/cms.cern.ch/cmsset_default.sh
cmsrel CMSSW_10_6_30
cd CMSSW_10_6_30/src
cmsenv

Let’s break down what each command does:

  • cmssw-el7: Starts a CentOS 7 container. (CMSSW_10_6_30 is old and doesn’t have builds for newer LXPLUS systems.)
  • source /cvmfs/cms.cern.ch/cmsset_default.sh: Sets up environment variables (adds /cvmfs/cms.cern.ch/common to ${PATH}) and defines helper functions like cmsrel and cmsenv.
  • cmsrel CMSSW_10_6_30: Creates a new CMSSW release. (We are using an old version of CMSSW on purpose for this exercise.)
  • cmsenv: Sets up the runtime environment for the release.

At cmslpc, jobs typically run in your nobackup area (/uscms_data/d3/USERNAME). You may need to bind your directories when starting a container:

BASH

cmssw-el7 -p --bind /uscms/homes/${USER:0:1}/${USER}$ --bind /uscms_data/d3/${USER}$ --bind /uscms_data --bind /cvmfs -- /bin/bash -l
Discussion

What are the actual commands behind cmsenv and cmsrel?

The most important aliases are in the table below:

Alias Command
cmsenv eval `scramv1 runtime -sh`
cmsrel scramv1 project CMSSW

The meaning of eval: The args are read and concatenated together into a single command. This command is then read and executed by the shell, and its exit status is returned as the value of eval. If there are no args, or only null arguments, eval returns 0.

Writing the .gitlab-ci.yml File

Now, translate the manual setup into a .gitlab-ci.yml configuration:

YAML

cmssw_setup:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=slc7_amd64_gcc700
    - cmsrel CMSSW_10_6_30
    - cd CMSSW_10_6_30/src
    - cmsenv
    - cmsRun --help

Key points:

  • image: Specifies a CentOS 7 container, equivalent to running cmssw-el7 locally.
  • tags: Ensures the job runs on a runner with CVMFS access.
  • variables: Defines CMS_PATH, mirroring the LXPLUS environment.
  • set +u and set -u: Temporarily allows unset variables (some setup scripts reference optional variables). This is a defensive coding practice.
  • cmsRun --help: A simple test to verify CMSSW is properly configured.
Discussion

Exercise: Verify CMSSW Setup

Update your .gitlab-ci.yml with the configuration above and push it to GitLab. Check the pipeline logs to confirm:

  1. The job runs on a k8s-cvmfs runner.
  2. The cmsRun --help command executes successfully.

What do the logs tell you about the CMSSW installation?

Callout

A common pitfall when setting up CMSSW in GitLab is that execution fails because a setup script doesn’t follow best practices for shell scripts (for example, returning non-zero exit codes even when setup is OK, or using unset variables). Even if the script exits without a visible error message, there could still be an issue. To avoid false negatives, temporarily disable strict checks (set +u) before running the setup command and re-enable them afterwards (set -u).

Key Points
  • GitLab CI runs on remote runners; capture all environment setup in .gitlab-ci.yml and do not rely on local machine files.
  • Use a CVMFS-enabled Kubernetes runner tag (e.g., k8s-cvmfs); the image: keyword is only honored on Kubernetes runners; verify /cvmfs/cms.cern.ch is accessible.
  • Set up CMSSW with cmsset_default.sh, SCRAM_ARCH, cmsrel, and cmsenv, then validate with cmsRun --help.
  • Temporarily relax shell strict mode (set +uset -u) around setup to avoid failures from unset variables or non-standard exit codes; inspect job logs to diagnose issues.

Content from Running a CMSSW analysis in GitLab CI/CD


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How do I run my CMSSW analysis with GitLab CI instead of locally?
  • How can I avoid re-downloading and rebuilding CMSSW on every CI job?
  • How do artifacts help share build outputs between pipeline stages?

Objectives

  • Run a CMSSW analysis job in GitLab CI using a CVMFS-enabled runner.
  • Configure .gitlab-ci.yml to reuse a built CMSSW release via artifacts instead of rebuilding each time, and handle write-protection in downstream jobs.
  • Validate the analysis output (ROOT file and event counts) within the CI pipeline.

After successfully setting up CMSSW in GitLab CI/CD in the previous lesson, you are now ready to run the next steps of the analysis workflow from Episode 2: run the analysis on one file located in EOS via xrootd, apply some selections and check the number of events in the output.

Running a CMSSW Analysis in GitLab CI/CD


A Simple Job

First, ensure the analysis code is committed to your GitLab repository:

BASH

# Inside your gitlab repository cmsdas-gitlab-cms/
# Download and unzip the analysis code (https://alefisico.github.io/cmsdas-cat-gitlab-cms/files/ZPeakAnalysis.zip)
git add ZPeakAnalysis/
git commit -m "Add analysis code"
git push

Then extend your .gitlab-ci.yml to set up CMSSW and run the analysis with cmsRun:

YAML

cmssw_setup:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
  script:
    - echo "Setting up CMSSW environment"
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=slc7_amd64_gcc700
    - cmsrel CMSSW_10_6_30
    - cd CMSSW_10_6_30/src
    - cmsenv
    - echo "Copying analysis code"
    - mkdir -p AnalysisCode
    - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd AnalysisCode/ZPeakAnalysis/
    - echo "Running analysis code"
    - cmsRun test/MyZPeak_cfg.py
    - ls -l myZPeak.root
    - echo "Checking number of events"
    - python3 test/check_number_events.py
    - echo "Testing output"
    - python3 test/check_cutflows.py number_of_events.txt test/number_of_expected_events.txt
Discussion

Do you see any potential issues with this job definition?

The job above will work, but it’s inefficient:

  • CMSSW setup (release download, build) runs every time.
  • If you want multiple datasets, you’d duplicate the job and repeat the setup.

Using Artifacts for Efficiency

Use artifacts so you set up and build CMSSW once, then reuse it:

YAML

stages:
  - compile
  - run
  - check

cmssw_compile:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  stage: compile
  tags:
    - k8s-cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    CMSSW_RELEASE: CMSSW_10_6_30
    SCRAM_ARCH: slc7_amd64_gcc700
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - cmsrel ${CMSSW_RELEASE}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p AnalysisCode
    - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd ${CMSSW_RELEASE}/src
    - scram b -j 4
  artifacts:
    untracked: true
    expire_in: 1 hour
    paths:
      - ${CMSSW_RELEASE}

cmssw_run:
  needs:
    - cmssw_compile
  stage: run
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    CMSSW_RELEASE: CMSSW_10_6_30
    SCRAM_ARCH: slc7_amd64_gcc700
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - mkdir -p run
    - cp -r ${CMSSW_RELEASE} run/
    - chmod -R +w run/${CMSSW_RELEASE}/
    - cd run/${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p AnalysisCode
    - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd AnalysisCode/ZPeakAnalysis/
    - cmsRun test/MyZPeak_cfg.py
    - ls -l myZPeak.root
  artifacts:
    untracked: true
    expire_in: 1 hour
    paths:
      - ${CMSSW_RELEASE}/src/AnalysisCode/ZPeakAnalysis/myZPeak.root

check_events:
  needs:
    - cmssw_compile
    - cmssw_run
  stage: check
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  variables:
    CMS_PATH: /cvmfs/cms.cern.ch
    CMSSW_RELEASE: CMSSW_10_6_30
    SCRAM_ARCH: slc7_amd64_gcc700
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - mkdir -p run
    - cp -r ${CMSSW_RELEASE} run/
    - chmod -R +w run/${CMSSW_RELEASE}/
    - cd run/${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p AnalysisCode
    - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd AnalysisCode/ZPeakAnalysis/
    - python3 test/check_number_events.py --input ${CMSSW_RELEASE}/src/AnalysisCode/ZPeakAnalysis/myZPeak.root
    - python3 test/check_cutflows.py number_of_events.txt test/number_of_expected_events.txt

For the compiled code to be available in subsequent steps, the artifacts must be explicitly defined. In the cmssw_compile job, we specify:

YAML

artifacts:
  untracked: true
  expire_in: 1 hour
  paths:
    - ${CMSSW_RELEASE}

Key options:

  • untracked: true: Includes files not tracked by git (ignores .gitignore), ensuring the full CMSSW build area is captured.
  • expire_in: Specifies how long artifacts are kept before automatic deletion. Use 1 hour for testing or 1 week for longer workflows.
  • paths: Lists directories/files to preserve. Here, ${CMSSW_RELEASE} captures the entire CMSSW work area.
Callout

Artifacts Are Write-Protected

Artifacts are write-protected by default. You cannot modify files directly in the artifact directory in downstream jobs.

To work around this, copy the artifact to a new directory and add write permissions:

YAML

script:
  - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
  - export SCRAM_ARCH=${SCRAM_ARCH}
  - mkdir -p run
  - cp -r ${CMSSW_RELEASE} run/
  - chmod -R +w run/${CMSSW_RELEASE}/
  - cd run/${CMSSW_RELEASE}/src
  - cmsenv
  - mkdir -p AnalysisCode
  - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
  - cd AnalysisCode/ZPeakAnalysis/
  - cmsRun test/MyZPeak_cfg.py

This ensures cmssw_run and check_events can reuse the built CMSSW area without rebuilding, while still being able to write output files.

Using needs vs dependencies

In the pipeline above, we use the needs keyword to define job dependencies:

YAML

cmssw_run:
  needs:
    - cmssw_compile
  stage: run
  # ...

Differences between needs and dependencies:

Keyword Behavior
needs Job starts immediately when the needed job completes, regardless of stage. Enables parallelism.
dependencies Job starts only when all jobs in the prior stage complete. Strictly follows stage order.

Best practice: Use needs for faster pipelines—it allows jobs to start as soon as their dependencies are met, rather than waiting for an entire stage to finish.

Discussion

How does this updated pipeline improve efficiency?

  • cmssw_compile sets up and builds CMSSW once and stores it as an artifact.
  • cmssw_run reuses that artifact and skips re-building.
  • Global variables can avoid repetition:

YAML

...
variables:
  CMS_PATH: /cvmfs/cms.cern.ch
  CMSSW_RELEASE: CMSSW_10_6_30
  SCRAM_ARCH: slc7_amd64_gcc700

cmssw_compile:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  stage: compile
  tags:
    - k8s-cvmfs
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
...

If yes, you dont need the rest of the lessons. If they are failing, continue with the next lesson.

Key Points
  • GitLab CI runs on remote runners; every dependency and setup step must be in .gitlab-ci.yml.
  • Use CVMFS-enabled Kubernetes runners (k8s-cvmfs); the image: keyword is honored only there.
  • Split the pipeline into stages and pass the built CMSSW area via artifacts to avoid rebuilding in each job.
  • Validate outputs in CI (e.g., myZPeak.root, event counts) to catch issues early; inspect job logs for failures.
  • Artifacts are write-protected by default; downstream jobs must copy them to a writable directory using mkdir, cp -r, and chmod -R +w.
  • Define artifacts with untracked: true to capture build outputs, and set expire_in to control automatic cleanup (e.g., 1 hour for testing, 1 week for production).
  • Use the needs keyword for faster pipelines—jobs start immediately when dependencies complete, rather than waiting for entire stages to finish. dependencies is an alternative that strictly respects stage order.

Content from Securely adding passwords and files to GitLab


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How can I securely store my grid certificate and password in GitLab without exposing them?
  • Why can’t I simply copy my grid certificate files into my GitLab repository?
  • How do I restore and use grid credentials in a CI job?

Objectives

  • Understand why grid credentials must be encoded and stored as protected CI/CD variables.
  • Encode grid certificates and passwords using base64 and add them securely to GitLab.
  • Restore grid certificate files and obtain a VOMS proxy in a CI job to access CMS data.
Prerequisite
  • A valid grid certificate issued by CERN, and installed in ~/.globus/usercert.pem and ~/.globus/userkey.pem.

Unless you are very familiar with CMSSW and CMS data access, the previous pipeline will likely fail when trying to access data on EOS via XRootD. This is because accessing CMS data typically requires a valid grid proxy.

In your local environment (e.g., LXPLUS or cmslpc), you obtain a grid proxy using the voms-proxy-init command, which relies on your grid certificate files (usercert.pem and userkey.pem) stored in ~/.globus/. However, GitLab CI jobs run in isolated environments and have no access to your local grid proxy.

Additionally, you must never store grid certificate files or passwords directly in your GitLab repository—this would expose sensitive information. Instead, securely add them as CI/CD variables in GitLab.


Obtaining a Grid Certificate


To access CMS data, you need a grid certificate (also called a Virtual Organization Membership Service (VOMS) proxy). If you don’t have one yet, request it from the CERN Grid CA, or follow these instructions.

Your grid certificate consists of two files:

BASH

cat ~/.globus/usercert.pem

Output (example):

OUTPUT

Bag Attributes
    localKeyID: 95 A0 95 B0 1e AB BD 13 59 D1 D2 BB 35 5A EA 2E CD 47 BA F7
subject=/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=username/CN=123456/CN=Anonymous Nonamious
issuer=/DC=ch/DC=cern/CN=CERN Grid Certification Authority
-----BEGIN CERTIFICATE-----
TH1s1SNT4R34lGr1DC3rt1f1C4t3But1Th4s4l3NgtH0F64CH4r4ct3rSP3rL1N3
...
-----END CERTIFICATE-----

Keep your secrets secret


Caution

Never expose your grid certificate files or passwords!

  • Never store certificates or passwords in version control. Even if you delete them from HEAD, they remain in the commit history.
  • Storing secrets in a shared repository violates grid policy and can lead to access revocation.
  • If you accidentally commit sensitive data, immediately follow GitHub or GitLab removal guides—but treat the data as compromised.

For more details, see GitLab CI/CD variables documentation.

Encoding Certificates with base64


GitLab variables must be plain strings without line breaks. To preserve the structure of your certificate files, encode them using base64.

Example: Encoding a Certificate

BASH

base64 -i ~/.globus/usercert.pem -w 0
base64 -i ~/.globus/userkey.pem -w 0

(On Linux, add -w 0 to disable line wrapping; on macOS, use -D to decode instead of -d.)

Output (example):

OUTPUT

QmFnIEF0dHJpYnV0ZXMKICAgIGxvY2FsS2V5SUQ6IDk1IEEwIDk1IEIwIDFlIEFCIEJEIDEzIDU5IEQxIEQyIEJCIDM1IDVBIEVBIDJFIENEIDQ3IEJBIEY3CnN1YmplY3Q9L0RDPWNoL0RDPWNlcm4vT1U9T3JnYW5pYyBVbml0cy9PVT1Vc2Vycy9DTj11c2VybmFtZS9DTj0xMjM0NTYvQ049QW5vbnltb3VzIE5vbmFtaW91cwppc3N1ZXI9L0RDPWNoL0RDPWNlcm4vQ049Q0VSTiBHcmlkIENlcnRpZmljYXRpb24gQXV0aG9yaXR5Ci0tLS0tQkVHSU4gQ0VSVElGSUNBVEUtLS0tLQpUSDFzMVNOVDRSMzRsR3IxREMzcnQxZjFDNHQzQnV0MVRoNHM0bDNOZ3RIMEY2NENINHI0Y3QzclNQM3JMMU4zCjFhbVQwMExhMllUMHdSMVQzNDVtMHIzbDFOM1MwZm4wbnMzTlMzUzAxL2xMU3QwcEgzcjNBbmRBRGRzUEFjM1MKLi4uNDUgbW9yZSBsaW5lcyBvZiBsMzN0IGRpYWxlY3QuLi4KKzRuZCtoZUw0UyszOGNINHI0YytlcnNCZWYwckUrSEUxK2VuRHM9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==

To verify encoding works, decode it back:

BASH

base64 -i ~/.globus/usercert.pem -w 0 | base64 -d

This should return your original certificate.

Discussion

Decode a Secret Message

Try decoding this base64-encoded message:

SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=

Use the base64 command with the -d (Linux) or -D (macOS) flag.

BASH

echo "SSB3aWxsIG5ldmVyIHB1dCBteSBzZWNyZXRzIHVuZGVyIHZlcnNpb24gY29udHJvbAo=" | base64 -d

Output:

OUTPUT

I will never put my secrets under version control

Adding Secrets to GitLab


Now encode your grid password and certificates. For your password:

BASH

printf 'mySecr3tP4$$w0rd' | base64 -w 0

(Use single quotes ', not double quotes ". Include -w 0 on Linux.)

Adding Variables in GitLab

Now we know how to encode our grid certificate files, we can add them as CI/CD variables in GitLab. These variables are passed to the job environment when a job is run, and can be used in the job scripts. This way, we can avoid storing sensitive information directly in the repository.

Adding Grid Certificate and Password to GitLab


Now that we know how to encode our grid certificate files, we can add them as CI/CD variables in GitLab. These variables are passed to the job environment when a job runs, and can be used in the job scripts. This way, we avoid storing sensitive information directly in the repository.

There are a couple of important things to keep in mind when adding passwords and certificates as variables to GitLab:

  • Variables should always be set to Protected state. This ensures that they are only available in protected branches, e.g., your master or main branch. This is important when collaborating with others, since anyone with access could simply echo the variables in a merge request if automated tests run on merge requests.
  • As an additional safety measure, set them as Masked as well if possible. This will prevent your password from appearing in job logs. (Note: masking may not work perfectly for certificates due to their length, but should work for your grid password.)

For more details, see the GitLab CI/CD variables documentation. If you upload sensitive data by mistake, immediately follow the guidance on Removing Sensitive data on Github or on Gitlab.

Encoding Your Credentials

For your grid proxy password, encode it using base64 (make sure nobody’s peeking at your screen):

BASH

printf 'mySecr3tP4$$w0rd' | base64 -w 0

Use single quotes ('), not double quotes ("). On Linux, add -w 0 to disable line wrapping (by default, base64 wraps after 76 characters).

For the two certificates, use them directly as input:

BASH

base64 -i ~/.globus/usercert.pem -w 0
base64 -i ~/.globus/userkey.pem -w 0

Copy the full output into GitLab.

Caution

Every Equal Sign Counts!

Make sure to copy the full string including trailing equal signs.

Adding Variables in GitLab UI

Go to Settings > CI/CD > Variables in your GitLab project. Add the following three variables:

Variable Value
GRID_PASSWORD (base64-encoded password)
GRID_USERCERT (base64-encoded usercert.pem)
GRID_USERKEY (base64-encoded userkey.pem)

For each variable:

  • Check Protected (so it’s only available on protected branches like master/main).
  • Check Masked (so values won’t appear in job logs—especially important for the password).

The Settings > CI/CD > Variables section should look like this:

CI/CD Variables section with grid secrets added
CI/CD Variables section with grid secrets added

Protect Your Main Branch

To reduce the risk of leaking passwords and certificates to others, go to Settings > Repository > Protected Branches and protect your master or main branch. This prevents collaborators from pushing directly and accidentally exposing secrets in job logs.

Protecting branches to prevent password leaks
Protecting branches to prevent password leaks

With the Protected option enabled for variables, they are only available to those protected branches (though Maintainers can still push to them).

Using the Grid Proxy in CI


With secrets stored, you can now restore the certificate files and obtain a grid proxy in your CI job:

BASH

mkdir -p ${HOME}/.globus
printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
chmod 400 ${HOME}/.globus/userkey.pem
printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
voms-proxy-info --all

Example GitLab CI Job

The standard CentOS 7 CMSSW image lacks CMS-specific certificates. Use the CMS-specific image instead:

YAML

voms_proxy_test:
  image:
    name: gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms:latest
    entrypoint: [""]
  tags:
    - k8-cvmfs
  script:
    - mkdir -p ${HOME}/.globus
    - printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
    - printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
    - chmod 400 ${HOME}/.globus/userkey.pem
    - printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
    - voms-proxy-info --all
    - voms-proxy-destroy

The entrypoint: [""] override is needed to ensure the job runs your custom script instead of a default entrypoint.

Test Your Setup

Before moving to the next section, commit and push your .gitlab-ci.yml, navigate to CI/CD > Pipelines, and verify:

  1. The job runs on a cvmfs runner.
  2. voms-proxy-init succeeds without errors.
  3. voms-proxy-info displays your proxy details.

If proxy creation fails, check the job logs and verify all three variables are set correctly (especially trailing equal signs in base64 strings).

Callout

Once your grid proxy works, you can extend the pipeline to run your analysis with cmsRun and access CMS data on EOS!

Even with the voms_proxy_test job added, you still need to export your grid proxy as artifact and added in the analysis job to access data on EOS.

Here is an example of how to do this:

YAML

stages:
  - compile
  - run
  - check

variables:
  CMS_PATH: /cvmfs/cms.cern.ch
  CMSSW_RELEASE: CMSSW_10_6_30
  SCRAM_ARCH: slc7_amd64_gcc700

cmssw_compile:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  stage: compile
  tags:
    - k8s-cvmfs
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - cmsrel ${CMSSW_RELEASE}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    - mkdir -p AnalysisCode
    - cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd ${CMSSW_RELEASE}/src
    - scram b -j 4
  artifacts:
    untracked: true
    expire_in: 1 hour
    paths:
      - ${CMSSW_RELEASE}

voms_proxy_test:
  image:
    name: gitlab-registry.cern.ch/cms-cloud/cmssw-docker/cc7-cms:latest
    entrypoint: [""]
  tags:
    - k8-cvmfs
  script:
    - mkdir -p ${HOME}/.globus
    - printf "${GRID_USERCERT}" | base64 -d > ${HOME}/.globus/usercert.pem
    - printf "${GRID_USERKEY}" | base64 -d > ${HOME}/.globus/userkey.pem
    - chmod 400 ${HOME}/.globus/userkey.pem
    - printf "${GRID_PASSWORD}" | base64 -d | voms-proxy-init --voms cms --pwstdin
    - voms-proxy-info --all
  artifacts:
    paths:
      - /tmp/x509up_u$(id -u)

cmssw_run:
  needs:
    - cmssw_compile
    - voms_proxy_test
  stage: run
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    - voms-proxy-info --all
    - cd AnalysisCode/ZPeakAnalysis/
    - cmsRun test/MyZPeak_cfg.py
    - ls -l myZPeak.root
  artifacts:
    untracked: true
    expire_in: 1 hour
    paths:
      - ${CMSSW_RELEASE}/src/AnalysisCode/ZPeakAnalysis/myZPeak.root

check_events:
  needs:
    - cmssw_compile
    - cmssw_run
  stage: check
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - k8s-cvmfs
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - cd ${CMSSW_RELEASE}/src
    - cmsenv
    - cd AnalysisCode/ZPeakAnalysis/
    - python3 test/check_number_events.py --input ${CMSSW_RELEASE}/src/AnalysisCode/ZPeakAnalysis/myZPeak.root
    - python3 test/check_cutflows.py number_of_events.txt test/number_of_expected_events.txt
Key Points
  • Never store grid certificates or passwords in version control; they remain in commit history and violate grid policy.
  • Encode grid certificates and passwords with base64 and store them as protected CI/CD variables in GitLab.
  • Set variables to Protected (available only on protected branches) and Masked (hidden in logs) for maximum security.
  • Restore grid certificate files and obtain a VOMS proxy in CI jobs using base64 -d and voms-proxy-init --pwstdin.

Content from CAT services for GitLab CI


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How can I access CMS data stored on EOS from GitLab CI without managing my own grid certificates?
  • What are CAT services and why do they only work in the cms-analysis namespace?
  • How do I request authentication tokens for EOS access and VOMS proxies in CI jobs?

Objectives

  • Understand the purpose and security model of CAT services for GitLab CI.
  • Use the CAT EOS file service to access CMS data on EOS via JWT-authenticated tokens.
  • Use the CAT VOMS proxy service to obtain a grid proxy for the CMS VO without storing personal certificates.
Discussion

Did you find it difficult to access CMS data in GitLab CI/CD?

Accessing CMS data stored on EOS or using grid resources in GitLab CI/CD can be challenging, especially for newcomers. You must manage authentication, permissions, and complex data access methods. The Common Analysis Tools (CAT) group in CMS provides services to simplify these tasks.

Discussion

Why CERN GitLab Instead of GitHub?

CERN GitLab is tailored for the CERN community and offers seamless integration with CERN infrastructure: CVMFS, EOS storage, and grid computing resources. This makes it ideal for CMS analysts working with large datasets and complex workflows.

If you host your analysis on GitHub, you must manually set up access to CMS resources—a cumbersome and error-prone process. CERN GitLab lets you leverage CAT services that simplify data access and authentication, so you can focus on analysis rather than infrastructure.

The cms-analysis user code space


The Common Analysis Tools (CAT) group maintains a dedicated CERN GitLab namespace called cms-analysis, where any CMS member can store their analysis code. It is documented in the CAT documentation pages.

The namespace is organized into groups and subgroups following the CMS Physics Coordination structure. You can request an area in the PAG-specific group that best matches your analysis.

Request an Area Anytime

Keep your analysis code under version control from the start. You can request an area at any stage (in fact we invite you to do so):

  • You can create an area with a temporary name initially, then rename it to match your CADI line when the analysis matures.
  • Request an area for your analysis at cms-analysis.

The services described here only work in cms-analysis

For security reasons, the services described here only work if your project is in the cms-analysis namespace.

To move your current project:

  1. Go to Settings > General > Advanced > Transfer project.
  2. Select cms-analysis/CMSDAS/CAT-tutorials as the new namespace (for testing).
  3. For production work, select the relevant POG/PAG subgroup.

Using the CAT EOS file service


CAT maintains a service account, cmscat, that has CMS VO membership and can access EOS. CAT provides a service to request short-lived EOS tokens in CI jobs, allowing you to access CMS files on behalf of the cmscat service account.

Files accessible: /eos/cms/store/group/cat

For detailed documentation, see the CAT CI dataset service.

The file you need is not there?

If the file you need is not in /eos/cms/store/group/cat, request it by creating a merge request to cms-analysis/services/ci-dataset-files.

How the Service Works

Two steps are required:

Step 1: Configure your job to create a JWT (JSON Web Token) for authentication:

YAML

id_tokens:
  MY_JOB_JWT:
    aud: "cms-cat-ci-datasets.app.cern.ch"

Step 2: Query the CAT service to get a short-lived EOS token:

BASH

XrdSecsssENDORSEMENT=$(curl -H "Authorization: ${MY_JOB_JWT}" "https://cms-cat-ci-datasets.app.cern.ch/api?eospath=${EOSPATH}" | tr -d \")

Then access the file using:

BASH

root://eoscms.cern.ch/${EOSPATH}?authz=${XrdSecsssENDORSEMENT}&xrd.wantprot=unix

Where EOSPATH is a variable holding a path of a file on EOS. Now you can access the file with a path that includes the newly generated token at the end, as: root://eoscms.cern.ch/${EOSPATH}?authz=${XrdSecsssENDORSEMENT}&xrd.wantprot=unix.

Challenge

Exercise: Copy a File Using the CAT EOS Service

Try copying this file from the CAT storage:

BASH

/eos/cms/store/group/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root

YAML

test_eos_service:
  image:
    name: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - cvmfs
  id_tokens:
    MY_JOB_JWT:
      aud: "cms-cat-ci-datasets.app.cern.ch"
  variables:
    EOSPATH: '/eos/cms/store/group/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root'
    EOS_MGM_URL: root://eoscms.cern.ch
  before_script:
    - 'XrdSecsssENDORSEMENT=$(curl -H "Authorization: ${MY_JOB_JWT}" "https://cms-cat-ci-datasets.app.cern.ch/api?eospath=${EOSPATH}" | tr -d \")'
  script:
    - xrdcp "${EOS_MGM_URL}/${EOSPATH}?authz=${XrdSecsssENDORSEMENT}&xrd.wantprot=unix" test.root
    - ls -l test.root

Using the CAT VOMS Proxy Service


The cmscat service account is a CMS VO member and can request VOMS proxies. If your project is in cms-analysis, it can request a VOMS proxy from a service at cms-cat-grid-proxy-service.app.cern.ch, in much the same way as the CAT EOS service requests a proxy to cms-cat-ci-datasets.app.cern.ch above.

The proxy is returned as a base64-encoded string with a lifetime matching your CI job duration.

Exercise: Set up a CI Job to Obtain a Grid Proxy

Set up a CI job that retrieves and validates a VOMS proxy from the CAT service.

Step 1: Configure your job to create a JWT:

YAML

id_tokens:
  MY_JOB_JWT:
    aud: "cms-cat-grid-proxy-service.app.cern.ch"

Step 2: Query the CAT service to get a proxy:

BASH

proxy=$(curl --fail-with-body -H "Authorization: ${MY_JOB_JWT}" "https://cms-cat-grid-proxy-service.app.cern.ch/api" | tr -d \")

Step 3: Decode and store the proxy:

YAML

  - printf $proxy | base64 -d > myproxy
  - export X509_USER_PROXY=$(pwd)/myproxy

Important: CVMFS and Environment Variables

Your image must have CVMFS mounted (e.g., tags: [cvmfs]). Depending on the image environment, you may also need to export grid security variables:

YAML

  - export X509_VOMS_DIR=/cvmfs/grid.cern.ch/etc/grid-security/vomsdir/
  - export VOMS_USERCONF=/cvmfs/grid.cern.ch/etc/grid-security/vomses/
  - export X509_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates/

YAML

test_proxy_service:
  image:
    name: registry.cern.ch/docker.io/cmssw/el7:x86_64
  tags:
    - cvmfs
  id_tokens:
    MY_JOB_JWT:
      aud: "cms-cat-grid-proxy-service.app.cern.ch"
  before_script:
    - 'proxy=$(curl -H "Authorization: ${MY_JOB_JWT}" "https://cms-cat-grid-proxy-service.app.cern.ch/api" | tr -d \")'
  script:
    - printf $proxy | base64 -d > myproxy
    - export X509_USER_PROXY=$(pwd)/myproxy
    - export X509_CERT_DIR=/cvmfs/grid.cern.ch/etc/grid-security/certificates/
    - voms-proxy-info

Callout

With both CAT services in place, you now have automatic, secure access to CMS data and grid resources in your CI pipeline—no manual certificate management required!

Key Points
  • CAT services simplify CMS resource access in GitLab CI by eliminating manual certificate and credential management.
  • CAT services only work in the cms-analysis namespace for security; move your project there via Settings > General > Advanced > Transfer project.
  • The CAT EOS file service provides JWT-authenticated access to datasets in /eos/cms/store/group/cat via the cmscat service account.
  • The CAT VOMS proxy service provides short-lived grid proxies without storing personal certificates; proxies are returned as base64-encoded strings.
  • Both services require configuring id_tokens in .gitlab-ci.yml and querying CAT endpoints to retrieve authentication tokens.

Content from Putting It All Together: Final Run with CAT Services


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How do I combine CMSSW compilation, data access, and authentication into a single CI/CD pipeline?
  • What are the trade-offs between using personal grid credentials versus CAT services?
  • How can I use both CAT EOS and VOMS proxy services in the same pipeline?

Objectives

  • Create a complete .gitlab-ci.yml pipeline that compiles CMSSW, obtains authentication, and runs analysis on EOS data.
  • Implement data access using either CAT EOS file service or CAT VOMS proxy service.
  • Compare and choose between personal credentials and CAT services based on your workflow.

Putting It All Together: Final Run with CAT Services


In this final episode, you will combine everything you’ve learned to build a complete GitLab CI/CD pipeline that runs a CMSSW analysis using CAT services for data access and authentication. Your pipeline will include:

  1. Set up the CMSSW environment — Configure CMSSW version and environment variables.
  2. Obtain authentication — Use CAT services or personal grid credentials for CMS resource access.
  3. Access CMS data on EOS — Read input datasets via XRootD.
  4. Run the CMSSW analysis — Execute analysis code on input data.
  5. Validate output — Store and review analysis results.

Accessing Files on EOS via XRootD


Most analyses run on centrally produced files stored on EOS. To access them, you need a valid grid proxy for the CMS Virtual Organisation (VO).

We have already used EOS files in previous lessons. For this final example, we’ll explicitly define and use a single file from the DYJetsToLL dataset.

A copy is stored permanently on EOS at:

OUTPUT

/eos/cms/store/group/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root

Test Locally First

Before running in CI, test file access on your local machine:

BASH

cd ${CMSSW_BASE}/src/AnalysisCode/ZPeakAnalysis/
cmsRun test/MyZPeak_cfg.py inputFiles=/store/group/cat/datasets/MINIAODSIM/RunIISummer20UL17MiniAODv2-106X_mc2017_realistic_v9-v2/DYJetsToLL_M-50_TuneCP5_13TeV-amcatnloFXFX-pythia8/2C5565D7-ADE5-2C40-A0E5-BDFCCF40640E.root
ls -l myZPeak.root

This verifies your grid proxy and data access setup before committing to CI.


Three Approaches: Complete Examples


We show three different approaches to running your analysis in CI. Choose the one that best fits your use case.

Challenge: Create Your Complete Pipeline


Discussion

Challenge

Choose one of the three approaches above and implement a complete .gitlab-ci.yml file that:

  1. Compiles your CMSSW analysis code.
  2. Obtains authentication (using your chosen method).
  3. Runs the analysis on the example file.
  4. Validates the output.

Commit and push to GitLab, then check the CI/CD > Pipelines page to see it run. What happens? Did it succeed?

Callout

Congratulations! You have now built a complete, production-ready CI/CD pipeline for CMS analysis using GitLab. You can extend this pattern to multiple datasets, batch processing, and complex workflows.

Content from EXTRA: Building a Container Image in GitLab CI/CD


Last updated on 2025-12-23 | Edit this page

Overview

Questions

  • How can I build a container image with my own code in a GitLab CI/CD pipeline?
  • How can I reuse the same container building configuration across multiple jobs?

Objectives

  • Build and push a container image to the GitLab container registry using buildah.
  • Create a reusable .buildah template for container building.
  • Use container images built in CI/CD pipelines in your analysis workflows.

Building Container Images with Buildah


Buildah is a container build tool that works well in CI/CD environments without requiring Docker. In this lesson, you’ll learn to build and store container images in the GitLab container registry within your CI/CD pipeline.

What You’ll Need

  • A Dockerfile in your repository that defines your container image.
  • If unfamiliar with Docker, see the HSF Docker tutorial.

Creating a Reusable Buildah Template


Instead of repeating buildah commands in every job, define a reusable .buildah template that other jobs can extend:

YAML

.buildah:
  stage: build
  image: quay.io/buildah/stable
  variables:
    DOCKER_FILE_NAME: "Dockerfile"
    REGISTRY_IMAGE_PATH: ${CI_REGISTRY_IMAGE}:latest
    EXTRA_TAGS: ""
  script:
    - echo "$CI_REGISTRY_PASSWORD" | buildah login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - export BUILDAH_FORMAT=docker
    - export STORAGE_DRIVER=vfs
    - buildah build --storage-driver=${STORAGE_DRIVER} -f ${DOCKER_FILE_NAME} -t ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah tag ${REGISTRY_IMAGE_PATH} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi
    - buildah push --storage-driver=${STORAGE_DRIVER} ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah push --storage-driver=${STORAGE_DRIVER} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi

Template variables:

  • DOCKER_FILE_NAME: Path to your Dockerfile (default: Dockerfile).
  • REGISTRY_IMAGE_PATH: Full image path and tag (e.g., registry.example.com/user/repo:tag).
  • EXTRA_TAGS: Space-separated list of additional tags (optional, e.g., "latest v1.0").

Example 1: Simple Image Build and Push


Extend the .buildah template to build an image tagged with your commit hash:

YAML

stages:
  - build

build_image:
  extends: .buildah
  variables:
    DOCKER_FILE_NAME: "Dockerfile"
    REGISTRY_IMAGE_PATH: "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}"
    EXTRA_TAGS: "latest"

What happens:

  1. The image is tagged with your commit hash (e.g., a1b2c3d).
  2. It’s also tagged as latest via EXTRA_TAGS.
  3. Both tags are pushed to your GitLab container registry.

Find your image at Deploy > Container registry in your GitLab project.


Example 2: Conditional Builds with Rules


Build the container image only when the Dockerfile changes or on a schedule:

YAML

stages:
  - build

build_image:
  extends: .buildah
  variables:
    DOCKER_FILE_NAME: "Dockerfile"
    REGISTRY_IMAGE_PATH: "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}"
    EXTRA_TAGS: "latest"
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"
      variables:
        EXTRA_TAGS: "latest scheduled"
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      changes:
        - Dockerfile
    - if: $CI_COMMIT_BRANCH == "main"
      changes:
        - Dockerfile

This ensures containers are only rebuilt when necessary, saving CI/CD time.


Using Your Built Container Image


Once the image is pushed to the registry, use it in subsequent CI jobs or elsewhere:

In GitLab CI/CD

YAML

test_with_image:
  stage: test
  image: ${CI_REGISTRY_IMAGE}:latest
  script:
    - echo "Running tests in the custom container"
    - # your test commands here

Locally (on LXPLUS or your machine)

BASH

apptainer shell docker://${CI_REGISTRY_IMAGE}:latest

Or with Docker:

BASH

docker run ${CI_REGISTRY_IMAGE}:latest

Replace ${CI_REGISTRY_IMAGE} with your actual registry path (e.g., gitlab-registry.cern.ch/username/repo).


Discussion

Challenge: Build Your Own Container

  1. Create a simple Dockerfile in your repository that compiles CMSSW like the previous example.
  2. Add a build_image job to your .gitlab-ci.yml that extends .buildah.
  3. Push your code and check CI/CD > Pipelines to watch the build.
  4. Find your image in Deploy > Container registry.
  5. Use it in a downstream CI job or locally with apptainer.

Callout

Tip: The commit hash tag (e.g., a1b2c3d) ensures a one-to-one correspondence between your code and the built image, making it easy to reproduce analysis runs from a specific commit.

Build an image that includes the compiled CMSSW area (from cmssw_compile)


If you already compile CMSSW in a cmssw_compile job and publish the full ${CMSSW_RELEASE} as an artifact, you can bake that compiled area into a reusable runtime image. This lets downstream jobs (or local users) run cmsRun without rebuilding.

1) Ensure cmssw_compile exports the area as an artifact

YAML

variables:
  CMS_PATH: /cvmfs/cms.cern.ch
  CMSSW_RELEASE: CMSSW_10_6_30
  SCRAM_ARCH: slc7_amd64_gcc700

cmssw_compile:
  image: registry.cern.ch/docker.io/cmssw/el7:x86_64
  stage: compile
  tags: [k8s-cvmfs]
  script:
    - set +u && source ${CMS_PATH}/cmsset_default.sh; set -u
    - export SCRAM_ARCH=${SCRAM_ARCH}
    - cmsrel ${CMSSW_RELEASE}
    - cd ${CMSSW_RELEASE}/src && cmsenv
    - mkdir -p AnalysisCode && cp -r $CI_PROJECT_DIR/ZPeakAnalysis AnalysisCode/
    - cd ${CMSSW_RELEASE}/src && scram b -j 4
  artifacts:
    untracked: true
    expire_in: 1 hour
    paths:
      - ${CMSSW_RELEASE}

2) Add a Dockerfile that copies the compiled area

Create Dockerfile in your repository:

DOCKERFILE

# Dockerfile
FROM registry.cern.ch/docker.io/cmssw/el7:x86_64

ARG CMSSW_RELEASE
ENV CMS_PATH=/cvmfs/cms.cern.ch \
    SCRAM_ARCH=slc7_amd64_gcc700 \
    CMSSW_RELEASE=${CMSSW_RELEASE} \
    CMSSW_BASE=/opt/${CMSSW_RELEASE}

# Copy the compiled CMSSW area from the CI workspace (artifact) into the image
COPY ${CMSSW_RELEASE} /opt/${CMSSW_RELEASE}

SHELL ["/bin/bash", "-lc"]
RUN echo 'source ${CMS_PATH}/cmsset_default.sh && cd ${CMSSW_BASE}/src && cmsenv' >> /etc/profile.d/cmssw.sh
WORKDIR /opt/${CMSSW_RELEASE}/src

Notes:

  • The COPY ${CMSSW_RELEASE} ... works because the build job (below) uses needs: cmssw_compile with artifacts: true, so the compiled directory is present in the build context.
  • This image contains your compiled code, so downstream jobs can run cmsRun directly.

3) Build the image by extending the reusable .buildah template

Assuming you already defined the reusable .buildah template:

YAML

.buildah:
  stage: build
  image: quay.io/buildah/stable
  variables:
    DOCKER_FILE_NAME: "Dockerfile"
    REGISTRY_IMAGE_PATH: ${CI_REGISTRY_IMAGE}:latest
    EXTRA_TAGS: ""
  script:
    - echo "$CI_REGISTRY_PASSWORD" | buildah login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - export BUILDAH_FORMAT=docker
    - export STORAGE_DRIVER=vfs
    - buildah build --storage-driver=${STORAGE_DRIVER} -f ${DOCKER_FILE_NAME} -t ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah tag ${REGISTRY_IMAGE_PATH} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi
    - buildah push --storage-driver=${STORAGE_DRIVER} ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah push --storage-driver=${STORAGE_DRIVER} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi

Then add the image build job:

YAML

stages: [compile, build]

build_cmssw_image:
  stage: build
  extends: .buildah
  needs:
    - job: cmssw_compile
      artifacts: true
  variables:
    DOCKER_FILE_NAME: "Dockerfile"
    REGISTRY_IMAGE_PATH: "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}"
    EXTRA_TAGS: "latest"
    # Optional: pass release as build-arg
    BUILDAH_ARGS: "--build-arg CMSSW_RELEASE=${CMSSW_RELEASE}"
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | buildah login -u "$CI_REGISTRY_USER" --password-stdin $CI_REGISTRY
    - export BUILDAH_FORMAT=docker
    - export STORAGE_DRIVER=vfs
  script:
    - buildah build --storage-driver=${STORAGE_DRIVER} ${BUILDAH_ARGS} -f ${DOCKER_FILE_NAME} -t ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah tag ${REGISTRY_IMAGE_PATH} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi
    - buildah push --storage-driver=${STORAGE_DRIVER} ${REGISTRY_IMAGE_PATH}
    - |
      if [ -n "${EXTRA_TAGS}" ]; then
        for tag in ${EXTRA_TAGS}; do
          buildah push --storage-driver=${STORAGE_DRIVER} ${CI_REGISTRY_IMAGE}:${tag}
        done
      fi

You can now use the image ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA} (or :latest) in later jobs:

YAML

run_with_baked_image:
  stage: test
  image: ${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}
  tags: [k8s-cvmfs]
  script:
    - source /cvmfs/cms.cern.ch/cmsset_default.sh
    - cd /opt/${CMSSW_RELEASE}/src && cmsenv
    - cmsRun AnalysisCode/ZPeakAnalysis/test/MyZPeak_cfg.py

Tips:

  • Artifacts are read-only in downstream jobs, but Buildah only needs to read them (no extra chmod is required).
  • Tag images by commit for exact reproducibility; add latest for a moving pointer.