A set of quick instructions to get started
If you need a few tutorials/material to start with some basic languages/programs, here there is a (non-complete and ongoing) list of suggested tutorials:
python
and how to create plots using python tools.C++
and python
.CMS has a series of schools with introductory topics for newcomers. A list of the latest schools:
As a first introduction to what CMS is doing with ML, it is strongly recommended to follow the CMSDAS tutorial. Link.
I suggest you to watch the lecture and follow the slides.
As a first step towards getting used to ML tools, I suggest you to use the CMSDAS material in the CERN SWAN. SWAN is a hub created at CERN with many tools needed for CERN data analysis. For more information about SWAN follow this link. Contact me if you dont have access to SWAN.
In SWAN you will be asked about the environment you need to create, at this moment you can use the default settings and click on Start my Session
.
Once you access your SWAN projects, you can include directly github repositories by click on the button next to the plus button.
There you can add the github link https://github.com/FNALLPC/machine-learning-das.git
from the repository. After this, you will have all the jupiter notebooks in SWAN ready to run. You must start with the 0-setup-libraries.ipynb
notebook, which it will create the environment you need for the rest of the notebooks.
Jupyter notebooks are a wonderful tool to teach and to learn coding. However, since you have working code there, it is easy just to run it once and do not understand what the code is doing. I strongly suggest you to take your time to see what the example is doing. Literally play as much as you can with it.
This notebook is optional, you can continue to notebook 4 without loosing any major information. If you have a problem with import torch
, you need to modify the first cell. From:
!{sys.executable} -m pip install torch torchvision root_pandas --user
to
!{sys.executable} -m pip install torch --user
Then you can run it without problems.
This notebook relies on the skopt
package, which is recommended to use with python3
. The current environment has python2.7
. We can fix this issue, but for now it is not necessary.
This only means that you might need to change your environment. Go back to your main SWAN folder and in the top right, click on the three dots. There, find the option change configuration
, which it will bring you back to the settings. In my test, it work perfectly with 16 Gb.
In this notebook, you need to access some files that exist only in the CERN storage area. For that you need to set your certificate in lxplus. If you have never installed your certificate in lxplus, please follow this instructions.
Then, in lxplus, run:
voms-proxy-init -rfc -voms cms --valid 168:00
it will print a message containing some information and a line that looks like this:
Created proxy in /tmp/x509up_u99999.
These file contains a certificate that you need to copy to your cernbox. To do that, run:
cp -p /tmp/x509up_u99999 /eos/user/X/USER/tmp/x509up_u15148
where X
and USER
depends on YOUR user. For instance if your cern user is agomez then X=a
and USER=agomez
.
Then in the notebook 4-preprocessing.ipynb
, in the first cell right after import numpy as np
, copy:
##### REMEMBER TO MANUALLY COPY THE PROXY TO YOUR CERNBOX FOLDER AND TO MODIFY THE NEXT LINE
import os
os.environ['X509_USER_PROXY'] = '/eos/home-X/USER/tmp/x509up_u99999'
if os.path.isfile(os.environ['X509_USER_PROXY']): pass
else: print("os.environ['X509_USER_PROXY'] ",os.environ['X509_USER_PROXY'])
os.environ['X509_CERT_DIR'] = '/cvmfs/cms.cern.ch/grid/etc/grid-security/certificates'
os.environ['X509_VOMS_DIR'] = '/cvmfs/cms.cern.ch/grid/etc/grid-security/vomsdir'
where X
and USER
follows the same notation as before.
Once you properly modify this line, you can run it once because it will download many files needed and it will take a lot of time.
If you have any technical problem, dont hesitate on contacting me on mattermost.
More to come