Configuring new clients
New production clients will need access to Ganga (no logner available via CVMFS - must install manually; See below) and a dirac_ui if submitting to the Dirac WMS. If a grid UI has been setup then you should already have CVMFS access.
Software needed
- gridui: Accessible via /cvmfs/grid.cern.ch
-
ganga:
Recommended to copy ganga-7.1.9 dir from liverpool or a site where the screens already work to avoid compatibility issues with Dirac New installation of the same version can give problems too
Create a virtual environment for python 2.7 by running the following:
- Run
virtualenv ganga-7.1.9
in the home directory - Run
source ganga-7.1.9/bin/activate
to enable the virtual python installation - Run
python -m pip install ganga==7.1.9
to install Ganga to this virtual python - Run
deactivate
to leave the virtual env - Ensure
data-flow/env.sh
points to this location ($HOME/ganga-7.1.9
) for the Ganga location
- Run
- data-flow: Accessible via github
- Needs pip install to generate libraries
- Can also scp the data-flow/lib director from Cedar
Configure files needed
- .gangarc: Configure to your local batch system and directory structure be sure RUNTIME_PATH points to your ganga install.
- Typically, the
.gangarc
is a default one - we have two customized ones on each site (for processing and production) usually labelled as.gangarc_[backend]_[processing|production]
- copy the ones from Cedar or another site as a reference
- Typically, the
- .gangasnoplus: Contains ratdb passwords. May be deprecated.
- data-flow/gasp/config/[sitename].cfg: Make sure this is passed to gasp_client. This contains information about the passwords to access databases.
- data-flow/gasp/sites/[sitename].py: This will define the qualities of the site, including the name, the number of jobs and the RAT versions.
Environment
A few directories need to be accessible from all nodes:
- ganga_jobs: Path set in postexecute and preexecute in the corresponding .gangarc (processing and production).
- gangadir: Ganga information directories, the path is set in .gangarc.
- $TMPDIR: Set in an environment file.
- Valid Grid Certificate: Stored in
.globus
(should probably just copy the.globus
from Cedar) - referenced by$X509_USER_PROXY
- data-flow/env.sh: May need tweaks to match your environment (i.e. cvmfs not mounted).
Cron jobs
Typically need two cron jobs. First, clear out anything older than a week in $TMPDIR once a day. This can be done with a command like find /home/snoprod/tmp/ -mtime +7 -delete > /dev/null
Second, the grid certificates need to be refreshed every hour. See the instructions here or look at what is on Cedar as an active example.
Screen Sessions
Basically the screen sessions just need loops both checking and submitting jobs. Checking can be done at any time, but submitting can only happen at one site. The script enqueue_command.py synchronizes submission. For an active example, refer to the setup on Cedar.
Dirac
Dirac is used to interface with the Grid. This provides submission and monitoring tools. To setup the dirac_ui first follow instructions here.
Recommended to copy dirac_ui dir from liverpool or a site where the screens already work to avoid compatibility issues with Ganga New installation of the same version can give problems too
# You will need a script checks dirac proxy validity and reminds operators to renew proxies (the one here
# will have a lifetime of 1000 hours)
dirac-proxy-init -g snoplus.snolab.ca_production -v 1000:00 -M -u $HOME/grid/proxy/dirac_production_proxy
dirac-proxy-init -g snoplus.snolab.ca_user -v 1000:00 -M -u $HOME/grid/proxy/dirac_user_proxy
export X509_USER_PROXY=$HOME/grid/proxy/dirac_production_proxy
# This environment file will be used later by Ganga for job submission
env > /path/to/dirac_production_env
export X509_USER_PROXY=$HOME/grid/proxy/dirac_user_proxy
# This environment file will be used later by Ganga for job submission
env > /path/to/dirac_user_env
Batch
Batch systems directly work with a head submission node i.e. PBS, SLURM, CONDOR, etc. If you are using a batch system, you simply need to create a normal voms proxy and ensure that your ganga is set up for that backend.
Ganga
If ganga has not been run before, you will need to run it for the first time:
cd data-flow
source env.sh
ganga
This will prompt you to create a new .gangarc, say yes!
Now you have a .gangarc, you first need to edit it such that e.g. Batch submission is setup to correctly submit jobs or Dirac submission knows where your environment file is located. For batch submission this may involve additional editing:
[Configuration]
Batch = SGE
[SGE]
submit_res_pattern = Your job (?P<id>\d+) (.+) has been submitted
[defaults_SGE]
queue = snoplus
while for Dirac it will probably involve:
[Configuration]
RUNTIME_PATH = GangaDirac
[DIRAC]
DiracEnvFile = $HOME/dirac_ui/dirac_env_file
You can see an example of a batch submission on Cedar and a Dirac submission on Liverpool.
Most locations will typically have a .gangarc_production and a .gangarc_processing that point to two separate gangadirs. This is a way of distinguishing between production and processing jobs which usually operate on slightly different timescales.