fix_incomplete_paths.py

This script is used to correct the paths of some production data files on the grid. In some cases, the correct path at the host site (usually RAL - GridPP) has been replaced by INCOMPLETE.

python validation/fix_incomplete_paths.py [config] [version] -m [module] -format [fileType] 
usage: fix_incomplete_paths.py [-h] [--ask] [--modules MODULES]
                               [--format FORMAT] [--fearless]
                               [--startrun STARTRUN | --runrange RUNRANGE RUNRANGE]
                               config ratv

#positional arguments:
#  config                Configuration file for database access credentials
#  ratv                  The version of rat used to select passes

#optional arguments:
#  -h, --help                    show this help message and exit
#  --ask                         Prompts for each matching job instead of assuming fix.
#  --modules MODULES             Restrict to modules matching the (shell-style) pattern
#  --format FORMAT               Specify the file format (ntuple, ratds, soc or all)
#  --fearless                    Do not prompt the user to continue - run without fear!
#  --alldocs                     Set the script to run over all docs in the view 
#  --startrun STARTRUN           Apply to all runs after this run
#  --runrange RUNRANGE RUNRANGE  Restrict to a specific run range
#  --docrange DOCSTART DOCEND    Restrict --alldocs to run between a specific range in the view

Some documents will have INCOMPLETE in the data file URL like: srm://srm-snoplus.gridpp.rl.ac.uk/INCOMPLETE/production/TeLoadedAlphan_Telab_Avin_Av_18o/r600/TeLoadedAlphan_Telab_Avin_Av_18o_r629_s0_p2.ntuple.root

This script loops over the files specified by the user (RAT version, modules, file formats, run range) and looks for the “INCOMPLETE” substring in the data path of the data documents. Since so far, all the issues have been seen at RAL (srm-snoplus.gridpp.rl.ac.uk), the “INCOMPLETE” are replaced by “/castor/ads.rl.ac.uk/prod/snoplus/”. Those paths are hard coded at the moment (Feb 2019) but this might be changed in cases other sites show this issue. If a data document is found to be linking a host different than RAL, hepgrid11, susx, or qmul, the script will not change the path of the data file in order to avoid dangerously messing up the database. Note that the default file format is “all”, meaning that the script will loop over all the ntuple, ratds and soc files satisfying the conditions set by the user unless told otherwise.