storage_site_file_scan.py

This script flags and unregisters the zdabs on certain grid storage site and remove their corresponding CouchDB references. Example usage:

python validation/storage_site_file_scan.py -c [config] --start [runStart] --end [runEnd] -f [fileType] -o [output text file] --site [Storage site of interest] --dryRun --verbose

# Required arguments:
#  -c CONFIG         CouchDB configuration file
#  -f FILETYPE       File type to check [L1], [L2]
#  -o OUTTEXT        Output file [files_on_site.txt]
#  --start RUNSTART  First run to check
#  --end RUNEND      Last run to check
#  --site SITE       Storage site to check
# Optional arguments:
#  -h, --help        show this help message and exit
#  --dryRun          Dry run, produces an output of files that will be modified and unregistered.
#  --verbose         Increase output verbosity

In the dryRun mode, the script will only flag the files on the storage site of interest without modifying the documents and unregistering the files. The output file will contain the document ID, run number, file type, subfile number, guid, and the srm path to the zdab. Before unregistering files and modify CouchDB documents, you should always dryrun on a small number of runs to see if the output text file contains the relevant information of the runs. After making sure everything looks fine, remember to set up the grid_production proxy before a wet run.