Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add calibration to provenance tracking #74

Merged
merged 33 commits into from
Jan 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
c05b37f
remove not used log level in handler
Bultako Jan 4, 2022
aaf4358
improve code comments
Bultako Jan 4, 2022
4e9a1e0
add drs4_pedestal definition in the model
Bultako Dec 2, 2021
fe48311
refactor utils.py
Bultako Jan 4, 2022
261ccfc
add drs4_pedestal instance vars
Bultako Jan 4, 2022
3a82cf1
adapt provenance/capture.py to calibration
Bultako Dec 2, 2021
9a91a34
trace drs4_pedestal
Bultako Jan 4, 2022
5f7cefa
add drs4_pedestal entities to model
Bultako Jan 4, 2022
8512671
fix paths
Bultako Jan 4, 2022
52f4fa7
add pedestal and calibration runs as session tag
Bultako Jan 4, 2022
21ec791
fix provenance/capture.py modifs for calibration
Bultako Dec 2, 2021
301c7a7
add calibrate_charge definition in the model
Bultako Jan 4, 2022
6302436
add drs4_pedestal_run_id, pedcal_run_id as params for both calib func…
Bultako Jan 7, 2022
1bd0d1b
comment entities not used in osa but in lstchain
Bultako Jan 7, 2022
6c959e6
add calibrate_charge instance vars
Bultako Jan 7, 2022
77c8370
refactor provenance/capture.py
Bultako Dec 2, 2021
39ad393
trace calibrate_charge
Bultako Jan 7, 2022
1869f64
do not copy DL1Check files
Bultako Dec 2, 2021
350bc99
fix int conversion in get_time_calibration_file call
Bultako Jan 7, 2022
088ff3d
fix CI tests
Bultako Jan 7, 2022
dc59182
add pedestal/pedcalib run ids as arguments for provprocess.py
Bultako Jan 8, 2022
7244a20
make whole session starts with calibration
Bultako Jan 8, 2022
d590109
add calibrationcheckplot and timecalibrationfile to model
Bultako Jan 8, 2022
1eca05d
use os.path.realpath when recording filepaths
Bultako Jan 8, 2022
ae3d595
use run number as the session name
Bultako Jan 8, 2022
3976461
process calibration info
Bultako Jan 8, 2022
2d9c319
create only calibration_to_dl1 and calibration_to_dl2 prov files
Bultako Jan 8, 2022
4554fef
remove hash and has_type as properties of entities in graph
Bultako Jan 8, 2022
b767a60
black scripts
Bultako Jan 8, 2022
742a9ee
fix tests
Bultako Jan 8, 2022
14a0038
fix codacy issues
Bultako Jan 10, 2022
4f9dedf
remove time_calib file since not produced in calibrate_charge
Bultako Jan 11, 2022
217fc17
pedestal file in HDF5 format
Bultako Jan 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions osa/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,13 +289,13 @@ def sequence_calibration_filenames(sequence_list):

if not sequence.parent_list:
drs4_pedestal_run_id = sequence.previousrun
calibration_run_id = sequence.run
pedcal_run_id = sequence.run
else:
drs4_pedestal_run_id = sequence.parent_list[0].previousrun
calibration_run_id = sequence.parent_list[0].run
pedcal_run_id = sequence.parent_list[0].run

drs4_pedestal_file = f"drs4_pedestal.Run{drs4_pedestal_run_id:05d}.0000.h5"
calibration_file = f"calibration_filters_52.Run{calibration_run_id:05d}.0000.h5"
calibration_file = f"calibration_filters_52.Run{pedcal_run_id:05d}.0000.h5"

# Assign the calibration and drive files to the sequence object
sequence.drive = drive_file
Expand All @@ -305,7 +305,7 @@ def sequence_calibration_filenames(sequence_list):
sequence.calibration = (
Path(cfg.get("LST1", "CALIB_DIR")) / nightdir / "pro" / calibration_file
)
sequence.time_calibration = get_time_calibration_file(calibration_run_id)
sequence.time_calibration = get_time_calibration_file(pedcal_run_id)


def plot_job_statistics(sacct_output: pd.DataFrame, directory: Path):
Expand Down
83 changes: 50 additions & 33 deletions osa/provenance/capture.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
import psutil
import yaml

from osa.provenance.io import read_prov
from osa.provenance.utils import get_log_config, parse_variables

# gammapy specific
Expand Down Expand Up @@ -52,9 +51,9 @@
PROV_PREFIX = provconfig["PREFIX"]
SUPPORTED_HASH_METHOD = ["md5"]
SUPPORTED_HASH_BUFFER = ["content", "path"]
REDUCTION_TASKS = ["r0_to_dl1", "dl1ab", "dl1_datacheck", "dl1_to_dl2"]

# global variables
sessions = set()
traced_entities = {}
session_name = ""
session_tag = ""
Expand Down Expand Up @@ -99,8 +98,16 @@ def wrapper(*args, **kwargs):
# variables parsing
global session_name, session_tag
class_instance = parse_variables(class_instance)
session_tag = f"{activity}:{class_instance.ObservationRun}"
session_name = f"{class_instance.ObservationRun}"
if class_instance.__name__ in REDUCTION_TASKS:
session_tag = f"{activity}:{class_instance.ObservationRun}"
session_name = f"{class_instance.ObservationRun}"
else:
session_tag = (
f"{activity}:{class_instance.PedestalRun}-{class_instance.CalibrationRun}"
)
session_name = f"{class_instance.PedestalRun}-{class_instance.CalibrationRun}"
# OSA specific
# variables parsing

# provenance capture before execution
derivation_records = get_derivation_records(class_instance, activity)
Expand Down Expand Up @@ -205,7 +212,7 @@ def get_entity_id(value, item):
entity_name = item["entityName"]
entity_type = definition["entities"][entity_name]["type"]
except Exception as ex:
logger.warning(f"{ex} in {item}")
logger.warning(f"Not found in model {ex} in {item}")
entity_name = ""
entity_type = ""

Expand All @@ -220,7 +227,8 @@ def get_entity_id(value, item):
# osa specific hash path
# async calls does not allow for hash content
return get_file_hash(value, buffer="path")

# osa specific hash path
# async calls does not allow for hash content
try:
entity_id = abs(hash(value) + hash(str(value)))
if hasattr(value, "entity_version"):
Expand Down Expand Up @@ -331,7 +339,9 @@ def get_python_packages():

def log_prov_info(prov_dict):
"""Write a dictionary to the logger."""
prov_dict["session_tag"] = session_tag # OSA specific session tag
# OSA specific session tag used in merging prov from parallel sessions
prov_dict["session_tag"] = session_tag
#
record_date = datetime.datetime.now().isoformat()
logger.info(f"{PROV_PREFIX}{record_date}{PROV_PREFIX}{prov_dict}")

Expand All @@ -341,32 +351,39 @@ def log_session(class_instance, start):
# OSA specific
# prov session is outside scripting and is run-wise
# we may have different sessions/runs in the same log file
session_id = abs(hash(class_instance))
lines = read_prov(filename=LOG_FILENAME)
for line in lines:
if line.get("observation_run", 0) == class_instance.ObservationRun:
session_id = lines[0]["session_id"]
sessions.add(session_id)

if session_id not in sessions:
sessions.add(session_id)
system = get_system_provenance()
log_record = {
"session_id": session_id,
"name": session_name,
"startTime": start,
"system": system,
# OSA specific
"software_version": class_instance.SoftwareVersion,
"observation_date": class_instance.ObservationDate,
"observation_run": class_instance.ObservationRun, # a session is run-wise
"config_file": class_instance.ProcessingConfigFile,
"config_file_hash": get_file_hash(
class_instance.ProcessingConfigFile, buffer="path"
),
"config_file_hash_type": get_hash_method(),
}
log_prov_info(log_record)
# session_id = abs(hash(class_instance))
if class_instance.__name__ in REDUCTION_TASKS:
session_id = f"{class_instance.ObservationDate}{class_instance.ObservationRun}"
else:
session_id = f"{class_instance.PedestalRun}{class_instance. CalibrationRun}"
# OSA specific
# prov session is outside scripting and is run-wise
# we may have different sessions/runs in the same log file

system = get_system_provenance()
log_record = {
"session_id": session_id,
"name": session_name,
"startTime": start,
"system": system,
# OSA specific
"observation_date": class_instance.ObservationDate,
# OSA specific
"software_version": class_instance.SoftwareVersion,
"config_file": class_instance.ProcessingConfigFile,
"config_file_hash": get_file_hash(
class_instance.ProcessingConfigFile, buffer="path"
),
"config_file_hash_type": get_hash_method(),
}
if class_instance.__name__ in REDUCTION_TASKS:
log_record[
"observation_run"
] = class_instance.ObservationRun # a session is run-wise
else:
log_record["pedestal_run"] = class_instance.PedestalRun
log_record["calibration_run"] = class_instance.CalibrationRun
log_prov_info(log_record)
return session_id


Expand Down
88 changes: 86 additions & 2 deletions osa/provenance/config/definition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,74 @@
#

activities:
drs4_pedestal:
description:
"Create pedestal file"
parameters:
usage:
- role: "Subrun for pedestal"
description: "Raw observation file for pedestal"
entityName: RawObservationFile
value: RawObservationFilePedestal
# filepath: /fefs/aswg/data/real/R0/20210913/LST-1.1.Run06268.0000.fits.fz
generation:
- role: "Pedestal"
description: "Pedestal calibration file"
entityName: PedestalFile
value: PedestalFile
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/drs4_pedestal.Run06268.0000.fits
- role: "Check plot for pedestal"
description: "Pedestal check plot"
entityName: PedestalCheckPlot
value: PedestalCheckPlot
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/log/drs4_pedestal.Run02068.0000.pdf

calibrate_charge:
description:
"Create charge calibration file"
parameters:
usage:
- role: "Subrun for calibration"
description: "Raw observation file for calibration"
entityName: RawObservationFile
value: RawObservationFileCalibration
# filepath: /fefs/aswg/data/real/R0/20210913/LST-1.1.Run06274.0000.fits.fz
- role: "Pedestal file"
description: "Pedestal file used"
entityName: PedestalFile
value: PedestalFile
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/drs4_pedestal.Run06268.0000.fits
# - role: "Run summary"
# description: "Run summary configuration"
# entityName: RunSummaryFile
# value: RunSummaryFile
# filepath: /fefs/aswg/data/real/monitoring/RunSummary/RunSummary_20210913.ecsv
# - role: "Configuration file"
# description: "Configuration file for camera"
# entityName: AnalysisConfigFile
# value: CalibrationConfigurationFile
# filepath: /fefs/aswg/software/virtual_env/ctasoft/cta-lstchain/lstchain/data/onsite_camera_calibration_param.json
# - role: "Systematics correction file"
# description: "Systematics correction file"
# entityName: SystematicsCorrectionFile
# value: SystematicsCorrectionFile
# filepath: /path/to/ff_systematics_file.h5
# - role: "Time calibration file"
# description: "Time calibration file"
# entityName: TimeCalibrationFile
# value: TimeCalibrationFile
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/time_calibration.Run06274.0000.hdf5
generation:
- role: "Coefficients calibration file"
description: "Coefficients calibration file"
entityName: CoefficientsCalibrationFile
value: CoefficientsCalibrationFile
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/calibration.Run06274.0000.hdf5
- role: "Check plot for calibration"
description: "Calibration check plot"
entityName: CalibrationCheckPlot
value: CalibrationCheckPlot
# filepath: /fefs/aswg/data/real/calibration/20210913/v0.7.5/log/calibration.Run06274.0000.pedestal.Run06268.0000.pdf
r0_to_dl1:
description:
"Create DL1 files for an observation run and subrun"
Expand Down Expand Up @@ -280,6 +348,10 @@ entities:
PythonObject:
description: "Python variable in memory"
type: PythonObject
RawObservationFile:
description: "Raw observation compressed FITS file"
type: File
contentType: application/fits
R0SubrunDataset:
description: "R0 subrun file in FITS format on the disk"
type: File
Expand All @@ -293,17 +365,29 @@ entities:
type: File
contentType: text/plain
PedestalFile:
description: "Pedestal file in FITS format on the disk"
description: "Pedestal file in HDF5 format on the disk"
type: File
contentType: application/fits
contentType: application/x-hdf
PedestalCheckPlot:
description: "Pedestal check plot PDF file"
type: File
contentType: application/pdf
CoefficientsCalibrationFile:
description: "Coefficients calibration file in HDF5 format on the disk"
type: File
contentType: application/x-hdf
# SystematicsCorrectionFile:
# description: "Systematics correction file in HDF5 format on the disk"
# type: File
# contentType: application/x-hdf
TimeCalibrationFile:
description: "Time calibration file in HDF5 format on the disk"
type: File
contentType: application/x-hdf
CalibrationCheckPlot:
description: "Calibration check plot PDF file"
type: File
contentType: application/pdf
AnalysisConfigFile:
description: "LSTChain analysis configuration file in JSON format on the disk"
type: File
Expand Down
1 change: 0 additions & 1 deletion osa/provenance/config/logger.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ formatters:
handlers:
provHandler:
class: logging.handlers.WatchedFileHandler
level: INFO
formatter: simple
filename: prov.log
loggers:
Expand Down
2 changes: 1 addition & 1 deletion osa/provenance/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ def provlist2provdoc(provlist):
records[progen_id] = progen
ent.wasDerivedFrom(progen)
for k, v in provdict.items():
if k != "session_tag":
if k not in["session_tag", "hash", "hash_type"]:
ent.add_attributes({k: str(v)})
# agent
return pdoc
Expand Down
Loading