-
Hi there, I have got the DQM running, though using @vmarandon 's script to get the runs): Some questions (for @hashkar or ...):
Good Luck, |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments 25 replies
-
Hi @mdpunch !
I can at least try to answer some your points, below.
Very weird indeed, it does not happen on my side (tested on a laptop, a few desktops, all running Ubuntu, and a couple of servers - including EGI nodes - on EL7-like systems). Do you have a log of what happened by any chance, e.g. directing the stderr and stdout to file(s) when running the DQM ? Also, what is your desktop environment (Gnome, KDE, XFCE, ...) ? And your terminal emulator (Gnome terminal, terminator, xterm, ...) ?
It may have been related to the previous mechanism to automatically download run files from the grid, which uses web cookies to authenticate to the NectarCAM Elog under the hood. With
The DQM is actually meant to optionally use the NectarCAM daily sqlite files as input, in order to process monitoring data. If not downloaded first, the DQM tries to read it anyway, which produces an empty file, safe to be deleted.
Yes, this is a minor bug -> @hashkar ?
Sure thing ! |
Beta Was this translation helpful? Give feedback.
-
I'm running Kubuntu, so it's closing the "Konsole". I've tried it now in "Terminator" and despite the name, that one doesn't close! But still, the job finishes with
There is nothing particularly strange in the
My machine (two weeks old, with 32GB ram, after the last one was lifted at CdG), has the CPUs all starting to be occupied, then the fan goes strongly, then the process ends (producing some plots, as mentioned). So, something is going haywire, but not sure what.
I will try updating my dev version to
Right, it's an empty file. Will ignore.
There is also a fits file created, with: Are there some scripts to view interesting things in this, or I should make them? |
Beta Was this translation helpful? Give feedback.
-
Yes, there is a Bokeh app that I developed for that: https://github.com/cta-observatory/nectarchain/tree/master/src/nectarchain/dqm/bokeh_app, or did you mean something else ?
Yes, of course! python /scr/punch/CTA/nectarchain/src/nectarchain/dqm/start_dqm.py --help
usage: start_dqm.py [-h] [-p] [--write-db] [-n] [-r RUNNB] [-i INPUT_FILES [INPUT_FILES ...]] input_paths output_paths
NectarCAM Data Quality Monitoring tool
positional arguments:
input_paths Input paths
output_paths Output paths
options:
-h, --help show this help message and exit
-p, --plot Enables plots to be generated
--write-db Write DQM output in DQM ZODB data base
-n, --noped Enables pedestal subtraction in charge integration
-r RUNNB, --runnb RUNNB
Optional run number, automatically found on DIRAC
-i INPUT_FILES [INPUT_FILES ...], --input-files INPUT_FILES [INPUT_FILES ...]
Local input files so you can chain run files like that: python /scr/punch/CTA/nectarchain/src/nectarchain/dqm/start_dqm.py $NECTARCAMDATA $NECTARDIR -i Beamer_runs/2023/20230512/NectarCAM.Run4333.0000.fits.fz Beamer_runs/2023/20230512/NectarCAM.Run4333.0001.fits.fz Beamer_runs/2023/20230512/NectarCAM.Run4333.0002.fits.fz Beamer_runs/2023/20230512/NectarCAM.Run4333.0003.fits.fz Beamer_runs/2023/20230512/NectarCAM.Run4333.0004.fits.fz Beamer_runs/2023/20230512/NectarCAM.Run4333.0005.fits.fz -p -n >> log_file 2>> err_file or just (easier): python /scr/punch/CTA/nectarchain/src/nectarchain/dqm/start_dqm.py $NECTARCAMDATA $NECTARDIR -r 4333 -p -n >> log_file 2>> err_file |
Beta Was this translation helpful? Give feedback.
-
I'll check that out, thanks!
...
I guess I meant if we could have a But, running this with all the parts of the run lets the process go to completion without getting killed! So, DQM has a problem if only given part of a file?
That is indeed easier, and allows me to check out the downloading. Unfortunately, in the 2024-01-26 13:39:34,869 nectarchain.data.management WARNING run 4333 is not present in ./runs/
NoneType: None
Traceback (most recent call last):
File "/scr/punch/CTA/nectarchain/src/nectarchain/dqm/start_dqm.py", line 56, in <module>
_, filelist = dm.findrun(args.runnb)
^^^^^^^^^^^^^^^^^^^^^^
File "/scr/punch/CTA/nectarchain/src/nectarchain/data/management.py", line 43, in findrun
lfns = DataManagement.get_GRID_location(run_number)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scr/punch/CTA/nectarchain/src/nectarchain/data/management.py", line 121, in get_GRID_location
return __class__.__get_GRID_location_DIRAC(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scr/punch/CTA/nectarchain/src/nectarchain/data/management.py", line 145, in __get_GRID_location_DIRAC
fccli.do_find("-q " + basepath)
File "/home/punch/miniconda3/envs/nectar-dev/lib/python3.11/site-packages/DIRAC/DataManagementSystem/Client/FileCatalogClientCLI.py", line 1965, in do_find
result = self.fc.findFilesByMetadata(metaDict, path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/punch/miniconda3/envs/nectar-dev/lib/python3.11/site-packages/DIRAC/Resources/Catalog/FileCatalog.py", line 158, in __getattr__
raise AttributeError
AttributeError I am in nectarchain version '0.1.7', and if I go into ipython, then after from nectarchain.data.management import DataManagement
DataManagement.findrun(4333) ... I get the same errors as above In [2]: DataManagement.findrun(4333)
2024-01-26 14:13:32,099 nectarchain.data.management WARNING run 4333 is not present in ./runs/
NoneType: None
2024-01-26 13:13:32 UTC Framework/FileCatalog ERROR: FileCatalog._getEligibleCatalogs: Failed to get file catalog configuration. Path /Resources/FileCatalogs does not exist or it's not a section
2024-01-26 13:13:32 UTC Framework/FileCatalog ERROR: Failed to create catalog objects
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 DataManagement.findrun(4333)
File /scr/punch/CTA/nectarchain/src/nectarchain/data/management.py:43, in DataManagement.findrun(run_number, search_on_GRID)
41 log.warning(e, exc_info=True)
42 log.info("will search files on GRID and fetch them")
---> 43 lfns = DataManagement.get_GRID_location(run_number)
44 DataManagement.getRunFromDIRAC(lfns)
45 list = glob.glob(
46 basepath + "**/*" + str(run_number) + "*.fits.fz", recursive=True
47 )
File /scr/punch/CTA/nectarchain/src/nectarchain/data/management.py:121, in DataManagement.get_GRID_location(run_number, output_lfns, basepath, fromElog, username, password)
114 return __class__.__get_GRID_location_ELog(
115 run_number=run_number,
116 output_lfns=output_lfns,
117 username=username,
118 password=password,
119 )
120 else:
--> 121 return __class__.__get_GRID_location_DIRAC(
122 run_number=run_number, basepath=basepath
123 )
File /scr/punch/CTA/nectarchain/src/nectarchain/data/management.py:145, in DataManagement.__get_GRID_location_DIRAC(run_number, basepath)
143 fccli = FileCatalogClientCLI(catalog.catalog)
144 sys.stdout = StdoutRecord(keyword=f"Run{run_number}")
--> 145 fccli.do_find("-q " + basepath)
146 lfns = sys.stdout.output
147 sys.stdout = sys.__stdout__
File ~/miniconda3/envs/nectar-dev/lib/python3.11/site-packages/DIRAC/DataManagementSystem/Client/FileCatalogClientCLI.py:1965, in FileCatalogClientCLI.do_find(self, args)
1962 if verbose:
1963 print("Query:", metaDict)
-> 1965 result = self.fc.findFilesByMetadata(metaDict, path)
1966 if not result["OK"]:
1967 print(f"Error: {result['Message']}")
File ~/miniconda3/envs/nectar-dev/lib/python3.11/site-packages/DIRAC/Resources/Catalog/FileCatalog.py:158, in FileCatalog.__getattr__(self, name)
156 return self.r_execute
157 else:
--> 158 raise AttributeError
AttributeError: ... but if I run it a second time, it works! In [3]: DataManagement.findrun(4333)
2024-01-26 13:13:58,557 nectarchain.data.management WARNING run 4333 is not present in ./runs/
NoneType: None
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0000.fits.fz
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0001.fits.fz
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0002.fits.fz
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0003.fits.fz
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0004.fits.fz
/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0005.fits.fz
2024-01-26 13:14:01 UTC Framework/GFAL2_HTTPSStorage/GFAL2_StorageBase._getSingleFile INFO: Trying to download https://eos.grif.fr:11000/eos/grif/cta/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0000.fits.fz to /scr/punch/CTA/NectarCAM_muons/runs/NectarCAM.Run4333.0000.fits.fz
.0000.fits.fz
{'Failed': {},
'Successful': {'/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0000.fits.fz': '/scr/punch/CTA/NectarCAM_muons/runs/NectarCAM.Run4333.0000.fits.fz'}}
2024-01-26 13:23:46 UTC Framework/GFAL2_HTTPSStorage/GFAL2_StorageBase._getSingleFile INFO: Trying to download https://eos.grif.fr:11000/eos/grif/cta/vo.cta.in2p3.fr/nectarcam/2023/20230512/NectarCAM.Run4333.0001.fits.fz to /scr/punch/CTA/NectarCAM_muons/runs/NectarCAM.Run4333.0001.fits.fz
... |
Beta Was this translation helpful? Give feedback.
-
I tried waiting 10 minutes, but still had to give the command twice in ipython! Strange... |
Beta Was this translation helpful? Give feedback.
-
Concerning the on-the-fly identification of trigger patches, I had pretty much optimized this I thought. Then @maxnoe came along for the review, and made it 1500 times faster! (see cta-observatory/ctapipe_io_nectarcam#40 (comment)). So, I don't think it's this, but maybe we could comment out that part, and compare the timing? I'm not sure how to run a profiler on this, but that would be the ideal... (I see you've already tried that... my message was too slow... but good to know my contribution - as improved by Max - isn't the cause) |
Beta Was this translation helpful? Give feedback.
-
Otherwise, I think I found the origin of the slow-down. I do not yet understand what the exact culprit is, but at least the following works:
|
Beta Was this translation helpful? Give feedback.
-
Here I'm also looking directly in my own script. But you're correct that it's another question, so I'll open another thread! |
Beta Was this translation helpful? Give feedback.
-
Maybe, it's working now. |
Beta Was this translation helpful? Give feedback.
-
Closed since @jlenain found how to speed it up again, but there is still the bug that it closes the terminal emulator |
Beta Was this translation helpful? Give feedback.
Otherwise, I think I found the origin of the slow-down. I do not yet understand what the exact culprit is, but at least the following works:
ctapipe_io_nectarcam
concerning gain selection, to be back to the HEADmax_events
inEventSource
dqm_patch.txt