-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StatisticsSenderService working in "recent" releases #41975
Comments
A new Issue was created by @davidlange6 David Lange. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
and second, could we add release name or some other unique identifier that can trace back the release being used to open the file to make understanding future problems easier? |
#35505 (since 12_1_X) added a |
The CMSSW version was actually added in #37220 (12_4_X, backported to earlier release cycles down to 8_0_X) |
#37220 seems to do what I ask above... |
I did some checking with an example release - 12_3_0 - the json seems sane to me and the socket interactions appear to go ok according to some printouts.. Seems functional on the cmssw side. I opened some files with various releases so can check the monitoring tomorrow to see if they should up or not.
vs something in 10_6_0
|
I also checked that the number of bytes successfully sent on the socket makes sense (eg, matches the size of the json string). |
looking at monitoring this morning, I see the file reads from CMSSW_10_6_0, CMSSW_11_3_0 and CMSSW_12_0_0, but not from those with CMSSW_12_3_0. I somehow missed doing CMSSW_12_1_0. Will add that to today's tests.. Second, if the cmssw_version was back ported to all releases, either no one is using those releases or all such releases are failing. I see no records with the cmssw release version in the json (in 2023). |
The earlier-than-12_4_X releases where the
(in many release cycles where the backport was done, no release has been built since then). |
For example, I tried using 10_6_33 yesterday, it doesn't look like it worked. [nothing I did yesterday showed up in the monitoring, so I will redo with some that I expect to work..) |
results from my tests of yesterday:
|
From CMSSW side the list of "ok" and "not ok" would easiest to explain with #35362 + #35505 somehow breaking things. @davidlange6 Would you be able to test 10_6_29 and 10_6_29_patch1? |
my tests suggest that 10_6_29 is ok and 10_6_29_patch1 is not working. |
it looks like the problem is that "type" is not a legal key in the json -- eg, it is used for something else in the monitoring data stream and confuses things. We should rename it "read_type" or something like that |
Can we have a list of all other key names that are not allowed in the monitoring system? |
Just to document here, David pointed to https://monit-docs.web.cern.ch/metrics/amq/ as a hint. |
This document was confirmed in https://mattermost.web.cern.ch/cms-o-and-c/pl/6fqtne7uytfpby3rui9as38w7e to be the correct documentation, and the list of key names has been stable over the years (there are also many other users, so on AMQ side they would hopefully avoid potentially breaking changes). The list of reserved keywords is then
|
dmwm/udp-collector#3 makes a "hot fix" in the UDP message receiving end to change
|
Fix is in #42060 |
The fix and backports have been merged |
+core |
This issue is fully signed and ready to be closed. |
Thanks to a question in Mattermost, I noticed that the cmssw file-open monitoring for recent upgrade samples did not see reads that were found by xrootd or classad based monitoring.
Unfortunately the json kept by the monitoring does not include CMSSW release used. Instead, I looked at what file usages are being captured by the cmssw monitoring, and there are no run3 files. (which is an approximation of release used to open) - there are some files whose lfn contains CMSSW_12_0_1, but nothing newer in 2023. (well, unless there is a CMSSW_12_0_111 but I assume thats a bug in the lfn somehow..).
@makortel summarized recent changes as being
#35362 (in CMSSW_12_1_0)
#35505 (in CMSSW_12_1_0)
#36570 (in CMSSW_12_3_0)
Is there an easy way to see the json being produced by cmssw? I can also try to see how to trigger a file open in various releases to see what shows up in the monitoring (not sure how fool proof that is - maybe thats an interesting test too)
@vkuznet @Dr15Jones @makortel
The text was updated successfully, but these errors were encountered: