Re-implementation of some of the ATLAS Collider DY Datasets #1866

Radonirinaunimi · 2023-11-26T21:28:21Z

The following PR implements some of the ATLAS collider DY in the new commondata format. The status is summarized in the table below.

☑️ in Comparison vs. Old means that the results are fully identical while 🔴 means that comparisons are available but noticeable differences are perceived.

Dataset Name	Comparison vs. Old	General Comments	Status
ATLAS_DY_7TEV_EMUON_Y	☑️	The old implementation used a luminosity uncertainty of 3.5% while in HepData it is 3.4%	☑️
ATLAS_DY_7TEV_DILEPTON_Y-CRAP	☑️	The new implementation (from HepData) is missing one source of uncorrelated systematics	☑️
ATLAS_DY_7TEV_DILEPTON_Y-FRAP	☑️	The new implementation (from HepData) is missing one source of uncorrelated systematics	☑️
ATLAS_WPWM_8TEV_MUON_Y		FK tables are missing but old commondata exists	☑️
ATLAS_Z0_8TEV_LOWMASS_2D	☑️		☑️
ATLAS_Z0_8TEV_HIGHMASS_2D	🔴	Slight differences in treatment of asymmetric systematic correlated uncertainties	☑️
ATLAS_Z0_8TEV_3D_CRAP		Could not find dataset and FK tables to compare the implementation to	☑️
ATLAS_Z0_8TEV_3D_FRAP		Could not find dataset and FK tables to compare the implementation to	☑️
ATLAS_DY_13TEV_FID	☑️	Just needs to fix the plotting to be as a function of Gauge bosons	☑️
ATLAS_Z0_8TEV_20FB_PT-INVDIST			☑️
ATLAS_Z0_8TEV_20FB_PT-RAPDIST			☑️

Remains TODO:

Define the plotting entries to be exactly the same as before

Radonirinaunimi · 2023-11-26T21:35:47Z

@scarlehoff, is something maybe wrong with the loading when the treatment of the systematics is set to MULT?

Comparing the old:

In [25]: ds = API.dataset(dataset_input={"dataset": "ATLASWZRAP11CC", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [26]: ds.load_commondata().systematics_table
Out[26]:
               ADD      MULT         ADD   MULT         ADD   MULT         ADD   MULT        ADD   MULT  ...         ADD   MULT         ADD   MULT         ADD   MULT          ADD  MULT          ADD      MULT
entry                                                                                                    ...
1       760.914560  0.131840 -126.973000 -0.022    5.771500  0.001    5.771500  0.001   0.000000  0.000  ...   23.086000  0.004   23.086000  0.004  409.776500  0.071  10388.70000   1.8   715.666000  0.124000
2       845.576046  0.146580  -92.299200 -0.016   11.537400  0.002   11.537400  0.002   5.768700  0.001  ...   40.380900  0.007  -63.455700 -0.011  184.598400  0.032  10383.66000   1.8   663.400500  0.115000
3       719.275700  0.123640 -215.247500 -0.037   23.270000  0.004   17.452500  0.003  -5.817500 -0.001  ...   -5.817500 -0.001  319.962500  0.055  459.582500  0.079  10471.50000   1.8   605.020000  0.104000
4       673.101395  0.114850  -41.024900 -0.007   52.746300  0.009   23.442800  0.004   5.860700  0.001  ...   35.164200  0.006   46.885600  0.008  298.895700  0.051  10549.26000   1.8   750.169600  0.128000
5       847.481382  0.144540   23.453200  0.004   76.222900  0.013   29.316500  0.005   5.863300  0.001  ...  205.215500  0.035 -134.855900 -0.023 -222.805400 -0.038  10553.94000   1.8   738.775800  0.126000
6       766.929414  0.128020  113.823300  0.019  137.786100  0.023   53.916300  0.009   5.990700  0.001  ...  -41.934900 -0.007  281.562900  0.047 -299.535000 -0.050  10783.26000   1.8   623.032800  0.104000
7      1951.014450  0.326940 -143.220000 -0.024  310.310000  0.052  113.382500  0.019 -53.707500 -0.009  ...   83.545000  0.014   95.480000  0.016  -41.772500 -0.007  10741.50000   1.8   853.352500  0.143000
8       784.152243  0.129790  -36.250200 -0.006   84.583800  0.014   24.166800  0.004  -6.041700 -0.001  ...  132.917400  0.022   36.250200  0.006  -48.333600 -0.008  10875.06000   1.8   815.629500  0.135000
9      1071.656301  0.176570   -6.069300 -0.001   66.762300  0.011   18.207900  0.003   6.069300  0.001  ...  121.386000  0.020  182.079000  0.030   78.900900  0.013  10924.74000   1.8  1019.642400  0.168000
10      854.792700  0.144050  -23.736000 -0.004   35.604000  0.006    0.000000  0.000  -5.934000 -0.001  ...  124.614000  0.021  -29.670000 -0.005  183.954000  0.031  10681.20000   1.8   884.166000  0.149000
...

with the new implementation:

In [13]: ds_new = API.dataset(dataset_input={"dataset": "ATLAS_DY_7TEV_DILEPTON_Y", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [14]: ds_new.load_commondata().systematics_table
Out[14]:
               MULT          MULT          MULT          MULT          MULT          MULT          MULT      MULT  ...          MULT          MULT      MULT          MULT          MULT      MULT      MULT      MULT
entry                                                                                                              ...
1     -3.811834e-06  1.732652e-07  1.732652e-07  0.000000e+00 -1.732652e-07  8.663259e-07 -2.079182e-06  0.000025  ...  6.930607e-07 -3.465304e-07 -0.000003  6.930607e-07  6.930607e-07  0.000012  0.000023  0.000312
2     -2.773589e-06  3.466986e-07  3.466986e-07  1.733493e-07 -0.000000e+00  8.667464e-07 -2.080191e-06  0.000020  ...  1.386794e-06  0.000000e+00 -0.000005  1.213445e-06 -1.906842e-06  0.000006  0.000026  0.000312
3     -6.360120e-06  6.875806e-07  5.156854e-07 -1.718951e-07  3.437903e-07  2.234637e-06 -3.781693e-06  0.000023  ...  5.328749e-06  1.718951e-07  0.000005 -1.718951e-07  9.454233e-06  0.000014  0.000021  0.000309
4     -1.194397e-06  1.535653e-06  6.825123e-07  1.706281e-07  1.706281e-07  1.706281e-06 -2.559421e-06  0.000022  ...  4.095074e-06  3.412562e-07  0.000010  1.023768e-06  1.365025e-06  0.000009  0.000019  0.000307
5      6.822097e-07  2.217181e-06  8.527621e-07  1.705524e-07  1.705524e-07  2.046629e-06 -2.728839e-06  0.000026  ...  1.364419e-05  3.411048e-07 -0.000016  5.969335e-06 -3.922706e-06 -0.000006  0.000024  0.000307
6      3.171583e-06  3.839284e-06  1.502329e-06  1.669254e-07 -1.669254e-07  3.839284e-06 -3.505433e-06  0.000020  ...  4.506986e-06  8.346270e-07  0.000018 -1.168478e-06  7.845494e-06 -0.000008  0.000022  0.000300
7     -4.021785e-06  8.713867e-06  3.183913e-06 -1.508169e-06 -8.378718e-07  1.139506e-05 -9.719313e-06  0.000021  ...  5.194805e-06  1.005446e-06  0.000009  2.346041e-06  2.681190e-06 -0.000001  0.000055  0.000302
8     -9.930980e-07  2.317229e-06  6.620653e-07 -1.655163e-07 -3.310327e-07  2.979294e-06 -2.648261e-06  0.000021  ...  3.475843e-06  9.930980e-07  0.000005  3.641359e-06  9.930980e-07 -0.000001  0.000022  0.000298
9     -1.647636e-07  1.812400e-06  4.942909e-07  1.647636e-07  3.295273e-07  1.647636e-06 -1.482873e-06  0.000021  ...  6.425782e-06  1.153346e-06  0.000010  3.295273e-06  4.942909e-06  0.000002  0.000030  0.000297
10    -6.740816e-07  1.011122e-06  0.000000e+00 -1.685204e-07 -5.055612e-07  8.426020e-07 -8.426020e-07  0.000022  ...  1.196495e-05  1.516684e-06  0.000013  3.538928e-06 -8.426020e-07  0.000005  0.000024  0.000303

while you can see that dumped values are exactly the same (modulo the first column ADD and MULT in the old):

In [15]: ds_new.load_commondata().systematic_errors()
Out[15]:
       ATLASWZRAP11_1001  ATLASWZRAP11_1002  ATLASWZRAP11_1003  ATLASWZRAP11_1004  ATLASWZRAP11_1005  ...  ATLASWZRAP11_1128  ATLASWZRAP11_1129  ATLASWZRAP11_1130  UNCORR  ATLASLUMI11
entry                                                                                                 ...
1                 -0.022              0.001              0.001              0.000             -0.001  ...              0.004              0.004              0.071    0.13          1.8
2                 -0.016              0.002              0.002              0.001             -0.000  ...              0.007             -0.011              0.032    0.15          1.8
3                 -0.037              0.004              0.003             -0.001              0.002  ...             -0.001              0.055              0.079    0.12          1.8
4                 -0.007              0.009              0.004              0.001              0.001  ...              0.006              0.008              0.051    0.11          1.8
5                  0.004              0.013              0.005              0.001              0.001  ...              0.035             -0.023             -0.038    0.14          1.8
6                  0.019              0.023              0.009              0.001             -0.001  ...             -0.007              0.047             -0.050    0.13          1.8
7                 -0.024              0.052              0.019             -0.009             -0.005  ...              0.014              0.016             -0.007    0.33          1.8
8                 -0.006              0.014              0.004             -0.001             -0.002  ...              0.022              0.006             -0.008    0.13          1.8
9                 -0.001              0.011              0.003              0.001              0.002  ...              0.020              0.030              0.013    0.18          1.8
10                -0.004              0.006              0.000             -0.001             -0.003  ...              0.021             -0.005              0.031    0.14          1.8

scarlehoff · 2023-11-27T07:21:22Z

It does look wrong, specially since there's nothing that would justify a 10^-7, isn't there? The data is all > 1 so it cannot be a add vs mult problem, let me have a look.

Radonirinaunimi · 2023-11-27T09:13:43Z

It does look wrong, specially since there's nothing that would justify a 10^-7, isn't there? The data is all > 1 so it cannot be a add vs mult problem, let me have a look.

By just converting the percentage (MULT) into the absolute value (ADD), that is representing the systematics as additive instead, the entries are exactly the same (omitting the first column of ds).

n [3]: ds_new = API.dataset(dataset_input={"dataset": "ATLAS_DY_7TEV_DILEPTON_Y", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [4]: ds_new.load_commondata().systematics_table
Out[4]:
              ADD         ADD         ADD        ADD        ADD         ADD         ADD  ...        ADD         ADD         ADD        ADD         ADD         ADD         ADD
entry                                                                                    ...
1     -126.973000    5.771500    5.771500   0.000000  -5.771500   28.857500  -69.258000  ... -11.543000  -86.572500   23.086000   23.08600  409.776500   750.29500  10388.7000
2      -92.299200   11.537400   11.537400   5.768700  -0.000000   28.843500  -69.224400  ...   0.000000 -155.754900   40.380900  -63.45570  184.598400   865.30500  10383.6600
3     -215.247500   23.270000   17.452500  -5.817500  11.635000   75.627500 -127.985000  ...   5.817500  168.707500   -5.817500  319.96250  459.582500   698.10000  10471.5000
4      -41.024900   52.746300   23.442800   5.860700   5.860700   58.607000  -87.910500  ...  11.721400  357.502700   35.164200   46.88560  298.895700   644.67700  10549.2600
5       23.453200   76.222900   29.316500   5.863300   5.863300   70.359600  -93.812800  ...  11.726600 -551.150200  205.215500 -134.85590 -222.805400   820.86200  10553.9400
6      113.823300  137.786100   53.916300   5.990700  -5.990700  137.786100 -125.804700  ...  29.953500  641.004900  -41.934900  281.56290 -299.535000   778.79100  10783.2600
7     -143.220000  310.310000  113.382500 -53.707500 -29.837500  405.790000 -346.115000  ...  35.805000  316.277500   83.545000   95.48000  -41.772500  1969.27500  10741.5000
8      -36.250200   84.583800   24.166800  -6.041700 -12.083400  108.750600  -96.667200  ...  36.250200  181.251000  132.917400   36.25020  -48.333600   785.42100  10875.0600
9       -6.069300   66.762300   18.207900   6.069300  12.138600   60.693000  -54.623700  ...  42.485100  382.365900  121.386000  182.07900   78.900900  1092.47400  10924.7400
10     -23.736000   35.604000    0.000000  -5.934000 -17.802000   29.670000  -29.670000  ...  53.406000  445.050000  124.614000  -29.67000  183.954000   830.76000  10681.2000

So I think it is really a difference between how the systematics are represented (unless I am doing something stupid here).

scarlehoff · 2023-11-27T14:59:26Z

The systematic_errors method are all absolute.

I think the difference might be that you are implementing the multiplicative uncertainties as % or relative, while in the new commondata format they should be implemented always as absolute.

#1679 (comment)

I thought that we had added this to the documentations but it seems we didn't. Let me update it!

Radonirinaunimi · 2023-11-27T15:16:08Z

The systematic_errors method are all absolute.

This definitely explains why using absolute $\oplus$ ADD works.

I think the difference might be that you are implementing the multiplicative uncertainties as % or relative, while in the new commondata format they should be implemented always as absolute.

#1679 (comment)

I thought that we had added this to the documentations but it seems we didn't. Let me update it!

Is this actually correct (I don't think so!)? If the values are quoted as absolute then their treatment have to be ADD, and reciprocally if the values are quoted as percentage then their treatment have to be MULT. I don't think one can have absolute values but treated as MULT , or percentage but treated as ADD.

scarlehoff · 2023-11-27T15:23:13Z

Regardless on how they are given in hepdata (they could tell you it's a relative value but give you a table with the absolute values) you can convert them to absolute.

I honestly don't remember why we went for everything absolute, I guess it is more consistent this way.

Radonirinaunimi · 2023-11-27T15:29:22Z

Regardless on how they are given in hepdata (they could tell you it's a relative value but give you a table with the absolute values) you can convert them to absolute.

I honestly don't remember why we went for everything absolute, I guess it is more consistent this way.

Right. I just want to emphasize that if everything now is given as absolute, then only the treatment ADD is allowed (not MULT).

Re everything absolute, we might want to keep in mind the following sentence from the docs:

While it may seem at first that the multiplicative error is spurious given the presence of the additive error and data central value, this may not be the case. For example, in a closure test scenario, the data central values may have been replaced in the CommonData file by theoretical predictions. Therefore if you wish to use a covariance matrix generated with the original multiplicative uncertainties via the method, you must also store the original multiplicative (percentage) error. For flexibility and ease of I/O this is therefore done in the CommonData file itself.

scarlehoff · 2023-11-27T15:35:09Z

I just want to emphasize that if everything now is given as absolute, then only the treatment ADD is allowed (not MULT)

Why? The first thing the parser does is to make it relative to the central values. The way it is written in the actual file doesn't really matter that much.

(that said... it makes it unreliable in closure tests? we need @enocera here!)

Radonirinaunimi · 2023-11-28T09:37:55Z

Why? The first thing the parser does is to make it relative to the central values. The way it is written in the actual file doesn't really matter that much.

But such extra-operation is not needed at all if everything is defined as Absolute $\oplus$ ADD. At the end of the day (modulo the CT business), the treatments ADD and MULT (and representation of thereof) are exactly the same information.

scarlehoff · 2023-11-28T09:39:26Z

Not for the t0 covmat.

Radonirinaunimi · 2023-12-04T08:04:23Z

@scarlehoff, @enocera, this is also ready for review. Here is the report: https://vp.nnpdf.science/kuWT56KBSlai3_-XqLZxeA==/

For some datasets, I couldn't find the commondata and/or FK tables to compare to.

scarlehoff · 2023-12-04T08:46:23Z

The ones you didn't find the commondata for is because they have no corresponding old dataset, right?

enocera · 2023-12-04T10:17:30Z

I understand that these are the 3D ATLAS distributions, of which we implemented only the 2D version.

enocera · 2023-12-04T10:17:48Z

So let's forget about the 3D distributions, for the moment.

scarlehoff · 2023-12-04T11:37:11Z

Ok! Thanks. First comments, then I'll start going through all the old-new datasets one by one:

What about these ones? Did you forget about them or are they part of another set (or maybe they have a different name in your list?)

ATLASZHIGHMASS49FB
ATLASLOMASSDY11EXT
ATLASWZRAP11CF (I see you do have the CC version so this might actually be forgotten!)
ATLAS_WZ_TOT_13TEV (maybe this one is the one you call ATLASWZTOT13TEV81PB ??

And these four I think already asked you about, so I know you were not taking care of them but just for completeness:

ATLAS_WP_JET_8TEV_PT
ATLAS_WM_JET_8TEV_PT
ATLASZPT8TEVMDIST
ATLASZPT8TEVYDIST

Radonirinaunimi · 2023-12-04T15:08:39Z

So the status is then the following:

ATLASZHIGHMASS49FB, ATLASLOMASSDY11EXT: these datasets I haven't touched on purpose because as far as I understood @cschwan was/has been looking into them (?).
ATLASWZRAP11CF, ATLASZPT8TEVMDIST, ATLASZPT8TEVYDIST: I genuinely missed these datasets. I will implement them in this PR.
ATLAS_WZ_TOT_13TEV: this is indeed an updated version of ATLASWZTOT13TEV81PB (I implemented the outdated one) in that the correct one should include the experimental correlation coefficients. I will fix the currently implemented one. This is now Done.
As for the _JET_: If no one is looking into them yet, I can also implement them in this PR.

All in all, still a few to be done before this PR is complete 😅

buildmaster/ATLAS_DY_7TEV_DILEPTON/metadata.yaml

buildmaster/ATLAS_WPWM_8TEV_MUON/metadata.yaml

buildmaster/ATLAS_Z0_8TEV_HIGHMASS/metadata.yaml

buildmaster/ATLAS_Z0_8TEV_3D/metadata.yaml

buildmaster/ATLAS_DY_13TEV/metadata.yaml

scarlehoff · 2023-12-04T15:35:09Z

buildmaster/ATLAS_DY_13TEV/metadata.yaml

+  kinematic_coverage: [_zero, mu2, sqrt_s]
+  kinematics:
+    variables:
+      _zero: {description: "", label: "", units: ""}


Suggested change

_zero: {description: "", label: "", units: ""}

_Zero: {description: "", label: "", units: ""}

(maybe I should automatically fill a column with zero when one is missing... the constrain of having k1, k2, k3 is silly in both directions...)

This would be best indeed! In this case it we don't overcrowd the implementation with spurious variables.

Since you are at it, how to plot the results as a function of the Gauge bosons? Right now, it seems that plot_x can only accept one of the kinematic variables. Here is an example using the corrected ATLAS_WZ_TOT_13TEV: https://vp.nnpdf.science/B4_E9eBKRjadW8gBoZZZRw==/

I'm not doing this yet! (but I will)

For the plot_x, just do whatever was done in the previous plotting file. If things are not working it just means I had not encountered that situation before and I need to fix it.

(which might mean you have to use some specific kinematic override or transformation

In some of the previous files, plot_x are not defined, as is the case for the particular example above for instance https://github.com/NNPDF/nnpdf/blob/master/nnpdfcpp/data/commondata/PLOTTING_ATLAS_WZ_TOT_13TEV.yaml (not sure how exactly it works in such a case).

Yes, for now it will complain!

Ok, so for the benefit of producing report comparisons I leave its value to some reasonable kinematic while waiting for the parser to accommodate this. I added a TODO in the description to not forget about this.

ok, but don't worry for the comparisons for now, I'll try to fix by today! (I hope it's nothing supercomplicated)

(let me know when I should look at this again btw)

You can look into this now, this is the only ATLAS dataset that not only has one kinematic variable that is zero but also should not have plot_x (in the same way as before) as it should plotted as a function of the Gauge bosons.

In 1976617, I have both removed the _zero from the kinematics and the plot_x in the plotting.

PS: As for the remaining 4 datasets (inc the JETS), I will finish them by early next week.

scarlehoff

I've updated the parser so that automatically repeats a column if one is missing.

Here's the report for the one with the weird plot_x option: https://vp.nnpdf.science/fkjMDKmrSC6XpBXv7-mhHA==

I've repeated the tests and now I have:

old: ATLASWZRAP36PB vs new: ATLAS_DY_7TEV_EMUON_Y
# Differences in the computation of chi2 32.119931215298024 vs 32.21855155561812
    The covmats are different
    even the diagonal

old: ATLAS_DY_2D_8TEV_LOWMASS vs new: ATLAS_Z0_8TEV_LOWMASS_2D
 > Everything ok

 > old: ATLAS_WZ_TOT_13TEV vs new: ATLAS_DY_13TEV_FID
The t0 chi2 is different: 10934.218358793676 vs 81138.4387008005

> old: ATLASDY2D8TEV vs new: ATLAS_Z0_8TEV_HIGHMASS_2D
% difference in the data 
 Differences in the computation of chi2  80.2445870631963 vs 76.30021056788038
    The covmats are different
    even the diagonal

In the last one I've noticed the data itself is different at the level of few-per-mille which could be driving the difference (since a difference in the data will modify also the covmat through the multiplicative uncertainties).

For the one that has a very different t0 (but nothing else was different) I guess the MULT and ADD uncertainties are very wrong? Or I've done something else wrong...

The one that is only t0, might be a problem with MULT and ADD?

buildmaster/ATLAS_DY_13TEV/metadata.yaml

Radonirinaunimi · 2023-12-08T09:24:04Z

I've updated the parser so that automatically repeats a column if one is missing.

Here's the report for the one with the weird plot_x option: https://vp.nnpdf.science/fkjMDKmrSC6XpBXv7-mhHA==

I've repeated the tests and now I have:
old: ATLASWZRAP36PB vs new: ATLAS_DY_7TEV_EMUON_Y
# Differences in the computation of chi2 32.119931215298024 vs 32.21855155561812
    The covmats are different
    even the diagonal

old: ATLAS_DY_2D_8TEV_LOWMASS vs new: ATLAS_Z0_8TEV_LOWMASS_2D
 > Everything ok

 > old: ATLAS_WZ_TOT_13TEV vs new: ATLAS_DY_13TEV_FID
The t0 chi2 is different: 10934.218358793676 vs 81138.4387008005

> old: ATLASDY2D8TEV vs new: ATLAS_Z0_8TEV_HIGHMASS_2D
% difference in the data 
 Differences in the computation of chi2  80.2445870631963 vs 76.30021056788038
    The covmats are different
    even the diagonal
In the last one I've noticed the data itself is different at the level of few-per-mille which could be driving the difference (since a difference in the data will modify also the covmat through the multiplicative uncertainties).

For the one that has a very different t0 (but nothing else was different) I guess the MULT and ADD uncertainties are very wrong? Or I've done something else wrong...

The one that is only t0, might be a problem with MULT and ADD?

As usual, thanks a lot for the detailed checks! For the one with different t0, I am a bit surprised that this is the case. I thought that I had check that the treatment of the systematics were the same as before. I will check again.

As for the rests, the differences are exactly understood. Before implementing the legacy versions, maybe I am just missing something from the new hepdata (?), which we'd need @enocera.

PS: I will also check the boson plotting now.

scarlehoff · 2023-12-08T12:32:27Z

Thanks @Radonirinaunimi, your last commit fixes the t0 issue.

Radonirinaunimi · 2024-01-09T12:42:04Z

This is also now ready for review.

For all of the datasets (except one), they have been implemented in the same way as in the old commondata (for legacy purposes), and comments are left in the table above to describe what I've found to be different wrt the hepdata. Nevertheless, sometimes, the numerical values of the correlated systematics (and even the central values) are not exactly equal because it might happen that the numerical values quoted in the hepdata tables are slightly different from the rawdata used in the old commondata.

PS: there are only the ATLAS_Z0_8TEV_20FB_PT-* datasets which raise some weird errors regarding indexing when computing data vs theory comparisons although the data can be loaded properly and the entries of the tables are exactly the same.

scarlehoff · 2024-01-09T12:44:34Z

When the results are different you can implement the hepdata one and then a legacy variant with the different version (that it is compatible with the old one). This is preferred.

Btw, did you check that when loading the entire set of datasets the associated covariance matrix is the same as the old (same for the datasets in the other PRs)?

Radonirinaunimi · 2024-01-09T14:11:18Z

When the results are different you can implement the hepdata one and then a legacy variant with the different version (that it is compatible with the old one). This is preferred.

The issue that I am struggling at the moment is that I am not sure if it makes sense to have legacy versions for some particular datasets are not. And this is really one of the things we should discuss (cc @enocera). Let me provide two explicit examples:

Take CMS_WP_7TEV_MUON_ASY for example, when one downloads the full thing from hepdata there are two different type of files: the usual hepdata table (as shown on the HepData interface) and the rawdata (usually in txt or dat format and does not follow any convention/structure). In most of the old implementation, the rawdata were used. However, the numerical values in both are not always the same and thus the covariance matrix slightly differ. If we resort to always use the rawdata, then some of the entries in the metadata (such as tables) will be deprecated.
Then, there are the cases in which maybe some conscientious decisions were made (?) such as the example of ATLAS_DY_7TEV_EMUON_Y. In the paper, it is mentioned that luminosity uncertainties are about $3.5$% (and this was the value used in the old implementation) but in the hepdata entries the values are $3.4$%.

Btw, did you check that when loading the entire set of datasets the associated covariance matrix is the same as the old (same for the datasets in the other PRs)?

Yes, for the datasets listed here and have a checkmark in the column comparison vs old. For some of the CMS datasets in #1869, it is a bit tricky because of the numerical differences mentioned in the first point as I tried to use as much as possible the hepdata files instead.

scarlehoff · 2024-02-20T10:47:27Z

I'm going to rebase these datasets on top of the ones currently in master.

@Radonirinaunimi I'll leave this as PR and not merge immediately in case you want to rollback the changes that you did for legacy purposes. Now we have the legacy version for reproduction as the copy from the old one but I think it is better in general to have the proper hepdata one as well

scarlehoff

I think I need some feedback here, some of the keys in the plotting dictionary refer to the wrong datasets. I guess you copied part of it since it is shared, but could you have a second look to make sure the rest is ok?

(if it is only the labels I can change those)

scarlehoff · 2024-02-20T11:50:04Z

buildmaster/ATLAS_Z0_8TEV_HIGHMASS/metadata.yaml

+  npoints: [48]
+  plotting:
+    kinematics_override: ewk_rap_sqrt_scale
+    dataset_label: "ATLAS DY 2D 8 TeV low mass"


This is the high mass dataset, is the rest of the plotting data equal between the two?

scarlehoff · 2024-02-20T11:52:59Z

buildmaster/ATLAS_DY_7TEV_EMUON/metadata.yaml

+  npoints: [8, 11, 11]
+  plotting:
+    kinematics_override: ewk_rap_sqrt_scale
+    dataset_label: "LHCb $W,Z \\to \\mu$ 8 TeV"


This one also contains data from another ds.

scarlehoff · 2024-02-20T11:53:21Z

buildmaster/ATLAS_DY_7TEV_DILEPTON/metadata.yaml

+  npoints: [9, 6]
+  plotting:
+    kinematics_override: ewk_rap_sqrt_scale
+    dataset_label: "LHCb $W,Z \\to \\mu$ 7 TeV"


and this one

Radonirinaunimi · 2024-02-20T13:00:45Z

I'm going to rebase these datasets on top of the ones currently in master.

@Radonirinaunimi I'll leave this as PR and not merge immediately in case you want to rollback the changes that you did for legacy purposes. Now we have the legacy version for reproduction as the copy from the old one but I think it is better in general to have the proper hepdata one as well

That sounds good! I will revert back to before it produced the legacy versions. I guess in doing so, I will need to call the uncertainty files to something else?

Thanks for the comments plotting metadata, I will have a second look at them and make sure they are fully correct.

scarlehoff

Since there are quite some changes to be made to the metadata of these files, I will move it to the right folder, with the right names, etc, and I'll leave you to review the labels / plotting options / etc @Radonirinaunimi

data/kinematics/uncertainties should be ok, I've used your data when it was compatible up to 10^-3 with the legacy data and the legacy data when it was between 10^-2 and 10^-3 (every difference was sub-% anyway)

scarlehoff · 2024-02-20T12:59:41Z

buildmaster/ATLAS_Z0_8TEV_20FB/metadata.yaml

+  ndata: 64
+  npoints: [8, 8, 8, 20, 20]
+  plotting:
+    kinematics_override: ewk_ptrap_sqrt_scale


This has also change with respect to what's in the old commondata files (which uses jet_sqrt_scale)

Radonirinaunimi added the data toolchain label Nov 26, 2023

Radonirinaunimi marked this pull request as draft November 26, 2023 21:28

Radonirinaunimi linked an issue Nov 26, 2023 that may be closed by this pull request

Complete porting old Collider DY into the new format #1846

Open

scarlehoff mentioned this pull request Nov 29, 2023

Status of the new commondata format implementation #1709

Closed

77 tasks

Radonirinaunimi marked this pull request as ready for review December 4, 2023 08:02

scarlehoff reviewed Dec 4, 2023

View reviewed changes

scarlehoff reviewed Dec 7, 2023

View reviewed changes

buildmaster/ATLAS_DY_13TEV/metadata.yaml Outdated Show resolved Hide resolved

buildmaster/ATLAS_DY_13TEV/metadata.yaml Outdated Show resolved Hide resolved

Radonirinaunimi mentioned this pull request Dec 18, 2023

Polarized proton-PDF fits #1893

Closed

scarlehoff force-pushed the new_commondata_collected branch from e64d092 to 929b692 Compare February 1, 2024 14:26

Base automatically changed from new_commondata_collected to master February 16, 2024 09:57

Radonirinaunimi added 21 commits February 20, 2024 11:32

Remove extra files

bcda798

Set everything to absolute

b6e7484

fixed slight inconsistencies

73cb029

Add Z0 low mass

9fa30eb

Add Z0 high mass

023b97e

Add triple differential Z 3D

d9a74f8

fix typos in metadata

9cc6dec

Add DY fiducial cross section at 13 TeV

87787b1

Add ATLAS_WPWM_8TEV_MUON

48bafd9

Use updated version of the Fiducial ATLAS DY a 13 TeV

d3679dd

remove theory key when FK tables do not exist

81fa4f4

Add CF version of ATLAS_DY_7TEV_DILEPTON

5ba3c72

Remove non-necessary entries

5918d68

fix treatment of the systematics in ATLAS_DY_13TEV_FID

84508bd

replace sqrt_s -> sqrts

9ee7526

fall back luminosity uncertainty to legacy 3.5%

6bb6f29

add uncertainties present in the legacy

3d4f5bb

start adding Z0 8TEV dpT

23f0781

fix ATLAS_Z0_8TEV_20FB_PT-INVDIST

5133502

Add ATLAS_Z0_8TEV_20FB_PT-RAPDIST

39b8559

fixed metadata plotting in ATLAS_DY_13TEV

baaeadb

scarlehoff force-pushed the cdy_atlas_ncd branch from 276c240 to baaeadb Compare February 20, 2024 10:47

scarlehoff reviewed Feb 20, 2024

View reviewed changes

move datasets around and merge them with legacy versions

afd579f

scarlehoff added closure tests and removed closure tests labels Jul 17, 2024

	_zero: {description: "", label: "", units: ""}
	_Zero: {description: "", label: "", units: ""}

Re-implementation of some of the ATLAS Collider DY Datasets #1866

Are you sure you want to change the base?

Re-implementation of some of the ATLAS Collider DY Datasets #1866

Conversation

Radonirinaunimi commented Nov 26, 2023 • edited Loading

Radonirinaunimi commented Nov 26, 2023

scarlehoff commented Nov 27, 2023

Radonirinaunimi commented Nov 27, 2023

scarlehoff commented Nov 27, 2023

Radonirinaunimi commented Nov 27, 2023 • edited Loading

scarlehoff commented Nov 27, 2023

Radonirinaunimi commented Nov 27, 2023

scarlehoff commented Nov 27, 2023 • edited Loading

Radonirinaunimi commented Nov 28, 2023

scarlehoff commented Nov 28, 2023

Radonirinaunimi commented Dec 4, 2023

scarlehoff commented Dec 4, 2023

enocera commented Dec 4, 2023

enocera commented Dec 4, 2023

scarlehoff commented Dec 4, 2023

Radonirinaunimi commented Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff Dec 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarlehoff left a comment

Choose a reason for hiding this comment

Radonirinaunimi commented Dec 8, 2023

scarlehoff commented Dec 8, 2023

Radonirinaunimi commented Jan 9, 2024

scarlehoff commented Jan 9, 2024

Radonirinaunimi commented Jan 9, 2024 • edited Loading

scarlehoff commented Feb 20, 2024

scarlehoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Radonirinaunimi commented Feb 20, 2024

scarlehoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Radonirinaunimi commented Nov 26, 2023 •

edited

Loading

Radonirinaunimi commented Nov 27, 2023 •

edited

Loading

scarlehoff commented Nov 27, 2023 •

edited

Loading

Radonirinaunimi commented Dec 4, 2023 •

edited

Loading

scarlehoff Dec 4, 2023 •

edited

Loading

scarlehoff Dec 6, 2023 •

edited

Loading

Radonirinaunimi commented Jan 9, 2024 •

edited

Loading