Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for an experiment to include native histograms in OpenMetrics #247

Closed
beorn7 opened this issue Jun 29, 2022 · 7 comments
Closed

Comments

@beorn7
Copy link
Member

beorn7 commented Jun 29, 2022

Prometheus's new Native Histograms AKA Sparse Histograms somehow need to be represented in OM 2.x.

I have described the various trade-offs here. As an outsider, it's hard for me to sketch out a concrete design how to deal with all those, but I would like to propose an experiment: Let's create a makeshift way of including a Native Histogram representation in OpenMetrics that is very easy to generate and parse but ignores, for now, efficiency concerns, OM design philosophies etc. It will, however, allow instrumentation libraries to expose Native Histograms in an experimental way and Prometheus to ingest those. We can then study exposition and ingestion in practice, iterate on it, and get a better idea about the trade-offs for the actual specification of Native Histograms in OM 2.x.

This experiment should be hidden behind a feature flag or even in a separate branch, depending on the release philosophy of the affected repository.

Here's the idea:

Example for the exposition of a (fairly complex) pure Native Histogram including a timestamp:

# TYPE foo histogram
foo {"schema":0,"zero_threshold":0.001,"zero_count":4,"count":24,"sum":100,"positive_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"negative_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"positive_buckets":[2,1,-2,3],"negative_buckets":[2,1,-2,3]} 1520430042.123
foo_created 1520430000.123

Note that there is no name collision with any of the conventional Histogram fields. Therefore, a conventional and a Native Histogram representation can be exposed side by side:

# TYPE foo histogram
foo {"schema":0,"zero_threshold":0.001,"zero_count":2,"count":12,"sum":123.4,"positive_spans":[{"offset":0,"length":2},{"offset":1,"length":2}],"positive_buckets":[2,1,-2,3]}
foo_bucket{le="0.0001"} 2
foo_bucket{le="1.0"} 2
foo_bucket{le="2.0"} 5
foo_bucket{le="4.0"} 6
foo_bucket{le="8.0"} 10
foo_bucket{le="+Inf"} 12
foo_count 12
foo_sum 123.4
foo_created 1520430000.123

Following thoughts:

  • The JSON snippet makes it easy to quickly create generators and parsers.
  • To test ideal text parsing performance, a hand-coded highly-optimized parser can still be written.
  • The layout corresponds to what the Prometheus server has to create when ingesting a Native Histogram into TSDB. Therefore, this experiment will illustrate the "best case" (from the server's perspective) of doing minimal work for decoding.
  • The format is also fairly compact, avoiding spelling out bucket boundaries explicitly (which are, in the general case, very long floating point numbers, e.g. 0.0008955117609420616, all buckets on one row without repeating labels for each bucket.
  • With a bit of squinting (mostly removing the double quotes), the format is close to how a "machine-friendly" OM format could actually look like. It is, of course, not very "human-friendly", but a human-readable format would be both much more verbose and much more expensive to decode. See the trade-offs mentioned earlier.
  • While this implicitly also covers the case of a Gauge Histogram, it does not work for a Float Histogram (which looks significantly different Prometheus-internally). I think that's fine for this experiment. Exposing Float Histograms is a very rare use case (currently not even covered by OM for conventional Histograms).

WRT a human-readable representation, you might want to have a look at the String method of the histogram.Histogram type. For the 2nd example above, the string representation is {count:12, sum:123.4, [-0.001,0.001]:2, (0.5,1]:2, (1,2]:3, (2,4]:1, (4,8]:4}. This looks very benign, but it's also a very simplistic example with only a few buckets and very simple bucket boundaries. For reference, I paste a more typical histogram below. In addition to the verbosity, Prometheus has to "guess" a schema from the representation, sort all the buckets into it, generate the span descriptions, and calculate the deltas between buckets. In total, that is quite a decoding effort.

Here the string representation of a "normal" Native Histogram:

{ count:252719 sum:2.588417777086236 [-0.0011613350732448448,-0.0010649489576809157):1 [-0.0008211879055212055,-0.0007530326295937211):9 [-0.0007530326295937211,-0.0006905339660024878):50 [-0.0006905339660024878,-0.0006332224387944383):121 [-0.0006332224387944383,-0.0005806675366224224):213 [-0.0005806675366224224,-0.0005324744788404579):392 [-0.0005324744788404579,-0.00048828125):727 [-0.00048828125,-0.0004477558804710308):1171 [-0.0004477558804710308,-0.00041059395276060273):1672 [-0.00041059395276060273,-0.00037651631479686053):2193 [-0.00037651631479686053,-0.0003452669830012439):2903 [-0.0003452669830012439,-0.00031661121939721915):3469 [-0.00031661121939721915,-0.0002903337683112112):3785 [-0.0002903337683112112,-0.00026623723942022893):4245 [-0.00026623723942022893,-0.000244140625):4615 [-0.000244140625,-0.0002238779402355154):4920 [-0.0002238779402355154,-0.00020529697638030136):4996 [-0.00020529697638030136,-0.00018825815739843027):5091 [-0.00018825815739843027,-0.00017263349150062194):5122 [-0.00017263349150062194,-0.00015830560969860958):4900 [-0.00015830560969860958,-0.0001451668841556056):4876 [-0.0001451668841556056,-0.00013311861971011446):4707 [-0.00013311861971011446,-0.0001220703125):4435 [-0.0001220703125,-0.0001119389701177577):4087 [-0.0001119389701177577,-0.00010264848819015068):3994 [-0.00010264848819015068,-0.00009412907869921513):3730 [-0.00009412907869921513,-0.00008631674575031097):3425 [-0.00008631674575031097,-0.00007915280484930479):3157 [-0.00007915280484930479,-0.0000725834420778028):2944 [-0.0000725834420778028,-0.00006655930985505723):2782 [-0.00006655930985505723,-0.00006103515625):2536 [-0.00006103515625,-0.00005596948505887885):2440 [-0.00005596948505887885,-0.00005132424409507534):2224 [-0.00005132424409507534,-0.00004706453934960757):2042 [-0.00004706453934960757,-0.000043158372875155485):1948 [-0.000043158372875155485,-0.000039576402424652394):1759 [-0.000039576402424652394,-0.0000362917210389014):1628 [-0.0000362917210389014,-0.000033279654927528616):1439 [-0.000033279654927528616,-0.000030517578125):1282 [-0.000030517578125,-0.000027984742529439426):1184 [-0.000027984742529439426,-0.00002566212204753767):1185 [-0.00002566212204753767,-0.000023532269674803783):1064 [-0.000023532269674803783,-0.000021579186437577742):966 [-0.000021579186437577742,-0.000019788201212326197):889 [-0.000019788201212326197,-0.0000181458605194507):788 [-0.0000181458605194507,-0.000016639827463764308):732 [-0.000016639827463764308,-0.0000152587890625):638 [-0.0000152587890625,-0.000013992371264719713):602 [-0.000013992371264719713,-0.000012831061023768835):587 [-0.000012831061023768835,-0.000011766134837401892):534 [-0.000011766134837401892,-0.000010789593218788871):494 [-0.000010789593218788871,-0.000009894100606163098):420 [-0.000009894100606163098,-0.00000907293025972535):398 [-0.00000907293025972535,-0.000008319913731882154):359 [-0.000008319913731882154,-0.00000762939453125):332 [-0.00000762939453125,-0.0000069961856323598564):338 [-0.0000069961856323598564,-0.000006415530511884418):280 [-0.000006415530511884418,-0.000005883067418700946):255 [-0.000005883067418700946,-0.000005394796609394436):232 [-0.000005394796609394436,-0.000004947050303081549):228 [-0.000004947050303081549,-0.000004536465129862675):208 [-0.000004536465129862675,-0.000004159956865941077):179 [-0.000004159956865941077,-0.000003814697265625):166 [-0.000003814697265625,-0.0000034980928161799282):177 [-0.0000034980928161799282,-0.000003207765255942209):138 [-0.000003207765255942209,-0.000002941533709350473):149 [-0.000002941533709350473,-0.000002697398304697218):99 [-0.000002697398304697218,-0.0000024735251515407746):101 [-0.0000024735251515407746,-0.0000022682325649313374):96 [-0.0000022682325649313374,-0.0000020799784329705385):101 [-0.0000020799784329705385,-0.0000019073486328125):94 [-0.0000019073486328125,-0.0000017490464080899641):74 [-0.0000017490464080899641,-0.0000016038826279711044):79 [-0.0000016038826279711044,-0.0000014707668546752365):62 [-0.0000014707668546752365,-0.000001348699152348609):66 [-0.000001348699152348609,-0.0000012367625757703873):73 [-0.0000012367625757703873,-0.0000011341162824656687):53 [-0.0000011341162824656687,-0.0000010399892164852693):50 [-0.0000010399892164852693,-9.5367431640625e-07):37 [-9.5367431640625e-07,-8.745232040449821e-07):47 [-8.745232040449821e-07,-8.019413139855522e-07):32 [-8.019413139855522e-07,-7.353834273376182e-07):30 [-7.353834273376182e-07,-6.743495761743044e-07):28 [-6.743495761743044e-07,-6.183812878851937e-07):25 [-6.183812878851937e-07,-5.670581412328344e-07):25 [-5.670581412328344e-07,-5.199946082426346e-07):29 [-5.199946082426346e-07,-4.76837158203125e-07):18 [-4.76837158203125e-07,-4.3726160202249103e-07):15 [-4.3726160202249103e-07,-4.009706569927761e-07):17 [-4.009706569927761e-07,-3.676917136688091e-07):20 [-3.676917136688091e-07,-3.371747880871522e-07):15 [-3.371747880871522e-07,-3.0919064394259683e-07):17 [-3.0919064394259683e-07,-2.835290706164172e-07):15 [-2.835290706164172e-07,-2.599973041213173e-07):9 [-2.599973041213173e-07,-2.384185791015625e-07):12 [-2.384185791015625e-07,-2.1863080101124551e-07):15 [-2.1863080101124551e-07,-2.0048532849638805e-07):9 [-2.0048532849638805e-07,-1.8384585683440456e-07):4 [-1.8384585683440456e-07,-1.685873940435761e-07):8 [-1.685873940435761e-07,-1.5459532197129841e-07):8 [-1.5459532197129841e-07,-1.417645353082086e-07):2 [-1.417645353082086e-07,-1.2999865206065866e-07):4 [-1.1920928955078125e-07,-1.0931540050562276e-07):2 [-1.0024266424819403e-07,-9.192292841720228e-08):3 [-9.192292841720228e-08,-8.429369702178806e-08):3 [-8.429369702178806e-08,-7.729766098564921e-08):2 [-7.08822676541043e-08,-6.499932603032933e-08):4 [-6.499932603032933e-08,-5.960464477539063e-08):1 [-5.960464477539063e-08,-5.465770025281138e-08):3 [-5.465770025281138e-08,-5.012133212409701e-08):4 [-5.012133212409701e-08,-4.596146420860114e-08):1 [-4.596146420860114e-08,-4.214684851089403e-08):2 [-4.214684851089403e-08,-3.8648830492824603e-08):1 [-3.8648830492824603e-08,-3.544113382705215e-08):1 [-3.544113382705215e-08,-3.2499663015164664e-08):1 [-3.2499663015164664e-08,-2.9802322387695312e-08):2 [-2.732885012640569e-08,-2.5060666062048506e-08):2 [-1.9324415246412302e-08,-1.7720566913526073e-08):2 [-1.6249831507582332e-08,-1.4901161193847656e-08):1 [-1.4901161193847656e-08,-1.3664425063202845e-08):1 [-8.860283456763037e-09,-8.124915753791166e-09):1 [-7.450580596923828e-09,-6.832212531601422e-09):1 [-6.265166515512127e-09,-5.7451830260751424e-09):1 [-5.7451830260751424e-09,-5.2683560638617535e-09):1 [-5.2683560638617535e-09,-4.8311038116030754e-09):2 [-2.215070864190759e-09,-2.0312289384477915e-09):1 [-1.4362957565187856e-09,-1.3170890159654384e-09):1 (1.7080531329003556e-09,1.862645149230957e-09]:1 (2.6341780319308768e-09,2.8725915130375712e-09]:2 (5.7451830260751424e-09,6.265166515512127e-09]:1 (6.265166515512127e-09,6.832212531601422e-09]:1 (1.1490366052150285e-08,1.2530333031024253e-08]:1 (1.3664425063202845e-08,1.4901161193847656e-08]:1 (1.6249831507582332e-08,1.7720566913526073e-08]:1 (1.7720566913526073e-08,1.9324415246412302e-08]:2 (1.9324415246412302e-08,2.1073424255447014e-08]:3 (2.9802322387695312e-08,3.2499663015164664e-08]:1 (3.2499663015164664e-08,3.544113382705215e-08]:3 (3.544113382705215e-08,3.8648830492824603e-08]:4 (3.8648830492824603e-08,4.214684851089403e-08]:2 (4.596146420860114e-08,5.012133212409701e-08]:4 (5.012133212409701e-08,5.465770025281138e-08]:1 (5.465770025281138e-08,5.960464477539063e-08]:8 (5.960464477539063e-08,6.499932603032933e-08]:5 (6.499932603032933e-08,7.08822676541043e-08]:3 (7.08822676541043e-08,7.729766098564921e-08]:2 (7.729766098564921e-08,8.429369702178806e-08]:2 (8.429369702178806e-08,9.192292841720228e-08]:2 (9.192292841720228e-08,1.0024266424819403e-07]:4 (1.0024266424819403e-07,1.0931540050562276e-07]:6 (1.0931540050562276e-07,1.1920928955078125e-07]:8 (1.1920928955078125e-07,1.2999865206065866e-07]:9 (1.2999865206065866e-07,1.417645353082086e-07]:6 (1.417645353082086e-07,1.5459532197129841e-07]:10 (1.5459532197129841e-07,1.685873940435761e-07]:10 (1.685873940435761e-07,1.8384585683440456e-07]:7 (1.8384585683440456e-07,2.0048532849638805e-07]:8 (2.0048532849638805e-07,2.1863080101124551e-07]:7 (2.1863080101124551e-07,2.384185791015625e-07]:14 (2.384185791015625e-07,2.599973041213173e-07]:9 (2.599973041213173e-07,2.835290706164172e-07]:14 (2.835290706164172e-07,3.0919064394259683e-07]:9 (3.0919064394259683e-07,3.371747880871522e-07]:10 (3.371747880871522e-07,3.676917136688091e-07]:11 (3.676917136688091e-07,4.009706569927761e-07]:21 (4.009706569927761e-07,4.3726160202249103e-07]:18 (4.3726160202249103e-07,4.76837158203125e-07]:22 (4.76837158203125e-07,5.199946082426346e-07]:24 (5.199946082426346e-07,5.670581412328344e-07]:23 (5.670581412328344e-07,6.183812878851937e-07]:20 (6.183812878851937e-07,6.743495761743044e-07]:22 (6.743495761743044e-07,7.353834273376182e-07]:26 (7.353834273376182e-07,8.019413139855522e-07]:30 (8.019413139855522e-07,8.745232040449821e-07]:34 (8.745232040449821e-07,9.5367431640625e-07]:35 (9.5367431640625e-07,0.0000010399892164852693]:35 (0.0000010399892164852693,0.0000011341162824656687]:66 (0.0000011341162824656687,0.0000012367625757703873]:51 (0.0000012367625757703873,0.000001348699152348609]:60 (0.000001348699152348609,0.0000014707668546752365]:54 (0.0000014707668546752365,0.0000016038826279711044]:74 (0.0000016038826279711044,0.0000017490464080899641]:67 (0.0000017490464080899641,0.0000019073486328125]:74 (0.0000019073486328125,0.0000020799784329705385]:74 (0.0000020799784329705385,0.0000022682325649313374]:87 (0.0000022682325649313374,0.0000024735251515407746]:110 (0.0000024735251515407746,0.000002697398304697218]:107 (0.000002697398304697218,0.000002941533709350473]:113 (0.000002941533709350473,0.000003207765255942209]:133 (0.000003207765255942209,0.0000034980928161799282]:144 (0.0000034980928161799282,0.000003814697265625]:176 (0.000003814697265625,0.000004159956865941077]:195 (0.000004159956865941077,0.000004536465129862675]:189 (0.000004536465129862675,0.000004947050303081549]:206 (0.000004947050303081549,0.000005394796609394436]:221 (0.000005394796609394436,0.000005883067418700946]:234 (0.000005883067418700946,0.000006415530511884418]:307 (0.000006415530511884418,0.0000069961856323598564]:266 (0.0000069961856323598564,0.00000762939453125]:309 (0.00000762939453125,0.000008319913731882154]:341 (0.000008319913731882154,0.00000907293025972535]:373 (0.00000907293025972535,0.000009894100606163098]:408 (0.000009894100606163098,0.000010789593218788871]:450 (0.000010789593218788871,0.000011766134837401892]:512 (0.000011766134837401892,0.000012831061023768835]:569 (0.000012831061023768835,0.000013992371264719713]:564 (0.000013992371264719713,0.0000152587890625]:649 (0.0000152587890625,0.000016639827463764308]:692 (0.000016639827463764308,0.0000181458605194507]:729 (0.0000181458605194507,0.000019788201212326197]:847 (0.000019788201212326197,0.000021579186437577742]:890 (0.000021579186437577742,0.000023532269674803783]:942 (0.000023532269674803783,0.00002566212204753767]:1069 (0.00002566212204753767,0.000027984742529439426]:1193 (0.000027984742529439426,0.000030517578125]:1294 (0.000030517578125,0.000033279654927528616]:1334 (0.000033279654927528616,0.0000362917210389014]:1494 (0.0000362917210389014,0.000039576402424652394]:1571 (0.000039576402424652394,0.000043158372875155485]:1817 (0.000043158372875155485,0.00004706453934960757]:1969 (0.00004706453934960757,0.00005132424409507534]:2043 (0.00005132424409507534,0.00005596948505887885]:2286 (0.00005596948505887885,0.00006103515625]:2459 (0.00006103515625,0.00006655930985505723]:2670 (0.00006655930985505723,0.0000725834420778028]:2950 (0.0000725834420778028,0.00007915280484930479]:3145 (0.00007915280484930479,0.00008631674575031097]:3372 (0.00008631674575031097,0.00009412907869921513]:3538 (0.00009412907869921513,0.00010264848819015068]:3968 (0.00010264848819015068,0.0001119389701177577]:4280 (0.0001119389701177577,0.0001220703125]:4478 (0.0001220703125,0.00013311861971011446]:4828 (0.00013311861971011446,0.0001451668841556056]:5014 (0.0001451668841556056,0.00015830560969860958]:5149 (0.00015830560969860958,0.00017263349150062194]:5350 (0.00017263349150062194,0.00018825815739843027]:5540 (0.00018825815739843027,0.00020529697638030136]:5457 (0.00020529697638030136,0.0002238779402355154]:5503 (0.0002238779402355154,0.000244140625]:5523 (0.000244140625,0.00026623723942022893]:5350 (0.00026623723942022893,0.0002903337683112112]:5075 (0.0002903337683112112,0.00031661121939721915]:4607 (0.00031661121939721915,0.0003452669830012439]:4015 (0.0003452669830012439,0.00037651631479686053]:3309 (0.00037651631479686053,0.00041059395276060273]:2770 (0.00041059395276060273,0.0004477558804710308]:1996 (0.0004477558804710308,0.00048828125]:1457 (0.00048828125,0.0005324744788404579]:989 (0.0005324744788404579,0.0005806675366224224]:577 (0.0005806675366224224,0.0006332224387944383]:298 (0.0006332224387944383,0.0006905339660024878]:132 (0.0006905339660024878,0.0007530326295937211]:53 (0.0007530326295937211,0.0008211879055212055]:20 (0.0008211879055212055,0.0008955117609420616]:7 }
@brian-brazil
Copy link
Contributor

brian-brazil commented Jun 29, 2022

OM 2.x doesn't exist yet in any form, so I think any experiment should be done under a different name and content type for now to avoid any potential future confusion. We already have enough people thinking OM and Prometheus text format are the same thing.

From an OM 1.x standpoint, as long as it can always gracefully negotiate and degrade to OM 1.0 then it's still compliant with OM. Which is to say produce at least a +Inf bucket and any other buckets be essentially static.

In terms of the minute of the format itself I do have some thoughts, though with 2.x we can be less constrained than for 1.x considering that a 2.x implementation would still have to be able to produce a degraded 1.0. So for example that your proposal requires parsers noticing that "foo" is associated with the TYPE just above is reasonable here, whereas it isn't for 1.0.
Without the double quotes would be my main thought, and if you want a JSON parser to be able to handle it ensure that you have a plan for NaN/Inf.

@beorn7
Copy link
Member Author

beorn7 commented Jun 29, 2022

WRT content type: Yes, sure, there should be a very specific content type just for the experiment.

Without the double quotes would be my main thought,

To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.

ensure that you have a plan for NaN/Inf.

Ah right. My thought here was that, for the experiment, we require instrumentation libraries to never emit NaN/Inf. That's anyway a weird corner case. We need to handle it for the real thing, of course. But for the real thing, we won't use a JSON snippet in the first place.

@brian-brazil
Copy link
Contributor

To clarify: If we want to use a JSON parser for the experiment, we need the double quotes during the experiment.

The whole point is to experiment, so that sounds fine to me anyway.

@beorn7
Copy link
Member Author

beorn7 commented Jun 29, 2022

/cc @fstab Would this match your expectation for an experiment in client_java?

@beorn7
Copy link
Member Author

beorn7 commented Sep 6, 2022

FYI: @fstab has now added protobuf support to client_java temporarily. One of the reasons for this experiment (to add native histogram support to client_java in a simple way) is therefore not relevant anymore. This might still be useful to play with a text representation of native histograms and how it behaves during generation and parsing etc.

@beorn7
Copy link
Member Author

beorn7 commented Oct 11, 2022

Also note #256 for a draft of Native Histogram support in the OpenMetrics protobuf format.

@beorn7
Copy link
Member Author

beorn7 commented Mar 14, 2023

Given the reaction to a brainstorming doc, I think we should not pursuit this "embedded JSON" idea any longer (but we can, of course, change our minds again). Of all the ideas discussed, "embedded JSON" (idea 1 in the doc) was the least liked, notably also by @csmarchbanks, who maintains client_python, which will probably be the first instrumentation library to implement a text format for native histograms.

Therefore, I'm retracting this proposal (for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants