Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logictest: compare floating point values approximately on s390x #63244

Merged

Conversation

jonathan-albrecht-ibm
Copy link
Contributor

Overview

On s390x in the std math package and some c-deps, floating point calculations can produce results that differ from the values calculated on amd64. This patch adds a function to compare logictest floating point and decimal values within a small relative margin on s390x. The existing behavior on all other platforms remains the same.

On s390x, there are three main reasons that floating point calculations sometimes give different results:

  • the go compiler generates the s390x "fused multiply and add" (FMA) instruction where possible,
  • the go math package uses s390x optimized versions of some functions,
  • some c libs eg. libgeos, libproj also have platform specific floating point calculation differences.

Proposal

The motivation for this work is so that users building CRDB on s390x do not need to diagnose tests that fail because of platform dependent floating point differences.

This PR proposes one possible approach to dealing with platform dependent floating point differences. Since development, testing and CI are done on amd64 it keeps the current logic for determining float equality exactly the same. On s390x, it determines values of decimal and float column types (R and F) in query tests to be equal if they are within a tolerance. See the new pkg/testutils/floatcmp package for the implementation of the approximate equality logic and changes in logictest.go to see how it is applied to only s390x.

There are probably other approaches I haven't thought of that would also work. I'd like to use this proposal to start a conversation on how all tests in CRDB that currently fail due to expected floating point differences could eventually be made to pass.

Of course platforms other than s390x may also have differences but I haven't looked at any other platforms. The changes should be easily extendable to other platforms if needed.

Future Work

The changes in this PR allow the following tests to pass on s390x:

  • TestLogic/fakedist-disk/builtin_function/extra_float_digits_3
  • TestLogic/fakedist-metadata/builtin_function/extra_float_digits_3
  • TestLogic/fakedist-vec-off/builtin_function/extra_float_digits_3
  • TestLogic/fakedist/builtin_function/extra_float_digits_3
  • TestLogic/local-spec-planning/builtin_function/extra_float_digits_3
  • TestLogic/local-vec-off/builtin_function/extra_float_digits_3
  • TestLogic/local/builtin_function/extra_float_digits_3

There are about 70 more tests that currently fail due to platform floating point differences on s390x, many are tests of geospatial functions. Assuming we can come up with a good approach, I'd like to continue working on fixes to be submitted in future PRs.

Release note: None

@blathers-crl
Copy link

blathers-crl bot commented Apr 7, 2021

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

I was unable to automatically find a reviewer. You can try CCing one of the following members:

  • A person you worked with closely on this PR.
  • The person who created the ticket, or a CRDB organization member involved with the ticket (author, commenter, etc.).
  • Join our community slack channel and ask on #contributors.
  • Try find someone else from here.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner labels Apr 7, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jonathan-albrecht-ibm
Copy link
Contributor Author

cc @yuzefovich for visibility

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! I've left a few minor comments but I'm sure others will want to chime in as well.

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)


pkg/sql/logictest/logic.go, line 2839 at r1 (raw file):

			if !resultMatches && (colT == 'F' || colT == 'R') {
				var err error
				// On s390x, check that both floats and decimals are approximately equal

I didn't see any tests for decimals -- would be good to add some


pkg/sql/logictest/logic.go, line 2840 at r1 (raw file):

				var err error
				// On s390x, check that both floats and decimals are approximately equal
				// to take into account platform differences in floating point calculations

nit: comment should end in a period.


pkg/sql/logictest/logic.go, line 2903 at r1 (raw file):

}

func parseFloats(expectedString, actualString string) (float64, float64, error) {

nit: I'd name this something less generic since it's designed for a specific case. Maybe parseExpectedAndActualFloats?


pkg/sql/logictest/logic.go, line 2915 at r1 (raw file):

}

func floatsMatchApprox(expectedString, actualString string) (bool, error) {

this function needs a comment


pkg/testutils/floatcmp/floatcmp.go, line 26 at r1 (raw file):

	// the close tolerances in go's math package.
	CloseFraction float64 = 1e-14
	// CloseMargin can be used to set a "close" tolerance for the margin

nit: add a blank line above this comment


pkg/testutils/floatcmp/floatcmp.go, line 45 at r1 (raw file):

// • If both expected and actual are not NaN or infinate, they are equal within
// the larger of the relative fraction or absolute margin calculated from the
// fraction and margin arguments.

could you give a concrete example of when fraction or margin would be used?


pkg/testutils/floatcmp/floatcmp_test.go, line 40 at r1 (raw file):

		args structArgs
		want bool
	}

these structs could use some comments explaining their purpose


pkg/testutils/floatcmp/floatcmp_test.go, line 116 at r1 (raw file):

}

func toStructTests(floatTestCases []floatTestCase) []structTestCase {

this needs a comment

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)


pkg/sql/logictest/logic.go, line 2837 at r1 (raw file):

			// by the number of coltypes.
			colT := query.colTypes[i%len(query.colTypes)]
			if !resultMatches && (colT == 'F' || colT == 'R') {

nit: I think I'd move the second part of AND to be ANDed with the architecture checks below, to something like:

			if !resultMatches {
				var err error
				// On s390x, check that both floats and decimals are approximately equal
				// to take into account platform differences in floating point calculations
				if runtime.GOARCH == "s390x" && (colT == 'F' || colT == 'R') {
					resultMatches, err = floatsMatchApprox(expected, actual)
				} else if colT == 'F' {
					resultMatches, err = floatsMatch(expected, actual)
				}
				if err != nil {
					return errors.CombineErrors(makeError(), err)
				}
			}

pkg/testutils/floatcmp/floatcmp.go, line 30 at r1 (raw file):

	// the CloseFraction constant for the fraction argument.
	//
	// It is set to the square of CloseFraction so it only used when the smaller

nit: s/it only used/ it is only used/.

@jonathan-albrecht-ibm jonathan-albrecht-ibm force-pushed the floating_point_test_tolerance branch from adc2efc to b3237ed Compare April 8, 2021 16:16
@blathers-crl
Copy link

blathers-crl bot commented Apr 8, 2021

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@jonathan-albrecht-ibm
Copy link
Contributor Author

@rytaft @yuzefovich thanks for reviewing.

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)

pkg/sql/logictest/logic.go, line 2839 at r1 (raw file):

			if !resultMatches && (colT == 'F' || colT == 'R') {
				var err error
				// On s390x, check that both floats and decimals are approximately equal

I didn't see any tests for decimals -- would be good to add some

By 'floats and decimals' I meant the float 'F' and decimal 'R' coltypes. At this point the expected and actual values coming in for both those coltypes should be strings that are convertible to float64 to do the floatsMatch* comparisons. I don't think there are different tests that can be done but let me know if I've misunderstood.

The comment in the code was vague anyway so I changed it to make it a bit clearer.

pkg/sql/logictest/logic.go, line 2840 at r1 (raw file):

				var err error
				// On s390x, check that both floats and decimals are approximately equal
				// to take into account platform differences in floating point calculations

nit: comment should end in a period.

Done.

pkg/sql/logictest/logic.go, line 2903 at r1 (raw file):

}

func parseFloats(expectedString, actualString string) (float64, float64, error) {

nit: I'd name this something less generic since it's designed for a specific case. Maybe parseExpectedAndActualFloats?

Done.

pkg/sql/logictest/logic.go, line 2915 at r1 (raw file):

}

func floatsMatchApprox(expectedString, actualString string) (bool, error) {

this function needs a comment

Done.

pkg/testutils/floatcmp/floatcmp.go, line 26 at r1 (raw file):

	// the close tolerances in go's math package.
	CloseFraction float64 = 1e-14
	// CloseMargin can be used to set a "close" tolerance for the margin

nit: add a blank line above this comment

Done.

pkg/testutils/floatcmp/floatcmp.go, line 45 at r1 (raw file):

// • If both expected and actual are not NaN or infinate, they are equal within
// the larger of the relative fraction or absolute margin calculated from the
// fraction and margin arguments.

could you give a concrete example of when fraction or margin would be used?

Done, let me know if it looks ok.

pkg/testutils/floatcmp/floatcmp_test.go, line 40 at r1 (raw file):

		args structArgs
		want bool
	}

these structs could use some comments explaining their purpose

Done.

pkg/testutils/floatcmp/floatcmp_test.go, line 116 at r1 (raw file):

}

func toStructTests(floatTestCases []floatTestCase) []structTestCase {

this needs a comment

Done.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)

pkg/sql/logictest/logic.go, line 2837 at r1 (raw file):

			// by the number of coltypes.
			colT := query.colTypes[i%len(query.colTypes)]
			if !resultMatches && (colT == 'F' || colT == 'R') {

nit: I think I'd move the second part of AND to be ANDed with the architecture checks below, to something like:

			if !resultMatches {
				var err error
				// On s390x, check that both floats and decimals are approximately equal
				// to take into account platform differences in floating point calculations
				if runtime.GOARCH == "s390x" && (colT == 'F' || colT == 'R') {
					resultMatches, err = floatsMatchApprox(expected, actual)
				} else if colT == 'F' {
					resultMatches, err = floatsMatch(expected, actual)
				}
				if err != nil {
					return errors.CombineErrors(makeError(), err)
				}
			}

Done.

pkg/testutils/floatcmp/floatcmp.go, line 30 at r1 (raw file):

	// the CloseFraction constant for the fraction argument.
	//
	// It is set to the square of CloseFraction so it only used when the smaller

nit: s/it only used/ it is only used/.

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)


pkg/sql/logictest/logic.go, line 2839 at r1 (raw file):

Previously, jonathan-albrecht-ibm (Jonathan Albrecht) wrote…

By 'floats and decimals' I meant the float 'F' and decimal 'R' coltypes. At this point the expected and actual values coming in for both those coltypes should be strings that are convertible to float64 to do the floatsMatch* comparisons. I don't think there are different tests that can be done but let me know if I've misunderstood.

The comment in the code was vague anyway so I changed it to make it a bit clearer.

Makes sense - thanks!


pkg/testutils/floatcmp/floatcmp.go, line 45 at r1 (raw file):

Previously, jonathan-albrecht-ibm (Jonathan Albrecht) wrote…

Done, let me know if it looks ok.

Great explanation! Thank you!


pkg/testutils/floatcmp/floatcmp_test.go, line 40 at r1 (raw file):

Previously, jonathan-albrecht-ibm (Jonathan Albrecht) wrote…

Done.

This is better, but I was hoping for more of an explanation about why you would want two different types of tests. Something along the lines of "EqualApprox takes an interface, allowing it to compare equality of both primitive data types and structs. We want to test both cases."


pkg/testutils/floatcmp/floatcmp_test.go, line 125 at r2 (raw file):

}

// toStructTests transforms an array of floatTestsCases into an array of structTestCases by

nit: floatTestsCases -> floatTestCases

On s390x in the std math package and some c-deps, floating point
calculations can produce results that differ from the values calculated
on amd64. This patch adds a function to compare logictest floating point
and decimal values within a small relative margin on s390x.

Release note: None
@jonathan-albrecht-ibm jonathan-albrecht-ibm force-pushed the floating_point_test_tolerance branch from b3237ed to 688583e Compare April 9, 2021 21:34
Copy link
Contributor Author

@jonathan-albrecht-ibm jonathan-albrecht-ibm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/testutils/floatcmp/floatcmp_test.go, line 40 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This is better, but I was hoping for more of an explanation about why you would want two different types of tests. Something along the lines of "EqualApprox takes an interface, allowing it to compare equality of both primitive data types and structs. We want to test both cases."

I added your suggested comment to the beginning of the types. I think it helps make sense of everything after but let me know if it could be moved somewhere better.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for the contribution! I'll give other folks at CRL a couple of days to chime in and then I'll merge this.

Reviewed 1 of 1 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)


pkg/testutils/floatcmp/floatcmp_test.go, line 40 at r1 (raw file):

Previously, jonathan-albrecht-ibm (Jonathan Albrecht) wrote…

I added your suggested comment to the beginning of the types. I think it helps make sense of everything after but let me know if it could be moved somewhere better.

SGTM

@jonathan-albrecht-ibm
Copy link
Contributor Author

@rytaft @yuzefovich no rush but would like to check if there are there any more comments?

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for following up, @jonathan-albrecht-ibm! Sorry I lost track of this. I'll go ahead and merge this. Thanks again for the contribution!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jonathan-albrecht-ibm)

@craig
Copy link
Contributor

craig bot commented Apr 19, 2021

Build failed:

@jonathan-albrecht-ibm
Copy link
Contributor Author

@rytaft IIUC the build failure is just a hiccup. Can it be rerun?

[20:02:17][Prepare environment] curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to register.cockroachdb.com:443 
[20:02:17][Prepare environment] Process exited with code 35

@RaduBerinde
Copy link
Member

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

Already running a review

@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

This PR was included in a batch that was canceled, it will be automatically retried

craig bot pushed a commit that referenced this pull request Apr 20, 2021
63238: roachtest: update libpq blocklist to ignore TestCopyInBinaryError r=rafiss a=RichardJCai

roachtest: update libpq blocklist to ignore TestCopyInBinaryError

TestCopyInBinary's behaviour was incorrect in the test since we were not receiving an expected error (`pq: only text format supported for COPY`). 
Furthermore the test would sporadically panic causing the following tests to fail.

Release note: None

Resolves #57855 

63244: logictest: compare floating point values approximately on s390x r=ajwerner a=jonathan-albrecht-ibm

### Overview
On s390x in the std math package and some c-deps, floating point calculations can produce results that differ from the values calculated on amd64. This patch adds a function to compare logictest floating point and decimal values within a small relative margin on s390x. The existing behavior on all other platforms remains the same.

On s390x, there are three main reasons that floating point calculations sometimes give different results:
* the go compiler generates the s390x "fused multiply and add" (FMA) instruction where possible,
* the go math package uses s390x optimized versions of some functions,
* some c libs eg. libgeos, libproj also have platform specific floating point calculation differences.

### Proposal
The motivation for this work is so that users building CRDB on s390x do not need to diagnose tests that fail because of platform dependent floating point differences.

This PR proposes one possible approach to dealing with platform dependent floating point differences. Since development, testing and CI are done on amd64 it keeps the current logic for determining float equality exactly the same. On s390x, it determines values of decimal and float column types (R and F) in query tests to be equal if they are within a tolerance. See the new pkg/testutils/floatcmp package for the implementation of the approximate equality logic and changes in logictest.go to see how it is applied to only s390x.

There are probably other approaches I haven't thought of that would also work. I'd like to use this proposal to start a conversation on how all tests in CRDB that currently fail due to expected floating point differences could eventually be made to pass.

Of course platforms other than s390x may also have differences but I haven't looked at any other platforms. The changes should be easily extendable to other platforms if needed.

### Future Work
The changes in this PR allow the following tests to pass on s390x:
* TestLogic/fakedist-disk/builtin_function/extra_float_digits_3
*  TestLogic/fakedist-metadata/builtin_function/extra_float_digits_3
*  TestLogic/fakedist-vec-off/builtin_function/extra_float_digits_3
*  TestLogic/fakedist/builtin_function/extra_float_digits_3
*  TestLogic/local-spec-planning/builtin_function/extra_float_digits_3
*  TestLogic/local-vec-off/builtin_function/extra_float_digits_3
*  TestLogic/local/builtin_function/extra_float_digits_3

There are about 70 more tests that currently fail due to platform floating point differences on s390x, many are tests of geospatial functions. Assuming we can come up with a good approach, I'd like to continue working on fixes to be submitted in future PRs.

Release note: None

63802: colbuilder: optimize IS DISTINCT FROM NULL when null is casted r=yuzefovich a=yuzefovich

We have an optimized operator for `Is{Not}DistinctFrom` operation which
we can plan currently only if the right side is a constant NULL. In some
cases the optimizer might create a cast expression on the right in order
to propagate the type of the null, and previously we would fallback to
the default comparison operator in such scenario. This is suboptimal,
and this commit fixes the issue by special casing the scenario of
casting NULL to some type.

Fixes: #63792.

Release note: None

63903: sql: mark planNodeToRowSource as streaming intelligently r=yuzefovich a=yuzefovich

Previously, out of abundance of caution (and some laziness) we marked
all `planNodeToRowSource` processors as of "streaming" nature. This
marker influences whether we wrap it with a streaming or buffering
columnarizer into the vectorized flow. However, doing so is unnecessary
in most cases and kills some of the benefits of the vectorized model.
The only special planNode is `hookFnNode` which must be streaming, all
others are safe to have buffering around them. This commit implements
that idea. This required adding another method to `Processor` interface.

Release note: None

Co-authored-by: richardjcai <[email protected]>
Co-authored-by: Jonathan Albrecht <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Apr 20, 2021

Build succeeded:

@craig craig bot merged commit b88bd7f into cockroachdb:master Apr 20, 2021
@jonathan-albrecht-ibm
Copy link
Contributor Author

@rytaft @yuzefovich TFTR!

@jonathan-albrecht-ibm jonathan-albrecht-ibm deleted the floating_point_test_tolerance branch April 21, 2021 13:07
@rytaft
Copy link
Collaborator

rytaft commented Apr 21, 2021

Thank you! Apologies again for the delay in merging.

rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Jul 10, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Jul 17, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Jul 18, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Jul 22, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
craig bot pushed a commit that referenced this pull request Jul 24, 2023
106552: tests: support float approximation in roachtest query comparison utils r=rharding6373 a=rharding6373

tests, logictest, floatcmp: refactor comparison test util functions
    
This commit moves some float comparison test util functions from
logictest into the floatcmp package. It also moves a query result
comparison function from the tlp file to query_comparison_util in the
tests package.
    
This commit also marks roachtests as testonly targets.
    
Epic: none
    
Release note: None


tests: support float approximation in roachtest query comparison utils
    
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.
    
This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see #63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.
    
Epic: None
Fixes: #95665
    
Release note: None

106607: pkg/util/log: introduce `fluent.sink.conn.errors` metric to log package r=knz a=abarganier

Logging is a critical subsystem within CRDB, but as things
are today, we have very little observability into logging
itself. For starters, we have no metrics in the logging
package at all!

This makes it difficult to observe things within the log
package. For example, if a fluent-server log sink fails
to connect to FluentBit, how can we tell? We get some STDOUT
message, but that's about it.

It's time to get some metrics into the log package.

Doing so is a bit of a tricky dance, because pretty much every
package in CRDB imports the logging package, meaning you
almost always get a circular dependency when trying to make
use of any library within pkg/util/log. Therefore, this PR 
injects metrics functionality into the logging package.
It does so via a new interface called `LogMetrics` with
functions that enable its users to modify metrics.

The implementation of the interface can live elsewhere,
such as the metrics package itself, whe circular dependencies
aren't such a pain. We can then inject the implementation
into the log package.

With all that plumbing done, the PR also introduces a new metric,
`fluent.sink.conn.errors` representing fluentbit connection errors 
whenever a fluent-server log sink fails to establish a connection.

We can see the metric represented below in a test, where no
fluent-server was running for a moment, before starting it.
Note that the connection errors level off once the server was 
started:
<img width="1018" alt="Screenshot 2023-07-11 at 1 32 57 PM" src="https://github.com/cockroachdb/cockroach/assets/8194877/ccacf84e-e585-4a76-98af-ed145629f9ef">

Release note (ops change): This patch introduces the counter
metric `fluent.sink.conn.errors` to the CockroachDB tsdb,
which is exposed to `/_status/vars` clients as
`fluent_sink_conn_errors`. The metric is incremented whenever
a `fluent-server` log sink fails to establish a connection to
the log sink pointed to by the `address` for the sink in the
provided log config.

Epic: CC-9681

Addresses: #102753

107453: Revert "github-pull-request-make: temporary workaround" r=postamar a=postamar

This reverts commit 2bd61c0.

Relates to #106920.

Co-authored-by: rharding6373 <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Marius Posta <[email protected]>
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Sep 8, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Sep 8, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Sep 13, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
rharding6373 added a commit to rharding6373/cockroach that referenced this pull request Sep 13, 2023
Before this change unoptimized query oracle tests would compare results
using simple string comparison. However, due to floating point precision
limitations, it's possible for results with floating point to diverge
during the course of normal computation. This results in test failures
that are difficult to reproduce or determine whether they are expected
behavior.

This change utilizes existing floating point comparison functions used
by logic tests to match float values only to a specific precision. Like
the logic tests, we also have special handling for floats and decimals
under the s390x architecture (see cockroachdb#63244). In order to avoid costly
comparisons, we only check floating point precision if the naiive string
comparison approach fails and there are float or decimal types in the
result.

Epic: None
Fixes: cockroachdb#95665

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community X-blathers-untriaged blathers was unable to find an owner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants