Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli,server: disable latency jump detection with start-single-node and/or when using Docker on macOS #98066

Closed
knz opened this issue Mar 6, 2023 · 0 comments · Fixed by #98580
Labels
A-logging In and around the logging infrastructure. A-observability-inf B-os-macos Issues specific to macOS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.

Comments

@knz
Copy link
Contributor

knz commented Mar 6, 2023

Describe the problem

When running inside a Docker container on macOS, the TCP stack has very irregular latencies. This causes spurious log messages for folk who are exploring / testing using cockroach start-single-node.

Like this:

2023-03-06 10:56:36 W230306 15:56:34.770789 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 40  latency jump (prev avg 64.54ms, current 108.40ms)
2023-03-06 10:56:45 W230306 15:56:44.095660 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 41  latency jump (prev avg 70.86ms, current 131.61ms)
2023-03-06 10:57:01 W230306 15:56:59.634686 623 2@rpc/clock_offset.go:226 ⋮ [n1,rnode=1,raddr=‹b17287e17171:26257›,class=default,heartbeat] 42  latency jump (prev avg 72.80ms, current 242.88ms)

To Reproduce

run cockroach start-single-node on macOS via the Docker image.

Expected behavior

We should avoid the spurious warnings in that case.

Note that this logging mechanism is otherwise desirable in production clusters. A latency jump from, say, 70ms to 242ms between nodes, or on a node connecting to itself via the loopback interface, is a MAJOR operational event and should be reported.

The anomalous situation here is caused by macOS. So if we make the situation better in that case, that should not be at the expense of proper network observability for everyone else.

Jira issue: CRDB-25060

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption. A-logging In and around the logging infrastructure. T-observability-inf labels Mar 6, 2023
@knz knz added the B-os-macos Issues specific to macOS. label Mar 6, 2023
tbg added a commit to tbg/cockroach that referenced this issue Mar 14, 2023
For months I've seen this misfire in nearly every single log line I've
looked at, and I've had to grep it out in many L2 incidents.
Maybe it works better when we suppress it for latencies <=50ms.

Touches cockroachdb#96262.
Fixes cockroachdb#98066.

Epic: none
Release note: None
craig bot pushed a commit that referenced this issue Mar 15, 2023
98517: multitenant: add AdminMerge capability r=knz a=ecwall

Fixes #95138

AdminMerge is currently only called by the system tenant even though it is
named similarly to other Admin* functions so it does not need its own
capability for now.

This changes its required capability from noCapCheckNeeded to onlySystemTenant
to prevent secondary tenants from calling it.

Release note: None

98580: rpc: bump threshold for latency jump reporting r=erikgrinaker a=tbg

For months I've seen this misfire in nearly every single log line I've
looked at, and I've had to grep it out in many L2 incidents.
Maybe it works better when we suppress it for latencies <=50ms.

Touches #96262.
Fixes #98066.

Epic: none
Release note: None


98688: colexecbase: fix a recently introduced bug with identity cast r=yuzefovich a=yuzefovich

This commit fixes a recently introduced bug that can occur when we're randomizing `coldata.BatchSize()` (which we do in tests). In particular, we capped a global singleton at one batch size value, but later we can change it to a higher value, which could lead to index out of bounds. This is now fixed by always using the max batch size.

Fixes: #98660.

Release note: None

98700: opt: fix hoist of ANY comparison with tuples r=mgartner a=mgartner

#### opt: fix hoist of ANY comparison with tuples

Prior to this commit, when hoisting Any expressions like
`<left> = ANY (SELECT <right> ...)`, we constructed
`(IsNot <left|right> Null)` expressions which are equivalent to
`<left|right> IS DISTINCT FROM NULL`. As discovered in #46675, these
expressions have different behavior than `<left> IS NOT NULL` when
`<left>` is a tuple. As a result, the hoisting transformations could
construct invalid plans that cause incorrect results. This commit fixes
the issue by using `IsTupleNotNull` expressions when `<left>` and
`<right> are tupleq.

Fixes #98691

Release note (bug fix): A bug has been fixes that caused incorrect
results of ANY comparisons of tuples. For example, an expression like
`(x, y) = ANY (SELECT a, b FROM t WHERE ...)` could return `true`
instead of the correct result of `NULL` when `x` and `y` were `NULL`, or
`a` and `b` were `NULL`. This could only occur if the subquery was
correlated, i.e., it references columns from the outer part of the
query. This bug was present since the cost-based optimizer was
introduced in version 2.1.


Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in 3fed3d4 Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-logging In and around the logging infrastructure. A-observability-inf B-os-macos Issues specific to macOS. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-ux-surprise Issue leaves users wondering whether CRDB is behaving properly. Likely to hurt reputation/adoption.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant