Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: sqllitelogic test no longer files GitHub issues upon failure #79403

Closed
jordanlewis opened this issue Apr 5, 2022 · 3 comments · Fixed by #79532 or #80284
Closed

roachtest: sqllitelogic test no longer files GitHub issues upon failure #79403

jordanlewis opened this issue Apr 5, 2022 · 3 comments · Fixed by #79532 or #80284
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-dev-inf

Comments

@jordanlewis
Copy link
Member

jordanlewis commented Apr 5, 2022

The Bazel sqllitelogic test has been failing since at least 21 Feb, before which no history is recorded: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_SQLiteLogicTestsBazel?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds

However, no issues were filed for the failures. We rely on these issues to detect the failures, so it's very important that they get filed.

This was last fixed here: #47566

Here is an example of a correctly filed SQLiteLogicTest issue: #77929

Jira issue: CRDB-14807

@jordanlewis jordanlewis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Apr 5, 2022
@jordanlewis
Copy link
Member Author

I wonder if the switch to Bazel might have caused this regression, but I have no idea. For reference, here is the pre-bazel test history, which wasn't failing: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_SqlLogicTests?branch=%3Cdefault%3E&buildTypeTab=overview&mode=builds

craig bot pushed a commit that referenced this issue Apr 6, 2022
74005: server: hash tenantID into ClusterID used in SQL exec r=knz a=dt

Fixes #74480.

Prior to this patch, in a multi-tenant deployment the ClusterID value
visible to tenans was the same as the ClusterID for the storage
cluster.

This original design was defective because it prevented the
backup/restore subsystem from distinguishing backups made for
different tenants.

In this commit, we change the ClusterID for tenants to become a new
UUID that is the storage cluster's ClusterID with a hash of the
tenant's ID added to the lower 64 bits. This makes it so that two
tenants on the same storage cluster will observe different values when
calling ClusterID().

Release note (bug fix): Separate tenants now use unique values for the
internal field ClusterID. Previously separate tenants that
were hosted on the same storage cluster would be assigned the same
ClusterID.


Jira issue: CRDB-14750

77794: sql: implement SHOW CLUSTER SETTING FOR TENANT r=rafiss a=knz

All commits but the last 2 from #77742.
(Reviewers: only the last 2 commits belong to this PR.)

Fixes #77471

Release justification: fixes to high-priority bugs

79503: backupccl: avoid over-shrinking memory monitor r=adityamaru a=stevendanna

This change updates DecryptFile to optionally use a memory monitor and
adjusts the relevant calling functions.

Previously, the readManifest function that used DecryptFile could
result in shrinking too many bytes from the bound account:

```
ERROR: a panic has occurred!
root: no bytes in account to release, current 29571, free 30851
(1) attached stack trace
  -- stack trace:
  | runtime.gopanic
  | 	GOROOT/src/runtime/panic.go:1038
  | [...repeated from below...]
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/util/log/logcrash.ReportOrPanic
  | 	github.com/cockroachdb/cockroach/pkg/util/log/logcrash/crash_reporting.go:374
  | github.com/cockroachdb/cockroach/pkg/util/mon.(*BoundAccount).Shrink
  | 	github.com/cockroachdb/cockroach/pkg/util/mon/bytes_usage.go:709
  | github.com/cockroachdb/cockroach/pkg/ccl/backupccl.(*restoreResumer).doResume.func1
  | 	github.com/cockroachdb/cockroach/pkg/ccl/backupccl/pkg/ccl/backupccl/restore_job.go:1244
```

This was the result of us freeing `cap(descBytes)` despite the fact
that we never accounted for the fact that `descBytes` had been
re-assigned after the decryption without accounting for changes in the
capacity used by the old vs new buffer. That is, we called mem.Grow
with the capcity of the old buffer, but mem.Shrink with the capacity
of the new buffer.

We generally expect that the size of encrypted data to be larger than
the size of the plaintext data, so in most cases, the existing code
would work without error because it was freeing an amount smaller than
the original allocation.

However, while the plaintext data is smaller than the encrypted data,
that doesn't mean that the _capacity of the slice holding the
plaintext data_ is smaller than the _capacity of the slice holding the
encrypted data_.

The slice holding the encrypted data was created with mon.ReadAll
which has slice growth strategy of doubling the size until it reaches
8MB. The slice holding the plaintext data was created by
ioutil.ReadAll that defers to appends slice growth strategy, which
differs from that used in mon.ReadAll.

In the current implementations, for byte sizes around 30k, the
capacity of the buffer create by append's strategy is larger than that
created by mon.ReadAll despite the len of the buffer being smaller:

    before decrypt: len(descBytes) = 27898, cap(descBytes) = 32768
     after decrypt: len(descBytes) = 27862, cap(descBytes) = 40960

We could have fixed this by simply adjusting the memory account by the
difference. Instead, I've opted to thread the memory monitor into
DecryptFile to better account for the fact that we technically do hold
2 copies of the data during this decryption.

Fixes #79488

Release note: None

79532: ci: fix some janky ci scripts that prevent posting issues to github r=mari-crl a=rickystewart

In a couple places in `82e0b121c715c59cebbb8d53e29edf7952d6913f` I
accidentally did `$exit_status=$?` instead of `exit_status=$?`, which is
not a proper variable assignment. This was causing these jobs to fail
before they could post issues to GitHub.

Closes #79403.

Release note: None

Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@craig craig bot closed this as completed in 6a88d40 Apr 6, 2022
blathers-crl bot pushed a commit that referenced this issue Apr 6, 2022
In a couple places in `82e0b121c715c59cebbb8d53e29edf7952d6913f` I
accidentally did `$exit_status=$?` instead of `exit_status=$?`, which is
not a proper variable assignment. This was causing these jobs to fail
before they could post issues to GitHub.

Closes #79403.

Release note: None
@cucaroach
Copy link
Contributor

cucaroach commented Apr 20, 2022

Unless I'm missing something we're stilling failing sqllite logic tests and not generating github issues. @rickystewart I poked around but didn't see anything obvious, can you or someone on your team take another look? Thanks!

@cucaroach cucaroach reopened this Apr 20, 2022
@rickystewart
Copy link
Collaborator

Looking at https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_SQLiteLogicTestsBazel/4935103?buildTab=log&focusLine=7027&linesState=4751.4756:

build/teamcity-bazel-support.sh: line 39: TC_BUILD_BRANCH: unbound variable

I know what's causing this, give me a bit to get a PR together.

rickystewart added a commit to rickystewart/cockroach that referenced this issue Apr 20, 2022
Without these environment variables, these nightlies won't be able to
run `process_test_json` or post GitHub issues.

Closes cockroachdb#79403.

Release note: None
@craig craig bot closed this as completed in fc58be8 Apr 20, 2022
blathers-crl bot pushed a commit that referenced this issue Apr 20, 2022
Without these environment variables, these nightlies won't be able to
run `process_test_json` or post GitHub issues.

Closes #79403.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-dev-inf
Projects
None yet
4 participants