-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logictest: improve observability of TeamCity timeouts #78185
Comments
Until this is fixed, you can use
Import CREATE TABLE running_tests (status TEXT, name TEXT) And run the query to find tests that started but did not finish: SELECT name
FROM running_tests t
WHERE status = 'RUN' AND NOT EXISTS (
SELECT * FROM running_tests s
WHERE status = 'DONE' AND t.name = s.name
) |
I mislead you a bit - in the above failure, bazel actually timed out the test. So maybe one ask is for bazel to send a signal before killing, and then the signal can dump stacks and print auxiliary information. |
It'd be amazing to get stack traces. |
Previously, bazel will kill the test process if it goes beyond the timeout duration set for its size. This prevented us from knowing which tests timed out and also prevented us from getting their stack traces. This patch causes the `go test` process to timeout before bazel kills it to allow us to know which test timed out and get its stack trace. Closes cockroachdb#78185 Release justification: Non-production code changes Release note: None
Previously, bazel will kill the test process if it goes beyond the timeout duration set for its size. This prevented us from knowing which tests timed out and also prevented us from getting their stack traces. This patch causes the `go test` process to timeout before bazel kills it to allow us to know which test timed out and get its stack trace. Closes cockroachdb#78185 Release justification: Non-production code changes Release note: None
86363: bazel: make `go test` process timeout before bazel kills it r=rickystewart a=healthy-pod Previously, bazel will kill the test process if it goes beyond the timeout duration set for its size. This prevented us from knowing which tests timed out and also prevented us from getting their stack traces. This patch causes the `go test` process to timeout before bazel kills it to allow us to know which test timed out and get its stack trace. Closes #78185 Release justification: Non-production code changes Release note: None 87067: descs,catkv: rewrite Collection backend r=postamar a=postamar This rewrite was motivated by the upcoming need to validate descriptors up to different levels depending on why they were being read. Doing so requires tracking the validation level in a more granular fashion within the Collection's caches. Trying to make this work within the existing scheme where the uncommitted and stored descriptors are all in the same layer turned out to be painful and hard to reason about. This commit does several things: 1. The uncommitted descriptors are spun off into their own distinct layer again. 2. The stored descriptors now mirror what's in storage and so that whole layer could effectively be moved to catkv. 3. Execution flow through the layers in the Collection is now straightforwardly implemented and easy to reason about. Validation and hydration now take place in unique, well-defined steps at the very end of a lookup, be it by name or by ID. 3. Instances of this new stored descriptors layer are now also used by the lease manager, the Direct() interface provided by the Collection, etc. 4. As a result the logic in catkv could be significantly cleaned up. In particular, lookups for descriptor validation are significantly less convoluted now. 5. The SystemNamespaceCache object was also moved to catkv and now supports caching descriptors, and so was renamed to SystemCatalogCache. Despite significant rewriting of the inner workings of the descriptors collection, this commit should not functionally change anything. Informs #85263. Release justification: low-risk, high-benefit change. Release note: None 87103: storage: fix mvcc stats on gc of range tombstone over del r=erikgrinaker a=aliher1911 Previously if range tombstone was placed over delete, GC would not correctly update GC bytes age incorrectly using range tombstone ts as age of underlying tombstone. Release justification: Bugfix Release note: None Fixing: #87042 87212: ci: set appropriate test tmpdir for compose test r=srosenberg a=rickystewart Otherwise the tmpdir defaults to `/artifacts` which is non-existent outside of a Docker container. Release justification: Non-production code changes Release note: None 87229: storage: fix leaked iterators in unit tests r=erikgrinaker a=jbowens Fix a couple instances of leaked iterators in unit tests in the storage package. This should also resolve the goroutine leak observed in #86256. Informs #71481. Release justification: Non-production code changes Release note: None Co-authored-by: healthy-pod <[email protected]> Co-authored-by: Marius Posta <[email protected]> Co-authored-by: Andrew Werner <[email protected]> Co-authored-by: Oleg Afanasyev <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Jackson Owens <[email protected]>
It's possible for logic tests to run longer than the time allotted for them in TC, in which case TC abruptly kills the tests and prints a timeout message to the log:
Example Log
From the logs, it's unclear which test(s) were running when the timeout occurred. We should ensure that
go test
timeouts first, so we get an informative stack trace that will help track down the issue.Jira issue: CRDB-14009
Epic CRDB-8349
The text was updated successfully, but these errors were encountered: