Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/cgroup/read-write/logs-too=false failed #98904

Closed
cockroach-teamcity opened this issue Mar 17, 2023 · 4 comments
Closed
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Mar 17, 2023

roachtest.disk-stalled/cgroup/read-write/logs-too=false failed with artifacts on master @ f0489334a0ee6980a9d365b361d2fce4b2cdc05b:

test artifacts and logs in: /artifacts/disk-stalled/cgroup/read-write/logs-too=false/run_1
(test_runner.go:990).runTest: test timed out (20m0s)
(cluster.go:1969).Run: output in run_104035.607708578_n4_cockroach-workload-r: ./cockroach workload run kv --read-percent 50 --duration 10m --concurrency 256 --max-rate 2048 --tolerate-errors  --min-block-bytes=512 --max-block-bytes=512 {pgurl:1-3} returned: COMMAND_PROBLEM: ssh verbose log retained in ssh_104036.374671798_n4_cockroach-workload-r.log: exit status 137
(disk_stall.go:206).runDiskStalledDetection: context canceled
(cluster.go:1969).Run: cluster.RunE: context canceled

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-25580

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Mar 17, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.1 milestone Mar 17, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Mar 17, 2023
@jbowens jbowens added A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Mar 20, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 20, 2023

cc @cockroachdb/replication

@jbowens
Copy link
Collaborator

jbowens commented Mar 20, 2023

This test timed out because it took 10 minutes (~10x longer than typical) to upreplicate the kv ranges after

	c.Run(ctx, c.Node(4), `./cockroach workload init kv --splits 1000 {pgurl:1}`)

@cockroachdb/replication — Is this amount of variance in upreplication latency expected? If so we can close this out (and I'll bump the timeout).

jbowens added a commit to jbowens/cockroach that referenced this issue Mar 20, 2023
This commit sets a new 30m timeout for all disk stall roachtests. Previously,
the FUSE filesystem variants had no timeout and inherited the default 10h
timeout. The other variants had a 20m timeout, which has been observed to be
too short due to upreplication latency.

Informs cockroachdb#98904.
Informs cockroachdb#98886.
Epic: None
Release note: None
@erikgrinaker
Copy link
Contributor

erikgrinaker commented Mar 20, 2023

I see that the test waits for upreplication via WaitFor3XReplication after running workload init kv. This is bad, because if the system ranges haven't finished upreplicating from RF=1 to RF=3 before we start splitting, then we'll be splitting new ranges off of a RF=1 range and then have to upreplicate all 1000 of them (while ingesting data) from RF=1 to RF=3 afterwards. If we instead wait for uprelication before splitting, then we're splitting RF=3 ranges from the get-go and don't have to upreplicate.

@erikgrinaker erikgrinaker removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. A-kv-replication Relating to Raft, consensus, and coordination. T-kv-replication labels Mar 20, 2023
@jbowens
Copy link
Collaborator

jbowens commented Mar 20, 2023

Thanks for looking—I'll switch the order.

jbowens added a commit to jbowens/cockroach that referenced this issue Mar 20, 2023
This reduces the amount of work.

Informs cockroachdb#98904.
Epic: None
Release note: None
craig bot pushed a commit that referenced this issue Mar 21, 2023
99075: roachtest: update disk-stall roachtests to wait for uprepl first r=nicktrav a=jbowens

This reduces the amount of work.

Informs #98904.
Epic: None
Release note: None

Co-authored-by: Jackson Owens <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Mar 21, 2023
This reduces the amount of work.

Informs #98904.
Epic: None
Release note: None
blathers-crl bot pushed a commit that referenced this issue Mar 21, 2023
This reduces the amount of work.

Informs #98904.
Epic: None
Release note: None
@jbowens jbowens closed this as completed Mar 21, 2023
craig bot pushed a commit that referenced this issue Mar 21, 2023
97685: sql: add default_text_search_config r=jordanlewis a=jordanlewis

Updates: #41288
Epic: CRDB-22357

All but the last commit are from #92966 and #97677.


    This commit adds the default_text_search_config variable for the tsearch
    package, which allows the user to set a default configuration for the
    text search builtin functions that take configurations, such as
    to_tsvector and to_tsquery. The default for this configuration variable
    is 'english', as it is in Postgres.

    Release note (sql change): add the default_text_search_config variable
    for compatibility with the single-argument variants of the text search
    functions to_tsvector, to_tsquery, phraseto_tsquery, and
    plainto_tsquery, which use the value of default_text_search_config
    instead of expecting one to be included as in the two-argument variants.
    The default value of this setting is 'english'.

99045: roachtest: set 30m timeout for all disk stall roachtests r=nicktrav a=jbowens

This commit sets a new 30m timeout for all disk stall roachtests. Previously,
the FUSE filesystem variants had no timeout and inherited the default 10h
timeout. The other variants had a 20m timeout, which has been observed to be
too short due to upreplication latency.

Informs #98904.
Informs #98886.
Epic: None
Release note: None


99057: sql: check replace view columns earlier r=rharding6373 a=rharding6373

Before this change, we could encounter internal errors while attempting to add result columns during a `CREATE OR REPLACE VIEW` if the number of columns in the new view was less than the number of columns in the old view. This led to an inconsistency with postgres, which would only return the error `cannot drop columns from view`.

This PR moves the check comparing the number of columns before and after the view replacement earlier so that the correct error returns.

Co-authored-by: [email protected]

Fixes: #99000
Epic: None

Release note (bug fix): Fixes an internal error that can occur when `CREATE OR REPLACE VIEW` replaces a view with fewer columns and another entity depended on the view.

Co-authored-by: Jordan Lewis <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: craig[bot] <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Mar 21, 2023
This commit sets a new 30m timeout for all disk stall roachtests. Previously,
the FUSE filesystem variants had no timeout and inherited the default 10h
timeout. The other variants had a 20m timeout, which has been observed to be
too short due to upreplication latency.

Informs #98904.
Informs #98886.
Epic: None
Release note: None
blathers-crl bot pushed a commit that referenced this issue Mar 21, 2023
This commit sets a new 30m timeout for all disk stall roachtests. Previously,
the FUSE filesystem variants had no timeout and inherited the default 10h
timeout. The other variants had a 20m timeout, which has been observed to be
too short due to upreplication latency.

Informs #98904.
Informs #98886.
Epic: None
Release note: None
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 21, 2023
This commit sets a new 30m timeout for all disk stall roachtests. Previously,
the FUSE filesystem variants had no timeout and inherited the default 10h
timeout. The other variants had a 20m timeout, which has been observed to be
too short due to upreplication latency.

Informs cockroachdb#98904.
Informs cockroachdb#98886.
Epic: None
Release note: None
@jbowens jbowens added the X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue label Mar 22, 2023
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. T-storage Storage Team X-infra-flake the automatically generated issue was closed due to an infrastructure problem not a product issue
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants