Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/importer: flake in TestCSVImportCanBeResumed [ingestKVs hangs with BulkAdderFlushesEveryBatch] #91828

Closed
knz opened this issue Nov 14, 2022 · 8 comments · Fixed by #94010
Closed
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test T-disaster-recovery

Comments

@knz
Copy link
Contributor

knz commented Nov 14, 2022

Found here: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_BazelExtendedCi/7502297?showRootCauses=false&expandBuildChangesSection=true&expandBuildProblemsSection=true&expandBuildTestsSection=true

Failed
=== RUN   TestCSVImportCanBeResumed
    test_log_scope.go:161: test logs captured to: /artifacts/tmp/_tmp/375538c3f706ce59f4e97af17139af4f/logTestCSVImportCanBeResumed2019049709
    test_log_scope.go:79: use -show-logs to present logs inline
    import_processor_test.go:739: Resume pos: 10
    import_processor_test.go:745: not enough memory available to create a BulkAdder: root: memory budget exceeded: 33554432 bytes requested, 101984256 currently allocated, 134217728 bytes in budget
    panic.go:522: -- test log scope end --
test logs left over in: /artifacts/tmp/_tmp/375538c3f706ce59f4e97af17139af4f/logTestCSVImportCanBeResumed2019049709
--- FAIL: TestCSVImportCanBeResumed (1.16s)

cc @rafiss for triage

Jira issue: CRDB-21446

@knz knz added C-test-failure Broken test (automatically or manually discovered). T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 14, 2022
craig bot pushed a commit that referenced this issue Nov 14, 2022
91635: jobs: add a comment about the Resumer contract r=andreimatei a=andreimatei

Release note: None
Epic: None

91754: zip: move `--redact-logs` deprecation warning to end of zip output r=dhartunian a=abarganier

This patch simply moves the deprecation notice for the `--redact-logs` flag to the bottom of the debug zip output. Previously, the message was logged at the beginning of the output, which was quickly drowned out by the rest of the output indicating the debug zip progress.

Release note: none

Addresses #91685

Epic: CRDB-12732

91830: sql/importer: skip a flaky test r=rafiss a=knz

Informs #91828.
Informs  #91845.
Informs #91850.

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@rafiss rafiss added the E-quick-win Likely to be a quick win for someone experienced. label Nov 17, 2022
@rafiss
Copy link
Collaborator

rafiss commented Nov 29, 2022

I applied a memory limit in #92689, but after that this test fails under stress with a new error

--- FAIL: TestCSVImportCanBeResumed (6.96s)
    test_log_scope.go:161: test logs captured to: /tmp/_tmp/73dfce30b9a5630b1b4dabed3c94b32c/logTestCSVImportCanBeResumed3015423266
    test_log_scope.go:79: use -show-logs to present logs inline
    import_processor_test.go:741: Resume pos: 10
    import_processor_test.go:755: -- test log scope end --
    import_processor_test.go:755: Leaked goroutine: goroutine 1657 [semacquire]:
        sync.runtime_Semacquire(0x11c41c5?)
                GOROOT/src/runtime/sema.go:62 +0x25
        sync.(*WaitGroup).Wait(0xc0035cddf0?)
                GOROOT/src/sync/waitgroup.go:139 +0x52
        golang.org/x/sync/errgroup.(*Group).Wait(0xc0041d34c0)
                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:53 +0x27
        github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.Wait({0xc0041d34c0?, {0x6a60950?, 0xc0041d3480?}})
                github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:144 +0x4a
        github.com/cockroachdb/cockroach/pkg/sql/importer.runImport({0x6a609f8, 0xc000f14780}, 0xc001b60200, 0xc0041bdd50, 0xc004f94f60, 0xc0041d33c0?)
                github.com/cockroachdb/cockroach/pkg/sql/importer/read_import_base.go:127 +0x7c6
        github.com/cockroachdb/cockroach/pkg/sql/importer.(*readImportDataProcessor).Start.func1()
                github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:183 +0x8b
        created by github.com/cockroachdb/cockroach/pkg/sql/importer.(*readImportDataProcessor).Start
                github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:181 +0xeb
        Leaked goroutine: goroutine 1659 [semacquire]:
        sync.runtime_Semacquire(0x11c41c5?)
                GOROOT/src/runtime/sema.go:62 +0x25
        sync.(*WaitGroup).Wait(0xc003565808?)
                GOROOT/src/sync/waitgroup.go:139 +0x52
        golang.org/x/sync/errgroup.(*Group).Wait(0xc0041d35c0)
                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:53 +0x27
        github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.Wait({0xc0041d35c0?, {0x6a60950?, 0xc0041d3580?}})
                github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:144 +0x4a
        github.com/cockroachdb/cockroach/pkg/sql/importer.ingestKvs({0x6a60950?, 0xc0041d3480?}, 0xc001b60200, 0xc0041bdd50, 0xc004f94f60, 0xc004680600)
                github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:553 +0x1028
        github.com/cockroachdb/cockroach/pkg/sql/importer.runImport.func2({0x6a60950, 0xc0041d3480})
                github.com/cockroachdb/cockroach/pkg/sql/importer/read_import_base.go:108 +0x6a
        github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
                github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x25
        golang.org/x/sync/errgroup.(*Group).Go.func1()
                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x64
        created by golang.org/x/sync/errgroup.(*Group).Go
                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:72 +0xa5
        Leaked goroutine: goroutine 1661 [chan send]:
        github.com/cockroachdb/cockroach/pkg/sql/importer.ingestKvs.func3()
                github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:468 +0x311
        github.com/cockroachdb/cockroach/pkg/sql/importer.ingestKvs.func5({0x6a60950, 0xc0041d3580})
                github.com/cockroachdb/cockroach/pkg/sql/importer/import_processor.go:547 +0x2cf
        github.com/cockroachdb/cockroach/pkg/util/ctxgroup.Group.GoCtx.func1()
                github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168 +0x25
        golang.org/x/sync/errgroup.(*Group).Go.func1()
                golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75 +0x64
        created by golang.org/x/sync/errgroup.(*Group).Go

The ingestKvs function is getting stuck in the BulkAdderFlushesEveryBatch testing knob:

if flowCtx.Cfg.TestingKnobs.BulkAdderFlushesEveryBatch {
_ = pkIndexAdder.Flush(ctx)
_ = indexAdder.Flush(ctx)
pushProgress()
}

@adityamaru or @dt - do you have any idea why this might happen?

@rafiss
Copy link
Collaborator

rafiss commented Dec 2, 2022

Since it seems related to the BulkAdder behavior, I'm putting this in the DR team board. Let us know if you want to give it back.

@rafiss rafiss removed E-quick-win Likely to be a quick win for someone experienced. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Dec 2, 2022
@blathers-crl
Copy link

blathers-crl bot commented Dec 2, 2022

cc @cockroachdb/disaster-recovery

@rafiss rafiss changed the title sql/importer: flake in TestCSVImportCanBeResumed sql/importer: flake in TestCSVImportCanBeResumed [ingestKVs hangs with BulkAdderFlushesEveryBatch] Dec 5, 2022
@adityamaru adityamaru self-assigned this Dec 6, 2022
@adityamaru
Copy link
Contributor

I have a patch that fixes the flake that looks very similar to the bigger change @stevendanna has around import processor shutdown in #91615. We plan on merging this soon, so I'm going to hold off on sending an independent patch.

@stevendanna
Copy link
Collaborator

Should be closed by #93782

@rafiss
Copy link
Collaborator

rafiss commented Dec 20, 2022

@stevendanna it seems like #93782 does not unskip the test. are you planning to do that? the test also needs to have memory limits set as per #91828 (comment)

@rafiss rafiss reopened this Dec 20, 2022
@stevendanna
Copy link
Collaborator

@rafiss Yikes, looks like I closed this too soon during triage, thanks! I'll unskip it and take a quick look at the memory limits.

stevendanna added a commit to stevendanna/cockroach that referenced this issue Dec 20, 2022
This unskips the test and adds a memory limit. The test completed
1500+ runs under stress. I believe cockroachdb#93782 resolved the main source of
flakiness here.

Fixes cockroachdb#91828

Release note: None
@stevendanna
Copy link
Collaborator

The stack from the error you hit makes me believe that it will be directly addressed by the fix in #93782. I've opened a PR to unskip the test here: #94010

@adityamaru adityamaru assigned stevendanna and unassigned adityamaru Dec 27, 2022
craig bot pushed a commit that referenced this issue Jan 3, 2023
94010: sql/importer: unskip TestCSVImportCanBeResumed r=rafiss a=stevendanna

This unskips the test and adds a memory limit. The test completed 1500+ runs under stress. I believe #93782 resolved the main source of flakiness here.

Fixes #91828

Release note: None

94345: storage: include range key stats in ScanStats r=erikgrinaker a=jbowens

Include the new, low-level Pebble range key iterator stats (introduced in cockroachdb/pebble#1871) in roachpb.ScanStats.

Informs #77580.
Close #93326.

Epic: None
Release note: None

94458: logictest: fix flakes from mixed version testing r=ZhouXing19 a=rafiss

fixes #92637

The main fix was to wait for each node to be reachable before beginning the upgrade process. This also includes a version bump that has better logging to make it easier to debug.

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
@craig craig bot closed this as completed in d522dab Jan 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). skipped-test T-disaster-recovery
Projects
No open projects
Archived in project
4 participants