Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: creating an index, user experienced OOM #36381

Closed
roncrdb opened this issue Apr 1, 2019 · 10 comments · Fixed by #36765
Closed

sql: creating an index, user experienced OOM #36381

roncrdb opened this issue Apr 1, 2019 · 10 comments · Fixed by #36765
Assignees
Labels
A-schema-changes A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@roncrdb
Copy link

roncrdb commented Apr 1, 2019

Describe the problem

User ran into OOM while creating an index, also experiencing some slight load from tests.

[10782031.209476] Out of memory: Kill process 25243 (cockroach) score 915 or sacrifice child
[10782031.210847] Killed process 25243 (cockroach) total-vm:118637456kB, anon-rss:62702344kB, file-rss:0kB, shmem-rss:0kB

Here are the top sources of Memory allocation from the node:

Type: inuse_space
Time: Apr 1, 2019 at 5:41am (EDT)
Showing nodes accounting for 4534.31MB, 96.66% of 4691.17MB total
Dropped 406 nodes (cum <= 23.46MB)
      flat  flat%   sum%        cum   cum%
 4352.75MB 92.79% 92.79%  4352.75MB 92.79%  github.com/cockroachdb/cockroach/vendor/github.com/golang/leveldb/memfs.(*file).Write
      64MB  1.36% 94.15%       64MB  1.36%  github.com/cockroachdb/cockroach/vendor/github.com/andy-kimball/arenaskl.NewArena (inline)
   27.03MB  0.58% 94.73%    49.53MB  1.06%  github.com/cockroachdb/cockroach/pkg/storage.newReplica
   26.54MB  0.57% 95.29%    26.54MB  0.57%  github.com/cockroachdb/cockroach/pkg/storage/raftentry.realloc
      24MB  0.51% 95.80%    27.50MB  0.59%  github.com/cockroachdb/cockroach/pkg/sql/sqlbase.EncodeSecondaryIndex
   20.99MB  0.45% 96.25%    48.49MB  1.03%  github.com/cockroachdb/cockroach/pkg/sql/backfill.(*IndexBackfiller).BuildIndexEntriesChunk
      18MB  0.38% 96.64%    54.89MB  1.17%  github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.newRaft
    0.50MB 0.011% 96.65%    55.39MB  1.18%  github.com/cockroachdb/cockroach/vendor/go.etcd.io/etcd/raft.NewRawNode
    0.50MB 0.011% 96.66%    59.58MB  1.27%  github.com/cockroachdb/cockroach/pkg/storage.(*Replica).withRaftGroupLocked

DEBUG ZIP

Expected behavior
Index should complete without OOM

Additional data / screenshots
User has been trying to create an index, and has run into multiple issues, may be related to #34878

Environment:

  • CockroachDB v19.1.0-beta.20190318
@roncrdb roncrdb added the A-sql-execution Relating to SQL execution. label Apr 1, 2019
@awoods187 awoods187 added S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Apr 1, 2019
@awoods187
Copy link
Contributor

cc @vivekmenezes

@jordanlewis
Copy link
Member

@roncrdb can you attach just the memory profile to this issue? Thanks!

@roncrdb
Copy link
Author

roncrdb commented Apr 1, 2019

@jordanlewis are you referring to this?

@jordanlewis
Copy link
Member

Yes, thanks!

Pasting the major allocation source along with its call stack for reference. @dt, this seems like the same thing you were looking at earlier, right?

image

@roncrdb
Copy link
Author

roncrdb commented Apr 1, 2019

Looks like the link above to the debug zip isn't working, here's a new link to the zip @jordanlewis

@dt
Copy link
Member

dt commented Apr 1, 2019

I've been sitting on a change that I think would fix that specific allocation for a long time now that I just PR'ed in #36394.

That said, I didn't pursue merging it sooner since I never actually saw that allocation feature significantly in any profiles, so I'm curious if there's another issue happening here. If indeed all that mem belongs to addSplitSSTable iterators, I'm not quite sure you can get that high: the initial req uses e.g. 32mb, it hits a split and you pass all 32mb to the splitting helper. It makes an iterator that copies 32mb, then makes the recursive call. Say they keep hitting splits ... you might make 64mb of copies before the first returns, but at that point they should be freed... right?

I'll get #36394 out for review now and see about backporting it, but I still want to dig in a bit more to understand how this happens / confirm that removing the copy really will fix it.

@jordanlewis
Copy link
Member

Here's a stack trace from the unhappy node:

github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1e0bb0000, 0xa5ee2a, 0xa5ee2a, 0xc1350368a0, 0x54, 0x60, 0xc1bf1bd3e0, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1350368a0, 0x54, 0x60, 0xc135036960, 0x54, 0x54, 0xc1e0bb0000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1df6e8000, 0xa637eb, 0xa637eb, 0xc0fe357f80, 0x54, 0x60, 0xc1bf1bd320, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc0fe357f80, 0x54, 0x60, 0xc174ee8060, 0x54, 0x54, 0xc1df6e8000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1de214000, 0xa6897e, 0xa6897e, 0xc11278ec60, 0x53, 0x60, 0xc1bf1bd260, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc11278ec60, 0x53, 0x60, 0xc11278ed20, 0x54, 0x54, 0xc1de214000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1dcd38000, 0xa6d903, 0xa6d903, 0xc1c1e4e840, 0x53, 0x60, 0xc1bf1bd1a0, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1c1e4e840, 0x53, 0x60, 0xc1c1e4e900, 0x54, 0x54, 0xc1dcd38000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1db850000, 0xa72d12, 0xa72d12, 0xc1cfbc3620, 0x53, 0x60, 0xc0dbc05040, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1cfbc3620, 0x53, 0x60, 0xc1cfbc36e0, 0x54, 0x54, 0xc1db850000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1da360000, 0xa775dc, 0xa775dc, 0xc1cf9b3200, 0x52, 0x60, 0xc1bf1bd0e0, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1cf9b3200, 0x52, 0x60, 0xc1cf9b32c0, 0x54, 0x54, 0xc1da360000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1d8e64000, 0xa7c3ea, 0xa7c3ea, 0xc1bef0a540, 0x51, 0x60, 0xc1bf1bd020, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1bef0a540, 0x51, 0x60, 0xc1bef0a600, 0x54, 0x54, 0xc1d8e64000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1d7960000, 0xa80bce, 0xa80bce, 0xc0cea76240, 0x53, 0x60, 0xc0a6173b60, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc0cea76240, 0x53, 0x60, 0xc0cea76300, 0x54, 0x54, 0xc1d7960000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1d6454000, 0xa8592e, 0xa8592e, 0xc176a86900, 0x53, 0x60, 0xc0a6173a40, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc176a86900, 0x53, 0x60, 0xc1767e6b40, 0x54, 0x54, 0xc1d6454000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
github.com/cockroachdb/cockroach/pkg/storage/bulk.addSplitSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc1d4f3c000, 0xa8af1a, 0xa8af1a, 0xc0e143c540, 0x52, 0x60, 0xc0a6173980, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:285 +0x916
github.com/cockroachdb/cockroach/pkg/storage/bulk.AddSSTable(0x39d8080, 0xc02664e960, 0xc00064e580, 0xc0e143c540, 0x52, 0x60, 0xc0e143c600, 0x54, 0x54, 0xc1d4f3c000, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/storage/bulk/sst_batcher.go:210 +0x6a3
created by github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*Flow).startInternal
    /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:563 +0x342

... except there were about 30 recursive calls.

@vivekmenezes
Copy link
Contributor

looks like AddSSTable has some retry logic that should only be applied at the highest level of the recursion

@dt
Copy link
Member

dt commented Apr 2, 2019

@vivekmenezes we can/should restructure the helpers here a bit, but with #36394 all the calls should be using the same []byte which should help at least avoid the OOM.

@dt
Copy link
Member

dt commented Apr 2, 2019

OOM fixed in #36394.

@dt dt closed this as completed Apr 2, 2019
@vivekmenezes vivekmenezes reopened this Apr 12, 2019
craig bot pushed a commit that referenced this issue Apr 14, 2019
36765: bulk: change AddSSTTable to not be recursive r=vivekmenezes a=vivekmenezes

AddSSTTable was recursive to deal with range splits.
Unfortunately the recursive call would create new SSTs
without freeing the older ones creating a memory buildup
that was quadratic. We've seen memory buildup on the order
of GBs due to this recursion.

fixes #36769
fixes #36381

Release note: None

Co-authored-by: Vivek Menezes <[email protected]>
@craig craig bot closed this as completed in #36765 Apr 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants