Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: SSTable to be imported has 6 live copies in memory #36347

Closed
andreimatei opened this issue Mar 29, 2019 · 3 comments
Closed

storage: SSTable to be imported has 6 live copies in memory #36347

andreimatei opened this issue Mar 29, 2019 · 3 comments
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors X-stale

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Mar 29, 2019

While investigating a cluster with a stuck AddSSTable request that's getting constantly reproposed and rejected by Raft (to be filed elsewhere) and looking at heap profiles, it would appear that an SSTable is present 6 times in memory. It's hard for me to say how many copies of the thing there should ideally be (I'm rooting for one), but 6 seems too damn much.

Check out the allocation flame graph:
flame

All the highlighted regions are ~33MB slots, all presumably pertaining to one AddSSTable that's constantly being reproposed.

1: AddSSTable request in a BatchRequest
2: defaultSubmitProposalLocked makes a copy, held alive by the proposals map in the replica
3: refreshProposalsLocked makes a copy to repropose. reproposing because of RaftReady
4: raft.maybeSendAppend reads the log entry (from disk?) to decide if it needs to send proposals. This guy doesn't seem to need to deal with sideloaded data.
5: another reproposal, this time because of ReasonTick
6: protouil.Marshall. This is a copy of #5?

raw heap profile
heap.zip

cc @bdarnell @petermattis @dt @ajwerner @tbg @nvanbenschoten

@andreimatei andreimatei added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors labels Mar 29, 2019
@dt
Copy link
Member

dt commented Mar 29, 2019

on the same topic, though I don't see it in that profile: I think there might be another copy made during EvalAddSSTable when we make a memory-backed "file" for the sst iterator and then "write" the content to said file. I've been meaning to dust off a branch that implements a leveldb "file" directly on top of a []byte to avoid that copy.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 1, 2019
Informs cockroachdb#36347.

This change avoids the unnecessary allocation and memory copy present in
Raft command encoding. This extra work is expensive for large commands
like `AddSSTable` requests. Even for smaller requests, this work was
still a serious problem because it took place under heavily contended
locks. For instance, the encoding in `defaultSubmitProposalLocked` took
place under the Replica mutex, which serializes all Raft proposals for
a Range. The other two locations fixed took place under the Raft mutex.
While less heavily contended, this was still slowing down the Raft
processing goroutine.

This is a less dramatic version of a change I've been working on. The
full change lifts the slice allocation and most of the RaftCommand proto
marshalling all the way out of `defaultSubmitProposalLocked` and out of
the `Replica.mu` critical section. This commit gets us part of the way
there sets the stage for the rest of the change.

Release note: None
@nvanbenschoten
Copy link
Member

I've had a change to remove the copy in defaultSubmitProposalLocked floating around for a little while. I put part of it up in a PR here: #36392. This should address some of these highlighted regions.

:plause: for this issue. We should open more like it!

@andreimatei
Copy link
Contributor Author

:plause:

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 1, 2019
Informs cockroachdb#36347.

This change avoids the unnecessary allocation and memory copy present in
Raft command encoding. This extra work is expensive for large commands
like `AddSSTable` requests. Even for smaller requests, this work was
still a serious problem because it took place under heavily contended
locks. For instance, the encoding in `defaultSubmitProposalLocked` took
place under the Replica mutex, which serializes all Raft proposals for
a Range. The other two locations fixed took place under the Raft mutex.
While less heavily contended, this was still slowing down the Raft
processing goroutine.

This is a less dramatic version of a change I've been working on. The
full change lifts the slice allocation and most of the RaftCommand proto
marshalling all the way out of `defaultSubmitProposalLocked` and out of
the `Replica.mu` critical section. This commit gets us part of the way
there sets the stage for the rest of the change.

Release note: None
craig bot pushed a commit that referenced this issue Apr 2, 2019
36392: storage: avoid copying marshalled RaftCommand when encoding r=nvanbenschoten a=nvanbenschoten

Informs #36347.

This change avoids the unnecessary allocation and memory copy present in
Raft command encoding. This extra work is expensive for large commands
like `AddSSTable` requests. Even for smaller requests, this work was
still a serious problem because it took place under heavily contended
locks. For instance, the encoding in `defaultSubmitProposalLocked` took
place under the Replica mutex, which serializes all Raft proposals for
a Range. The other two locations fixed took place under the Raft mutex.
While less heavily contended, this was still slowing down the Raft
processing goroutine.

This is a less dramatic version of a change I've been working on. The
full change lifts the slice allocation and most of the RaftCommand proto
marshalling all the way out of `defaultSubmitProposalLocked` and out of
the `Replica.mu` critical section. This commit gets us part of the way
there sets the stage for the rest of the change.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 9, 2019
Informs cockroachdb#36347.

This change avoids the unnecessary allocation and memory copy present in
Raft command encoding. This extra work is expensive for large commands
like `AddSSTable` requests. Even for smaller requests, this work was
still a serious problem because it took place under heavily contended
locks. For instance, the encoding in `defaultSubmitProposalLocked` took
place under the Replica mutex, which serializes all Raft proposals for
a Range. The other two locations fixed took place under the Raft mutex.
While less heavily contended, this was still slowing down the Raft
processing goroutine.

This is a less dramatic version of a change I've been working on. The
full change lifts the slice allocation and most of the RaftCommand proto
marshalling all the way out of `defaultSubmitProposalLocked` and out of
the `Replica.mu` critical section. This commit gets us part of the way
there sets the stage for the rest of the change.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors X-stale
Projects
None yet
Development

No branches or pull requests

4 participants