Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sideloading: 2x disk usage after a restore #18077

Closed
danhhz opened this issue Aug 31, 2017 · 6 comments
Closed

sideloading: 2x disk usage after a restore #18077

danhhz opened this issue Aug 31, 2017 · 6 comments
Milestone

Comments

@danhhz
Copy link
Contributor

danhhz commented Aug 31, 2017

When restoring X bytes of data, you need 2X (plus some headroom) of available space on the cluster. This is because (until the new tables get some traffic) the raft log for the new ranges is not truncated, so an entire copy of the restored data ends up sitting in raft logs.

We fixed this once by making the raft log truncation work by both number of entries and total size, but it seems there's a regression when it comes to sideloading. Perhaps the truncation threshold logic doesn't take into account the fully hydrated command size?

@danhhz danhhz added this to the 1.2 milestone Aug 31, 2017
@petermattis
Copy link
Collaborator

Is there an easy way to reproduce this without running a large restore? That is, can you give me a small test that uses sideloaded data. Shouldn't be difficult to track down where the lack of truncation is occurring.

@danhhz
Copy link
Contributor Author

danhhz commented Aug 31, 2017

I could provide you with a 1GB restore. Anything smaller that that would probably require some work on my end to set up

@petermattis
Copy link
Collaborator

petermattis commented Aug 31, 2017 via email

@danhhz
Copy link
Contributor Author

danhhz commented Aug 31, 2017

TestDBAddSSTable

@petermattis
Copy link
Collaborator

This is the code which adjusts the Raft log size when appending entries:

	if len(rd.Entries) > 0 {
		// All of the entries are appended to distinct keys, returning a new
		// last index.
		thinEntries, err := r.maybeSideloadEntriesRaftMuLocked(ctx, rd.Entries)
		if err != nil {
			return stats, err
		}
		if lastIndex, lastTerm, raftLogSize, err = r.append(
			ctx, writer, lastIndex, lastTerm, raftLogSize, thinEntries,
		); err != nil {
			return stats, err
		}
	}

Because the size is adjust when appending the thinEntries we don't account for the size of the side-loaded data. It seems easy to perform this accounting in maybeSideloadEntriesRaftMuLocked. We'd also have to fix Replica.applySnapshot which has a similar bit of code. @tschottdorf Am I missing anything here? I'm guessing this was a simple oversight.

@tbg
Copy link
Member

tbg commented Sep 1, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants