-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for additional compressors/decompressors #7978
Conversation
Signed-off-by: Renan Rangel <[email protected]>
66c7b9c
to
4becf1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get @deepthi input on this.
Sorry for the delay in responding. I somehow missed the notification for reviewing this PR.
To start with we only need one flag - let us call it |
Things have changed a little bit now (see detail in #8175), so the design will need to change to accommodate a choice of builtin decompressors also. |
@deepthi I saw that, I will update this PR to take it into account, thanks! |
hi @deepthi, I have update the code with support for the two classes of compressors and decompressors:
|
This sounds good. Let us not add cgo right now, we want to avoid that if possible. |
@deepthi It should be ready for review now 👍, I would love to get your feedback on it |
go/vt/mysqlctl/compression.go
Outdated
} | ||
} | ||
|
||
return "", fmt.Errorf("%w \"%s\"", errUnsupportedCompressionExtension, extension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anywhere where you have \"%s\"
in a format string, you can replace it with %q
, which adds the quotes automatically and escapes the contents of the string so they always look proper inside the quotes. 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat, I will change that!
Hiiii @rvrangel! Been looking at this PR. I'm afraid that your usage of From the
You can see that this API in the stdlib is very nuanced. I would recommend refactoring this implementation so that all Let me know if you get stuck! |
go/vt/mysqlctl/compression.go
Outdated
} | ||
|
||
func prepareExternalCompressionCmd(ctx context.Context, cmdStr string) (*exec.Cmd, error) { | ||
cmdArgs := strings.Split(cmdStr, " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't perfectly cromulent; it will corrupt argument strings with quotes or escaped whitespace. I would suggest bringing https://pkg.go.dev/github.com/google/shlex as a (tiny) dependency to parse the commandline properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will take a look at it and make the changes 👍
hi @vmg, thanks for the feedback. It is correct that we should not call |
I see now, I was a bit confused with the |
Signed-off-by: Renan Rangel <[email protected]>
@vmg I updated the PR, let me know if that covers most of what you thought and if you have any additional feedback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Looks like a good start! Once @deepthi approves this we'll merge and I'll run some profiles to see how it performs in practice. Thank you!
Signed-off-by: Renan Rangel <[email protected]>
and thanks for the feedback @vmg! As a heads up, one limiting factor we encountered is the download speed for the restores when using S3, due to the fact vitess/go/vt/mysqlctl/s3backupstorage/s3.go Lines 191 to 208 in 440d2e6
I have some code that implements multipart download as a buffer in memory, but still needs some polishing, but will soon create another PR. We saw some improvements on the 2-4x range when using other algorithms since |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just need the copyright notices, then good from my end.
@@ -0,0 +1,281 @@ | |||
package mysqlctl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is missing the copyright notice
@@ -0,0 +1,199 @@ | |||
package mysqlctl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on copyright notice
Signed-off-by: Renan Rangel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the same support is not being added to the builtin engine? The compression options should be independent of the engine being used.
Also, more tests are needed. It should be fine to overload existing backup tests so that different tests use different options.
unit tests: https://github.com/vitessio/vitess/blob/main/go/vt/wrangler/testlib/backup_test.go
endtoend: https://github.com/vitessio/vitess/tree/main/go/test/endtoend/backup
go/vt/mysqlctl/xtrabackupengine.go
Outdated
// switch which compressor/decompressor to use | ||
builtinCompressor = flag.String("xtrabackup_builtin_compressor", "pgzip", "which builtin compressor engine to use") | ||
builtinDecompressor = flag.String("xtrabackup_builtin_decompressor", "auto", "which builtin decompressor engine to use") | ||
// use and external command to decompress the backups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: and -> an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While adding support to builtinbackupengine
, all these flags should be renamed so that they are not xtrabackup specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @deepthi thanks for the feedback. I will add support for it this week in the builtin engine and for the tests 👍
go/vt/mysqlctl/xtrabackupengine.go
Outdated
@@ -63,6 +61,13 @@ var ( | |||
// striping mode | |||
xtrabackupStripes = flag.Uint("xtrabackup_stripes", 0, "If greater than 0, use data striping across this many destination files to parallelize data transfer and decompression") | |||
xtrabackupStripeBlockSize = flag.Uint("xtrabackup_stripe_block_size", 102400, "Size in bytes of each block that gets sent to a given stripe before rotating to the next stripe") | |||
// switch which compressor/decompressor to use | |||
builtinCompressor = flag.String("xtrabackup_builtin_compressor", "pgzip", "which builtin compressor engine to use") | |||
builtinDecompressor = flag.String("xtrabackup_builtin_decompressor", "auto", "which builtin decompressor engine to use") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to add some help text on what "auto" means.
Signed-off-by: Renan Rangel <[email protected]>
@rvrangel when you get back to this, we will also need a companion website PR to document the new options. |
Superseded by #10558 which has been merged. |
Signed-off-by: Renan Rangel [email protected]
Description
This change will allow us to use an external decompressor when available as opposed to the internal
pgzip
currently used. We ran some tests usingpigz -d -c
and we get a nice speed bump on decompression, around 30% or more faster decompressing when restoring from a backup in S3. While decompressing is still single-threaded inpigz
, it uses separate threads for reading, writing and checksum, and it results in faster performance.I wanted to get an initial feedback on this change, as we would like to also to plug our own compressor to use other encryption algorithms (like
lz4
orzstd
) with an external binary as this is a more flexible setup for users, but also work on the change to add built-in support for one of these compression algorithms.Related Issue(s)
#7802
Checklist
Deployment Notes
This should not have any affect on current deployments. To take advantage of this, one needs to pass the correct flags to
vttablet
and the command also needs to exist in the OS, otherwise it will revert to the builtin decompressor.