Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: support option to compress output files using gzip #45978

Merged
merged 1 commit into from
Apr 2, 2020

Conversation

C0rWin
Copy link
Contributor

@C0rWin C0rWin commented Mar 11, 2020

Fix #45579

This commit extends EXPORT functionality by enabling compression of the
exported stream as suggested in #45579. Currently only gzip is supported
and the export clause to use compression looks as following:

export into csv 's3://export.csv' with compression = gzip from select * from foo;

Signed-off-by: Artem Barger [email protected]

Release note (sql change): support option to compress output files using
gzip

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 11, 2020

Currently still work in progress

@C0rWin C0rWin force-pushed the export branch 2 times, most recently from 22a6005 to 713ca1d Compare March 11, 2020 10:35
@dt dt self-requested a review March 11, 2020 13:34
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks for picking it up!

A few comments inline.

pkg/ccl/importccl/exportcsv.go Show resolved Hide resolved
pkg/ccl/importccl/exportcsv.go Show resolved Hide resolved
pkg/sql/execinfrapb/processors_bulk_io.proto Outdated Show resolved Hide resolved
pkg/sql/execinfrapb/processors_bulk_io.proto Outdated Show resolved Hide resolved
pkg/sql/execinfrapb/processors_bulk_io.proto Outdated Show resolved Hide resolved
pkg/sql/export.go Outdated Show resolved Hide resolved
pkg/sql/distsql_physical_planner.go Outdated Show resolved Hide resolved
@dt
Copy link
Member

dt commented Mar 11, 2020

As far as testing, i think you can just duplicate one of the existing tests, for example:
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/importccl/exportcsv_test.go#L181
but just confirm that a) the EXPORT runs with your option and b) you then open a file named "n1.0.csv*.gz*" and that that file decompresses to the expected content via a gzip stream.

@C0rWin C0rWin force-pushed the export branch 2 times, most recently from 1d20e51 to 01cafe2 Compare March 12, 2020 21:45
@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 12, 2020

As far as testing, i think you can just duplicate one of the existing tests, for example:
https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/importccl/exportcsv_test.go#L181
but just confirm that a) the EXPORT runs with your option and b) you then open a file named "n1.0.csv*.gz*" and that that file decompresses to the expected content via a gzip stream.

thanks will add UT based on that example.

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just a few final nits inline.

I think you can ignore the red CI for the moment: the testrace failure is known flake (#45919) and the lint failure is just enforcing the 2 week code freeze.

pkg/ccl/importccl/exportcsv.go Show resolved Hide resolved
@@ -160,12 +245,16 @@ func (sp *csvWriter) Run(ctx context.Context) {
}
defer es.Close()

size := buf.Len()
size := writer.Len()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to move this after .Close() below to include the flushed buf / trailer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, yes, it has to be after close to count on footers as well.

pkg/ccl/importccl/exportcsv.go Outdated Show resolved Hide resolved
pkg/ccl/importccl/exportcsv.go Outdated Show resolved Hide resolved
pkg/sql/execinfrapb/processors_bulk_io.proto Outdated Show resolved Hide resolved
pkg/sql/export.go Show resolved Hide resolved
pkg/ccl/importccl/exportcsv.go Show resolved Hide resolved
pkg/ccl/importccl/exportcsv.go Outdated Show resolved Hide resolved
@C0rWin C0rWin force-pushed the export branch 4 times, most recently from 1783eff to 2c303b8 Compare March 13, 2020 17:57
@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 13, 2020

I think you can ignore the red CI for the moment: the testrace failure is known flake (#45919) and the lint failure is just enforcing the 2 week code freeze.

understood and make sense 👍

@dt dt self-assigned this Mar 13, 2020
@dt dt changed the title wip/importccl support option to compress output files using gzip importccl: support option to compress output files using gzip Mar 13, 2020
@dt
Copy link
Member

dt commented Mar 13, 2020

Thank for taking this on!

Code looks good to go once master re-opens, though it looks like you might need to make again and amend to check in the generated proto diffs.

While you're at it, if you were so inclined, could you amend the commit message so the initial line has a colon after the package, and so it mentions EXPORT e.g. importccl: support option to compress EXPORT files using gzip.

Other than that though, I think this can merge once we re-open master in the next week or two to new features. Thanks again!

@C0rWin C0rWin force-pushed the export branch 2 times, most recently from 2cabe80 to 0f03079 Compare March 13, 2020 19:14
@C0rWin C0rWin closed this Mar 20, 2020
@C0rWin C0rWin deleted the export branch March 21, 2020 21:39
@C0rWin
Copy link
Contributor Author

C0rWin commented Mar 21, 2020

it seems that I've mistakenly forced push from master rather than development branch, which causes to auto-close pull-request. at any rate that wasn't intentional, pushed alternative one here: #46405

managed to update and reopen

@C0rWin C0rWin reopened this Mar 21, 2020
@dt
Copy link
Member

dt commented Mar 31, 2020

@C0rWin with the 20.1 branch cut today, we're about ready to re-open master to merging feature PRs. If you are some inclined, want to give this a quick rebase (and run make to ensure the protos are re-generated since it looks like there are some diffs there right now) and we can merge it?

@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 1, 2020

@C0rWin with the 20.1 branch cut today, we're about ready to re-open master to merging feature PRs. If you are some inclined, want to give this a quick rebase (and run make to ensure the protos are re-generated since it looks like there are some diffs there right now) and we can merge it?

let me rebase and try.

@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 1, 2020

@dt running make leaves me with generated

        modified:   pkg/sql/execinfrapb/processors_bulk_io.pb.go
        modified:   pkg/storage/enginepb/mvcc3.pb.go
        modified:   pkg/util/hlc/timestamp.pb.go

also seems after rebasing UT fails in TeamCity due to some issue with protos.

@dt
Copy link
Member

dt commented Apr 1, 2020

yeah, I think you want to check in those diffs as well -- a change to one proto can change all the generated ones and you just need to check in the generated diff as well. The CI is failing in the check that re-running generation produces no diffs.

@C0rWin C0rWin force-pushed the export branch 3 times, most recently from 39886d5 to 9a981cc Compare April 2, 2020 00:32
@C0rWin
Copy link
Contributor Author

C0rWin commented Apr 2, 2020

Unit test fails due to:

panic: event RetriableErr{CanAutoRetry:false, IsCommit:false} inappropriate in current state NoTxn{} [recovered]
	panic: panic while executing 1 statements: UPDATE _._ SET _ = $1 WHERE _ = $2; caused by event RetriableErr{CanAutoRetry:false, IsCommit:false} inappropriate in current state NoTxn{}

test in failure TestBackupRestoreWithConcurrentWrites, wonder how and whenever it actually related to current PR? fyi @dt

does it make sense to just re-run UT?

This commit extends EXPORT functionality by enabling compression of the
exported stream as suggested in cockroachdb#45579. Currently only gzip is supported
and the export clause to use compression looks as following:

```
export into csv 's3://export.csv' with compression = gzip from select * from foo;
```

Signed-off-by: Artem Barger <[email protected]>

Release note (sql change): support option to compress output files using
gzip

Release justification: none
@dt
Copy link
Member

dt commented Apr 2, 2020

Whoops, sorry, the TestBackupRestoreWithConcurrentWrites failure is an unrelated flake recently introduced and fixed yesterday in #46829 -- might need one more rebase.

@dt
Copy link
Member

dt commented Apr 2, 2020

Oh, nevermind, looks like it passed!

Thanks again for sticking with this though a rough couple weeks in our dev cycle.

bors r=dt

@craig
Copy link
Contributor

craig bot commented Apr 2, 2020

Build succeeded

@craig craig bot merged commit 6b4b508 into cockroachdb:master Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EXPORT: support option to compress output files using gzip
3 participants