Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1229] Do not write RemoveFile.numRecords to the delta log checkpoint #1230

Conversation

scottsand-db
Copy link
Collaborator

@scottsand-db scottsand-db commented Jun 23, 2022

Description

Resolves #1229.

This PR ensures that RemoveFile.numRecords field is not written out to the delta checkpoint. We do this by removing numRecords from the RemoveFile constructor.

How was this patch tested?

We write out a checkpoint, and read it back as parquet, and ensure that its schema does not contain numRecords.

Does this PR introduce any user-facing changes?

No.

@scottsand-db scottsand-db self-assigned this Jun 23, 2022
@scottsand-db scottsand-db changed the title [1229] Filter which RemoveFile fields are written to the checkpoint; add test [1229] Do not write RemoveFile.numRecords to the delta log checkpoint Jun 23, 2022
@scottsand-db scottsand-db changed the title [1229] Do not write RemoveFile.numRecords to the delta log checkpoint [WIP] [1229] Do not write RemoveFile.numRecords to the delta log checkpoint Jun 23, 2022
@scottsand-db scottsand-db changed the title [WIP] [1229] Do not write RemoveFile.numRecords to the delta log checkpoint [1229] Do not write RemoveFile.numRecords to the delta log checkpoint Jun 29, 2022
@tdas
Copy link
Contributor

tdas commented Jul 1, 2022

Shouldnt this be in the 2.0 as well?

vkorukanti pushed a commit to vkorukanti/delta that referenced this pull request Jul 14, 2022
(Cherry-pick of delta-io#1230)

This PR avoids persisting NumRecord of RemoveFile action to checkpoint by removing this attribute from the constructor of RemoveFile object.

Resolves delta-io#1229.

This PR ensures that `RemoveFile.numRecords` field is not written out to the delta checkpoint.

We write out a checkpoint, and read it back as parquet, and ensure that its schema does not contain `numRecords`.

No.

Closes delta-io#1230.

GitOrigin-RevId: 518e46c0622cca4277729e9e6e7ebb08452619f3
@vkorukanti vkorukanti added this to the 2.0.0 milestone Jul 19, 2022
mmengarelli pushed a commit to mmengarelli/delta.io that referenced this pull request Jul 26, 2022
This PR avoids persisting NumRecord of RemoveFile action to checkpoint by removing this attribute from the constructor of RemoveFile object.

Resolves delta-io#1229.

This PR ensures that `RemoveFile.numRecords` field is not written out to the delta checkpoint.

We write out a checkpoint, and read it back as parquet, and ensure that its schema does not contain `numRecords`.

No.

Closes delta-io#1230.

GitOrigin-RevId: 518e46c0622cca4277729e9e6e7ebb08452619f3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Do not persist RemoveFile.numRecords to the checkpoint
5 participants