Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DATAFU-177 Add dedupByAllExcept #46

Merged
merged 1 commit into from
Dec 9, 2024
Merged

DATAFU-177 Add dedupByAllExcept #46

merged 1 commit into from
Dec 9, 2024

Conversation

eyala
Copy link
Contributor

@eyala eyala commented Nov 19, 2024

A new method for when you want to de-duplicate records, but not lose any "real" data.

For example if a server creates events with an autogenerated event id, and sometimes
events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their original event ids) - otherwise you'd just drop the event id column. In order to keep at least one value you need to tediously list all the other columns.

JIRA: https://issues.apache.org/jira/browse/DATAFU-177

@eyala eyala self-assigned this Dec 9, 2024
@eyala eyala merged commit 6fd6fc4 into apache:main Dec 9, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant