DATAFU-177 Add dedupByAllExcept #46

eyala · 2024-11-19T12:39:02Z

A new method for when you want to de-duplicate records, but not lose any "real" data.

For example if a server creates events with an autogenerated event id, and sometimes
events are duplicated. You don't want double rows just for the event ids, but if any of the other fields are distinct you want to keep the rows (with their original event ids) - otherwise you'd just drop the event id column. In order to keep at least one value you need to tediously list all the other columns.

JIRA: https://issues.apache.org/jira/browse/DATAFU-177

DATAFU-177 Add dedupByAllExcept

5219af6

eyala self-assigned this Dec 9, 2024

eyala merged commit 6fd6fc4 into apache:main Dec 9, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DATAFU-177 Add dedupByAllExcept #46

DATAFU-177 Add dedupByAllExcept #46

eyala commented Nov 19, 2024

DATAFU-177 Add dedupByAllExcept #46

DATAFU-177 Add dedupByAllExcept #46

Conversation

eyala commented Nov 19, 2024