Add support for multi-column --key values #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These modifications allow users to pass multiple (comma-separated) columns as the
--key
, for scenarios in which rows are uniquely identified by a combination of columns — for instance, the county and the state. For instance:An arbitrary number of columns can be used. These scenarios are fairly common, in my experience.
I aimed to make this implementation as simple as possible. As such, it doesn't handle one particular edge case: columns whose names contain a comma. My instinct is that this could be handled by adding a
--key-sep
option, in which the user could pass any arbitrary string to serve as a separator. E.g.,:... and then passing that argument to
load_csv
/load_json
. But figured I'd raise the possibility here first before mucking around too much in the code.