-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implements add_column_transform_regexp feature #1690
implements add_column_transform_regexp feature #1690
Conversation
This allows one to add column transforms by patterns that match the table name and column name, using re.fullmatch. It adds a new function `add_column_transform_regexp()`, so the original API does not change. It also adds an internal helper function `_realize_regexp_transforms()` that converts transform patterns into explicit transform dicts that are applied as before. This way there is no speed impact compared to the original implementation, as this function is only called during schema generation.
Codecov Report
@@ Coverage Diff @@
## master #1690 +/- ##
==========================================
+ Coverage 90.52% 90.57% +0.04%
==========================================
Files 186 186
Lines 13998 14065 +67
==========================================
+ Hits 12672 12739 +67
Misses 1326 1326
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This allows one to add column transforms by patterns that match the table name and column name, using re.fullmatch. It adds a new function
add_column_transform_regexp()
, so the original API does not change.It also adds an internal helper function
_realize_regexp_transforms()
that converts transform patterns into explicit transform dicts that are applied as before. This way there is no speed impact compared to the original implementation, as this function is only called during schema generation.This feature is needed in order to make the DL2 output of #1673 work if we split tables by tel_id, since it allows one to add column transforms to all tables for each algorithm without a huge and complex loop over all possible names.
Technical note: we could replace
add_column_transform()
with this implementation, i.e. to always treat all transforms as regexps and hide this as an implementation detail, but I decided on simply adding a separate API function for adding by regexp, since otherwise it could make schema generation slow in cases where there are a lot of tables (like DL1 files, etc).