-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement DataValidator.apply()
#368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, some smaller comments in line and one file to delete (I think).
After that, good to merge.
error_list = [] | ||
|
||
with adjust_log_level(): | ||
for item in self.criteria_items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could turn this into a single list comprehension and use the walrus operator:
if error_list := [
" Criteria: "
+ ", ".join([f"{key}: {value}" for key, value in item.criteria.items()])
+ "\n"
+ textwrap.indent(str(df.validate(**item.criteria)), prefix=" ")
+ "\n"
for item in self.criteria_items
if df.validate(**item.criteria) is not None
]:
logger.error(
"Failed data validation (file %s):\n%s",
get_relative_path(self.file),
"\n".join(error_list),
)
not sure if that's more readable though.
Feel free to keep whatever is most readable to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems less readable than the current implementation, so suggest to get it unless we make a utility-function that moves the string-concat somewhere else - maybe together with a refactor of the RequiredDataValidator
?
This is the implementation for the
apply
method of theDataValidator
.The current implementation writes each validation-item to the log separately, together with the failing data rows.
I initially had all failing validations (per yaml file) concatenated to one DataFrame, but I then thought that it would not be helpful if users can't see which criteria they specifically do not pass.
If you agree with this approach in principle, I can make the error-message a bit nicer (indentation, more concise criteria representation).