Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/warnings_sort_QualityEngine.report #11

Merged
merged 11 commits into from
Sep 1, 2021

Conversation

jfsantos-ds
Copy link
Contributor

Core changes

  • QualityEngine warnings is now a list
  • Changed store_warnings add to append (list method)
  • Updated report method, now properly sorts warnings based on priority
  • Added warning counts by priority
  • Reproduced changes in the DataQuality class

Minor changes

  • Updated all engine scripts (changed all engine._warnings.add to engine.store_warning)
  • Updated all notebooks, engine.warnings no longer needs to be converted to a list
  • Added missing examples of single warning inspection to notebooks
  • Fixed LabelInspector engine warning descriptions

@jfsantos-ds jfsantos-ds added the fix A bug fix label Aug 19, 2021
@jfsantos-ds jfsantos-ds self-assigned this Aug 19, 2021
@@ -68,15 +67,14 @@ def dtypes(self, dtypes: dict):

def store_warning(self, warning: QualityWarning):
"Adds a new warning to the internal 'warnings' storage."
self._warnings.add(warning)
self._warnings.append(warning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should maintain uniqueness of warnings, otherwise running the same failing test multiple times will generate duplicated warnings which are meaningless. this is solved for the .report method with the set but not for the .warnings property. wdyt of the strategy below?

    def store_warning(self, warning: QualityWarning):
        "Adds a new warning to the internal 'warnings' storage."
        if warning not in self._warnings:
            self._warnings.append(warning)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right if this holds true, when we do:

warning not in self._warnings

we are running the warning __eq__ method against each content of _warnings, right? Doing a set of objects also resorts to __eq__ to filter uniques right?

The user might raise the same warning, in the same test but with different parameters. We would not filter in that case using your proposal, right?

Besides that I wonder if self.warnings should be an accessible property and not private. I think we probably should only give access to report and get_warnings methods. ATM get_warnings is not filtering unique warnings but should right on the beginning, just like report. What do you think?

Copy link
Contributor Author

@jfsantos-ds jfsantos-ds Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing warnings as an accessible property would mean switching 3 occurrences of self.warnings to self._warnings and removing the warnings property definition in core engine.
Across the different tutorials the sample warning demo should be changed to engine.get_warnings()[x] too.
Wdyt?

Copy link
Contributor

@UrbanoFonseca UrbanoFonseca Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm not 100% sure of what is the python implementation but from local testing it seems that defining __eq__ is enough to test the presence in arrays (like we are doing in the warning not in self._warnings). From our implementation, we are comparing the {category, test, description, priority} attributes on the __eq__ , so if two warnings of the same test have different parameters, the {category, test} are the same but the at least the {description} should be different (we often add some success/failure metrics to the warning description).
  2. Given the necessary sort (by priority) and the optional filtering, we can remove the warnings as a property and keep only the get_warnings method

Copy link
Contributor Author

@jfsantos-ds jfsantos-ds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like nothing odd remains.

Copy link
Contributor

@UrbanoFonseca UrbanoFonseca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 👍 🚀

@UrbanoFonseca UrbanoFonseca merged commit ffac9f2 into master Sep 1, 2021
@portellaa portellaa deleted the fix/report_method_qualityengine branch September 23, 2021 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix A bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants