fix/warnings_sort_QualityEngine.report #11

jfsantos-ds · 2021-08-19T13:40:28Z

Core changes

QualityEngine warnings is now a list
Changed store_warnings add to append (list method)
Updated report method, now properly sorts warnings based on priority
Added warning counts by priority
Reproduced changes in the DataQuality class

Minor changes

Updated all engine scripts (changed all engine._warnings.add to engine.store_warning)
Updated all notebooks, engine.warnings no longer needs to be converted to a list
Added missing examples of single warning inspection to notebooks
Fixed LabelInspector engine warning descriptions

UrbanoFonseca · 2021-08-24T10:02:30Z

src/ydata_quality/core/engine.py

@@ -68,15 +67,14 @@ def dtypes(self, dtypes: dict):

    def store_warning(self, warning: QualityWarning):
        "Adds a new warning to the internal 'warnings' storage."
-        self._warnings.add(warning)
+        self._warnings.append(warning)


we should maintain uniqueness of warnings, otherwise running the same failing test multiple times will generate duplicated warnings which are meaningless. this is solved for the .report method with the set but not for the .warnings property. wdyt of the strategy below?

def store_warning(self, warning: QualityWarning): "Adds a new warning to the internal 'warnings' storage." if warning not in self._warnings: self._warnings.append(warning)

Seems right if this holds true, when we do:

warning not in self._warnings

we are running the warning __eq__ method against each content of _warnings, right? Doing a set of objects also resorts to __eq__ to filter uniques right?

The user might raise the same warning, in the same test but with different parameters. We would not filter in that case using your proposal, right?

Besides that I wonder if self.warnings should be an accessible property and not private. I think we probably should only give access to report and get_warnings methods. ATM get_warnings is not filtering unique warnings but should right on the beginning, just like report. What do you think?

Removing warnings as an accessible property would mean switching 3 occurrences of self.warnings to self._warnings and removing the warnings property definition in core engine.
Across the different tutorials the sample warning demo should be changed to engine.get_warnings()[x] too.
Wdyt?

I'm not 100% sure of what is the python implementation but from local testing it seems that defining __eq__ is enough to test the presence in arrays (like we are doing in the warning not in self._warnings). From our implementation, we are comparing the {category, test, description, priority} attributes on the __eq__ , so if two warnings of the same test have different parameters, the {category, test} are the same but the at least the {description} should be different (we often add some success/failure metrics to the warning description).

Given the necessary sort (by priority) and the optional filtering, we can remove the warnings as a property and keep only the get_warnings method

…ydataai/ydata-quality into fix/report_method_qualityengine

jfsantos-ds

Seems like nothing odd remains.

UrbanoFonseca

Nice work! 👍 🚀

Francisco Santos added 2 commits August 19, 2021 11:46

fixed warnings sorting in qualityengine.report

dc1eb25

Warning store, reporting and view in full suite

01780f7

jfsantos-ds added the fix A bug fix label Aug 19, 2021

jfsantos-ds requested a review from UrbanoFonseca August 19, 2021 13:40

jfsantos-ds self-assigned this Aug 19, 2021

UrbanoFonseca reviewed Aug 24, 2021

View reviewed changes

jfsantos-ds requested a review from UrbanoFonseca August 24, 2021 16:03

Francisco Santos and others added 4 commits August 31, 2021 10:28

Drop and chang warnings prop to get_warnings

ccedee1

Merge branch 'master' into fix/report_method_qualityengine

5545a63

Reset previous commit. Keep ipynb changes

d70d021

Merge branch 'fix/report_method_qualityengine' of https://github.com/…

2a28299

…ydataai/ydata-quality into fix/report_method_qualityengine

jfsantos-ds commented Aug 31, 2021

View reviewed changes

Francisco Santos added 5 commits August 31, 2021 14:10

Removing lingering warnings property calls

1e4bcee

remove warnings property from DataQuality class

4615afe

add __clean_warnings method to base classes

c9144b3

Fix warnings list reset on expectations module

789c3bf

Fix expectations ipynb sample warning

9e73bb4

UrbanoFonseca approved these changes Sep 1, 2021

View reviewed changes

UrbanoFonseca merged commit ffac9f2 into master Sep 1, 2021

portellaa deleted the fix/report_method_qualityengine branch September 23, 2021 14:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix/warnings_sort_QualityEngine.report #11

fix/warnings_sort_QualityEngine.report #11

jfsantos-ds commented Aug 19, 2021

UrbanoFonseca Aug 24, 2021

jfsantos-ds Aug 24, 2021

jfsantos-ds Aug 24, 2021 •

edited

Loading

UrbanoFonseca Aug 30, 2021 •

edited

Loading

jfsantos-ds left a comment

UrbanoFonseca left a comment

fix/warnings_sort_QualityEngine.report #11

fix/warnings_sort_QualityEngine.report #11

Conversation

jfsantos-ds commented Aug 19, 2021

Core changes

Minor changes

UrbanoFonseca Aug 24, 2021

Choose a reason for hiding this comment

jfsantos-ds Aug 24, 2021

Choose a reason for hiding this comment

jfsantos-ds Aug 24, 2021 • edited Loading

Choose a reason for hiding this comment

UrbanoFonseca Aug 30, 2021 • edited Loading

Choose a reason for hiding this comment

jfsantos-ds left a comment

Choose a reason for hiding this comment

UrbanoFonseca left a comment

Choose a reason for hiding this comment

jfsantos-ds Aug 24, 2021 •

edited

Loading

UrbanoFonseca Aug 30, 2021 •

edited

Loading