-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Great Expectations Plugin #495
Great Expectations Plugin #495
Conversation
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
plugins/tests/greatexpectations/great_expectations/expectations/.ge_store_backend_id
Outdated
Show resolved
Hide resolved
plugins/greatexpectations/flytekitplugins/greatexpectations/task.py
Outdated
Show resolved
Hide resolved
plugins/greatexpectations/flytekitplugins/greatexpectations/task.py
Outdated
Show resolved
Hide resolved
plugins/greatexpectations/flytekitplugins/greatexpectations/task.py
Outdated
Show resolved
Hide resolved
plugins/greatexpectations/flytekitplugins/greatexpectations/task.py
Outdated
Show resolved
Hide resolved
plugins/greatexpectations/flytekitplugins/greatexpectations/task.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
…d of same dataclass name in both task and schema Signed-off-by: Samhita Alla <[email protected]>
e08b0f0
to
2ac96ab
Compare
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
@kumare3 Added support for FlyteSchema and FlyteFile. Please review. |
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/schema.py
Outdated
Show resolved
Hide resolved
Thanks for these updates, @samhita-alla! This looks great. |
…ling on the newer versions (will have to fix this) Signed-off-by: Samhita Alla <[email protected]>
Just a heads up—pinned the Great Expectations version to 0.13.23. If it's the latest, _______ ERROR collecting tests/scripts/test_flytekit_sagemaker_runner.py _______
ImportError while importing test module '/home/runner/work/flytekit/flytekit/tests/scripts/test_flytekit_sagemaker_runner.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/hostedtoolcache/Python/3.7.11/x64/lib/python3.7/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/scripts/test_flytekit_sagemaker_runner.py:5: in <module>
from scripts.flytekit_sagemaker_runner import run as _flyte_sagemaker_run
E ModuleNotFoundError: No module named 'scripts.flytekit_sagemaker_runner' |
Hi @samhita-alla - thanks for letting us know. That error doesn't look like it would be due to Great Expectations. Do you have any ideas about why it might be occurring? I think pinning will be ok temporarily, but we're in the process of rolling out some important fixes and powerful features, so it would be good to unpin in the near future. |
I'm not sure as to why it's happening. We will have to dig deeper to find that out. I'm sure the problem is on our end. |
Signed-off-by: Eduardo Apolinario <[email protected]>
I was able to repro this. This was caused by great-expectations/great_expectations#3003, which ends up defining a package called I opened #598 to fix the issue. |
…o great-expectations-plugin
Signed-off-by: Samhita Alla <[email protected]>
plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/schema.py
Outdated
Show resolved
Hide resolved
plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/task.py
Show resolved
Hide resolved
plugins/flytekit-greatexpectations/flytekitplugins/great_expectations/task.py
Show resolved
Hide resolved
@samhita-alla this is great, I have a few nits. we can handle them in 2 ways. I am ok with merging and then revisiting the nits, especially refactoring the extremely large functions and also handling other types - either with an error or supporting them |
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
Signed-off-by: Samhita Alla <[email protected]>
} | ||
|
||
# Great Expectations' RuntimeBatchRequest | ||
if self._batch_request_config and (self._batch_request_config.runtime_parameters or is_runtime): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this always a json? not a typed structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_batch_request_config
is a dataclass, whereas final_batch_request
is a dictionary.
Signed-off-by: Samhita Alla [email protected]
TL;DR
Add great expectations plugin to validate data.
Type
Are all requirements met?
Complete description
TASK
Define
GETask
and validate the dataset by passing it as an argument to the task.Valid data example:
Invalid data example:
Output:
TYPE
Attach
GEType
to your dataset in atask
which shall then automatically validate your data.EXAMPLES CONCERNING FlyteSchema & FlyteFile ARE AVAILABLE IN TESTS
Tracking Issue
https://github.com/lyft/flyte/issues/
Follow-up issue
NA
OR
https://github.com/lyft/flyte/issues/