Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] facet dataQualityAssertions is not displayed since 0.33.0 #2503

Closed
YLibert opened this issue Jun 7, 2023 · 5 comments · Fixed by #2528
Closed

[Bug] facet dataQualityAssertions is not displayed since 0.33.0 #2503

YLibert opened this issue Jun 7, 2023 · 5 comments · Fixed by #2528

Comments

@YLibert
Copy link

YLibert commented Jun 7, 2023

Since 0.33.0 I'm experiencing an issue related to the display of the dataQualityAssertions facet in Marquez.

How to reproduce

The cURL I'm using:
curl -X POST http://localhost:9091/api/v1/lineage -H 'Content-Type: application/json' -d '{"eventTime": "2023-06-06T18:00:00", "eventType": "COMPLETE", "inputs": [{"namespace": "Healthcheck","name": "public.delivery_7_days","facets": {"dataQualityAssertions": {"_producer": "https://github.com/MarquezProject/marquez/blob/main/docker/metadata.json","_schemaURL": "https://openlineage.io/spec/facets/1-0-0/DataQualityAssertionsDatasetFacet.json","assertions": [{"assertion": "not_null","success": false,"column": "driver_id"},{"assertion": "is_string","success": true,"column": "customer_address"}]}}}], "job": {"facets": {}, "name": "CheckIsAlive", "namespace": "Healthcheck"}, "outputs": [], "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.17.0/client/python", "run": {"facets": {}, "runId": "test003"}}'

Expected behavior (until 0.32.0):

image

Current behavior (for 0.33.0 and later):

image

@YLibert
Copy link
Author

YLibert commented Jun 7, 2023

What I've checked so far:

  • later versions still write the facets properly in the database
  • this bug only affects marquez (API), as later version of marquez-web still display the facet with marquez-api in 0.32.0

@YLibert
Copy link
Author

YLibert commented Jun 7, 2023

Looking deeper in this issue, I found that sending the dataQualityAssertions that way produces a dataset facet stored as INPUT type
image

which is not compatible with the latest version of the DAO.

image

If I change it to DATASET it's working again.

@YLibert
Copy link
Author

YLibert commented Jun 7, 2023

I also tested from scratch with a fresh version of Marquez using the ./docker/up.sh with the --seed option provided directly on the repo
In the food_delivery namespace, the dataset public.delivery_7_days should have a quality assertion (which is indeed in the db with the INPUT type) but it doesn't appear in the Marquez UI.
Like before, if I manually change the dataset facet dataQualityAssertions type to DATASET, it appears back in the UI

@YLibert
Copy link
Author

YLibert commented Jun 16, 2023

For information,
here is the corresponding run fetched from the getRun endpoint:

{
    "id": "d5a2a4c4-fc78-428d-ae85-08c942ed8371",
    "createdAt": "2020-02-22T22:42:42Z",
    "updatedAt": "2020-02-22T22:48:12Z",
    "nominalStartTime": "2020-02-22T22:00:00Z",
    "nominalEndTime": "2020-02-22T22:00:00Z",
    "state": "COMPLETED",
    "startedAt": "2020-02-22T22:42:42Z",
    "endedAt": "2020-02-22T22:48:12Z",
    "durationMs": 330000,
    "args": {
        "nominal_start_time": "2020-02-22T22:00Z[UTC]",
        "nominal_end_time": "2020-02-22T22:00Z[UTC]"
    },
    "jobVersion": {
        "namespace": "food_delivery",
        "name": "etl_delivery_7_days",
        "version": "c50792dd-7657-31b5-8e33-3ea014a8096b"
    },
    "inputDatasetVersions": [
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.orders_7_days",
                "version": "d09633c4-4412-36de-bce6-8002c662e18a"
            },
            "facets": {}
        },
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.customers",
                "version": "68c1e307-f6bb-36f9-8596-14609c7f022b"
            },
            "facets": {}
        },
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.order_status",
                "version": "676ac323-c8e3-3cea-b172-b468827afb51"
            },
            "facets": {}
        },
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.drivers",
                "version": "93ae26cc-87d8-3eae-9cdd-f9b6fd71f1f7"
            },
            "facets": {}
        },
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.restaurants",
                "version": "4db26821-7966-390f-9cf9-ac775fe9182b"
            },
            "facets": {}
        }
    ],
    "outputDatasetVersions": [
        {
            "datasetVersionId": {
                "namespace": "food_delivery",
                "name": "public.delivery_7_days",
                "version": "6f8f52f5-0230-31ce-a138-08b79e671b33"
            },
            "facets": {}
        }
    ],
    "facets": {
        "nominalTime": {
            "_producer": "https://github.com/MarquezProject/marquez/blob/main/docker/metadata.json",
            "_schemaURL": "https://openlineage.io/spec/facets/1-0-0/NominalTimeRunFacet.json",
            "nominalEndTime": "2020-02-22T22:00:00Z",
            "nominalStartTime": "2020-02-22T22:00:00Z"
        }
    }
}

This is taken directly from the ./docker/up.sh --seed command within the repo. As you can see, the dataQualityAssertions of public.delivery_7_days is missing

@sophiely
Copy link
Contributor

Hi all !

It should be fixed with this PR : #2528
Let me know if it's ok for you or if i need to add/modify something :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants