Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotfix: address missing data #10

Merged
merged 4 commits into from
Dec 23, 2024
Merged

Hotfix: address missing data #10

merged 4 commits into from
Dec 23, 2024

Conversation

jairus-m
Copy link
Owner

Summary

Production nightly runs have been successful so far. I added logging early on to output metadata throughout every step in the pipeline; however, the metadata aren't tied to asset checks outside of dbt. I checked the logs, and saw that the size of the cycling data from model training was less than 600 activities (in the cycling_data asset). Since this is my personal data, I knew to expect +1000 activities so the data is obviously wrong. This PR addresses this issue.

Screenshot 2024-12-22 at 10 16 28 AM

Note: This also prompts a future feature need for backfilling capabilities in the dltHub pipeline in the context of dagster.

Details

Source of the issue

  • incorrect duckdb schema config in __src_schema.yml

Additions:

  • minor updates to dbt models / duckdb query

  • add custom dbt tests in analytics_dbt/tests/

    • analytics_dbt/tests/test_activity_counts.sql
    • analytics_dbt/tests/test_cycling_counts.sql
  • add dagster asset_checks for cycling_data

    @asset_check(asset=cycling_data)
    def check_cycling_data_size(cycling_data):
        """Check that the cycling data has more than 1500 rows."""
        num_rows = cycling_data.shape[0]
        passed = num_rows > 1500
        return AssetCheckResult(
            passed=passed,
            metadata={"num_rows": num_rows}
        )

@jairus-m jairus-m added the bug Something isn't working label Dec 22, 2024
@jairus-m jairus-m merged commit a4dc220 into main Dec 23, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant