Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show publication date for each dataset result #61

Open
kirahowe opened this issue Apr 15, 2021 · 0 comments
Open

Show publication date for each dataset result #61

kirahowe opened this issue Apr 15, 2021 · 0 comments
Labels
app Related to the application itself data Related to some underlying/upstream data issue enhancement New feature or request etl Related to the etl/pipelines

Comments

@kirahowe
Copy link
Contributor

Description copied from the end of #18, new issue here to put on the backburner since it depends on a couple of upstream issues being resolved first.

The question "is this the latest data?" is tricky to answer!

  1. Upstream publication lags behind reality - i.e. it can take more than a year to collect, analyse and publish data - indeed you can't get data about a whole year until it's over!
  2. Some publications lag further behind than others - ONS trade data is derived from HMRC data so it necessarily lags further behind.
  3. The cubes on IDP lag behind publications - because it takes some time to write and check the transformation.
  4. The OOK index lags behind IDP - because we have to run this manually at the moment (until we implement ETL Process only that data which has changed since the last run #17).

We should aim to show the publication-dates and observation-ref-dates side-by-side. We can reveal this once the upstream dataset tracking (https://github.com/Swirrl/cogs-issues/issues/35) and coverage metadata (https://github.com/Swirrl/cogs-issues/issues/92) are loaded.

This will be really useful for OOK as we can show how robustness comes at the expense of recency - a key dataset-choosing criterion.

We could load the metadata showing when the cubes were modified on IDP but I think it would be misleading (just because the cube was re-loaded, it doesn't follow that the data itself was any newer).

I suggest we leave this issue open until those upstream ones are resolved.

@Robsteranium Robsteranium added the enhancement New feature or request label Apr 22, 2021
@kirahowe kirahowe added app Related to the application itself data Related to some underlying/upstream data issue etl Related to the etl/pipelines labels Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app Related to the application itself data Related to some underlying/upstream data issue enhancement New feature or request etl Related to the etl/pipelines
Projects
None yet
Development

No branches or pull requests

2 participants