Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show more dataset info #18

Closed
kirahowe opened this issue Mar 2, 2021 · 5 comments
Closed

Show more dataset info #18

kirahowe opened this issue Mar 2, 2021 · 5 comments

Comments

@kirahowe
Copy link
Contributor

kirahowe commented Mar 2, 2021

Right now selecting some codes just fetches the dataset id and the count of observations that match. We should also fetch the labels, descriptions, or whatever other information is relevant. Also generate the right urls so the "View data" buttons actually link to PMD.

@Robsteranium
Copy link
Contributor

I think we need at least:

@benjystanton was wondering about publication date too. Indeed something like "latest reference period" (i.e. date of last observation) might be really useful for comparison (which is most up-to-date?). We can possibly wait for the coverage metadata.

@benjystanton
Copy link

benjystanton commented Mar 5, 2021

Yeah I think publication date is really useful. Users need to know "is this the latest data?" so we often need a few bits of information to help understand that. E.g. If something was published 11 months ago but it's only published annually then it's still the latest data. So anything we can to help them answer this question would be great.

@Robsteranium Robsteranium self-assigned this Apr 12, 2021
@Robsteranium
Copy link
Contributor

We've since added description and today I've added publisher label and altlabel (typically has the acronym).

@Robsteranium
Copy link
Contributor

The question "is this the latest data?" is tricky to answer!

  1. Upstream publication lags behind reality - i.e. it can take more than a year to collect, analyse and publish data - indeed you can't get data about a whole year until it's over!
  2. Some publications lag further behind than others - ONS trade data is derived from HMRC data so it necessarily lags further behind.
  3. The cubes on IDP lag behind publications - because it takes some time to write and check the transformation.
  4. The OOK index lags behind IDP - because we have to run this manually at the moment (until we implement ETL Process only that data which has changed since the last run #17).

We should aim to show the publication-dates and observation-ref-dates side-by-side. We can reveal this once the upstream dataset tracking (https://github.com/Swirrl/cogs-issues/issues/35) and coverage metadata (https://github.com/Swirrl/cogs-issues/issues/92) are loaded.

This will be really useful for OOK as we can show how robustness comes at the expense of recency - a key dataset-choosing criterion.

We could load the metadata showing when the cubes were modified on IDP but I think it would be misleading (just because the cube was re-loaded, it doesn't follow that the data itself was any newer).

I suggest we leave this issue open until those upstream ones are resolved.

@kirahowe
Copy link
Contributor Author

Publisher is now shown in the dataset results table since 962f087 and the links to PMD work now, so I'm going to close this in favour of #61 since the original problems this issue describes are fixed now and the remaining things we'd like to add depend on upstream issues being resolved first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants