Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/data and /data/:series route should return dh:hasRelease summary for releases #369

Open
RickMoynihan opened this issue Feb 6, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@RickMoynihan
Copy link
Member

RickMoynihan commented Feb 6, 2024

We should adjust the granularity of requests a little, so you can follow your nose when walking data in the API.

Currently if you hit /data or /data/:series-slug you know nothing of the releases within it:

curl -X 'GET' \
  'https://dluhc-pmd5-prototype.publishmydata.com/data' \
  -H 'accept: application/ld+json'
{
  "contents": [
    {
      "dcterms:title": "Permanent dwellings completed, England, District By Tenure",
      "@type": "dh:DatasetSeries",
      "dcterms:modified": "2024-02-01T14:48:16.422002278Z",
      "dcterms:issued": "2024-02-01T14:48:16.422002278Z",
      "dcterms:description": "House building data are collected at local authority district level, but it is important to treat figures at this level with care. House building is unevenly distributed both geographically and over time and patterns of housing development can produce clusters of new homes which make the figures at a low geographic level volatile and difficult to interpret. For detailed definitions of all tenures, see definitions of housing terms on Housing Statistics The district level and county figures are as reported by local authorities and the NHBC. Where a local authority has not submitted a quarterly return to DCLG, no figure has been presented for this local authority (and when relevant its county) for any 12-month period that includes the missing quarter. England total figures include estimates for missing data returns from independent Approved Inspectors and Local Authorities, so the sum of district values may be slightly less than the England totals. *House building completion* – In principle, a dwelling is regarded as complete when it becomes ready for occupation or when a completion certificate is issued whether it is in fact occupied or not. In practice, the reporting of some completions may be delayed and some completions may be missed if no completion certificate was requested by the developer or owner, although this is unusual. *Tenure* – For the purposes of these statistics, the term tenure refers to the nature of the organisation responsible for the development of a new housing start or completion. It does not necessarily describe the terms of occupancy for the dwelling on completion.",
      "dh:baseEntity": "https://ldapi-prototype.gss-data.org.uk/data/Permanent-dwellings-completed",
      "@id": "Permanent-dwellings-completed"
    }
...]}

It would be helpful if these documents contained the property dh:hasRelease with a subset of properties relevant to each nested release entity (it should be a minimum of dcterms:title and @id):

"dh:hasRelease: [{"@id":"2019", "dcterms:title": "2019"},,,,]

However the above isn't quite sufficient because the URI's for those @id's will come out wrong. Instead we need to do something like the following to fix #349:

{
  "@context": {
    "@base": "https://services-base-url-goes-here.org/data/" ;; 1
  },
  "contents": [
    {
      // ...
      "@id": "English-Indices-of-Deprivation",
      "dh:hasRelease": {
        "@context":{"@base": "./English-Indices-of-Deprivation/release/"}, ;; 2
        "@set": [{"@id":"2019", "dcterms:title": "2019"}]}
    }
  ]
}

The @context's at 1 and 2 ensure within their respective scopes that the @id of each dataset in contents follows the form https://services-base-url-goes-here.org/data/:id, whilst each release when expanded will have the URI https://services-base-url-goes-here.org/data/:series-id/release/:release-id.

This also balances the requirement that @id's are relative to your position in the graph/tree, so you can feed them into the API without having to parse them.

Unfortunately it does trade off having a deeper path to access the release data contents -> dh:hasRelease -> @set with the less descriptive @set key.

(From a UX perspective a nicer alternative would be to dynamically generate a portion of the @context itself. This would mean having a dynamically generated context per entity (series/release/revision) which would include the @id slugs in the paths; the static portion of the context could be held in a separate static context document, and we could import/cascade the contexts appropriately. This would allow the more succinct syntax whilst ensuring the IRI expansion was the same.)

@RickMoynihan RickMoynihan changed the title /data route should return dh:hasRelease summary for releases /data and /data/:series route should return dh:hasRelease summary for releases Feb 6, 2024
@RickMoynihan RickMoynihan added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Feb 12, 2024
@xdrcft8000
Copy link
Contributor

xdrcft8000 commented Mar 6, 2024

I'm not sure if I've misinterpreted what's required but here's where I got to

@RickMoynihan
Copy link
Member Author

RickMoynihan commented Mar 6, 2024

The issue we're really talking about here is #349.

Thanks for this @xdrcft8000 it's a really helpful step in the right direction.

I think we need to refine it further into something like this though

The important thing I'm trying to do is separate the @context into a static and dynamic part:

{
  "@context": [
     # static part
    "https://cdn.jsdelivr.net/gh/Swirrl/datahost-prototypes@1282114/datahost-ld-openapi/resources/jsonld-context.json",
    {
    # dynamic part
    "@base": "https://dluhc-pmd5-prototype.publishmydata.com/data/",
    "dh": "http://example.org/vocab#",
    "dh:hasRelease": {
      "@context": {
        "@base": "English-Indices-of-Deprivation/release/"
      }
    }}],
  "@id": "English-Indices-of-Deprivation",
  "@type": "dh:DatasetSeries",
  "dh:hasRelease": [
    {
      "@id": "2019",
      "dcterms:title": "2019"
    },
    {
      "@id": "2020",
      "dcterms:title": "2020"
    }
  ]
}

The static part is probably the bulk of our vocabulary JSON/LD context, we should try and keep that a dumb flat file as much as we can.

The dynamic part however we'll need to programatically inject into the documents the application renders, as the dataset series slug forms part of that path.

@andrewmcveigh
Copy link
Contributor

So, I thought about doing what you're suggesting there @RickMoynihan but I don't think it works for this issue correctly. It probably can for /data/:series but it cannot for /data, as there can be more than one series so the @base in the context will only be correct when there is only one series.

Same issue (ish) for #370

@andrewmcveigh
Copy link
Contributor

so, I think we can produce this

{
  "contents": [
    {
      "dcterms:modified": "2024-02-28T15:13:30.592145398Z",
      "dcterms:description": "A very simple test",
      "dcterms:issued": "2024-02-28T15:13:30.592145398Z",
      "@index": "https://example.org/data/differentdummy1709133210",
      "dh:baseEntity": "https://example.org/data/differentdummy1709133210",
      "@id": "differentdummy1709133210",
      "dh:hasRelease": [
        {
          "dcterms:title": "Test Release",
          "@type": "dh:Release",
          "@id": "release-1",
          "@context": {"@base": "./differentdummy1709133210/release/"}

        }
      ],
      "@type": "dh:DatasetSeries",
      "dcterms:title": "Test Dataset"
    },
    {
      "dcterms:modified": "2024-02-28T15:13:30.214140032Z",
      "dcterms:description": "A very simple test",
      "dcterms:issued": "2024-02-28T15:13:30.214140032Z",
      "@index": "https://example.org/data/dummy1709133210",
      "dh:baseEntity": "https://example.org/data/dummy1709133210",
      "@id": "dummy1709133210",
      "dh:hasRelease": {
      	"@context": {"@base": "./dummy1709133210/release/"},
        "@set": [
          {
            "dcterms:title": "Test Release",
            "@type": "dh:Release",
            "@id": "release-1"
          }
        ]
      },
      "@type": "dh:DatasetSeries",
      "dcterms:title": "Test Dataset"
    }
  ],
  "@context": [
    "https://cdn.jsdelivr.net/gh/Swirrl/datahost-prototypes@1282114/datahost-ld-openapi/resources/jsonld-context.json",
    {
      "@base": "https://example.org/data/",
      "dh:hasRelease": {
        "@container": "@set",
        "@id": "dh:hasRelease"
      }
    }
  ]
}

Which appears to work (playground)

The issue is that we need to add the "@context" and "@set" stuff after compaction, which is a bit of a PITA

@RickMoynihan
Copy link
Member Author

hmmm ok good point!

I think in that case we should descope doing it for /data and only do it for /data/:series then, as I think it's better if the extra cruft doesn't affect the structure of the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants