Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Managed cached endpoint #1599

Merged
merged 11 commits into from
Jul 2, 2024
Merged

Conversation

bprusinowski
Copy link
Collaborator

@bprusinowski bprusinowski commented Jun 11, 2024

Closes #1596

This PR:

  • adds support for querying the cached endpoint with caching per cube iri (only applicable to PROD endpoint),
  • adds GitHub Action that runs once a day and opens 25 most recent and 25 most viewed PROD charts to populate the cache.

@bprusinowski bprusinowski requested a review from ptbrowne as a code owner June 11, 2024 15:02
Copy link

vercel bot commented Jun 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
visualization-tool ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 2, 2024 10:20am

@bprusinowski
Copy link
Collaborator Author

cc @Rdataflow, waiting for the PROD endpoint to be fixed so we can properly test the change and implement any potential additional requirements (e.g. encoding the cube iri) 👀

@ptbrowne
Copy link
Collaborator

Maybe you could extract all possible variable types with some typescript-fu:

type ExtractResolversObject<O> = O extends ResolversObject<infer S> ? S :never
type B = ExtractResolversObject<QueryResolvers>
type ExtractResolver<O> =  O extends Resolver<any, any, any, infer S> ? S : never
type Vars = ExtractResolver<B[keyof B]>

then you can as the variableInfos from the graphql context to Vars ?

@bprusinowski
Copy link
Collaborator Author

Thanks for the idea @ptbrowne, I will check it out tomorrow 💯

@ptbrowne
Copy link
Collaborator

ptbrowne commented Jun 12, 2024

Ah this time, the end to end tests seem to have found something, I do not recognize the usual suspects in the failed E2E tests.

@bprusinowski
Copy link
Collaborator Author

Yes, I think it might be related to the fact that with this PR, the PROD endpoint is broken, so any test relying on it won't work. I'll re-check once it's fixed by Zazuko 👀

@ptbrowne ptbrowne self-requested a review June 12, 2024 11:35
@bprusinowski bprusinowski changed the title feat: Managed cached endpoint feat: Managed cached endpoint / basic analytics Jun 12, 2024
@bprusinowski bprusinowski force-pushed the feat/managed-cached-endpoint branch 2 times, most recently from edf9179 to 377f82b Compare June 12, 2024 15:14
@Rdataflow
Copy link
Contributor

@bprusinowski can you add statistics on visit frequency to address case b)? (i.e. visits per chart key)

@bprusinowski
Copy link
Collaborator Author

@Rdataflow yes, that's the plan to try to add a new table to our database that would store information on views per chart config 👍

@bprusinowski bprusinowski force-pushed the feat/managed-cached-endpoint branch from 9234548 to fdf74ac Compare June 13, 2024 08:09
@bprusinowski bprusinowski changed the title feat: Managed cached endpoint / basic analytics feat: Managed cached endpoint Jun 13, 2024
@Rdataflow
Copy link
Contributor

@Rdataflow yes, that's the plan to try to add a new table to our database that would store information on views per chart config 👍

@bprusinowski curious: is there already some PR around ?

@bprusinowski
Copy link
Collaborator Author

@Rdataflow yes, this was already implemented in #1613 😄

@Rdataflow
Copy link
Contributor

@bprusinowski the new varnish config is now on PROD 👍 and unblocks this PR 🚀

@bprusinowski
Copy link
Collaborator Author

Thanks for a hint @Rdataflow! In this case I'll also take a look at pre-populating the cache for most viewed charts and then merge the PR. Will do it tomorrow 👍

@bprusinowski bprusinowski merged commit dc92b19 into main Jul 2, 2024
5 of 6 checks passed
@bprusinowski bprusinowski deleted the feat/managed-cached-endpoint branch July 2, 2024 10:35
@bprusinowski
Copy link
Collaborator Author

@Rdataflow the changes should soon be on TEST :)

@Rdataflow
Copy link
Contributor

@bprusinowski by inspecting the traffic I spotted 3 types of queries which shall also be passed to ${endpoint}/${cubeIri} (currently they are sent to default endpoint)

rationale is to harden the cached charts for the case of db outage and max out long term cache 😄

  • query 1: cube upgrade queries (as Ludovic will take into account to purge all versions 👍 )
PREFIX cube: <https://cube.link/>
PREFIX schema: <http://schema.org/>

SELECT ?iri WHERE {
  {
    # Versioned cube.
    SELECT ?iri ?version WHERE {
      VALUES ?oldIri { <https://environment.ld.admin.ch/foen/ubd000503bis/2> }
      ?versionHistory schema:hasPart ?oldIri .
      ?versionHistory schema:hasPart ?iri .
      ?iri schema:version ?version .
      ?iri schema:creativeWorkStatus ?status .
      ?oldIri schema:creativeWorkStatus ?oldStatus .
      FILTER(NOT EXISTS { ?iri schema:expires ?expires . } && ?status IN (?oldStatus, <https://ld.admin.ch/vocabulary/CreativeWorkStatus/Published>))
    }
    ORDER BY DESC(?version)
  } UNION {
    {
      # Version history of a cube.
      SELECT ?iri ?status ?version WHERE {
        VALUES ?versionHistory { <https://environment.ld.admin.ch/foen/ubd000503bis/2> }
        ?versionHistory schema:hasPart ?iri .
        ?iri schema:version ?version .
        ?iri schema:creativeWorkStatus ?status .
        FILTER(NOT EXISTS { ?iri schema:expires ?expires . })
      }
      ORDER BY DESC(?status) DESC(?version)
    }
  } UNION {
    {
      # Non-versioned cube.
      SELECT ?iri ?status WHERE {
        VALUES ?iri { <https://environment.ld.admin.ch/foen/ubd000503bis/2> }
        ?iri cube:observationConstraint ?shape .
        ?iri schema:creativeWorkStatus ?status .
        FILTER(NOT EXISTS { ?iri schema:expires ?expires . } && NOT EXISTS { ?versionHistory schema:hasPart ?iri . })
      }
      ORDER BY DESC(?status)
    }
  }
}
LIMIT 1
  • query 2: some dimension version query pattern
PREFIX cube: <https://cube.link/>
PREFIX schema: <http://schema.org/>
PREFIX sh: <http://www.w3.org/ns/shacl#>

SELECT ?dimensionIri ?version ?nodeKind WHERE {
  <https://environment.ld.admin.ch/foen/ubd000503bis/2> cube:observationConstraint/sh:property ?dimension .
  ?dimension sh:path ?dimensionIri .
  OPTIONAL { ?dimension schema:version ?version . }
  OPTIONAL { ?dimension sh:nodeKind ?nodeKind . }
  FILTER(?dimensionIri IN (<https://environment.ld.admin.ch/foen/ubd000503bis/treibstoffe>))
}
  • query 3: possible filters query
PREFIX cube: <https://cube.link/>
PREFIX schema: <http://schema.org/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?dimension0_v WHERE {
  <https://environment.ld.admin.ch/foen/ubd000503bis/2> cube:observationSet/cube:observation ?observation .
  ?observation <https://environment.ld.admin.ch/foen/ubd000503bis/treibstoffe> ?dimension0 .
  ?dimension0 schema:sameAs ?dimension0_v .
  VALUES ?dimension0_v { <https://environment.ld.admin.ch/foen/ubd000503bis/Treibstoffe/treib1> }
}

LIMIT 1

@bprusinowski
Copy link
Collaborator Author

😱 thanks for spotting this @Rdataflow, I will investigate why this is the case 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Managed Cached Endpoint
3 participants