Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future server API #687

Open
jameshadfield opened this issue Jan 2, 2019 · 5 comments
Open

future server API #687

jameshadfield opened this issue Jan 2, 2019 · 5 comments
Labels
proposal Proposals that warrant further discussion

Comments

@jameshadfield
Copy link
Member

This issue is to discuss the design of the server API employed by auspice. To briefly recap, auspice needs to know which datasets / narratives are available as well as obtaining the dataset or narrative to view.

Here are @tsibley's thoughts, taken from #683 (comment)

Current

The Charon API, as described in the white-labelling docs, relies on a dynamic server able to respond to the following endpoints:

/charon/getAvailable
/charon/getAvailable?prefix=flu_seasonal

/charon/getDataset?prefix=flu_seasonal_h3n2_ha_2y
/charon/getDataset?prefix=flu_seasonal_h3n2_ha_2y&type=tip-frequencies

/charon/getNarrative?prefix=flu-report-2018-Q3.md

Pros:

  • Server allows for custom logic to dynamically respond with different datasets for different people (e.g. logged in users).

Cons:

  • Requires a dynamic server.

  • Statically-hosted builds (e.g. for GitHub Pages) and S3-served builds (e.g. for Nextstrain) must be special-cased since no dynamic server is possible.

  • Paths in the user-facing URL require transformation back and forth, which historically has been a source of bugs.

Proposed

My proposed "data API" is to use standard HTTP access for what it's good at by changing Auspice to request these URLs instead:

/datasets.json
/narratives.json

/dataset/${datasetPath}/${type}.json
/dataset/flu/seasonal/h3n2/ha/2y/{tree,meta,tip-frequencies}.json

/narrative/${name}.json
/narrative/flu-report/2018/Q3.md

Pros:

  • Endpoints can be provided either

    • by a set of files on a static host (e.g. GitHub Pages, S3), or
    • by a dynamic server in order to customize responses for different people (e.g. logged in users).
  • Statically-hosted builds (e.g. for GitHub Pages) and S3-served builds (e.g. for Nextstrain) just work with no special-casing or additional server logic in Auspice.

  • Paths in the user-facing URL match those behind the scenes without any transformation.

  • Standard HTTP caching strategies work without a custom server complicating it.

Cons:

I don't see any downsides to this approach, but maybe you do? Looking for feedback!

@jameshadfield jameshadfield added the proposal Proposals that warrant further discussion label Jan 2, 2019
@jameshadfield
Copy link
Member Author

jameshadfield commented Jan 2, 2019

In general I think @tsibley's proposal is probably the better direction. I've sketched some further details out here. I've used the v2.0 schema which we should start using ASAP (no more meta & tree). I've also used the underscore separator as we've had a bunch of discussions about this and have always preferred it over hierarchical structures. (FWIW a change here would be better suited as an augur PR, as augur creates flat file structures.)

If we let the extension cusomisation define the fetch prefixes ${dataFetchPrefix} and ${narrativeFetchPrefix}, which may be relative or absolute, then the URLs would translate in these fetches:

${auspiceDomain}/zika -> ${dataFetchPrefix}/zika.json
${auspiceDomain}/flu/seasonal/h3n2/ha/2y -> ${dataFetchPrefix}/flu_seasonal_h3n2_ha_2y.json
${auspiceDomain}/narrative/flu-report/2018/Q3 -> ${narrativeFetchPrefix}/flu-report_2018_Q3.md

This would alleviate a lot of the complexity currently involved in writing a server for auspice.

Serverless builds

My understanding of serverless builds is that ${auspiceDomain}/flu/seasonal/h3n2/ha/2y needs to have the ${auspiceDomain}/flu/seasonal/h3n2/ha/2y/index.html file, which won't exist. (I could be wrong about this.) I've seen solutions such as ${auspiceDomain}?dataset=/flu/seasonal/h3n2/ha/2y. Currently the github pages builds hardcode a single data fetch path, they don't rely on the URL. It'd still be easier using this proposal, but wouldn't allow for datasets defined by the URL.

Path completion

Currently path completion is done by the server -- nextstrain.org/flu is compared with the manifest and changed to nextstrain.org/flu/seasonal/h3n2/ha/3y.

This proposal would result in a fetch to ${dataFetchPrefix}/flu.json which would 404 unless intercepted (similar to how it works now). A possible solution would be to shift the URL completion logic to auspice, as it has access to the available datasets.

Two trees

URLs with multiple trees are currently parsed by the server, which gets the appropriate JSONs and combines them. However auspice could be configured to make the two fetches:

${auspiceDomain}/flu/seasonal/h3n2/ha:na/2y ->
    ${dataFetchPrefix}/flu_seasonal_h3n2_ha_2y.json AND
    ${dataFetchPrefix}/flu_seasonal_h3n2_na_2y.json

Additional files (frequencies etc)

These will be specified in the main dataset JSON, so auspice can make an additional fetch. E.g.

${auspiceDomain}/flu/seasonal/h3n2/ha/2y ->
    ${dataFetchPrefix}/flu_seasonal_h3n2_ha_2y.json AND
    ${dataFetchPrefix}/flu_seasonal_h3n2_ha_2y_tip-frequencies.json

Community builds.

This functionality shouldn't be part of the auspice repo.

One option is to follow the above logic and require a server to interpret the request and deliver the appropriate JSON. I.e. The URL ${auspiceDomain}/community/blab/zika-colombia results in a fetch for ${dataFetchPrefix}/community_blab_zika-colombia.json which is intercepted by the server and the appropriate JSON returned.

Alternatively, the extension interface could expose a function such as:

function constructFetch(browserURL) {
  // interpret URLs
  return fetchURL;
}

@tsibley
Copy link
Member

tsibley commented Jan 25, 2019

Good observations, thank you for spending the time to consider this in detail! My comments are below.

I've also used the underscore separator as we've had a bunch of discussions about this and have always preferred it over hierarchical structures. (FWIW a change here would be better suited as an augur PR, as augur creates flat file structures.)

Hmm, I don't think anything fundamental about augur requires flat structures. All of the output files are user-provided and can be any arbitrary path. I expect moving the logically-hierarchical file names (with underscores) to actually-hierarchical names (with slashes) would require approximately no changes to augur.

This would alleviate a lot of the complexity currently involved in writing a server for auspice.

I'm glad you agree! This is exactly my goal with the proposed API. :-)

My understanding of serverless builds is that ${auspiceDomain}/flu/seasonal/h3n2/ha/2y needs to have the ${auspiceDomain}/flu/seasonal/h3n2/ha/2y/index.html file, which won't exist. […] Currently the github pages builds hardcode a single data fetch path, they don't rely on the URL. It'd still be easier using this proposal, but wouldn't allow for datasets defined by the URL.

I don't agree; I think it is possible to have this type of static build use datasets defined by the URL. It seems like it the auspice build process, for a static site only, could without much trouble make the appropriate index.html files, e.g. flu/seasonal/h3n2/ha/2y/index.html. These files would be almost identical to the normal top-level index.html, with some small differences in asset paths.

On a related note: I think the term "serverless" is not appropriate for this feature. The feature is comparable to a static-site generator like Jekyll or Gatsby, not a utility computing platform like AWS Lambda.

Path completion […] A possible solution would be to shift the URL completion logic to auspice, as it has access to the available datasets.

Moving this URL manipulation into Auspice makes the most sense to me. There are other alternate solutions like symlinking flu.json → flu/seasonal/h3n2/ha/3y.json, but I think the consequences get a little weird without any additional benefit.

Two tress […] However auspice could be configured to make the two fetches

Yep, moving this into Auspice makes the most sense to me.

Community builds […] One option is to follow the above logic and require a server to interpret the request and deliver the appropriate JSON. […] Alternatively, the extension interface could expose a function

I like both approaches, and I don't think they are mutually exclusive: in the absence of an extension-provided browser dataset URL → fetch URL transformation function, the standard request will be made and a custom server can exist to handle it.

Note that in the GitHub /community/… case, the server doesn't need to be complicated; it can encode the transformation function server-side as HTTP redirects to raw.githubusercontent.com.

@trvrb
Copy link
Member

trvrb commented Jan 25, 2019

I really like the proposed API on the auspice fetch side. ${auspiceDomain}/flu/seasonal/h3n2/ha/2y is really nice.

I think I'm historically the big proponent of flat JSON structures, ie flu_seasonal_h3n2_ha_2y.json. I had mainly liked this in the same way as I generally prefer to name my files with relevant information when doing bioinformatics rather than relying on path. This follows other's advice.

Regardless, at the very least, we should be including in the combined JSON the relevant data fields, so that flu_seasonal_h3n2_ha_2y.json would include:

dataset: ['flu', 'seasonal', 'h3n2', 'ha', '2y']

(This is an augur issue)

I had also assumed that eventually we would have a database for these JSONs and that they would be requested dynamically. Rather than staying fixed to directory hierarchies forever. This seemed like the way to get to proper versioning etc... But maybe we never leave S3-style blob storage...

One question, if we went with the nested structure, what does the augur output JSON become in the example of flu_seasonal_h3n2_ha_2y.json? Something like flu/seasonal/h3n2/ha/2y/build.json? Or flu/seasonal/h3n2/ha/2y/auspice.json? I don't like flu/seasonal/h3n2/ha/2y.json.

@jameshadfield
Copy link
Member Author

Thank you for your comments. I think we are agreed this is the better direction so I've added it to the nextstrain roadmap.

@tsibley
Copy link
Member

tsibley commented Jan 28, 2019

I think I'm historically the big proponent of flat JSON structures, ie flu_seasonal_h3n2_ha_2y.json. I had mainly liked this in the same way as I generally prefer to name my files with relevant information when doing bioinformatics rather than relying on path. This follows other's advice.

Nod. I guess I tend to consider the path (from the root of whatever project dir) part of the file's name rather than some disconnected, independent thing. Paths are a ubiquitous method of establishing a naming hierarchy with support for traversal and globbing operations, and it's a little weird to me to re-create all of that on disk and in our code using underscores.

I won't push on this more for now; it's not super-important, just persistently strange to me.

Regardless, at the very least, we should be including in the combined JSON the relevant data fields

Yes, agreed!

I had also assumed that eventually we would have a database for these JSONs and that they would be requested dynamically. Rather than staying fixed to directory hierarchies forever. This seemed like the way to get to proper versioning etc... But maybe we never leave S3-style blob storage...

Maybe! Though versioning can be done lots of ways "properly", with or without a traditional database, and the manner in which the data is stored doesn't obviate our hierarchical access patterns (i.e. what seasonal flu HA builds do we have?) or subsetting-nature of the dataset generation.

One question, if we went with the nested structure, what does the augur output JSON become in the example of flu_seasonal_h3n2_ha_2y.json? Something like flu/seasonal/h3n2/ha/2y/build.json? Or flu/seasonal/h3n2/ha/2y/auspice.json? I don't like flu/seasonal/h3n2/ha/2y.json.

With the v1 schema it'd be flu/seasonal/h3n2/ha/2y/tree.json, …/meta.json, …/tip-frequencies.json, etc. With the v2 schema it'd be the same, except that the tree and meta files are combined into …/auspice.json or …/main.json or …/tree-meta.json or something TBD.

jameshadfield added a commit to nextstrain/nextstrain.org that referenced this issue Feb 28, 2019
Previously the dataset selectors were not in the desired order due to the default value appearing first. This commit fixes this for the nextstrain.org server. Note that this functionality will be moved to the client when the new server API is implemented (see nextstrain/auspice#687).

This commit closes Auspice issue nextstrain/auspice#696.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Proposals that warrant further discussion
Projects
No open projects
Development

No branches or pull requests

4 participants