Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include original GISAID strain name in metadata #192

Merged
merged 4 commits into from
Oct 31, 2024

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Oct 24, 2024

Description of proposed changes

  • Add the GISAID strain name to list of fields to fetch from fauna in and parse from the raw sequences FASTA in the "upload" workflow. This change will include these original names in the metadata in S3.
  • Export GISAID strain names as a metadata column in Auspice JSONs
  • Set GISAID strain names as default tip label

Related issue(s)

Checklist

  • Checks pass

Adds the GISAID strain name to list of fields to fetch from fauna in and
parse from the raw sequences FASTA in the "upload" workflow. This change
will include these original names in the metadata in S3.
Adds `gisaid_strain` to Auspice config JSONs for public and private
Nextstrain builds, allowing users to display those original strain names
as tip labels or search for samples by those names.
Sets the default tip label in public Nextstrain builds to the GISAID
strain name. This change should reduce confusion for users who want to
cross-check Nextstrain analyses with the original data in GISAID.

This commit does not change the default label in private builds to
minimize confusion when comparing genetic data to serological data by
strain name.
@huddlej
Copy link
Contributor Author

huddlej commented Oct 25, 2024

Below is an example view before and after setting the default tip label to the GISAID strain.

Before:
image

After:
image

@huddlej huddlej requested review from trvrb and rneher October 25, 2024 19:32
Minimize confusion for ourselves by removing lists of FASTA fields from
builds that download parsed metadata and sequences from S3 and don't
need these fields. We could easily get confused into thinking that new
fields like `gisaid_strain` need to be added to each of these build
configs when that isn't the case.
@huddlej
Copy link
Contributor Author

huddlej commented Oct 31, 2024

I'm going to merge this to include GISAID strain names in the weekly ingest, but if we decide that the default tip label change here is a bad idea, it is easy to revert for future builds.

@huddlej huddlej merged commit efaf11b into master Oct 31, 2024
3 checks passed
@huddlej huddlej deleted the export-gisaid-strain-names branch October 31, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export original GISAID strain name in metadata for each seasonal flu build
1 participant