Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH(nextclade cli): nextclade dataset list: indicate whether clades can be assigned #1458

Closed
AngieHinrichs opened this issue May 29, 2024 · 7 comments · Fixed by #1473
Closed
Labels
t:feat Type: request of a new feature, functionality, enchancement

Comments

@AngieHinrichs
Copy link

In the output of nextclade dataset list it would be very helpful to have an indication of whether clades can be assigned using each dataset. For example, dataset nextstrain/flu/h3n2/ha/EPI1857216 can assign clades, but nextstrain/flu/h3n2/pb1 cannot (it has no tree.json). Currently, in order to determine that, I need to download each dataset and look for tree.json.

Does the presence of tree.json in a dataset always mean that clades can be assigned? If so, then hopefully it would be straightforward for nextclade dataset list to report whether pathogen.json includes treeJson.

@AngieHinrichs AngieHinrichs added good first issue Good for newcomers help wanted Extra attention is needed needs triage Mark for review and label assignment t:feat Type: request of a new feature, functionality, enchancement labels May 29, 2024
@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented May 29, 2024

@AngieHinrichs

Hi Angie,

Does the presence of tree.json in a dataset always mean that clades can be assigned?

We released 3.6.0 just earlier today where clades become optional even if the tree is present. And previously our folks used empty string in place of clade_membership tree field as a workaround if clades are missing from the tree for one reason or the other (most of the times this is due to unclear nomenclature, or lack of time).

Currently I'd say downloading the tree and looking if there's at least one .node_attrs.clade_membership in it is a safe bet.

In the official datasets in the data repo, when rebuilding the dataset index, we could enumerate datasets "capabilities". I have some basics emitted into the index.json of the dataset server, but not clade assignment. Might be a good addition.

Do you have any other such capabilities in mind that we could add? I am having difficulties imagining how that would look from the user perspective, as me myself I don't use Nextclade often :)

Once we have a list of capabilities in the index, the --json flag to the dataset list command should show it like it appears in the index. Then the list can be pretty-printed in CLI and rendered in Web in some way. Any preferences here?

@ivan-aksamentov ivan-aksamentov removed needs triage Mark for review and label assignment good first issue Good for newcomers help wanted Extra attention is needed labels May 29, 2024
@ivan-aksamentov
Copy link
Member

We should also not forget about clade-like attributes which may also be present on the tree in .meta.extensions.nextclade.clade_node_attrs, e.g. lineages in SC2 trees.

@ivan-aksamentov
Copy link
Member

The tree-related capabilities could be computed in the rebuild script somewhere around here, I guess
https://github.com/nextstrain/nextclade_data/blob/403e2574654daacc40b0face461965da41e953d2/scripts/rebuild#L43-L45

@AngieHinrichs
Copy link
Author

The tree-related capabilities could be computed in the rebuild script somewhere around here, I guess https://github.com/nextstrain/nextclade_data/blob/403e2574654daacc40b0face461965da41e953d2/scripts/rebuild#L43-L45

Yes, if you could add "clades" there like you add "customClades", and include the capabilities in the cli list output, that would be great! At the moment, clades are what I'm keen to see, but I would not mind seeing other special capabilities listed.

@ivan-aksamentov
Copy link
Member

Implemented in #1473 and nextstrain/nextclade_data#205

@ivan-aksamentov
Copy link
Member

Released in 3.7.0

@AngieHinrichs
Copy link
Author

Fantastic, thanks! The types and counts are really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:feat Type: request of a new feature, functionality, enchancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants