Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to export branch labels #720

Closed
jameshadfield opened this issue Apr 29, 2021 · 1 comment · Fixed by #728
Closed

Ability to export branch labels #720

jameshadfield opened this issue Apr 29, 2021 · 1 comment · Fixed by #728
Assignees

Comments

@jameshadfield
Copy link
Member

Currently there is no ability for augur export v2 to export custom branch labels.

In general, node-data.json defines traits for nodes via the following structure,

{
    "nodes": {
        "NODE_NAME": {
            "TRAIT_NAME": "VALUE"
        }
    }
}

which augur export v2 maps onto nodes as such:

"NODE_NAME": {
    "node_attrs": {
        "TRAIT_NAME": {
            "value": "VALUE"
        },
    }
}

There are two "special-case" situations which are relevant here:

  1. if TRAIT_NAME == "clade_annotation" then augur will export this as a branch label rather than a node_attr. "clade_annotation" is typically produced by augur clades and is how we get clade labelling in most of our datasets.

  2. augur export v2 automatically creates a branch label for "aa" if a node-data file with "aa_muts" is provided.

These two cases are the only time that augur export adds information to the branch_attrs of a node. This means that there is no ability for augur export v2 to set custom branch labels for (internal) nodes.

@jameshadfield jameshadfield transferred this issue from nextstrain/ncov Apr 29, 2021
@jameshadfield
Copy link
Member Author

jameshadfield commented Apr 29, 2021

One implementation would be to look for a top-level "branch_labels" key in the node-data JSON. For instance, we could modify augur clades to produce something like

"nodes": {
    "NODE_0000549": {
           "clade_membership": "20A"
    }
},
"branch_labels": {
    "NODE_0000549": {
          "clade": "20A",
    }
}

This would imply that augur clades had additional arguments along the lines of

augur clades --name clade_membership --root-label clade

Which would remove the need for augur export v2 to special case "clade_annotation" as well as allowing custom branch labels from arbitrary node-data inputs.

@jameshadfield jameshadfield self-assigned this May 27, 2021
jameshadfield added a commit that referenced this issue May 27, 2021
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(1) Mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(2) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branch_labels` as described in [1]. This data is exported
in the appropriate format for Auspice. This provides an way for
analyses to export custom branch labels other than the two special-
cases described above.

(Currently no augur commands can produce such data, but this will
change. See [1] for more.)

[1] #720
jameshadfield added a commit that referenced this issue May 27, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-trait "clade_membership"
and defined the basal nodes of each clade with the node-trait
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and produced a branch label with the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branch_labels` structure. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use.

To preserve backwards compatibility if neither key is specified, we
use the previously hardcoded key names, thus allowing workflows
to complete without needing to be updated.

Closes #720
jameshadfield added a commit that referenced this issue May 28, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-trait "clade_membership"
and defined the basal nodes of each clade with the node-trait
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and produced a branch label with the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branch_labels` structure. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use.

To preserve backwards compatibility if neither key is specified, we
default to trait-name="clade_membership" and label-name="clade, which
will be exported from `augur export v2` correctly without needing
any configuration changes.

Closes #720
jameshadfield added a commit that referenced this issue Jun 10, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-attr "clade_membership"
and defined the basal nodes of each clade with the node-attr
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and turned it into a branch label of the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branch_labels` structure. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use via the `--attribute-name` arg.

This commit breaks backwards compatibility for pipelines as the default
attribute name is "clade". This will result in dataset (auspice) JSONs
with the same branch labelling as before, but with a different node-attr
(was "clade_membership", now "clade"). As `augur export v2` will make
colorings for all node-attrs in in node-data JSONs, this will be
exported as a "clade" coloring with no changes needed, however auspice
config JSONs may now refer to a non-existent "clade_membership" key.

`augur export v2` has been updated to no longer special-case
`clade_membership` or `clade_annotation` node attrs. We print a
warning if an auspice config JSON refers to `clade_membership` to
help users update their configs.

Functional tests for `augur clades` have been added.

Closes #720
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(1) Mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(2) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branch_labels` as described in [1]. This data is exported
in the appropriate format for Auspice. This provides an way for
analyses to export custom branch labels other than the two special-
cases described above.

(Currently no augur commands can produce such data, but this will
change. See [1] for more.)

[1] #720
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-attr "clade_membership"
and defined the basal nodes of each clade with the node-attr
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and turned it into a branch label of the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branch_labels` structure. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use via the `--attribute-name` arg.

This commit breaks backwards compatibility for pipelines as the default
attribute name is "clade". This will result in dataset (auspice) JSONs
with the same branch labelling as before, but with a different node-attr
(was "clade_membership", now "clade"). As `augur export v2` will make
colorings for all node-attrs in in node-data JSONs, this will be
exported as a "clade" coloring with no changes needed, however auspice
config JSONs may now refer to a non-existent "clade_membership" key.

`augur export v2` has been updated to no longer special-case
`clade_membership` or `clade_annotation` node attrs. We print a
warning if an auspice config JSON refers to `clade_membership` to
help users update their configs.

Functional tests for `augur clades` have been added.

Closes #720
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(1) Mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(2) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branches` as described in [1] and the test data added here [2].
This data is exported in the appropriate format for Auspice (unchanged).
This provides an way for analyses to export custom branch labels other
than the two special-cases described above. Note that currently no augur
commands can produce such data, but this will change - see [1] for more.

This work also induced two smaller changes. The auspice config JSON
schema is extended the default branch label displayed to be any value.
Secondly, the requirement for node-data JSONs to specify "nodes" has
been relaxed (see [2] for an example); if neither "nodes" nor "branches"
are defined then we raise a validation error.

[1] #720
[2] ./tests/functional/export_v2/branch-labels.json
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-attr "clade_membership"
and defined the basal nodes of each clade with the node-attr
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and turned it into a branch label of the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branches` dictionary. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use via the `--attribute-name` arg.

This commit breaks backwards compatibility for pipelines as the default
attribute name is "clade". This will result in dataset (auspice) JSONs
with the same branch labelling as before, but with a different node-attr
(was "clade_membership", now "clade"). As `augur export v2` will make
colorings for all node-attrs in in node-data JSONs, this will be
exported as a "clade" coloring with no changes needed, however auspice
config JSONs may now refer to a non-existent "clade_membership" key.

`augur export v2` has been updated to no longer special-case
`clade_membership` or `clade_annotation` node attrs. We print a
warning if an auspice config JSON refers to `clade_membership` to
help users update their configs.

Functional tests for `augur clades` have been added.

Closes #720
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(1) Mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(2) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branches` as described in [1] and the test data added here [2].
This data is exported in the appropriate format for Auspice (unchanged).
This provides an way for analyses to export custom branch labels other
than the two special-cases described above. Note that currently no augur
commands can produce such data, but this will change - see [1] for more.

This work also induced two smaller changes. The auspice config JSON
schema is extended the default branch label displayed to be any value.
Secondly, the requirement for node-data JSONs to specify "nodes" has
been relaxed (see [2] for an example); if neither "nodes" nor "branches"
are defined then we raise a validation error.

[1] #720
[2] ./tests/functional/export_v2/branch-labels.json
jameshadfield added a commit that referenced this issue Jun 15, 2021
Previously the `augur clades` command produced a node-data JSON
which stored clade membership as the node-attr "clade_membership"
and defined the basal nodes of each clade with the node-attr
"clade_annotation". `augur export v2` interpreted the latter as a
special-case and turned it into a branch label of the same name.

The previous commit allowed `augur export` to be supplied node-data
JSONs with a `branches` dictionary. Here we update `augur clades`
to export data in this structure, which allows the user to specify
the keys to use via the `--attribute-name` arg.

This commit breaks backwards compatibility for pipelines as the default
attribute name is "clade". This will result in dataset (auspice) JSONs
with the same branch labelling as before, but with a different node-attr
(was "clade_membership", now "clade"). As `augur export v2` will make
colorings for all node-attrs in in node-data JSONs, this will be
exported as a "clade" coloring with no changes needed, however auspice
config JSONs may now refer to a non-existent "clade_membership" key.

`augur export v2` has been updated to no longer special-case
`clade_membership` or `clade_annotation` node attrs. We print a
warning if an auspice config JSON refers to `clade_membership` to
help users update their configs.

Functional tests for `augur clades` have been added.

Closes #720
@huddlej huddlej moved this from New to In Review in Nextstrain planning (archived) Jan 27, 2022
jameshadfield added a commit that referenced this issue Sep 9, 2022
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(i) AA mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(ii) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branches` as described in [1] and the test data added here [2].
This data is exported in the appropriate format for Auspice (unchanged).
This paves the way for pipelines to define a range of branch labels for
export. Currently the only usable key in this dict is 'labels'.

If a branch label (via node-data-json -> branches -> node_name -> label)
is provided for 'aa' or 'clade' then this will overwrite the values
generated above (i, ii).

A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.

[1] #720
[2] ./tests/functional/export_v2/branch-labels.json
jameshadfield added a commit that referenced this issue Sep 12, 2022
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(i) AA mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(ii) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branches` as described in [1] and the test data added here [2].
This data is exported in the appropriate format for Auspice (unchanged).
This paves the way for pipelines to define a range of branch labels for
export. Currently the only usable key in this dict is 'labels'.

If a branch label (via node-data-json -> branches -> node_name -> label)
is provided for 'aa' or 'clade' then this will overwrite the values
generated above (i, ii).

A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.

[1] #720
[2] ./tests/functional/export_v2/branch-labels.json
jameshadfield added a commit that referenced this issue Apr 11, 2023
Previously branch labels could not be specified in data passed to
`augur export v2` except for two "special cases":
(i) AA mutations (stored in node-data-json -> nodes) would create branch
labels "aa", if applicable.
(ii) `clade_annotation` (stored in node-data-json -> nodes) was
interpreted to be the "clade" branch label, and exported as such.

Here we extend the allowed node-data structure to include a top-level
key `branches` as described in [1] and the test data added here [2].
This data is exported in the appropriate format for Auspice (unchanged).
This paves the way for pipelines to define a range of branch labels for
export. Currently the only usable key in this dict is 'labels'.

If a branch label (via node-data-json -> branches -> node_name -> label)
is provided for 'aa' or 'clade' then this will overwrite the values
generated above (i, ii).

A side-effect of this work is that the requirement for node-data JSONs
to specify "nodes" has been relaxed (see [2] for an example); however
if neither "nodes" nor "branches" are defined then we raise a validation
error.

[1] #720
[2] ./tests/functional/export_v2/branch-labels.json
@github-project-automation github-project-automation bot moved this from In Review to Done in Nextstrain planning (archived) May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

1 participant