Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inputs and outputs for mapping #22

Closed
samriddhi99 opened this issue Aug 8, 2023 · 1 comment · Fixed by #25
Closed

Inputs and outputs for mapping #22

samriddhi99 opened this issue Aug 8, 2023 · 1 comment · Fixed by #25

Comments

@samriddhi99
Copy link
Collaborator

Currently the inputs being taken are entire scoresets. Out of all the data in the scoreset, only the urn, target sequence, uniprot, and target type are required for the mapping. In order to make it more efficient, is there a better way to obtain this data, instead of requiring an entire scoreset to be an input?

Additionally, in accordance with the new/anticipated changes in MaveDB, TaxID can be taken as an input to obtain additional required data.

@ahwagner
Copy link
Collaborator

This software should have as an input a target sequence, a set of variants represented on that sequence, and the sequence alphabet type (nucleic acid, amino acid)

The format for this can be specified by you.

The output format can also be specified by you, but should include:

  • the mapped sequence and associated metadata (minimally the refget and refseq sequence identifiers)
  • the mapping relationship (e.g. "homologous_to")
  • each original variant and its mapped variant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants