Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default nucleotide and amino acid sequence #510

Merged

Conversation

fengelniederhammer
Copy link
Contributor

@fengelniederhammer fengelniederhammer commented Jul 16, 2024

resolves #454

Summary

Changes in this PR:

  • when no default nucleotide sequence is set in the config, set it to absent (instead of defaulting to "main")
  • read the default amino acid sequence from the config (instead of hardcoding it to absent)
  • validate that the values are actually present in the reference genomes (if set)
  • if there is only one sequence of a type in the reference genomes, also set the default sequence in the database config
  • add a unit test that checks that relevant mutation queries work without a sequenceName field, if the config has the default sequences.

PR Checklist

Copy link
Contributor

github-actions bot commented Jul 16, 2024

This is a preview of the changelog of the next release:

0.2.9 (2024-07-18)

Features

  • have no default sequence by default, implement default amino acid sequence from config (448a6de), closes #454
  • set the default sequence when there is only a single sequence in the reference genomes (092c65b), closes #454
  • throw error in preprocessing when default sequence is not in reference genomes (c739ad6), closes #454

Bug Fixes

  • streaming: report an error when the unimplemented offset or limit matters (3be9c06), closes #511

@fengelniederhammer fengelniederhammer force-pushed the 454-default-nucleotide-and-amino-acid-sequence branch from 1666205 to f92a472 Compare July 16, 2024 20:35
Copy link
Collaborator

@Taepper Taepper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was quick! A few minor comments

src/silo/config/database_config.cpp Outdated Show resolved Hide resolved
@@ -126,6 +128,42 @@ Database Preprocessor::preprocess() {
);
}

void Preprocessor::checkConfig() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this function name a little confusing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the name or because of the content? I'm also not super happy with the name. Suggestions are welcome ;)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name made me think that it would not mutate the config
Maybe finalize / resolve?

Otherwise an extra class ValidatedDatabaseConfig which also takes the ReferenceGenomes in its constructor would be viable

or we can leave it as a TODO for #426 , where we want to specify the unaligned sequences (and I would also say the aligned nuc and aa sequences) in the database config to reduce this dependency. Then the reference genomes is no longer a config for the sequences in the database but only a map for looking up the references

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good point. check doesn't imply mutating. I renamed it to finalizeConfig.

ValidatedDatabaseConfig is only partially true, because you also need to validate it against the data (e.g. that all metadata fields are present etc.). So I'm not sure whether that would be a good name either.
If we want to redesign this anyway, then I'd propose to leave the rest to #426.

@fengelniederhammer fengelniederhammer force-pushed the 454-default-nucleotide-and-amino-acid-sequence branch 2 times, most recently from 05a167e to 245bea9 Compare July 17, 2024 10:16
@fengelniederhammer fengelniederhammer force-pushed the 454-default-nucleotide-and-amino-acid-sequence branch from 245bea9 to f15a7fb Compare July 18, 2024 07:15
@fengelniederhammer fengelniederhammer merged commit fa77c5a into main Jul 18, 2024
10 checks passed
@fengelniederhammer fengelniederhammer deleted the 454-default-nucleotide-and-amino-acid-sequence branch July 18, 2024 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default nucleotide and amino acid sequence
2 participants