Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml configuration input #19

Merged
merged 24 commits into from
Dec 4, 2024
Merged

Yaml configuration input #19

merged 24 commits into from
Dec 4, 2024

Conversation

MichaelBarnett
Copy link
Collaborator

Background

  • There was a need for better control on parsing database locations, as well as a system to easily add additional parameterisation into the commec screen program, that was not limited to CLI, thus removing the need for clunky shell/bash scripts.
  • Commec now expects a mandatory default "commec-config.yaml" file as input, found in the repository.

Dependencies:

  • pyyaml is required now, and has been added to the environment.yml python dependencies.

New features

  • The location of the file can be specified as cli input to Commec Screen, however by default it will search for commec-config.yaml, which is also now part of the repository.
  • Commec Setup CLI will update the base path of config file when databases are installed.
  • Database location no longer a mandatory input, as it can be derived from config yaml, however yaml behaviour can be overridden by forcing a database location in CLI when called Commec Screen.
  • Yaml file will always override the blast/diamond protein search tool string - consider removing blast/diamond cli input for screen - kept to make this a non-breaking change for now.
  • By parsing custom yaml files when calling Commec Screen, a user can easily automate the testing of different databases.
  • Allows end user ease of use when installing commec dbs using setup CLI.

Breaking changes

nil, commec screen should still function identically using old calling methods, where the database directory is set via screen CLI.

Copy link
Member

@alexanian alexanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to run some more tests, but here's a few comments!

commec/setup.py Outdated
@@ -617,6 +619,9 @@ def do_setup(self):

os.remove(filename_zipped)

# We update the default path of the configuration file to point to the newly installed databases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions here:

  1. Do we want to allow users to supply a configuration filepath (vs using the default one installed with the package)?
  2. Should we move the name of the file (also referenced in ScreenConfig defaults) to a constants module so we can import it and it only lives in one place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved as:

  1. No, it is nice to overwrite this (though there may be a touch of confusion when users update the package and overwrite the default; hopefully resolvable, though).
  2. Yes; proposed in 8a3ac8d

@@ -0,0 +1,20 @@
base_paths:
Copy link
Member

@alexanian alexanian Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-level suggestion: could we put all the defaults in here (vs. tracking some here and some in ScreenConfig)?

Could turn into something like:

base_paths:
  default: commec-dbs/
databases:
  benign:
    cm:
      path: '{default}benign_db/benign.cm'
    fasta:
      path: '{default}benign_db/benign.fasta'
    hmm:
      path: '{default}benign_db/benign.hmm'
  biorisk_hmm:
    path: '{default}biorisk_db/biorisk.hmm'
  regulated_nt:
    path: '{default}nt_blast/nt'
  regulated_protein:
    blast:
      path: '{default}nr_blast/nr'
    diamond:
      path: '{default}nr_dmnd/nr.dmnd'
screen_logic:
    in_fast_mode: false
    skip_nt_search: false
    protein_search_tool: blastx
parallelisation:
    threads: 1
output_handling:
   force: false
   resume: false

We could then throw an error if we can't find the default config.yaml and are missing overrides from the CLI or a user-supplied config? We already raise a FileNotFoundError if the default yaml can't be found. This wouldn't require users to change their CLI calls (we're just moving around where the defaults are stored) I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 100% the end-game purpose of the yaml, I just didn't add all the other commands from screen yet , as just wanted to get the core of it (yaml system and db directories) up and running.
Could certainly update the parsing to include all relevant cli args in this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that we want to do this eventually, but this should be backlogged, and for now we just want to remove the protein_search_tool piece (since then the config file is more cleanly just about specifying databases, to a level of specificity that was not possible before), proposed change in: 059ad43

"but was invalid as a yaml file:\n",
e,
)
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this return value is unused? I'm a bit confused about this vs line 172

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it more, I think we should decide if the desired behavior here is "raise an exception" (would match with raising an exception when the file is not found) or "no YAML overrides" (probably not viable now that we made the databases argument optional)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now raises an exception as of 8409257

commec/config/screen_tools.py Show resolved Hide resolved
Copy link
Member

@alexanian alexanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested out; clearly going to be very useful!

I'm hoping @MichaelBarnett you can take a quick look over before squash + merge, since I made a few of the changes we talked about this afternoon (e.g. removing the protein_search_tool key in the default config).

commec/setup.py Outdated
@@ -617,6 +619,9 @@ def do_setup(self):

os.remove(filename_zipped)

# We update the default path of the configuration file to point to the newly installed databases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved as:

  1. No, it is nice to overwrite this (though there may be a touch of confusion when users update the package and overwrite the default; hopefully resolvable, though).
  2. Yes; proposed in 8a3ac8d

"but was invalid as a yaml file:\n",
e,
)
return {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now raises an exception as of 8409257

@@ -0,0 +1,20 @@
base_paths:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided that we want to do this eventually, but this should be backlogged, and for now we just want to remove the protein_search_tool piece (since then the config file is more cleanly just about specifying databases, to a level of specificity that was not possible before), proposed change in: 059ad43

@MichaelBarnett
Copy link
Collaborator Author

Ran some tests, and is working as intended, nicely done on the updates here, looking forward to working with adding in the rest of the settings soon too!

@MichaelBarnett MichaelBarnett merged commit bb75e5b into main Dec 4, 2024
@MichaelBarnett MichaelBarnett deleted the yaml-input branch December 4, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants