Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pan draft #205

Merged
merged 19 commits into from
Mar 28, 2024
Merged

Pan draft #205

merged 19 commits into from
Mar 28, 2024

Conversation

nicola-debernardini
Copy link
Contributor

Dear Johannes,

I hope this email finds you well.
It's been a while since our first meeting. I wanted to reach out regarding the new feature implemented in gapseq. Silvio mentioned that you would be interested in checking it out.

I'm getting close to the final version of the manuscript and thought it would be a good moment to make the pull request. I would like to submit within the next week if the last implementations works smoothly. I've also shared the manuscript with you via email ([email protected]).
The added code doesn't interfere much with the original one, except for the "home" script. I've added a new script ("src/pan-draft.R") along with its supporting functions in "src/pan-draft_functions.R". While I tried to follow your layout, there may be some differences in the structure. Please let me know if there's anything that needs fixing.

One aspect I'm unsure about is the implementation of plots using the functions in "src/pan-draft_plotting.R". By default, graphs are not generated, but they can be useful for visualizing the results. I implemented them month ago when I was checking the features of the individual pan-Draft that I was generating. However, depending on the number of genomes in the input, the results may not scale well on the axis and could be slow. There are two point in the script that make plots, one of them is related to the "micropan" library. If you have the time, I'd appreciate if you could take a look and provide your thoughts on whether to exclude them or not.

Additionally, I couldn't visualize the documentation section because of access limitation. If there are any issues with it, please let me know, and I'll make the necessary adjustments.

Thank you for taking the time to review the code and documentation.
I will be available for anything at [email protected].

Best regards,
Nicola

@jotech
Copy link
Owner

jotech commented Mar 22, 2024

Dear Nicola

thank you very much for your work and pull request!
I'm happy to merge. Could you think perhaps about some
smaller changes?

  • I get why you added txt files as list for input files, I think this is a bit confusing because it hides the nature of the input files. One alternative would be to allow the user to provide either comma-separated file names or a generic file name with an asterisk (*)
  • I would suggest putting all files into the toy folder without a subfolder
  • Should the plotting script be part of gapseq, or is it rather further analysis that could be put somewhere else? I'm asking because the other modules also do not provide plotting, and your script would introduce new dependencies (ggplot, etc) that are not taken care of
  • Is the gapseq version stored in the models? (there is a TODO at the beginning of pan-draft.R script)
  • Could you add an example to the documentation on continuing the pipeline after gapseq pan?

@nicola-debernardini
Copy link
Contributor Author

Dear Johannes,

Thank you for your suggestion. I agree and have already implemented them. Let me know what you think about them.

To provide an overview, I made the following changes:

  • Removed the .txt files as input options and added three alternatives: comma-separated file names, file names with wildcards, paths to folders;
  • Removed the subfolder from toy/ for better organization;
  • Improved the documentation by adding input examples to the modules and further examples for connecting the output to the pipeline;
  • Modified pan-draft.R to include the gapseq version in the species-level model and removed the plotting options and plotting code.

Please let me know if you have any further feedback or suggestions.

- input files defined by wildcards was not working correctly
- simplified code
- update example in help
@jotech
Copy link
Owner

jotech commented Mar 27, 2024

Dear Nicola
The additional commits look very good! I modified the input file handling a bit. Could you check it out?
Many thanks for your efforts!

@jotech jotech merged commit a354bbc into jotech:master Mar 28, 2024
@jotech
Copy link
Owner

jotech commented Mar 28, 2024

great thank you for the additional fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants