Initial Python Version of Genotyping Script-Not Tested Yet #609
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Initial Python Version of Genotyping Script
Description:
This Pull Request was created in response to issue #403. This pull request introduces the initial Python version of our genotyping script. Previously, this logic was implemented in Bash, and this PR aims to transition that logic to a more maintainable and readable Python format.
Please note: I have not thoroughly checked or tested this script. This submission is intended to serve as a starting point for further refinement and optimization. Feedback, suggestions, and thorough reviews are highly encouraged to ensure the quality and functionality of the code.
Additional Note:
I have divided the original Bash script into sections to facilitate the transition from the Bash script to Python. I've added the corresponding line numbers from the Bash script as comments within the Python script for reference and easier tracking. This should aid in understanding the structure and mapping the Python code back to its Bash counterpart.
Rooms to improve:
External Command Sanitization: Please ensure that all inputs to
os.system()
calls are sanitized and validated. This is crucial to prevent potential command injection vulnerabilities.Variable Initialization: The comments in the script mention that certain variables (like
GTDIR
andcleaned_output_vcf
) should be defined elsewhere. Ensure these variables are initialized correctly in the relevant parts of the code.Code Repetition: The current version has repeated lines, such as
prepare_sample_lists(args.FAMFILE, GTDIR)
andsetup_genotype_counts_header(GTDIR)
.Code Optimization: To optimize the logic and methods in the script.