Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate output #43

Merged
merged 16 commits into from
Aug 20, 2021
Merged

Separate output #43

merged 16 commits into from
Aug 20, 2021

Conversation

rnmitchell
Copy link
Contributor

This PR provides the option to separate the final output files by sample and output separate files. This is useful when inputting directly into LLAMAS.

@rnmitchell
Copy link
Contributor Author

In addition to separating files, this PR changes how lusSTR deals with missing data. Previously, it would drop any allele with 0 reads. However, this data is necessary for EuroForMix... therefore now lusSTR does not drop any allele from the output.

@rnmitchell rnmitchell requested a review from standage August 20, 2021 13:40
@rnmitchell
Copy link
Contributor Author

This is ready for review @standage. I also plan to release the next version of lusSTR... it's been a minute since I last did that. :)

data = uas_load(infile, snp_type_arg)
data_filt = data.loc[data['Reads'] != 0].reset_index(drop=True)
data_filt = uas_load(infile, snp_type_arg).reset_index(drop=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is where you're retaining alleles with 0 reads.

Comment on lines +76 to +77
if data_filt.loc[j, 'Typed Allele?'] == 'No':
flag = 'Contains untyped allele'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the Typed Allele? column refer to whether there were any reads for that allele?

In any case, space and punctuation in column names can be problematic. If you have just added the column in this PR, I'd recommend using IsTyped instead, and boolean values (True/False) rather than "Yes"/"No" strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or AlleleIsTyped or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Typed Allele? column is from the Sample Details Report... it doesn't necessarily indicate an allele with 0 reads, but an allele with reads below the various thresholds, so can have a low number of reads as well as 0 (i.e. is the allele considered to be a real allele). The Yes/No is read directly from the Sample Details Report, so I'd prefer to leave that as is, but I can change the column name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Typed Allele? column is from the Sample Details Report

I see. Maybe worth just leaving it in then...

lusSTR/snps.py Outdated
Comment on lines 340 to 343
try:
os.mkdir(output_dir)
except FileExistsError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest:

os.makedirs(output_dir, exist_ok=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@rnmitchell rnmitchell merged commit 991761c into master Aug 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants