Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvperPos isn't working #4

Open
jhs7574 opened this issue Sep 4, 2019 · 15 comments
Open

ConvperPos isn't working #4

jhs7574 opened this issue Sep 4, 2019 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@jhs7574
Copy link

jhs7574 commented Sep 4, 2019

Hi, I setup NASC-seq analysis pipeline in our lab's ubuntu system

I setup config.py with gencode.v31.primary_assembly.annotation.gtf
strandedness.csv in NASC-seq/data folder
NASCseqModel.stan in NASC-seq/data folder

I use your data from GSE128273, which is your data from your paper

It worked well until annotate step(annotated_sorted_bam is created), however, conversion tag step yield empty bam file and empty Postag.csv except header. So, I can't go to next step.

I think addTags function in ConvperPos.py caused this problem

this line :
read.set_tag('ST',strandedness.loc[read.get_tag('XT')][1])

How can I solve this problem?

Thanks.

@gjhendriks
Copy link
Collaborator

Thanks for getting in touch.
Could you perhaps copy (some lines from) the result from samtools view on your annotated_sorted_bam file?

@jhs7574
Copy link
Author

jhs7574 commented Sep 4, 2019

10 lines from annotated_sorted_bam file

10lines.txt

@gjhendriks
Copy link
Collaborator

You do not have an ST tag in the bam file that you sent. If the gene IDs in the gtf do not match the gene IDs in the strandedness.csv file, this would be a problem. Can you confirm that these are the same gene IDs?

@jhs7574
Copy link
Author

jhs7574 commented Sep 4, 2019

I see.
the gene IDs in the gtf do not match the gene IDs in the strandedness.csv file.
Then, Should I make new strandedness.csv file for our gtf file?

I saw there is CreateStrandinfo.py in scripts folder but it isn't working for our gtf file though

@gjhendriks
Copy link
Collaborator

Yes, you should make a new strandedness.csv file with the same format. I will update this in the next few days to be done automatically from the provided gtf file. Thanks for the feedback!

@gjhendriks gjhendriks self-assigned this Sep 4, 2019
@gjhendriks gjhendriks added the enhancement New feature or request label Sep 4, 2019
@jhs7574
Copy link
Author

jhs7574 commented Sep 11, 2019

Hi, I created new strandness.csv file for my gtf file based on your example strandness.csv file
However, I can not find ST tag in annotated_sorted_bam file.
I checked XT tag and strandness gene ID and I can't find any problem(it was matched)

I attach my strandness file.
strandedness.csv.txt

Thanks.

@gjhendriks
Copy link
Collaborator

Exactly what parts of the analysis did you rerun (i.e. which flags?). Could you again send me the top ~100 lines lines of this new bam file that still fails ConvperPos?

@jhs7574
Copy link
Author

jhs7574 commented Sep 11, 2019

I rerun entire process from the start

Here is my annotated_sorted_bam file

sorted.bam.txt

@gjhendriks
Copy link
Collaborator

Did you try to run ConvperPos? Sorry for the confusion, but the ST tag is actually added in the first step by ConvperPos. It then proceeds to annotate the conversions. The problem you had before was the non-matching strandedness IDs and gene IDs, not necessarily the lack of the ST tag directly. After this step, you should see and ST tag, as well as the conversion tags in the header.

@jhs7574
Copy link
Author

jhs7574 commented Sep 11, 2019

Okay, I will try ConvperPos.py directly to my bam file, than I will tell the result.

Thanks

@gjhendriks
Copy link
Collaborator

How did it go? Did you manage to add the conversion tags to the bam file in the end?

@jhs7574
Copy link
Author

jhs7574 commented Sep 17, 2019

Yes, It was my fault that I make strandness.csv as 'Tab' separated not 'Comma' separated.
So, I can successfully create postag.csv and tagged bam file

During the tagging process, It printed the message says:

/ssd-data/workspace/support/tool/anaconda3/envs/python2.7/lib/python2.7/site-packages/pandas/core/frame.py:6692: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

sort=sort)

I think it is just warning message that can ignore but I will let you know.

After tagging, I run vcfFilter, however there is an error says:

Error in seq.default(min(x, na.rm = na.rm), max(x, na.rm = na.rm), length = breaks) :
'from' must be a finite number
Calls: heatmap.2 -> seq -> seq.default
In addition: Warning messages:
1: In min(x, na.rm = na.rm) :
no non-missing arguments to min; returning Inf
2: In max(x, na.rm = na.rm) :
no non-missing arguments to max; returning -Inf
Execution halted

And no files are created

Is there anything that I should check?
I will attach my Postag.csv file.

Thanks for getting in touch.

SRR8724279_Aligned.sortedByCoord.out.bam_removeDupl.bam.featureCounts.bam_PosTag.csv.txt

@gjhendriks
Copy link
Collaborator

Do I understand correctly that you are running through this with only a single file? The VCF filter step checks how often a certain conversion occurs over different cells and reads. It is important to note that this requires multiple cells to be compared. If you want to run this on a single cell, you can use './data/posfile.csv' to run it through using the SNPs that we detected in Jurkat cells. Alternatively, I would suggest running this with a few (or all) cells. You can then remove some of the cells from the next steps that take more computing time...

Also good to note, this step outputs a single pdf file, which shows the top converted positions and the top detected positions (sorted in that order). The next step will then actually filter the conversion tags in the headers for these positions.

@jhs7574
Copy link
Author

jhs7574 commented Sep 17, 2019

I am running with 2 files for testing(one is stimulated, the other is not).

You mean that the error may occur when I try to run with just 1 file?

@gjhendriks
Copy link
Collaborator

gjhendriks commented Nov 12, 2019

Did you solve this issue in the end or are you still stuck at this step?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants