Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about fragment size distribiton and error(NestedPathAbundanceEstimator) when using short read illumina RNA seq data. #63

Open
jjuhyunkim opened this issue Oct 7, 2024 · 3 comments

Comments

@jjuhyunkim
Copy link

Hi,

I have questions about fragment size using short read data. When I analyzed short read paired Illumina RNA data, the RPVG estimated the fragment distribution with a mean of 181.201 and a standard deviation of 54.2619. Should this estimation represent the whole paired-end fragment size, which corresponds to the cDNA library length, rather than the single-end read length? If so, I would expect the fragment mean size to be around 300-500, and I plan to use a mean of 500.

Additionally, I encountered an error message: NestedPathAbundanceEstimator::inferPathSubsetAbundance const: Assertion path_group.second.size() <= group_size' failed.` Could this error be related to the issue I mentioned above?

Thank you!

@zwh82
Copy link

zwh82 commented Oct 12, 2024

Additionally, I encountered an error message: NestedPathAbundanceEstimator::inferPathSubsetAbundance const: Assertion path_group.second.size() <= group_size' failed.`

I have encountered the same problem, have you solved it? My read distribution estimate is correct, it should be unrelated to the problem.

I checked all path_group. This situation only occurs twice, which should not be a common occurrence. I'm not sure what exactly caused it.

Path group ID: 2092, Size: 3, Group size limit: 2, Paths: 0 1 2 
Skipping path group due to size limit.
Path group ID: 2093, Size: 1, Group size limit: 2, Paths: 3 
Path group ID: 2094, Size: 3, Group size limit: 2, Paths: 4 5 6 
Skipping path group due to size limit.

@jjuhyunkim
Copy link
Author

I couldn't resolve this yet, what was your read length estimation? Did you use illumina short read data?

@jjuhyunkim
Copy link
Author

Hi @jeizenga,
Kind reminder regarding this issue😁 Could you please leave a comment if you have any ideas?

This is what error message states:

Running rpvg (commit: 301f553412a7f3b3c3dccad74e845868da4f0468)
Random number generator seed: 1730338817
Fragment length distribution parameters found in alignment (mean: 185.155, standard deviation: 61.6908)
Loaded graph, GBWT and r-index (6.56369 seconds, 2.55947 GB)
Fragment length distribution parameters re-estimated from alignment paths (location: 150.217, scale: 90.2194, shape: 78.2476)
Found alignment paths (3086.88 seconds, 2.55947 GB)
Clustered alignment paths (0.913452 seconds, 2.55947 GB)
rpvg: /home/rpvg/src/path_abundance_estimator.cpp:715: void NestedPathAbundanceEstimator::inferPathSubsetAbundance(PathClusterEstimates*, const std::vector<ReadPathProbabilities>&, std::mt19937*, const spp::sparse_hash_map<std::vector<unsigned int>, double>&) const: Assertion `path_group.second.size() <= group_size' failed.
/var/spool/slurm/slurmd/job39745670/slurm_script: line 4: 3779436 Aborted                 (core dumped) rpvg -t 50 --graph pantranscriptome.xg --paths pantranscriptome.gbwt --alignments aligned.gamp --output-prefix test -f pantranscriptome.txt --inference-model haplotype-transcripts

My alignmnet data is Illumnia short read paired RNA seq data and the graph was generated by cactus-pangenome using assemblies (2 haplotype from one sample and one haplotype of chm13 reference).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants