Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assembled contigs can not be circularized #233

Open
taha-lang opened this issue Aug 21, 2024 · 12 comments
Open

Assembled contigs can not be circularized #233

taha-lang opened this issue Aug 21, 2024 · 12 comments

Comments

@taha-lang
Copy link

Hello, for some samples, when assembling the chloroplast genome, the two contigs do not merge into a circular contig. I'm trying to find a solution, as well as to assemble them manually if possible, if the automatic command is not available. Here's the message from the novoplasty software.
LINKS BETWEEN CONTIGS

01----> 02 OR 1122512
1122512----> END_REVERSE
START----> 01

------Assembled contigs can not be circularized, some sequence is probably missing!------n

@parksonpurity
Copy link

I have the same issues with mitogenomes. Anyone who has a solution?

@taha-lang
Copy link
Author

Assembled contigs can not be circularized, some sequence is probably missing!------n
Hello friends, regarding the problem of circulazid cp assembli not being circulated, I'd like to share with you the solution found in my case,. it's about the reduce ambiguous N's parameter, you put yes. Then the assembly will be merged
Uploading Capture d’écran du 2024-09-08 11-09-58.png…

@parksonpurity
Copy link

Hi taha-lang, thanks for your reply. I did try in this way, but the same issue still popped up.

@ndierckx
Copy link
Owner

Sorry for the late reply, if you didn't resolved it yet, you can send me the extended log file (set option to 1) and I can have a look at it

@taha-lang
Copy link
Author

Thank you very much, Mr.Nicolas. Most of the issues have been resolved. I only have three cases left:

The first case is a sequence of 173,000 to 200,035 bp that is not merged or circularized.
The second case involves assemblies merged from 164,069 to 164,500 bp, but not circularized, and this size is too high compared with the circular of 162,930 bp.

The third case is a type that is circularized but contains 3 contigs that are not merged.
You will find the logs below."
log_Res114_chloro.txt
log_ResE98_chloro.txt
log_ResE110_chloro.txt
Many thanks for your help

@ndierckx
Copy link
Owner

ndierckx commented Oct 7, 2024

Res114_chloro stopped because the max length was set to 200000 so it stopped, if it should be shorter I am not sure why it didn't circularize, have you checked if it is complete?
The other 2 probably have low quality repeat regions that couldn't be merged.

But if you want me to get a better idea of the problem, I need the extended log, not the normal log...
Extended log gets created by setting that option to 1

@taha-lang
Copy link
Author

Thank you very much, Mr. Dierckxsens, I am very grateful for your response, and I apologize for the delayed reply. Please find below the log extend of the sample 114.
log_extended_Res114b_chloro.txt

@parksonpurity
Copy link

Hi taha-lang,

May I ask have you solved the other two? log_ResE98_chloro and log_ResE110_chloro, I am struggling with the same issues, but don't know how to fix it.

Hi Dierckxsens, I have attached the extended log, could you please help to have a look?

log_extended_B63.txt

@ndierckx
Copy link
Owner

@taha-lang
I checked the log you send, it didn't circularize, but the assembly is complete, it is just repeating itself.
So you can easily just remove the repeated part and then you have a fully circularized sequence
It is 162,940 bp long just search for the first 20 bp in a text editor, a remove the repeated sequence

Although the assembly was successful and therefore you don't need to worry about it much but it seems it was not able to pair the reads, so it assembled them as single end. Already saw that in your normal log file (Forward reads without pair: 4967748)
If you see such a high number of reads that couldn't be paired there is an issue. What is the format of the ids of both files, are they altered? Again not a problem for this assembly but some assemblies could be incomplete without the use of the paired end information....

@taha-lang
Copy link
Author

Thank you very much Mr. Dierckxsens for your help, and I would like to congratulate you on your novoplasty tool, which is simple and gives very precise results.

@taha-lang
Copy link
Author

Hi Mr. Tao
In my case, I replaced K-mers= 31 instead of 33, and I put yes instead of no, N ambigu = yes. this solved a few cases, and gave me a complete chloroplast genome.

@ndierckx
Copy link
Owner

@parksonpurity

Hi, I checked your log file, what kind of data do you have, which sample and just whole genome protocol?
Because the dataset looks quite weird, the coverage is starts out very low and then increases but diverse in many different sequences which it can't resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants