Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finding non-productive sequences(original/no hypermutation/no insertion/no deletions) #43

Open
decenwang opened this issue Mar 29, 2019 · 2 comments
Labels

Comments

@decenwang
Copy link

Hi Quentin,

According to your published article, How could you find out and extract the non-productive sequences (for new model construction) from the raw data? Do you have any good ideas?

In your article, page 3, you mentioned: "By contrast, V and J usage varied moderately but significantly across individuals,......., suggesting possible primer-dependent biases." How could you understand this fact? after selection, the survived T cells are MHC-dependent, and the MHCs in individuals are substantially distinctive.
Thanks!

Cheers,

Decen

@qmarcou
Copy link
Owner

qmarcou commented Apr 1, 2019

Hi @decenwang

The non coding sequences are defined as sequences that are known not to code for a viable receptor containing either a frame shift or a stop codon within the CDR3 region.There is a slight difference with the non productive ones, a term designating any sequence that does not code for a viable receptor (e.g a sequence producing a non folding but in frame receptor). We hypothesize that non coding sequences are non productive, the reverse is a priori not true.

One can find the non coding sequences via sequence alignment of the genomic templates, allowing to find the CDR3 position. There is for now no built in way of doing this via IGoR and you may want to use a different software (for now) for this pre processing step.

As for your third point I'm not sure I understand the question since you abbreviated the original sentence of the paper:

By contrast, V and J gene usage varied moderately but significantly across individuals, and even more across sequencing technologies, suggesting possible primer-dependent biases.

Just to make sure we're on the same page, this sentence means primer-dependent biases are likely because differences are much larger among sequencing technologies than among different individuals using the same technology. Although I agree with your MHC point for productive sequences, please bear in mind that model learning on non coding sequences a priori only reflects statistics from V(D)J recombination and not central/peripheral selection. Differences in gene usage among individuals on models learned on non coding sequences only reflect an individual's V(D)J genes properties (positions, number of copies etc)

Best,
Quentin

@decenwang
Copy link
Author

Hi Quentin, @qmarcou
Thanks for your reply. I think you are right. Anyway I have another idea. primer -dependent bias is innate for primer pairs(bias 1). different sequencing technologies will enlarge this bias(bias 2), and also, templates in individuals are diverse, so the primer pair may find the suitable template(bias 3). However, I have spent more 1 month learning IGoR, but still not keep forward. I will post questions as few as possible. Hopefully, you can respond when you are available.

Thanks a lot!

Best regards,

Decen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants