Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maaps #52

Merged
merged 44 commits into from
Oct 30, 2015
Merged

Maaps #52

merged 44 commits into from
Oct 30, 2015

Conversation

averagehat
Copy link
Contributor

#22

@necrolyte2 necrolyte2 added this to the v1.2.0 milestone Aug 21, 2015
@necrolyte2 necrolyte2 mentioned this pull request Aug 21, 2015
4 tasks
@averagehat
Copy link
Contributor Author

@demis001 can you help me understand the code here?. A set will not maintain order so I can't tell what the if statements are meant to accomplish or what indexm = m.index(list(items)[0]) is for.

Also, I can't tell if the output is consistent (it is different in python 3 and python 2).

1909_Den4/AY618992_1/Thailand/2001/Den4_1           3181          36233  RTD               I/M/V
1909_Den4/AY618992_1/Thailand/2001/Den4_1           3179          36234  RTD               I/M/V

This corresponds to a place in the sequence which has:

WRTD starting at position 3180. How is the output supposed to be -- should it generate one line per Degenerate base, or one line per effected amino acid? (But how can you know effected amino acid without knowing the reading frame?)

Understanding this will help me to get the tests working.

@demis001
Copy link
Contributor

@mike,

I have tested for Python 2, not for Python 3. The input fasta file is a CDS
which means it is inflame, It start with "M" and ends with stop codon
somewhere at the end. There is no need to find open reading frame. The
purpose of the for loop is to find whether the codon has a degenerate
nucleotide, the input is something like this:
key: 972 codon: GAA / key: 993 codon: WCA. "Key" is the last base index
of the codon on the CDS string.

The purpose of "indexm = m.index(list(items)[0])" is to get the position
of degenerate nucleotide in the codon ( it either position 0, 1, 2), in
this case "W" at indexm = 0.

Since the last position of the codon is 993 on the CDS string, the exact
positions of "W" which has index 0 should be 993 -2 = 991

I don't know whether the "set" behaves the same under python 2 and 3.

Dereje

On Fri, Aug 21, 2015 at 4:23 PM, Mike Panciera [email protected]
wrote:

@demis001 https://github.com/demis001 can you help me understand the
code here?
https://github.com/VDBWRAIR/bio_pieces/blob/0793813139fcf9c4759348bcdaedd8e6a701928e/bio_pieces/ctleptop.py#L166-L189.
A set will not maintain order so I can't tell what the if statements are
meant to accomplish or what indexm = m.index(list(items)[0]) is for.

Also, I can't tell if the output is consistent (it is different in python
3 and python 2).

1909_Den4/AY618992_1/Thailand/2001/Den4_1 3181 36233 RTD I/M/V
1909_Den4/AY618992_1/Thailand/2001/Den4_1 3179 36234 RTD I/M/V

This corresponds to a place in the sequence which has:

WRTD starting at position 3180. How is the output supposed to be --
should it generate one line per Degenerate base, or one line per effected
amino acid? (But how can you know effected amino acid without knowing the
reading frame?)

Understanding this will help me to get the tests working.


Reply to this email directly or view it on GitHub
#52 (comment).

@averagehat
Copy link
Contributor Author

Could you explain how AA position is different from nt position?

@demis001
Copy link
Contributor

"AA position" is amino acid position after translation, nt position is the
position of nucleotide in nucleotide space.

On Mon, Aug 24, 2015 at 11:47 AM, Mike Panciera [email protected]
wrote:

Could you explain how AA position is different from nt position?


Reply to this email directly or view it on GitHub
#52 (comment).

@averagehat
Copy link
Contributor Author

This is working well, but the output is inconsistent--for example ARR is printed once while WCR is printed twice, even though they both contain two degenerate NTs and code to exactly two different AAs. This is not exactly incorrect but is definitely misleading.

Additionally, the Amino Acid positions are incorrect.

Everything else looks good. I am working on fixing these things.

@InaMBerry, do you want one line per codon, or one line per non-degenerate base, or do you not care? (either way you get the same amount of information, it just changes how it's presented.

# print ambi_codon["Y"]
for key, codon in sorted(codon_list.items()):
# print "key: ", key , "codon:", codon
if list_overlap(codon, ambi_nucl):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is some redundancy here

I think this can replace the if statement and should be quite a bit faster than the current implementation

items = set(m).intersection(ambi_nucl)
if items:

@necrolyte2
Copy link
Member

Wait, is this pull request already merged? Can we close it?

@necrolyte2
Copy link
Member

Sorry, ignore that. I'm confused why the pull request comparison is not showing the latest commit

${in_genbank} = tests/testinput/sequence.gb

*** Test Cases ***
TestAmos2Fastq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this should be TestCtleptop

@averagehat
Copy link
Contributor Author

The buid checks out, you can review if you again if youlike, although not much has changed.

necrolyte2 added a commit that referenced this pull request Oct 30, 2015
@necrolyte2 necrolyte2 merged commit 2f52f46 into dev Oct 30, 2015
@necrolyte2 necrolyte2 deleted the maaps branch January 6, 2016 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants