Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit of 25 proteins for UniProt mapping. #244

Closed
ArnaudBelcour opened this issue Oct 26, 2022 · 3 comments
Closed

Limit of 25 proteins for UniProt mapping. #244

ArnaudBelcour opened this issue Oct 26, 2022 · 3 comments

Comments

@ArnaudBelcour
Copy link
Contributor

Hello,

First thank you for this package, it is very useful.

I have an issue when using UniProt with bioservices 1.10.4 on Python 3.8.10. I try to retrieve protein annotations from UniProt using the mapping function:

from bioservices import UniProt
uniprot_bioservices = UniProt(verbose=False)

protein_queries = ['Q89B22', 'P57224', 'Q89B11', 'P57601', 'P25749', 'P59526', 'P59491', \
        'P57659', 'P57263', 'P57337', 'P57411', 'Q89A85', 'P57576', 'Q89AJ8',\
        'P57524', 'Q89AK2', 'Q89B42', 'P57362', 'P59460', 'P57559', 'P57226', \
        'P57463', 'P57529', 'P57213', 'P57525', 'P57230']

data = uniprot_bioservices.mapping(fr='UniProtKB_AC-ID', to='UniProtKB', query=protein_queries)

print('Number of input proteins: {0}'.format(len(protein_queries)))
print('Number of mapped proteins: {0}'.format(len([i['from'] for i in data['results']])))
if 'failedIds' not in data:
    print('Number of failed mapped proteins: 0')
else:
    print('Number of failed mapped proteins: {0}'.format(len([i['from'] for i in data['failedIds']])))

There are 26 proteins that are given as a mapping query. But when I check the results, there are only 25 proteins even if the last protein has annotations on UniProt. I have tested with other number of proteins and it seems that there is a limit of 25 results for this mapping function.

But I do not find any information about it. Is it an error on my side with an option that I have missed?

@joaosegurilho
Copy link

I have also had this issue using u.search(). Was trying to query 200 proteins, but the results were always 25 proteins.

prot_search = u.search(query="+OR+".join(proteins), columns='id,annotation_score,lineage', frmt='tsv')

Using python 3.8.11 and bioservices 1.10.4

Also, I used u.search() because u.get_df() doesn't allow specific columns.

Thanks.

@cokelaer
Copy link
Owner

@joaosegurilho sorry for the later answer. I believe that in your case, using the parameter limit should help. For instance for 200 proteins, call the method with limit=200; if unsure, you can set a limit that is large enough e.g. 1000. :

prot_search = u.search(query="+OR+".join(proteins), columns='id,annotation_score,lineage', frmt='tsv', limit=200)

by default all uniprot calls have a limit of 25 results according to their API. I'll try to update the code to make it automatic in the future.

@ArnaudBelcour looks like there is also a limit of 25 on the mapping functionality but here I did not manage to implement the limit parameter. Not sure whether it is a bug in the uniprot API or not. So, the only solution for now is to split the input list in chunk of 25. similarly to the previous comment made here, I will try to update the code to make this process automatic in the future release.
best

cokelaer added a commit that referenced this issue Dec 28, 2022
Fixing #244

Also big cleanup (black), refactoring of REST parent class into
attribute.
@cokelaer
Copy link
Owner

Should be fixed in v1.11.0 now available on pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants