Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEPMatch trimms sequence id #19

Open
HA1-biocopy opened this issue Nov 26, 2024 · 5 comments
Open

PEPMatch trimms sequence id #19

HA1-biocopy opened this issue Nov 26, 2024 · 5 comments

Comments

@HA1-biocopy
Copy link

HA1-biocopy commented Nov 26, 2024

Hi,
I'm testing pepmatch with my custom fasta file (--proteome_file). However, I noticed that pepmatch trimms the protein's id, e.g. in my file the id follows this syntax: xx|GENE-ID|genome
what pepmatch reports is GENE-ID.1 , why? how can I make it returns back the full id?
thank you

@dmx2
Copy link
Collaborator

dmx2 commented Nov 27, 2024

PEPMatch is mostly built around UniProt proteomes, as that is what most of our users use and the protein ID is typically between pipe (|) characters.

I think there should be a way to just use the entire FASTA header for output. I'll consider this as a feature to add.

@HA1-biocopy
Copy link
Author

Hi,

I noticed that adding '.1' as a suffix to the name can introduce complications downstream, especially since the name may already contain other symbols. I’ve tried changing the ID delimiter, but it still gets stripped and the suffix is added.

Do you have any suggestions on how to resolve this issue in the meantime, until the feature is implemented?

For context: we’re testing the tool as part of our evaluation process, with plans to purchase a license if it meets our internal needs.

Thanks in advance for your help!

@dmx2
Copy link
Collaborator

dmx2 commented Nov 27, 2024

I need to update the README, but if you pass sequence_version=False into the Matcher class (or -v if using the command line) and this will drop the '.#' suffix to the IDs. Let me know if that works for you.

@HA1-biocopy
Copy link
Author

HA1-biocopy commented Dec 9, 2024

thank you again for your help
As I'm still actively testing pepmatch, I noticed that the id of the query sequences is not included in the output. Could you please add that as well? or at least have similar format to blast where we can specifiy the columns we need in the output? thank you

@dmx2
Copy link
Collaborator

dmx2 commented Dec 11, 2024

yes, I will make that a part of the output as well

will update when I make the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants