Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added pdb_selb to filter by B-factor values #164

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ahmedselim2017
Copy link

Hi,

I was working on some Alphafold predictions and needed a tool to filter with B-factor and saw #163. So I have written a script called pdb_selb to filter atoms by their B-factor values.

However, as the signs < and > may interfere with shell redirect commands, I have added an option to select the operator that should be used instead of directly writing the operation and the threshold value as the same option.

Also, I have added tests and documentation for pdp_selb. But, as this is my first time contributing to this project please let me know if you have any feedback or improvement ideas!

@ahmedselim2017 ahmedselim2017 changed the title Pdb selb Added pdb_selb to filter by B-factor values Apr 9, 2024
Copy link
Member

@joaomcteixeira joaomcteixeira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahmedselim2017 !

Thanks for your contribution. I like the way to presented the PR, everything follows the pdb-tools strategy and architecture. I left some comments I think you should address.

@amjjbonvin @JoaoRodrigues The new tool here follows the pdb-tools "one script one job" paradigm and is something we don't have yet. Personally, I like it. Once the comments are addressed I agree with merging it. It should be [FEATURE].

Cheers,

pdbtools/pdb_selb.py Show resolved Hide resolved
pdbtools/pdb_selb.py Outdated Show resolved Hide resolved
@amjjbonvin
Copy link
Member

PS: One more question/comment: the selection should act on a residue basis, meaning by that that full residues should be kept/removed and not only a few atoms per residue. Not an issue for pLDDT, but the B-factors are atom-specific.

@ahmedselim2017
Copy link
Author

I have implemented a new option called filtering_mode to select if the mean (used Python's statistics.fmean instead of sum()/len() for better precision), minimum, or maximum B-factor of a residue should be used to filter residues.

While testing, I noticed that the code fails if there are nonconsecutive records for the same residue in a PDB file. The current code assumes the records of a residue should be consecutive and groups consecutive records with the same chain and residue ID to filter it.

We could mitigate this issue by sorting the PDB file using pdb_sort before filtering. Or, as this would also sort already sorted files, we could keep a list of the already processed chain and residue IDs, and if a non-processed record has the same IDs, we could throw an error and then direct the user to sort their PDB file using pdb_sort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants