Micro-PR to add `alt_atom_id` to Biotite's internal CCD #668

Croydon-Brixton · 2024-10-04T10:18:05Z

Micro-PR to add alt_atom_id to Biotite's internal CCD:

Why? This is useful for resolving atom names in PDB files that do not use the standard atom ids, but use the alternative atom ids instead.

Croydon-Brixton · 2024-10-04T10:26:49Z

@padix-key the Test interfaces to databases and applications test fails due to

Error: Unable to locate executable file: /usr/share/miniconda/condabin/mamba. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.

This sounds like a CI/CD issue, I'm not sure there's anything I can do about this?

codspeed-hq · 2024-10-04T10:45:44Z

CodSpeed Performance Report

Merging #668 will not alter performance

_{Comparing Croydon-Brixton:feat/alt_atom_id (b3708ab) with main (231eefe)}

Summary

✅ 44 untouched benchmarks

padix-key · 2024-10-06T13:59:43Z

Hi, I checked the increased file size of the components.bcif: It went from 35.6 MB to 40.8 MB, which is tolerable in my opinion, when there is reasonable benefit.

However, I am not sure in which cases the alt_atom_id is actually used by the PDB: I had the impression that the PDB only uses the atom_id to name atoms in its entries. Do you have an example where alt_atom_id is used instead as atom name?

If alt_atom_id is actually used, one would probably need to adapt downstream functionality, e.g. connect_via_residue_names() to be able to from bonds between atoms with alternate names as well. Do you have other scenarios in mind where 'resolving atom names' is necessary?

padix-key

Apart from the conceptual discussion above, the change looks good to me, thank you. There is only the small outlined nitpick.

To make the changes in the internal CCD take effect, we need the built structure/info/ccd directory as well. It is built as artifact in the CI. I will add it to this PR when the CI finishes successfully.

setup_ccd.py

padix-key · 2024-10-06T14:25:51Z

This sounds like a CI/CD issue, I'm not sure there's anything I can do about this?

~~I deem this a temporary CI hickup as well. If this persists in a few days, I will look deeper into this.~~

Update: This error seems to be persistent, but I added a fix in #661.

nscorley · 2024-10-08T21:41:51Z

Hi,

Piling-on here: We've come across a number of entries where the atom_ids are annotated with the alternative atom ID's rather than the standard atom ID's, and even a few where a single residue mixes naming conventions.

An egregious example:

6q9t, where the ligand QUJ uses "C10" (an alternative atom name) rather than "CA" (the standard name), but uses the "standard" names for the other atoms

Providing the flexibility to return the alternative names enables us to write code handling these edge cases without significantly increasing overhead

nscorley · 2024-10-08T21:45:36Z

Hi, I checked the increased file size of the components.bcif: It went from 35.6 MB to 40.8 MB, which is tolerable in my opinion, when there is reasonable benefit.

However, I am not sure in which cases the alt_atom_id is actually used by the PDB: I had the impression that the PDB only uses the atom_id to name atoms in its entries. Do you have an example where alt_atom_id is used instead as atom name?

If alt_atom_id is actually used, one would probably need to adapt downstream functionality, e.g. connect_via_residue_names() to be able to from bonds between atoms with alternate names as well. Do you have other scenarios in mind where 'resolving atom names' is necessary?

In regards to connect_via_residue_names() -- will need to double-check our examples, but I'm 99% certain this problem is isolated to ligands; amino acids and other monomeric residues tend to be named correctly. Thus, when connecting polymers via canonical bonds, I don't think it'll matter. Broadly speaking, the most foolproof approach would be to check whether any of the provided atoms are non-standard at loading time, and to then map them to their standard equivalents; it's less clear to me whether this behavior should be the default given the additional overhead.

Croydon-Brixton · 2024-10-09T15:10:57Z

I think this is especially an issue as the PDB keeps making updates. From some older cif files I've got lying around I can see the alt_atom_id being used where the fresh download from the rcsb PDB now uses a standard atom name. I'm not sure how it's decided which naming to use, but it might be that the alt atom id is used to reference legacy versions of ligands from the CCD?

Croydon-Brixton · 2024-10-09T15:11:23Z

In any case -- I addressed the ordering of the blocks now ✅

nscorley · 2024-10-09T17:22:18Z

Hi,

Piling-on here: We've come across a number of entries where the atom_ids are annotated with the alternative atom ID's rather than the standard atom ID's, and even a few where a single residue mixes naming conventions.

An egregious example:

6q9t, where the ligand QUJ uses "C10" (an alternative atom name) rather than "CA" (the standard name), but uses the "standard" names for the other atoms

Providing the flexibility to return the alternative names enables us to write code handling these edge cases without significantly increasing overhead

Double-checked this example and it seems in the most recent PDB update that was fixed, as @Croydon-Brixton mentioned -- will look for others

padix-key · 2024-10-12T15:48:29Z

@nscorley Thanks for the pointer to the example entry. I will look into it. @Croydon-Brixton If you do not mind, I would also add the required changes to connect_via_residue_names() in this PR.

padix-key · 2024-10-15T07:38:56Z

I merged #661. After you rebase on the main branch, the test should work.

Croydon-Brixton · 2024-10-15T08:06:50Z

Amazing, thank you @padix-key -- I'm aiming to have these PR's updated for you later today or tomorrow. Will ping you when ready!

padix-key · 2024-10-19T08:05:03Z

And its merged. Thanks again!

padix-key requested changes Oct 6, 2024

View reviewed changes

setup_ccd.py Outdated Show resolved Hide resolved

Croydon-Brixton requested a review from padix-key October 9, 2024 15:57

padix-key mentioned this pull request Oct 12, 2024

Allow loading computationally predicted cif files, which may not have certain custom annotations #670

Merged

Croydon-Brixton added 4 commits October 16, 2024 10:00

feat: add alt_atom_id to biotite's internal CCD

769c74a

fix: remove accidental duplicate block

15df7de

chore: ruff format

8a1750b

fix: move alt_atom_id block below atom_id

98edf59

Croydon-Brixton force-pushed the feat/alt_atom_id branch from b3708ab to 98edf59 Compare October 16, 2024 17:00

padix-key approved these changes Oct 19, 2024

View reviewed changes

padix-key merged commit 3a7437d into biotite-dev:main Oct 19, 2024
25 of 26 checks passed

padix-key mentioned this pull request Oct 23, 2024

Bonds cannot be found for alternative atom names #684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro-PR to add `alt_atom_id` to Biotite's internal CCD #668

Micro-PR to add `alt_atom_id` to Biotite's internal CCD #668

Croydon-Brixton commented Oct 4, 2024

Croydon-Brixton commented Oct 4, 2024

codspeed-hq bot commented Oct 4, 2024 •

edited

Loading

padix-key commented Oct 6, 2024

padix-key left a comment

padix-key commented Oct 6, 2024 •

edited

Loading

nscorley commented Oct 8, 2024

nscorley commented Oct 8, 2024

Croydon-Brixton commented Oct 9, 2024

Croydon-Brixton commented Oct 9, 2024

nscorley commented Oct 9, 2024

padix-key commented Oct 12, 2024

padix-key commented Oct 15, 2024

Croydon-Brixton commented Oct 15, 2024

padix-key commented Oct 19, 2024

Micro-PR to add alt_atom_id to Biotite's internal CCD #668

Micro-PR to add alt_atom_id to Biotite's internal CCD #668

Conversation

Croydon-Brixton commented Oct 4, 2024

Croydon-Brixton commented Oct 4, 2024

codspeed-hq bot commented Oct 4, 2024 • edited Loading

CodSpeed Performance Report

Merging #668 will not alter performance

Summary

padix-key commented Oct 6, 2024

padix-key left a comment

Choose a reason for hiding this comment

padix-key commented Oct 6, 2024 • edited Loading

nscorley commented Oct 8, 2024

nscorley commented Oct 8, 2024

Croydon-Brixton commented Oct 9, 2024

Croydon-Brixton commented Oct 9, 2024

nscorley commented Oct 9, 2024

padix-key commented Oct 12, 2024

padix-key commented Oct 15, 2024

Croydon-Brixton commented Oct 15, 2024

padix-key commented Oct 19, 2024

Micro-PR to add `alt_atom_id` to Biotite's internal CCD #668

Micro-PR to add `alt_atom_id` to Biotite's internal CCD #668

codspeed-hq bot commented Oct 4, 2024 •

edited

Loading

padix-key commented Oct 6, 2024 •

edited

Loading