Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing crosslink attributes #118

Closed
wants to merge 1 commit into from
Closed

Conversation

aozalevsky
Copy link

Some mandatory attributes are missing from the interface in the current version:

id
group_id
seq_id_1
seq_id_2
comp_id_1
comp_id_2

@codecov
Copy link

codecov bot commented May 25, 2023

Codecov Report

Merging #118 (8f06848) into main (104640e) will decrease coverage by 2.76%.
The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #118      +/-   ##
==========================================
- Coverage   99.76%   97.01%   -2.76%     
==========================================
  Files          29       28       -1     
  Lines        7296     6728     -568     
  Branches     1749     1121     -628     
==========================================
- Hits         7279     6527     -752     
- Misses         11      180     +169     
- Partials        6       21      +15     
Impacted Files Coverage Δ
ihm/reader.py 99.47% <100.00%> (-0.35%) ⬇️

... and 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 104640e...8f06848. Read the comment docs.

Copy link
Contributor

@benmwebb benmwebb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, python-ihm does not store IDs (instead, we use pointers to other Python objects) and does not read redundant information (which this is). All of this information is already in the cross link object.

@aozalevsky
Copy link
Author

aozalevsky commented May 25, 2023

From a user perspective, it's a somewhat unintuitive behavior. mmCIF is a table-based format, so i see a table in the mmcif file, i see the field names and expect to be able to access this information.

For instance, now you can get asym and atom from xl, but not the residue (neither residue number nor residue name). You have to guess, that residue is linked inside the experimental_crosslink, but the Residue object doesn't have a comp_id attribute. In other words, you have to make multiple hops to get a) the data, that typically goes and accessed together b) the data that is already in the table.

I don't see a reason why one would want to impose a non-redundant scheme (doing essentially manual filtering/separation of the information ) if redundancy is a part of the format.

Linking to instances is completely ok unless it obscures the information. Again, as in the example with Residue, which doesn't have a comp_id attribute, it essentially removes the information that was already accessible and forces you to do several additional hops to get a residue name. The same goes to id and group_id for a specific restraint. The id is useful when, for instance, you want to select a specific restraint from a file. And for some classes ids are preserved:

https://github.com/ihmwg/python-ihm/blob/104640eb7ddd96d99099c10fb0b2e43b773cd16f/ihm/reader.py#L1243-L1244C13

why not be consistent?

@benmwebb
Copy link
Contributor

From a user perspective, it's a somewhat unintuitive behavior. mmCIF is a table-based format, so i see a table in the mmcif file, i see the field names and expect to be able to access this information.

That is not how python-ihm is designed. The internal representation is a hierarchy of Python objects, not a bunch of tables.

For instance, now you can get asym and atom from xl, but not the residue (neither residue number nor residue name). You have to guess, that residue is linked inside the experimental_crosslink, but the Residue object doesn't have a comp_id attribute. In other words, you have to make multiple hops to get a) the data, that typically goes and accessed together b) the data that is already in the table.

No guessing is required. We can certainly add convenience properties where necessary to reduce the number of hops though.

Linking to instances is completely ok unless it obscures the information. Again, as in the example with Residue, which doesn't have a comp_id attribute

Nothing has a comp_id attribute. The only place this information is stored is ChemComp.id. Everything else must reference a ChemComp by design. In your example it would be trivial to add a comp property to Residue if that's what you need. Then you can just say r.comp.id, no duplication needed.

@benmwebb
Copy link
Contributor

I've added convenience accessors so this information should be available for a given ResidueCrossLink object xl:

  • id = xl._id
  • group_id = xl.experimental_cross_link._id
  • seq_id_1 = xl.residue1.seq_id
  • seq_id_2 = xl.residue2.seq_id
  • comp_id_1 = xl.residue1.comp.id
  • comp_id_2 = xl.residue2.comp.id

This should work for the majority of depositions. If memory serves there are one or two where they elected to enforce cross-links on different residues from those identified experimentally (these are easy to see because the comp_ids in the mmCIF file are not all LYS, for example). python-ihm doesn't currently handle that; see #119.

(If your intention is to preserve data exactly as read from the mmCIF file, python-ihm probably isn't the best tool for the job because it is not designed to do that. Although you can use its low-level classes if you want to read the file as a bunch of tables, there are other tools such as Biopython which can do that too.)

@benmwebb benmwebb closed this May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants