Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix key in _ma_model_list #23

Closed
gtauriello opened this issue Nov 15, 2024 · 7 comments
Closed

Fix key in _ma_model_list #23

gtauriello opened this issue Nov 15, 2024 · 7 comments
Assignees

Comments

@gtauriello
Copy link

gtauriello commented Nov 15, 2024

As discussed in other meetings, there is an issue with _ma_model_list since the primary key is ordinal_id but everything points to model_id. The possible solutions are to either move the primary key to model_id and deprecate ordinal_id or move parent-child relations to ordinal_id and deprecate model_id.

Practically the situation for current ModelCIF files (in ModelArchive, AlphaFold DB, AlphaPulldown, D-I-TASSER, ModBase and SWISS-MODEL) and tool support (python-modelcif) is as follows:

  • Nobody has yet used ModelCIF with multiple models in the same file or with multiple groups of models. In all examples, ordinal_id and model_id (and model_group_id) are identical.
  • The dumper for python-modelcif, assigns model_id numbering models within the same group and can hence produce non-unique model_ids if multiple groups are in the list (see example in tests)

Suggested solution:

  1. Change all child data items of _ma_model_list.model_id to be child data items of ordinal_id instead.
  2. Make _ma_model_list.model_id non-mandatory and change description to match how it is used in python-modelcif: "An identifier for the structural model within a model group. If the models do not need to be grouped into collections, this can be dropped or set to be the same as ordinal_id."
  3. Change last sentence of description of _ma_model_list.model_group_id to: "If the models do not need to be grouped into collections, this can be dropped or set to be the same as ordinal_id."
@benmwebb
Copy link
Contributor

Nobody has yet used ModelCIF with multiple models in the same file or with multiple groups of models. In all examples, ordinal_id and model_id (and model_group_id) are identical.

That's not quite true. ModBase does support multiple models in one file if you select multiple sequences in the Sequence Overview (e.g. https://modbase.compbio.ucsf.edu/modbase-cgi/model_search.cgi?dataset=dengue) and then request coordinate files in mmCIF format. But you'll get each model in a separate data block.

The dumper for python-modelcif, assigns model_id numbering models within the same group and can hence produce non-unique model_ids if multiple groups are in the list (see example in tests)

python-modelcif uses python-ihm to assign model IDs. They are globally unique. You can see duplicate model IDs in the output (as above) if a given model is placed in multiple groups, but there are not multiple models with the same ID.

@gtauriello
Copy link
Author

Ah ok. Thanks @benmwebb for the clarification.

In terms of multiple models in ModBase, I think that having them in separate data block is about the same as having them in separate files and so that doesn't change the conclusions.

For the case of having the same model in multiple groups: was there any practical use case for this in IHM? I cannot think of a situation where I would have that for computed structure models and so I would be ok with simply not allowing this type of duplication.
Conversely, there may also be no use case to have model_id to number models within a group which would make it fully redundant with respect to ordinal_id. In that case, we can simply deprecate the use of that item and describe it as a redundant field which is only kept for backward compatibility.

@benmwebb
Copy link
Contributor

In terms of multiple models in ModBase, I think that having them in separate data block is about the same as having them in separate files and so that doesn't change the conclusions.

Agreed (internally it does indeed just concatenate multiple files).

For the case of having the same model in multiple groups: was there any practical use case for this in IHM?

I've never used it but the dictionary allows for it.

@gtauriello
Copy link
Author

Ok in that case I would suggest that we deprecate model_id altogether and disallow duplicated models to show up in ma_model_list. For the dictionary this means (updated suggestion compared to above):

  1. Change all child data items of _ma_model_list.model_id to be child data items of ordinal_id instead.
  2. Make _ma_model_list.model_id non-mandatory and change description to: "A unique identifier for the structural model being deposited. Was practically a duplicate of ordinal_id and has been deprecated with dictionary version 1.4.7."
  3. Change description of _ma_model_list.ordinal_id to: "A unique identifier for the structural model being deposited."
  4. Change last sentence of description of _ma_model_list.model_group_id to: "If the models do not need to be grouped into collections, this can be dropped or set to be the same as ordinal_id."

@brindakv
Copy link
Contributor

In addition to the changes discussed above, I also suggest the following.

  1. Deprecate _ma_model_list.model_group_id and _ma_model_list.model_group_name
  2. Add ma_model_group, ma_model_group_link, ma_model_representative

The changes will allow for many-to-many relationships between models and model groups and follow definitions similar to IHMCIF.

All updates are implemented in #25 but can be rolled back if needed.

@gtauriello
Copy link
Author

Ok. Sounds reasonable. I will check in detail in #25 . One related comment though: the examples about _ihm_model_list in https://github.com/ihmwg/IHMCIF/blob/master/dictionary_documentation/documentation.md seem to be outdated as they use _ihm_model_list.model_group_id and _ihm_model_list.model_group_name...

@brindakv
Copy link
Contributor

brindakv commented Dec 6, 2024

Thanks for pointing out the outdated documentation in IHMCIF. We have not updated/maintained it after IHMCIF was included in https://mmcif.wwpdb.org/dictionaries/mmcif_ihm_ext.dic/Index/. Perhaps we should remove it completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants