Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplification and storage saving potential for associations? #232

Open
Zehvogel opened this issue Oct 25, 2023 · 1 comment
Open

Simplification and storage saving potential for associations? #232

Zehvogel opened this issue Oct 25, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@Zehvogel
Copy link
Contributor

This might be more of a podio technicality but I think this is the better place to discuss this...
In most cases associations between edm4hep objects, like the MCRecoParticleAssociation will always be associations between objects of a collection A and a collection B e.g. between PandoraPFOs and MCParticles. However, we store the collection ID separately for every single entry in the association collection, even though they should all have the same collection ID.
It might be worthwhile to consider only saving the collection ID once and check for consistency during creation of the association collection... This would also have caught something like key4hep/k4MarlinWrapper#113 earlier.

@Zehvogel Zehvogel added the enhancement New feature or request label Oct 25, 2023
@tmadlener
Copy link
Contributor

This is probably more of a podio technicality indeed ;) I am not sure if this is worth the effort, but maybe you have a more specific use case (or an example of where this would benefit things in general) in mind?

A few considerations from my side:

  • The space savings (on disk) are probably very small since compression will kick in and have a very easy job to essentially do what you described.
  • In it's current form relations (and by extension associations and subset collections) require only one mechanism to handle effectively everything. Hence, only having the indices and storing the collection IDs separately, requires the implementation of a separate mechanism just to handle associations, and I would rather not complicate the podio backend(s) further, unless there is a (very) compelling use case to do so.
  • Broken associations when using edm4hep input  k4MarlinWrapper#113 is (more or less) unrelated to this. We could have checked for invalid collection IDs, but simply checking for the same ID would have succeeded, since all collection IDs are initialized to the same value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants