Consolidate atom collection classes into a single class #378
Replies: 4 comments 1 reply
-
Hi. Indeed this would be quite a large endeavour to implement your proposed change or a similar consolidation of all containers into a single one. Hence, I think this thread is a good place to compare the pros and cons of the proposed change. Here are my initial thoughts:
Based on these considerations I am currently in favor of the current data model, though I am open to a discussion about this topic. Nevertheless I think that we could have a look again at |
Beta Was this translation helpful? Give feedback.
-
These are all good points, but some things to keep in mind include:
# ...
coords = atoms.coords
if coors.ndim == 2:
coords = coords[np.newaxis, ...]
transformed = []
for mol in coords:
# this loop is called once for 2D coordinates, but in exchange the same code works
# for single molecules or collections
transformed.append(func_taking_2darray_returning_2darray(mol))
atoms.coords = np.array(transformed).squeeze()
# ...
If we could agree what an acceptable API for a single atom collection might look like, I think it would be possible to transition the code without introducing a disruptive change. |
Beta Was this translation helpful? Give feedback.
-
Due to the impact of such change, I think such a change would need a longer discussion preferably with additional input of other Biotite users. In my opinion both, the current implementation and your proposition, especially differ in how the user (and also library code) separates single-model and multi-model structures. Currently, this is done on the class level ( Here are some replies to your comments:
The common functionality of the two classes is already consolidated into
This behavior was implemented to give the user a defined output type (or coordinate shape in your approach) independent of the input file.
In all of Biotite's file classes
This is a good point in principle. However, I think the use case of this is quite small.
This is a little bit inconsistent indeed. At the time it was implemented, this type of indexing was chosen to be consistent with the model naming in PDB and CIF files. However, this inconsistency is not mitigated with a new data model.
I think for future code this is already OK, since it makes the same separation as
It is possible to reuse code for both |
Beta Was this translation helpful? Give feedback.
-
Yes, I think you're right. Having worked with the library a bit more over the past few days, working with multiple atom collection classes is not as cumbersome as I initially thought. Plus, since my use cases are small compared to the scope of the library, I'm sure I'm not fully appreciating the utility of some of the design choices. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I noticed a very minor bug in
.pdb
file output and investigated how much work it would be to fix and was struck at the complexity having three atom containers introduces in the code, especially in the structure io module. To me, the only difference between the three classes is the shape of the coords array, since an Atom is an AtomArray of coordinate shape (1, 3) or (3,), and an AtomArrayStack is, or could be, an AtomArray of shape (m, n, 3). I implemented an example of a single atom container in this gist.For simplicity, I consolidated the non-coordinate atom data in a numpy structured array and ignored for now boxes and bonds. There is an internal
_is_stack
attribute to simplify indexing. I addedto_pdb()
andfrom_pdb()
for the sake of illustration. Theto_pdb
method supports putting multiple models in a single file, adding TER lines, and aligns atom names.Is this of any interest? I understand it would be a massive undertaking to implement even if it was.
Beta Was this translation helpful? Give feedback.
All reactions