Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ProteinStructureStore #10

Merged
merged 5 commits into from
Oct 10, 2024
Merged

Add ProteinStructureStore #10

merged 5 commits into from
Oct 10, 2024

Conversation

AntonOresten
Copy link
Member

@AntonOresten AntonOresten commented Oct 7, 2024

This PR implements the Deserializer type (see #9). It's currently a lazy AbstractDict{String,ProteinStructure}, and deviates slightly from the original suggestion in that you iterate through the structures using values(deserializer), defined implicitly by explicitly defining Base.keys and Base.iterate.

As mentioned in #9, I'm also considering some kind of Serializer type for appending structures to files, but this can already be done with serialize(path, structures; mode="cw"). Still, the current interface provides no way to e.g. delete an entry without importing HDF5. A Serializer type could implement Base.delete!(serializer, key) method -- or should serializers strictly write, in the same way deserializers strictly read?

Should also reconcile the overlap between this and my make shift ProteinDataset type from #9. The deserialize function could be implemented as collect(values(Deserializer(filename))).

The commit also includes some minor package reformatting.

@AntonOresten AntonOresten self-assigned this Oct 7, 2024
Copy link
Collaborator

@bicycle1885 bicycle1885 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This looks good if you could add some tests to prevent regression in the future.

@AntonOresten AntonOresten changed the title Add Deserializer Add ProteinStructureStore Oct 9, 2024
@AntonOresten
Copy link
Member Author

I've replaced Deserializer and ProteinDataset with a more comprehensive ProteinStructureStore. ProteinDataset was really just a glorified Dict. I figured in-memory structures are best handled manually.

I also again made a lot of miscellaneous changes. Please excuse the clutter.😅

The tests should cover most of the ProteinStructureStore code. I am not sure if by "regression" you meant performance regression or just failing code. I saw BioStructures.jl has a suite of benchmarks. A similar setup could be used to automatically report performance regressions of pull requests.

Copy link
Collaborator

@bicycle1885 bicycle1885 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. By regression I meant avoiding functionality failures, not performance issues.

@bicycle1885 bicycle1885 merged commit a5fc925 into main Oct 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants