-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BrainGlobe Atlas API Version 2 #141
Comments
This would be great! I completely agree, the monological atlas idea can only produce growing problems in the future, we do need more flexibility.
I like this concept overall. If I get it correctly, the idea would be to have objects describing all parts of the atlas, where each can have independent versioning (as per later point) and storage place, as well as downloading time point. Difficulty here could only be the risk to overdo it, we'd have to find the soft spot in the tradeoff simplicity/flexibility. I still think minimal overhead over the data is a major strength of
I think this is feasible, given some requirements over availability (e.g., host has to mint a valid DOI) and some validation tools (see below). Hosting-wise, I do not like GIN very much for the current limitations (eg, the zip file download suboptimality) , and I don't think there's any mayor ongoing developing effort. I feel that DVC could be something very interesting to consider here and for the versioning.
I agree, this would give lot of room for optimisation.
Agree, a form could be a great starting point and can happen even before the rest of the points just to get in touch!
This would be nice. The good thing is that while grouping would pertain to the atlas semantics, the transformation would not, so one could develop it totally independently from the new atlas structure just as an additional layer/tool.
Although nice in principle, this would stretch the normative effort a bit too much imo. It would be better to give people clear ways to autonomously distribute data in a BrainGlobe-compatible way without deciding too many constrains a priori.
I think that this could be possible to do, but keeping a very stringent criteria on what you said - ensuring validity. As per my point above, this would require: 1) DOIs to guarantee accessibility and 2) tools for runtime (or first-download time) validation, with solid fallback options if the validation of a new version fails (we do not want to have the API blaimed for atlas developer inconsistencies, that can happen).
I don't know how much of this is a priority, you have a better idea of the current situation in the community. If it is, maybe it is conceivable to allow people to specify a different data backend for an atlas, as long as it provides a |
@vigji thanks for your feedback!
Yep. My idea was to split up the atlas into groups of files that are likely to be edited as one, namely:
As an example (mostly just because I'd already made the figure), two similar atlases (
Yes I think we should probably "outsource" some of the versioning to the many tools that already do it so well with a git-like system. DataLad is an option too.
👍
This is definitely low priority and may not see the light of day. I like the idea of expanding the idea of atlases though. To me (as an example) connectivity between regions is as much of an "atlas" as labels associated with voxels.
Yep, for example OME-Zarr looks like it could be a common interface for this type of thing. The question would be how to deliver an atlas in chunked way. Again, low priority. |
In general looks like a good start. Some initial impressions (mostly questions) from someone with minimal experience working with Atlases (so far)
|
We do currently, but I'm looking for a better solution for v2 if possible. TBH I don't think the current solution has caused any problems (yet), but it could be improved.
My idea is that if any constituent component of an atlas (e.g. annotation) changes, then that would trigger the generation of a new version of the atlas as a whole. However, if we use semantic versioning, it could be a patch/minor/major change.
I think there's a lot of things we won't be able to test in CI, as it's going to take too long. We could have a stand-alone "BrainGlobe Atlas Validator" tool though. I think this is going to be one of those things where the bulk of the value would come from testing a small number of things (are URLs valid, is the data the right size/shape etc). Are there the same number of meshes as brain regions etc etc.
There are a few with different levels of "curation". Some initiatives I know of:
|
Some ideas of where to start (e.g. what could be v1.5)
|
Resurrecting this. Even in the last year, the complexity of available atlases has increased considerably. There are multiple:
So while I think the general principle (store files separately & define via config) is a good one, we need to decide on what is an atlas? Certainly there should be some merging of what we currently refer to as an atlas (e.g. resolutions should be a parameter of a single atlas), we will need to decide on what is an atlas. Is it a coordinate space, an annotation etc? Do we stick with the atlases as published/released, or make our own "mega atlas" that combines data in the same space from multiple sources? This seems difficult for the community to adopt/report. |
Following discussion with @PolarBean, @aeidi89, @alessandrofelder, @IgorTatarnikov, @niksirbi and others, it seems as if the most promising way forward is to basically give up on the idea of defining "an atlas" in the rigid way we have been trying to do. Instead I propose we define an atlas as something like: "The collection of reference neuroanatomical data and metadata used in a specific analysis workflow". This means that an atlas is the collection of files used by the researcher, and could be unique to their study. In this comment I will summarise my understanding of the best way forward: General concept
The way that I imagine the users accessing the data they need would essentially be a decision tree, starting with species:
I'm not sure of the order of these necessarily, and of course only the relevant "decisions" should be presented to the user:
The API should support the access of these files individually (e.g. query for a specific annotation image) and together (define an atlas, and return a BrainGlobe atlas object). The second would allow backwards compatibility. HostingTo enable this "mix and match" approach, as outlined above, data should not be packaged up into an "atlas", but be hosted separately. We would need to decide how to store the meshes because we probably don't want to store them all individually (there will be hundreds of thousands eventually). However, there is a lot of overlap between the meshes for multiple annotation sets. Maybe we could store them in "batches" of ~10 meshes? This would make the API more complicated however. MetadataUnlike the proposed solution above where an atlas is defined by a config file, users would define what an atlas is for themselves. However, would we still want to define for the user some pre-set "standard" combinations (e.g. the Allen STPT atlas or the Waxholm MRI)? We will need metadata to define the atlas components (as defined above, but using openMINDS_SANDS (see #356). We will also need metadata to define the "tree" above, i.e. these reference images are in this coordinate space. VersioningAs above, every element should have it's own version Credit/reproducibilityTo ensure appropriate credit for those who create these resources and to ensure reproducibility, the API should enable user-facing tools to create:
Coordinate spacesCurrently in BrainGlobe, we have different resolutions of atlases. I think should be preserved for registration, but all results should be defined in the coordinate space, defined in physical units, not voxels. I.e. this means that data registered to two resolutions of the same reference image, should produce (approximately) the same result. Mapping between coordinate spacesA logical conclusion of this approach is to allow data to be moved between coordinate systems. I propose this should be shelved until version 3. HostingIt should be possible to host the data in multiple places (i.e. mirrors) to optimise performance and reduce downtime (cc @dbirman) I'm sure I've overlooked many elements, so anyone feel free to chime in. |
+1 for coordinate spaces in physical units, that's a known big source of confusion/frustration! |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/mri-template-for-ccfv3-space/101219/12 |
We should start to turn this general idea into smaller, actionable issues. Otherwise this will never get done. One idea to start with a version "1.5":
Then we can start adding new features of V2. |
This is essentially a reply to #96, but I'm starting a new issue to track this idea. Sorry for the long post, but interested in your ideas, @brainglobe/maintainers @brainglobe/swc-neuroinformatics @brainglobe/czi_eoss5.
After 2.5 years, and based on conversations with various users and atlas creators, I think it's time for
bg-atlasapi
version 2. Version 1 works very well for "classical" anatomical atlases (i.e. one reference image, and one annotation), but it doesn't cater well (or at all) for:kim_dev_mouse
andmpin_zfish
)allen_mouse
andperens_lsfm_mouse
)The atlas generation process also needs streamlining.
My idea for V2:
Move away from the monolithic atlas structure
The atlas could be defined by a config file, specifying atlas "elements" (an element being reference image, set of meshes etc):
When the atlas was downloaded, the atlas API would check to see which of these existing files had been downloaded, and then only download those required. The idea is that there would be a lot of overlap between atlases (same meshes at different resolutions, same reference image for multiple annotations etc.), and this would reduce download times and save disk space.
This would also allow data to be stored somewhere other than GIN. I'm not sure whether we want to do this, but it may be necessary for e.g. larger atlases (see below).
Improve versioning
Essentially as per #96. We could version the elements individually, and a versioned atlas could specify these, e.g.:
Improve the atlas generation process
I think the PR to
bg-atlasgen
has worked ok, but the repo itself needs a lot of refactoring to improve it. Submitting a new atlas could become more complex though, if the user is supposed to select which pre-existing atlas elements can be re-used. We end up spending a lot of time on these pull requests, so maybe we could:Introduce "relationships" between atlases
Lots of atlases are related in some way, e.g.:
It would be useful to introduce two concepts:
Include additional data
There are different types of atlas other than just brain regions (e.g. cell atlases). There is also a lot of publicly available data that is registered to an atlas (e.g. tracing data, gene expression). These data are as much an "atlas" as the brain region ones. I propose adding additional "elements" to cater to this. These elements could either be added to an existing atlas, or a new atlas could be created (without necessarily any annotation image, just the reference image to define the coordinate space). In some cases, this may include duplicating some functionality of
morphapi
. There are a lot of questions here about exactly what data to support and how to standardise it.Questions
Do we want to support data stored elsewhere?
My gut feeling is that in general BrainGlobe should ensure the validity of all atlases. However, for some atlases (e.g. bigger ones) maybe we want to allow hosting of files elsewhere and maybe mark them with a
community
tag or similar? This could also simplify the support for lab/project-specific atlases that we may not want to become a "proper" BG atlas.Should we support lazy loading for large atlases?
Some atlases are becoming very large (e.g. EM). We don't want to re-package these ourselves, and we definately don't want to download them locally. We could support N apis for lazy loading to support these type of atlases. I assume these atlases will only become more common, but we may not want to support them at all.
The text was updated successfully, but these errors were encountered: