-
Notifications
You must be signed in to change notification settings - Fork 14
Conversation
@stothe2 do you think it makes more sense to have many (>12) separate assemblies (@kohitij-kar suggested this) or merge those assemblies into one, but then having many NaN entries (that can again be filtered out by later splitting into separate assemblies)? |
@mschrimpf and @stothe2 — Just to directly register the conversation, I am pasting it below. Also, I don’t understand assemblies etc very well. I don’t think of data that way anyways. So I actually never made any specific suggestions: just explained what I thought is the case with this dataset. All I mean is please don’t consider my opinion very strongly during making decisions for this. Please do what you think is best here. Jon: Each monkey looked a a completely different set of synthetic images, is that correct? If I made synthetic into one single DataAssembly, it would be mostly NANs. So, would it be appropriate to make three separate assemblies? I assume I need to package curvature and naturalistic as one assembly each, too, to give context to the synthetic. So that's five assemblies total. Until now the data I've packaged had already been chopped into tidy rectangular blocks. Where there are differing numbers of reps in a single assembly, do I chop off the excess, or leave some NANs in? Ko: Here are my thoughts on the issues you mention.
|
thanks Ko! This data is actually a really good case for us to more thoroughly define what we mean by an "assembly" -- i.e. whether it is a recording session or rather an entire experiment. |
I agree. I will never advocate for an assembly per recording session. That’s too much. But I think as you will see that beyond that — things will quickly get more complicated as we do different types of experiments (which I believe will be a marker of the lab actually making SoTA experimental progress and not just churning out the same old stuff at more quantity). For now, assemblies per dataset grouped by purpose of collection, e.g paper or specific analysis, sounds reasonable to me. |
@jjpr-mit Looking at the code you pushed, Jon, one thing that could be done to reduce the number of assemblies is to combine the sessions (at least). So, As for rest of the discussion (single vs multiple assemblies), I'm ambivalent. I see both as a short term solution (like Ko said, we need to start thinking of experiments people will be doing in the next ~2 years -- amygdala, muscimol, stimulation -- and come up with a more intuitive solution for the heterogeneous data). |
@stothe2 Yes, I thought about combining each synth/nat pair, since they align on the neuroid axis. I'd have to combine the synth and nat StimulusSets to do that, though, and they've got completely different metadata, so I settled for separating the stages of the experiment. |
Fair point. It shouldn't matter for the assembly (since you would only include those columns which are common to both StimulusSets, for example, |
The metadata from the StimulusSet are automatically merged with the assembly on load. Trying to combine two StimulusSets with different metadata into one just so we can combine pairs of assemblies introduces complexity, and reducing complexity is the point of combining. So I think it defeats the purpose. Until we upgrade DataAssembly to handle heterogeneous non-aligned datasets, we can't accommodate combining these assemblies. |
@stothe2 @mschrimpf @kohitij-kar https://stackoverflow.com/questions/53754243/xarray-hierarchical-data-organization
also: I don't think we should hold up this PR to refactor BrainIO with a hierarchical data solution. |
yes agree, I thought what @stothe2 proposed was to have one assembly per session (with a bunch of NaNs from non-overlap -- i.e. no refactoring, just nans) |
…y two assemblies.
Also fix tests for dicarlo.BashivanKar2019 as only two assemblies * master: Rust305 (#51) Fix Kuzovkin 2018 (#50) Inplace (#47) dicarlo.Seibert2019 (#48) add ImageNet stimulus set (#45) Created exceptions in fetch and packaging for PropertyAssembly class that do not merge responses with stimulus_set (#42) Update lookup.csv (#44) Update Rajalingham2020 lookup (#43) Update lookup.csv (#41) use image_file_name without .png if present (#40) # Conflicts: # tests/test_assemblies.py # tests/test_stimuli.py
… Added NaNs to tests.
From what it looks like, there are two NeuronAssemblies (and corresponding StimulusSets) -- one for naturalistic and one for synthetic. This works too! |
No description provided.