Skip to content
This repository has been archived by the owner on Jul 15, 2021. It is now read-only.

dicarlo.BashivanKar2019 #39

Merged
merged 5 commits into from
Feb 4, 2021
Merged

dicarlo.BashivanKar2019 #39

merged 5 commits into from
Feb 4, 2021

Conversation

jjpr-mit
Copy link
Contributor

No description provided.

@jjpr-mit jjpr-mit requested a review from mschrimpf August 24, 2020 06:01
@mschrimpf mschrimpf requested a review from stothe2 August 24, 2020 13:35
@mschrimpf
Copy link
Member

@stothe2 do you think it makes more sense to have many (>12) separate assemblies (@kohitij-kar suggested this) or merge those assemblies into one, but then having many NaN entries (that can again be filtered out by later splitting into separate assemblies)?
My intuition is that we should keep an assembly more self-contained, i.e. have one or two assemblies for this paper, so that we don't flood the assembly lookup.

@kohitij-kar
Copy link

kohitij-kar commented Aug 24, 2020

@mschrimpf and @stothe2 — Just to directly register the conversation, I am pasting it below. Also, I don’t understand assemblies etc very well. I don’t think of data that way anyways. So I actually never made any specific suggestions: just explained what I thought is the case with this dataset. All I mean is please don’t consider my opinion very strongly during making decisions for this. Please do what you think is best here.

Jon: Each monkey looked a a completely different set of synthetic images, is that correct? If I made synthetic into one single DataAssembly, it would be mostly NANs. So, would it be appropriate to make three separate assemblies?

I assume I need to package curvature and naturalistic as one assembly each, too, to give context to the synthetic. So that's five assemblies total.

Until now the data I've packaged had already been chopped into tidy rectangular blocks. Where there are differing numbers of reps in a single assembly, do I chop off the excess, or leave some NANs in?

Ko: Here are my thoughts on the issues you mention.

  1. Yes I think it’s best to treat each monkey, each session as separate.
  2. Naturalistic images are same for both monkeys. But the curvature stimulus was different dependent on what receptive field we had for the neurons. I would either leave that out for now (and push this to later discussions) or include them as separate for each monkey.
  3. I would use nans and not chop off any data
    The problem may be that assemblies etc had been discussed so far with our usual data collection in mind. This was a very different type of experiment. So the data that exists is not completely consistent with everything else (yet I believe there is a way to make it such).

@mschrimpf
Copy link
Member

thanks Ko! This data is actually a really good case for us to more thoroughly define what we mean by an "assembly" -- i.e. whether it is a recording session or rather an entire experiment.
I'm leaning towards the latter because I think that users will mostly want to retrieve data for an experiment rather than for individual sessions. (And having all the data for a paper in one package feels more clean)

@kohitij-kar
Copy link

I agree. I will never advocate for an assembly per recording session. That’s too much. But I think as you will see that beyond that — things will quickly get more complicated as we do different types of experiments (which I believe will be a marker of the lab actually making SoTA experimental progress and not just churning out the same old stuff at more quantity). For now, assemblies per dataset grouped by purpose of collection, e.g paper or specific analysis, sounds reasonable to me.

@stothe2
Copy link
Collaborator

stothe2 commented Aug 24, 2020

@jjpr-mit Looking at the code you pushed, Jon, one thing that could be done to reduce the number of assemblies is to combine the sessions (at least). So, Monkey M/OHP/Session1/Nat - Monkey M/OHP/Session1/Synth (since the naturalistic stimuli in a given monkey and session were used for synthesis for same monkey and session), etc.

As for rest of the discussion (single vs multiple assemblies), I'm ambivalent. I see both as a short term solution (like Ko said, we need to start thinking of experiments people will be doing in the next ~2 years -- amygdala, muscimol, stimulation -- and come up with a more intuitive solution for the heterogeneous data).

@jjpr-mit
Copy link
Contributor Author

So, Monkey M/OHP/Session1/Nat - Monkey M/OHP/Session1/Synth

@stothe2 Yes, I thought about combining each synth/nat pair, since they align on the neuroid axis. I'd have to combine the synth and nat StimulusSets to do that, though, and they've got completely different metadata, so I settled for separating the stages of the experiment.

@stothe2
Copy link
Collaborator

stothe2 commented Aug 24, 2020

I'd have to combine the synth and nat StimulusSets to do that, though, and they've got completely different metadata

Fair point. It shouldn't matter for the assembly (since you would only include those columns which are common to both StimulusSets, for example, image_id), but I guess you're now dealing with NaNs in the StimulusSet (like dicarlo.BOLD5000). Even so, I would prefer having them combined since it makes it obvious to the user that they're meant to go together (i.e., we used responses to these naturalistic images for mapping, and then used the resulting mapped model for synthesising these other images).

@jjpr-mit
Copy link
Contributor Author

jjpr-mit commented Sep 2, 2020

only include those columns which are common to both StimulusSets

The metadata from the StimulusSet are automatically merged with the assembly on load.

Trying to combine two StimulusSets with different metadata into one just so we can combine pairs of assemblies introduces complexity, and reducing complexity is the point of combining. So I think it defeats the purpose. Until we upgrade DataAssembly to handle heterogeneous non-aligned datasets, we can't accommodate combining these assemblies.

@jjpr-mit
Copy link
Contributor Author

jjpr-mit commented Sep 2, 2020

@stothe2 @mschrimpf @kohitij-kar
I'm trying to figure out the right way to collect non-aligned chunks of data together. The authors of xarray say that DataArray and Dataset are not intended for that:

https://stackoverflow.com/questions/53754243/xarray-hierarchical-data-organization

Finally, note that xarray (currently) does not have any version of a hierarchical data structure for non-aligned axes. Aligned axes are an intentional constraint of the data models for xarray.Dataset and xarray.DataArray. If you have sub-groups that are not aligned along common axes, you'll need to keep track of them in some separate data structure.
answered Dec 18 '18 at 18:52 shoyer

also:

I don't think we should hold up this PR to refactor BrainIO with a hierarchical data solution.

@mschrimpf
Copy link
Member

yes agree, I thought what @stothe2 proposed was to have one assembly per session (with a bunch of NaNs from non-overlap -- i.e. no refactoring, just nans)

@jjpr-mit
Copy link
Contributor Author

The shapes of the source data:

image

@jjpr-mit
Copy link
Contributor Author

How synthetic sessions will be arranged as a single assembly (black is NaNs):

image

The image is reduced to a single repetition from each session for clarity.

Also fix tests for dicarlo.BashivanKar2019 as only two assemblies

* master:
  Rust305 (#51)
  Fix Kuzovkin 2018 (#50)
  Inplace (#47)
  dicarlo.Seibert2019 (#48)
  add ImageNet stimulus set (#45)
  Created exceptions in fetch and packaging for PropertyAssembly class that do not merge responses with stimulus_set (#42)
  Update lookup.csv (#44)
  Update Rajalingham2020 lookup (#43)
  Update lookup.csv (#41)
  use image_file_name without .png if present (#40)

# Conflicts:
#	tests/test_assemblies.py
#	tests/test_stimuli.py
@stothe2
Copy link
Collaborator

stothe2 commented Jan 28, 2021

yes agree, I thought what @stothe2 proposed was to have one assembly per session (with a bunch of NaNs from non-overlap -- i.e. no refactoring, just nans)

From what it looks like, there are two NeuronAssemblies (and corresponding StimulusSets) -- one for naturalistic and one for synthetic. This works too!

@jjpr-mit jjpr-mit requested review from stothe2 and removed request for mschrimpf and stothe2 February 3, 2021 22:48
@jjpr-mit jjpr-mit merged commit ff93098 into master Feb 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants