Skip to content
This repository has been archived by the owner on Oct 25, 2018. It is now read-only.

Rest of my plan #5

Closed
6 of 7 tasks
mih opened this issue May 20, 2018 · 9 comments
Closed
6 of 7 tasks

Rest of my plan #5

mih opened this issue May 20, 2018 · 9 comments

Comments

@mih
Copy link
Contributor

mih commented May 20, 2018

Just FYI, stop me if this is all wrong.

  • create basic analysis script to use as payload for showing reproducibility
  • create analysis container image (Singularity recipe in this repo) with all the bits to go from DICOMs to results. with (potentially split image into multiple, one for DICOM->BIDS, one for BIDS->results):
    • heudiconv + dcm2niix
    • fsl (something free would be better, but somebody else would need to draft the script for that)
  • new demo script with a containerized data processing workflow
  • create "workspace" image with just enough software to be able conduct an analysis using the analysis image (this could be the datalad-core singularity image that already has the needed pieces, but is presently lacking datalad-containers)
  • write the whole thing up as documentation
    • link to OHBM posters
      • 2046: which has everything about conducting analyses this way
      • possibly 2031: for creating/describing environments
      • 2000: all about ReproIn

Stuff is coming in via #4

@satra
Copy link
Contributor

satra commented May 20, 2018

perhaps we can chat about this early this week. specifically with respect to the examples in the following plan:

Introduction to reproducible neuroimaging: motivations
David Kennedy, University of Massachusetts, United States
8:30-10:00
FAIR Data - BIDS datasets
 Jeffrey Grethe [presenting] and Maryann Martone, UCSD, United States

talk 1: Intro to FAIR
exercise: 16 attributes of FAIR - e.g. Is there a clear license, what is a PID, What is meant by metadata, … 
link attributes for 2 modules below 
talk 2: Standardization and BIDS
exercise: dicom to BIDS conversion exercise: basic conversion (tie in w/ ReproIn in next section)
talk 3: FAIR Metadata: searching and using Scicrunch
exercise: BIDS metadata - participants.tsv and semantic annotation
talk 4: Brief Intro to NIDM
exercise: NIDM conversion tool to create sidecar file

10:00-10:15 coffee break
10:15-11:45
Computational basis
Yaroslav Halchenko, Dartmouth College, United States and Michael Hanke, Magdeburg Germany
talk 1: ReproIn : More on this?
Exercise: 
talk 2: Git/GitAnnex/DataLad: 
Exercise: 
talk 3: Everything Else 
Exercise: 
12:00-13:00  Lunch

13:00-14:30  Neuroimaging Workflows
Dorota Jarecka and Satrajit Ghosh, MIT, United States, Camille Maumet, INRIA, France 
talk 1: ReproFlow: Reusable scripts and environments, PROV
Exercise: Run, rinse, and repeat
talk 2: ReproEnv: Virtual machines/ContainersReproPaper, NIDM components
Exercise: Create different environments 
[talk 3: ReproTest: Variability sources (analysis models, operating systems, software versions)]
Exercise: Run analysis with different environments
14:30-14:45  Break

14:45-16:00  Statistics for reproducibility
Celia Greenwood, McGill University, Canada and Jean-Baptiste Poline, McGill University, Canada
Assumes we have a csv file with say 100 subjects and columns like: “age, sex, pheno1, pheno2… “
talk 1: evil p-value : what they are - and are not
Exercise: test with   
talk 2: 
Exercise: 
talk 3: 
Exercise: 
16:00-16:30  Conclusion & Getting Feedback
Nina Preuss, Preuss Enterprises, United States

@satra
Copy link
Contributor

satra commented May 20, 2018

i think we need to clarify and enhance each exercise within the next week or two, and have multiple people go through the exercises well before the session.

@satra
Copy link
Contributor

satra commented May 20, 2018

with respect to images, perhaps we can do either:

a. several small images for each task (the granularity of the task can be established separately)
b. one single image for everything

(a) is my current preference since it associates a small reusable component and allows easier maintenance of the image as software pieces change.

@mih
Copy link
Contributor Author

mih commented May 20, 2018

@satra I was aiming to fill the void of Yarik's and my exercises first .

re images: I am going for small ones (i.e. A). I see no advantage of B.

@satra
Copy link
Contributor

satra commented May 20, 2018

@mih - sounds good to me. i think there is some amount of redoing across exercises. just wanted to make sure we have a coherent picture.

we will try to finish the exercises for section 3 this coming week together with the talk outlines.

@djarecka
Copy link
Member

I see one disadvantage of A - we might end up with people who are running multiple containers at the same time and executing things in a wrong one.

@mih
Copy link
Contributor Author

mih commented May 21, 2018

@djarecka If you take a look the latest demo script you can see how much of container selection people would have to make in the datalad world:

https://github.com/mih/ohbm2018-training/blob/master/fsl_glm_w_amazing_datalad.sh

Pretty much none. One step, one dataset, one container.

@djarecka
Copy link
Member

@mih - ok, I didn't realize that people will run one script with docker run inside. I will read carefully and test it this week!

@mih
Copy link
Contributor Author

mih commented Jun 8, 2018

@mih mih closed this as completed Jun 8, 2018
satra added a commit that referenced this issue Jun 10, 2018
Accumulator for development outlined in #5
yarikoptic pushed a commit that referenced this issue Jun 17, 2018
'bids' (as a dataset name) -> localizer_scans
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants