Skip to content
This repository has been archived by the owner on Oct 25, 2018. It is now read-only.

VirtualBox list #15

Open
24 of 28 tasks
satra opened this issue Jun 5, 2018 · 56 comments
Open
24 of 28 tasks

VirtualBox list #15

satra opened this issue Jun 5, 2018 · 56 comments

Comments

@satra
Copy link
Contributor

satra commented Jun 5, 2018

@mjtravers - replacing issue #12 - just check off all the things you already have.

Core VM

  • memory 2G (needs to be tested)
  • cores 2
  • xenial LTS desktop
    • neurodebian sources
  • autologin full-screen
  • latest guest additions

general (see each section for additional local installs (listed under install)

  • git
  • git-annex-standalone
  • editors: vim emacs pico nano gedit
  • docker 18.03.1-ce
  • Singularity 2.5.1-dist
  • build-essential
  • wget curl

FAIR Data - BIDS datasets

Computational basis

Neuroimaging Workflows

  • containers
    • neurodocker (docker pull kaczmarj/neurodocker:master) - will ask @kaczmarj to cut a new release
  • fsl heudiconv (build using instructions below)
  • Reprozip/Reprounzip (installed via miniconda above into environment: section3)

Statistics for reproducibility

  • requirements installed into separate environment (environment: section4)

Others

  • Vagrantfile
  • All the neurodocker scripts for containers
  • niceman into containers?? (@yarikoptic @mjtravers - should this be done?)
  • instructions for building the singularity containers (testing these now - will add soon)
@djarecka
Copy link
Member

djarecka commented Jun 5, 2018

@satra - where are the instruction for the fsl&heudiconv container? I was thinking that building an image might be nice (and not only creating a Dockerfile), but I can "cheat" and use existing layers.

@satra
Copy link
Contributor Author

satra commented Jun 5, 2018

@djarecka - here you go:

# section 1
docker run --rm kaczmarj/neurodocker:master generate singularity \
  --base neurodebian:latest --pkg-manager apt \
  --install graphviz git wget \
  --miniconda \
    conda_install="python=3 pytest graphviz pip reprozip reprounzip \
       requests rdflib fuzzywuzzy python-levenshtein pygithub pandas" \
    pip_install="owlready2 pybids duecredit \
     https://github.com/incf-nidash/PyNIDM/archive/a90b3f47dbdafb9504f13a3a8d85fdff931cc45c.zip" \
    create_env="section1" \
    activate=true \
  --run-bash "cd /opt && \
    git clone https://github.com/incf-nidash/PyNIDM.git" > Singularity

# section 2/3
docker run --rm kaczmarj/neurodocker:master generate singularity  \
  --base neurodebian:stretch-non-free   --pkg-manager apt   \
  --install fsl-5.0-core fsl-mni152-templates \
  --install make gcc sqlite3 libsqlite3-dev python3-dev \
    libc6-dev python3-pip python3-setuptools python3-wheel \
  --run "pip3 install --system reprozip reprounzip" \
  --add-to-entrypoint "source /etc/fsl/5.0/fsl.sh" > Singularity


docker run --rm kaczmarj/neurodocker:master generate singularity \
  --base neurodebian:latest --pkg-manager apt \
  --install pigz python3-pip python3-traits python3-scipy  \
     python3-setuptools python3-wheel python3-networkx dcm2niix \
  --install make gcc sqlite3 libsqlite3-dev python3-dev libc6-dev \
  --run "pip3 install --system nipype \
    https://github.com/mvdoc/dcmstack/archive/bf/importsys.zip \
    https://github.com/nipy/heudiconv/archive/master.zip \
    reprozip reprounzip" > Singularity

@djarecka
Copy link
Member

djarecka commented Jun 5, 2018

I would add one Docker image to compare and use in the container lesson, e.g. the second image, so the neurodocker command is:

docker run --rm kaczmarj/neurodocker:master generate docker  \
  --base neurodebian:stretch-non-free   --pkg-manager apt   \
  --install fsl-5.0-core fsl-mni152-templates \
  --install make gcc sqlite3 libsqlite3-dev python3-dev \
    libc6-dev python3-pip python3-setuptools python3-wheel \
  --run "pip3 install --system reprozip reprounzip" \
  --add-to-entrypoint "source /etc/fsl/5.0/fsl.sh" > Dockerfile

And one T1w image would be great, e.g. ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz , but could be really any image, just want to use as an example for bet command

@kaczmarj
Copy link

kaczmarj commented Jun 6, 2018

i will be cutting a new neurodocker release this week after i add more examples.

by the way, in general i recommend running neurodocker with docker without -i/--interactive or -t/--tty. i run it with docker run --rm kaczmarj/neurodocker:master ....

@kaczmarj
Copy link

kaczmarj commented Jun 6, 2018

another minor point, pre-compiled reprozip wheels can be installed with pip now (see VIDA-NYU/reprozip#224).

@satra
Copy link
Contributor Author

satra commented Jun 6, 2018

@kaczmarj - it did not work day before yesterday when i tried. pip complained about compiling which is why a bunch of those additional dependencies were added.

@mjtravers, @yarikoptic - any chance you can take a look at this issue today? it would be good to cut a VM today or tomorrow if possible to have people play with it before we ask students to download.

@satra
Copy link
Contributor Author

satra commented Jun 6, 2018

@kaczmarj - i've updated the commands above without the -i (@djarecka - it may be useful to add the utility of i and t for docker in additional slides.

@mjtravers
Copy link
Contributor

I am running a VM build now that incorporates the section 1, 3, and 4 instructions above and from Al.
Will send out a notice when it is posted for downloading and review.

@mjtravers
Copy link
Contributor

I have posted an updated VM:
https://training.repronim.org/repronim-training-v0.2.ova

There are 2 conda environments set up named: "section1" and "section4":

  • section1 has the python env specified by Al's instructioins
  • section4 has the python env specified by JB's requirement.txt file (with a couple of version tweaks and minus the rpy2 package... was it decided to install R or is that still TBD?)

For section3, the kaczmarj/neurodocker:master image has been pulled into Docker and the following files are in the home directory:

  • Singularity.fsl
  • Singularity.heudiconv
  • Singularity.PyNIDM
  • Dockerfile.fsl
  • neurodocker.sh (Contains the scripts posted above to create the Singularity and Dockerfile files)

@djarecka
Copy link
Member

djarecka commented Jun 6, 2018

Thanks @mjtravers. I could I misunderstood @satra, but I thought that we include singularity/docker images inside, so people don't spend time to build them

@djarecka
Copy link
Member

djarecka commented Jun 6, 2018

screen shot 2018-06-06 at 13 13 25

@djarecka
Copy link
Member

djarecka commented Jun 6, 2018

@mjtravers - I'll wait for the next version of VM and will test the new conda environments

@mjtravers
Copy link
Contributor

The next version will be out later tonight. I discovered a couple of issues after I ran the build. The build is also taking a bit longer. I will message when ready

@mjtravers
Copy link
Contributor

The updated VM is now available. I believe this version has everything for sections 1, 3, and 4.
Section 2 is still in the works.

Download: https://training.repronim.org/repronim-training.ova
The VM size is now ~10GB

@satra
Copy link
Contributor Author

satra commented Jun 7, 2018

@mjtravers - that seems large for what it contains. i'll see if i can download and check it out.

for conda are you clearing out the environments post install?
example: https://github.com/kaczmarj/neurodocker/blob/master/examples/nipype_tutorial/Dockerfile#L117

also for any apt-get are you using eatmydata or some such?
example: https://github.com/kaczmarj/neurodocker/blob/master/examples/nipype_tutorial/Dockerfile#L26

also is the vagrant file somewhere? it may be slightly easier for me to build it than download it on my flight :)

@mjtravers
Copy link
Contributor

@satra No, I'm not doing any slimming of the file so I am sure there are a few GBs we could shave off.
I am using packer and building off a baseline Ubuntu Desktop ova. I can post the files to this git repo and the baseline ova to the training.repronim.org. Let me set it up

@djarecka
Copy link
Member

djarecka commented Jun 7, 2018

don't know if that helps, but just checked the size of the environments:

(section4) vagrant@nitrcce:~$ du -hs /home/vagrant/miniconda2/envs/*
361M	/home/vagrant/miniconda2/envs/section1
65M	/home/vagrant/miniconda2/envs/section3
1.3G	/home/vagrant/miniconda2/envs/section4

so together around 1.7

@satra
Copy link
Contributor Author

satra commented Jun 7, 2018

as soon as @jbpoline shares the notebooks, we can bring down the size of section 4. i'm sure all he needs is jupyter pandas seaborn scipy (and their dependencies) and may be statsmodels :)

let me know when the packer file is available. i'll try to build it on our cluster remotely. it's going to take the rest of my flight to download that ova!

@kaczmarj
Copy link

kaczmarj commented Jun 7, 2018

you can also save about 200 MB by installing jupyter-notebook as notebook instead of the entire jupyter package. and if you need jupyterlab, you can install that from conda-forge as jupyterlab.

the jupyter package installs big dependencies like qt5, which probably are not necessary for the vm.

@djarecka
Copy link
Member

djarecka commented Jun 7, 2018

I've tested the section1 by running:

cd workspace/Indiv_Diffs_ReadingSkill/
~/PyNIDM/bin/BIDSMRI2NIDM.py -d ~/workspace/Indiv_Diffs_ReadingSkill
 cd ~/nidm-training/
python rdf-age-query.py -nidm ~/workspace/Indiv_Diffs_ReadingSkill/nidm.ttl

and I got:

sub-12 - 2.096 - http://purl.org/nidash/nidm#_998c5d57-6a64-11e8-9b22-080027d6419f
sub-14 - 3.176 - http://purl.org/nidash/nidm#_998c5d5f-6a64-11e8-9b22-080027d6419f
sub-01 - 1.726 - http://purl.org/nidash/nidm#_998c5d2b-6a64-11e8-9b22-080027d6419f
sub-21 - -2.364 - http://purl.org/nidash/nidm#_998c5d7b-6a64-11e8-9b22-080027d6419f
...

So I didn't get any error, but I'm not sure if these is a proper output, should be "subject IDs, age of each subject, and the assessment ID" (the age looks pretty low to me for reading skills, but didn't read anything about the experiment)

For the section 4, I only tested a few things: importing pandas, numpy, opening jupyter notebook and lab. It seems to be working fine now.

@mjtravers
Copy link
Contributor

I have a pull request in containing the VM build scripts.

I will take a look at those tutorials later today and see how much we can slim down the file.

There is a cleanup.sh script in place to add size-reducing code. Now that the code is out there, feel free to edit.

@dbkeator
Copy link

dbkeator commented Jun 7, 2018

@djarecka: My last pull request for PyNIDM had some changes to BIDSMRI2NIDM but I didn't explicitly copy the tool from it's development location ([https://github.com/incf-nidash/PyNIDM/tree/master/nidm/experiment/tools]) to the bin folder so those copies may be out of date.

I'll take a look...

@djarecka
Copy link
Member

djarecka commented Jun 7, 2018

@dbkeator - so the output should be different?

@jgrethe
Copy link
Contributor

jgrethe commented Jun 7, 2018 via email

@dbkeator
Copy link

dbkeator commented Jun 7, 2018

@djarecka @mjtravers
Hi Folks, so I tried to replicate what Dorota did with the latest OVA:

   cd workspace/Indiv_Diffs_ReadingSkill/
   ~/PyNIDM/bin/BIDSMRI2NIDM.py -d ~/workspace/Indiv_Diffs_ReadingSkill
   cd ~/nidm-training/
   python rdf-age-query.py -nidm ~/workspace/Indiv_Diffs_ReadingSkill/nidm.ttl

In the OVA, first, I received an error that PyNIDM wasn't installed. So I issued the following command:
python ~/PyNIDM/setup.py install

Then I received an error that pybids wasn't installed. So, I issued the following command:
cd ~
git clone https://github.com/INCF/pybids.git
cd pybids
python setup.py install

Then, I received an error that urllib.parse doesn't have a module named quote. This was a curious error because urllib comes with python....which led me to the biggest problem: We have installed python2.7 via miniconda. PyNIDM was written for python 3.x and thus the current problem with the urllib package but likely many other downstream problems.

So, I couldn't test the things Dorota did.

@djarecka How did you test section1 with the current OVA file given the python 2.7 install?

Thanks!

@djarecka
Copy link
Member

djarecka commented Jun 7, 2018

@dbkeator - I did everything using the conda environment created by Matt, so first thing I did was source activate section1. Sorry, should have included it in my post.

That is the environment that is purely for this part, so should have everything, and if not @mjtravers should know.

@djarecka
Copy link
Member

djarecka commented Jun 7, 2018

@jgrethe - thank you for the confirmation!

@dbkeator
Copy link

dbkeator commented Jun 7, 2018

@djarecka Got it, that worked. Appears the query is also working. I didn't realize the ages were funky....

@yarikoptic
Copy link
Member

BTW one personally anything issue for me is that shortcuts on Win+number configured as shortcuts to various heavy applications such as libreoffice

@satra
Copy link
Contributor Author

satra commented Jun 8, 2018

i'm having some trouble importing the ova on our older virtualbox on our cluster. can someone verify the md5sum below?

$ md5sum repronim-training.ova 
bad86aace872ed38c46f9e30c9d86d62  repronim-training.ova

(i can't test on osx as it will take me 4 days to download under my current connection)

@satra
Copy link
Contributor Author

satra commented Jun 8, 2018

@djarecka verified that the above md5sum is correct.

@satra
Copy link
Contributor Author

satra commented Jun 8, 2018

and in case others have the same import appliance issue on some flavor of linux/virtualbox combo, this link helps solve it: http://installfights.blogspot.com/2018/05/how-to-fix-virtualbox-error-when-you.html

@jbpoline
Copy link
Contributor

jbpoline commented Jun 9, 2018

@satra @mjtravers @kaczmarj
yes - absolutely - there is way to much stuff there - working on this right now and should have an update on what exactly is needed (Satra's list sounds right). @cmtgreenwood has uploaded two R scripts that we should test in the VM as well

@djarecka
Copy link
Member

djarecka commented Jun 9, 2018

@jbpoline - i'm not really R-user, so I might be doing something wrong, but I tried to open the script multiTesting.Rmd in R studio and run, and it returns errors starting from there is no package called ‘knitr’ . I believe @mjtravers didn't have the list of required R packages.

@jbpoline
Copy link
Contributor

jbpoline commented Jun 9, 2018

@djarecka you are right : I think we need the R libraries
library(knitr)
library(rmarkdown)
library(mvtnorm)
library(ggplot2)

@cmtgreenwood do you confirm ? I suppose we can always extract the R code from the Rmd files which should run on the VM - but we do need mvtnorm and ggplot2, right ?
@mjtravers : would be hard to include these R libraries in the VM ?

@djarecka
Copy link
Member

@jbpoline - my understanding was that r-studio (which is installed) can open Rmd and can run the specific script cells. This is what I tried and that's how I got the package errors.

@jbpoline
Copy link
Contributor

hum - I am no R person but it looks like we need these R "libraries" (equivalent of python packages)

@djarecka
Copy link
Member

@jbpoline yes, i only tried to say that we don't need to "extract the R code from Rmd".

@mjtravers
Copy link
Contributor

@djarecka @jbpoline I am able to load those R packages onto the system and have them added to the VM build process. Kinda sure I am loading them right.

I have a build going right now that will include the above R packages in r-base... plus the section 3 stuff for Yarik.

The build will be done and posted for download later this evening.

@mjtravers
Copy link
Contributor

@satra .... plus I added the clean up for apt and conda referred to above. Will see if it reduces the VM size any with the in-progress build

@satra
Copy link
Contributor Author

satra commented Jun 10, 2018

@mjtravers - thank you - let's see what this does. it would be nice if packer did some kind of pre-post step assessment of size. may give us an indication of where things are piling up. my internal calculations indicate this VM should not exceed much more than 5-6G as an ova.

my wifi here is quite insufficient, so i'm trying to figure out how to run things remotely on our cluster.

@mjtravers
Copy link
Contributor

@mjtravers
Copy link
Contributor

@satra Success on compacting the OVA. Your calculations are correct, the size of the file is 5.25GB.

I have posted it to the training website. Note the new name (resulting from needing to clone the original OVA file as part of the compaction process):

https://training.repronim.org/reprotraining.ova

This file has the R libraries and section 2 python packages included.

This file does not have the section 3 edits posted this morning. I have pulled those changes and they'll go in the next build.

@djarecka
Copy link
Member

Thank @mjtravers ! This image is still not expected to have datalad inside, is that right?

@mjtravers
Copy link
Contributor

mjtravers commented Jun 10, 2018

@djarecka Datalad is installed in conda env section2

version 0.10.0-rc5

@djarecka
Copy link
Member

djarecka commented Jun 10, 2018

@jbpoline @cmtgreenwood
I opened again rstudio and tried to run scripts.
multiTesting.Rmd returns error since it has some path set to C:/CeliaFiles....
The type1 script didn't return any errors, but I did not even try to validate if the output/plots are good.

@kaczmarj
Copy link

@mjtravers @djarecka @satra - i released neurodocker version 0.4.0.

docker pull kaczmarj/neurodocker:0.4.0

@cmtgreenwood
Copy link
Collaborator

yes knitr and markdown packages are needed to assemble a nice report.
However the R script parts could be run without this so may be much easier if I rewrite the scripts to be plain text.

So I will fix the path information probably tomorrow, and upload another version.

I will create plain text versions (i.e. *.R files) at the same time that do not need knitr and markdown

@jbpoline
Copy link
Contributor

@cmtgreenwood Celia: I moved your scripts into section4/section41 and my notebook in section4/section42
I think Matt has included Rstudio and the libraries needed so - not sure we need to extract the R but I havent checked yet !

@mjtravers
Copy link
Contributor

mjtravers commented Jun 12, 2018

@jbpoline @cmtgreenwood
Yes, the following R packages were installed on the VM:

  • knitr
  • rmarkdown
  • mvtnorm
  • ggplot2

... and R-Studio

@jbpoline
Copy link
Contributor

@mjtravers Awesome - thanks !
@cmtgreenwood : let us know if that is fine - best would be to try on the VM - will try on my side

@cmtgreenwood
Copy link
Collaborator

managed to launch Rstudio successfully inside the updated VM. Looking for where the scripts are located in the VM. Will be back on this only tomorrow morning

@djarecka
Copy link
Member

djarecka commented Jun 12, 2018

@cmtgreenwood for the scripts you have to clone this github repository. They are not part of the VM (but during training it will be cloned during git part)

after you clone the repository, your scripts are in ohbm2018-training/section4/section41

@cmtgreenwood
Copy link
Collaborator

Good news and bad news:

  • successful clone onto my virtual box of ohbm-training
  • rstudio starts fine
  • sections of the code (run separately) inside the Rmd file run fine, with one small change needed to the code (the path to the dataset needs modification)

However, knitr fails. It requested updates of two packages (catools, bitops) and apparently installed them OK, but then I received this message:

Error in yaml::yaml.load(string, ...) :
Parser error: while parsing a block mapping at line 1, column 1 did not find expected key at line 1, column 45
Calls: ... parse_yaml_front_matter -> yaml_load_utf8 -> -> .Call
Execution halted

This is not a major problem, since knitr just compiles everything into one file. Even without this compilation, it is possible to see the results.

@jbpoline
Copy link
Contributor

Ok - so - room for improvement but in the worse case we should survive :) looks like a knitr "front" file is missing or something ... try re-installing knitr in the VM may be ?

@mjtravers
Copy link
Contributor

@cmtgreenwood If you send me the script that is having the issue, I can see if I can sort out a fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants