Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NGSPlotDB-hg19-data recipe #9157

Merged
merged 31 commits into from
Jun 11, 2018
Merged

Add NGSPlotDB-hg19-data recipe #9157

merged 31 commits into from
Jun 11, 2018

Conversation

apeltzer
Copy link
Member

@apeltzer apeltzer commented Jun 5, 2018

  • I have read the guidelines for bioconda recipes.
  • This PR adds a new recipe.
  • AFAIK, this recipe is directly relevant to the biological sciences (otherwise, please submit to the more general purpose conda-forge channel).
  • This PR updates an existing recipe.
  • This PR does something else (explain below).

The GDrive URL is somewhat a bit crazy - submitted a PR to get this to Galaxy Cargo-Port too. Maybe that eases things...

Referencing nf-core/chipseq#2
and nf-core/chipseq#15 to keep things related.

If this one is fine, I'm trying to do the same for hg38 and mm10 to cover the most used DBs for ngsplot.

@apeltzer
Copy link
Member Author

apeltzer commented Jun 6, 2018

@bioconda/core The r-ngsplot package seems to be missing for this? I submitted that yesterday, @bgruening merged it in - so it should be there right?

@mbargull
Copy link
Member

mbargull commented Jun 6, 2018

You can remove the skip: True line from the r-ngsplot recipe (see https://github.com/bioconda/bioconda-recipes/pull/9149/files#r193333485) and add that change to this PR.

@apeltzer
Copy link
Member Author

apeltzer commented Jun 6, 2018

Whoopsies!

@apeltzer
Copy link
Member Author

apeltzer commented Jun 6, 2018

Hmm, doesn't build that way. I will probably resubmit in a different PR then?! @mbargull

sha256: a3ad6daceec383f88faf3d3ee899f2bef37b3be2658ee9afa01e86404c0c92bd
build:
number: 0
rpaths:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove rpaths for a data package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

- lib/R/lib/
- lib/
requirements:
build:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No build requirement needed for a data package.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, removed that


# Platform-specific md5sum checks.
if [[ $(uname -s) == "Linux" ]]; then
if md5sum -c <<<"$MD5 $TARBALL"; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a sha256 hash, you'll have to use sha256sum here.
But note that on macOS you'll likely have to use another command for this. Can't remember what, though.
The safe route might be to add a dependency on openssl and use openssl sha256? (E.g.,

openssl sha256 libStatGen-v$VERSION.tar.gz | grep 70a504c5cc4838c6ac96cdd010644454615cc907df4e3794c999baf958fa734b
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed back to md5

break
fi
else if [[ $(uname -s) == "Darwin" ]]; then
if [[ $(md5 $TARBALL | cut -f4 -d " ") == "$MD5" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sha256, see above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to md5

package:
name: 'ngsplotdb-{{ name|lower }}'
version: '{{ version }}'
source:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No source needed since post-link.sh does the job.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, removed that - but the md5 is still required / good practice right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, the md5 in meta.yaml is only tied to the one in source, hence can be removed, too.

fi

# Install and clean up
ngsplotdb.py install ngsplotdb_hg19_75_3.00.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to use "${TARBALL}" here.
Also, if this outputs anything (progress status and such), you'll want to write that to .messages.txt, see https://conda.io/docs/user-guide/tasks/build-packages/link-scripts.html, i.e.,

ngsplotdb.py install "${TARBALL}" > "${PREFIX}/.messages.txt"

maybe even

ngsplotdb.py install "${TARBALL}" > "${PREFIX}/.messages.txt" 2>&1

to also capture stderr.

@@ -0,0 +1 @@
ngsplotdb.py remove hg19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirect output to .messages.txt, see above.


if [[ $SUCCESS != 1 ]]; then
echo "ERROR: post-link.sh was unable to download any of the following URLs with the md5sum $MD5:"
printf '%s\n' "${URLS[@]}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even this should write to .messages.txt (see below), i.e.,

  echo "ERROR: post-link.sh was unable to download any of the following URLs with the md5sum $MD5:" > "${PREFIX}/.messages.txt" 2>&1
  printf '%s\n' "${URLS[@]}" > "${PREFIX}/.messages.txt" 2>&1

@mbargull
Copy link
Member

mbargull commented Jun 6, 2018

Hmm, doesn't build that way. I will probably resubmit in a different PR then?! @mbargull

You can use this one, no problem.

+#!/bin/bash
+outdir=$PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM
+mkdir -p $outdir
+mkdir -p $PREFIX/bin
+cp -R * $outdir 

^-- please don't copy everything unconditionally but rather selectively the files you want.

+#Set up links for 
+for f in $outdir/*; do
+    ln -s $outdir/* $PREFIX/bin

^-- you'll want to use ln -s "${f}" "${PREFIX}/bin" here. But also, you should not loop over $outdir/* but just the files in $outdir that are the executables you want to link to.

+    fbname=$(basename "$f")
+    chmod 0755 ${PREFIX}/bin/$fbname
+done 

@mbargull
Copy link
Member

mbargull commented Jun 6, 2018

I took a quick peek at the file list in https://github.com/shenlab-sinai/ngsplot/archive/2.63.tar.gz. There is a bin folder in there => I assume you want to only link the files (or possibly only a subset of them?) in "${outdir}"/bin/* then, right?

@mbargull
Copy link
Member

mbargull commented Jun 6, 2018

And the files in bin seem to have the executable mode set already, to the chmod is likely not needed.

@apeltzer
Copy link
Member Author

apeltzer commented Jun 6, 2018

Hi @mbargull !
thanks for all the tips / hints. The problem is, that the respective tools I need in ngsplot.py use system calls to other tools in the bin folder and I don't want to try & error each of them until the recipe works. Honestly, I also feel like this package is rather not too well prepared in terms of keeping things easy to install for end users, which is also the main reason for packaging this nicely (or as nice as it gets at least).

Therefore, I'd rather say I restrict to simply add all tools in bin to the path, similar to what the authors suggest in their readme . What do you think?

Copy link
Member

@mbargull mbargull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes needed, but it looks nearly done 🎉.

r-ngsplot built fine, but I'm not sure whether the tests actually succeeded. By piping to true, you're masking the failures, I think (that is, I don't believe conda-build runs the tests with -o pipefail). Please remove all | true.
If the programs return non-zero exit codes on success, a workaround is to pipe to grep -q 'some expression you know the program outputs.

Therefore, I'd rather say I restrict to simply add all tools in bin to the path, similar to what the authors suggest in their readme . What do you think?

Seems fine to me.

- wget
test:
commands:
- ngsplotdb.py install "${TARBALL}" > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is run after the package is installed, i.e., after the post-link.sh is run.
So, that ngsplotdb.py install is already done at that point. Rather than executing it again, you should run a test that checks if the database has been successfully installed (either with some (short-running!) command from r-gnsplot or if that's not available, at least something like test -f "${PREFIX}/some-file-path".)

package:
name: 'ngsplotdb-{{ name|lower }}'
version: '{{ version }}'
md5: 172a6a93de3d1ae8a3ad6d6d79e911f7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove that md5 here.

URLS=(
"https://drive.google.com/uc?export=download&id=0B5hDZ2BucCI6SURYWW5XdUxnbW8"
)
sha256="a3ad6daceec383f88faf3d3ee899f2bef37b3be2658ee9afa01e86404c0c92bd"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD5='172a6a93de3d1ae8a3ad6d6d79e911f7'

instead of sha256


SUCCESS=0
for URL in ${URLS[@]}; do
wget -O- -q $URL > $TARBALL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI choked on the unquoted $URL, I think.

wget -qO "${TARBALL}" "${URL}"

should do the trick.

done

if [[ $SUCCESS != 1 ]]; then
echo "ERROR: post-link.sh was unable to download any of the following URLs with the md5sum $MD5:" > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, should be (appending to .messages.txt):

  echo "ERROR: post-link.sh was unable to download any of the following URLs with the md5sum $MD5:" >> "${PREFIX}/.messages.txt" 2>&1

@@ -0,0 +1 @@
ngsplotdb.py remove hg19 > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, missing new line on EOF 😉

# Install and clean up
ngsplotdb.py install "${TARBALL}" > "${PREFIX}/.messages.txt" 2>&1
rm $TARBALL
rmdir $STAGING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing new line on EOF 😉

for f in $outdir/*; do
ln -s $outdir/* $PREFIX/bin
for f in ${outdir}/bin/*; do
ln -s ${f} ${PREFIX}/bin
fbname=$(basename "$f")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it right now - why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been imprecise. Just meant the fbname=$(basename "$f") part 😉

fbname=$(basename "$f")
chmod 0755 ${PREFIX}/bin/$fbname
done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, missing new line on EOF 😉

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4,8 +4,7 @@ mkdir -p $outdir
mkdir -p $PREFIX/bin
cp -R * $outdir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's still the case, but I know conda-build sometimes wrote files to the cwd during building (I think conda-build.sh or run-test.sh or something). Those files will/might get copied too if you cp -R * here.

@@ -1,7 +1,7 @@
#!/bin/bash
FN="ngsplotdb_hg19_75_3.00.tar.gz"
URLS=(
"https://doc-0g-a0-docs.googleusercontent.com/docs/securesc/ha0ro937gcuc7l7deffksulhg5h7mbp1/jiksn2m41cm3dbspq0nqt102h6m6eo1o/1528221600000/01382619737792242945/*/0B5hDZ2BucCI6SURYWW5XdUxnbW8?e=download"
"https://drive.google.com/uc?export=download&id=0B5hDZ2BucCI6SURYWW5XdUxnbW8"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks much saner 😄. galaxyproject/cargo-port#148 is probably still preferred, I think, but if that isn't merged in a timely manner, we can always package a new build for r-ngsplot-hg19 updating that URL later.

@apeltzer
Copy link
Member Author

apeltzer commented Jun 8, 2018

So setting this environment variable as suggested in the README won't work. Therefore, the second test fails and probably also the recipe can't be called properly.

Using something like this:

#Setting environment variable NGSPLOT as requested by authors readme
ln -s ${outdir} ${PREFIX}/bin/NGSPLOT

Doesn't work unfortunately... Setting something via export NGSPLOT=/path/to... doesn't work either.

@mbargull any idea? This is the last thing required for the recipe...

@apeltzer
Copy link
Member Author

apeltzer commented Jun 8, 2018

Thanks @ewels for your hint with https://github.com/bioconda/bioconda-recipes/blob/master/recipes/bcftools/1.2/build.sh, this seems to work locally now at least!

@apeltzer
Copy link
Member Author

apeltzer commented Jun 8, 2018

Ok, I don't have a clue what causes this test fail...

@bioconda/core anybody an idea?

for f in $outdir/*; do
ln -s $outdir/* $PREFIX/bin
for f in ${outdir}/bin/*; do
ln -s ${f} ${PREFIX}/bin
fbname=$(basename "$f")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've been imprecise. Just meant the fbname=$(basename "$f") part 😉

mkdir -p $DEACTIVATE_DIR
cp $RECIPE_DIR/scripts/activate.sh $ACTIVATE_DIR/ngsplot-activate.sh
cp $RECIPE_DIR/scripts/deactivate.sh $DEACTIVATE_DIR/ngsplot-deactivate.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid conflicts, you should rename those to the exact name of the package, i.e., r-ngsplot-.... You can use the ${PKG_NAME} variable for this, see, e.g.: https://github.com/conda-forge/toolchain-feedstock/blob/bf0bfa1f06f6592eb3e6ef8989339a005cfd93ef/recipe/install-toolchain.sh#L8

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :-)


if [[ $SUCCESS != 1 ]]; then
echo "ERROR: post-link.sh was unable to download any of the following URLs with the md5sum $MD5:" >> "${PREFIX}/.messages.txt" 2>&1
printf '%s\n' "${URLS[@]}" > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix my > mistake 😉

printf '%s\n' "${URLS[@]}" >> "${PREFIX}/.messages.txt" 2>&1

fi

# Install and clean up
ngsplotdb.py install "${TARBALL}" > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix my > mistake 😉

ngsplotdb.py install "${TARBALL}" >> "${PREFIX}/.messages.txt" 2>&1

@@ -0,0 +1 @@
ngsplotdb.py remove hg19 > "${PREFIX}/.messages.txt" 2>&1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix my > mistake 😉

ngsplotdb.py remove hg19 >> "${PREFIX}/.messages.txt" 2>&1

@mbargull
Copy link
Member

mbargull commented Jun 8, 2018

Ok, I don't have a clue what causes this test fail...

That's because of a shortcoming on how the containers are built/run, see galaxyproject/galaxy-lib#81.
Essentially, the activate scripts are not run for the containers.

Choices you have for this:

  • "don't care" approach: Selectively disable the failing tests for the container by extracting them into run_tests.sh
  • "hero" approach: Fix activate.sh scripts should be converted into .bashrc during container build galaxyproject/galaxy-lib#81 😉
  • "patch" approach: If NGS_PLOT always points to that path, patch the scripts to use that path instead (probably not by using source/patches in meta.yaml but patching them in build.sh to be able to use the build-time environment variables the path consists of)
  • "defensive patch" approach: Just like "patch approach" but not replace NGS_PLOT with the path directly but replace it by ${NGS_PLOT:-SOME_PATH} (for shell scripts -- no idea what the R equivalent is).

@@ -0,0 +1,3 @@
#!/bin/bash
export NGSPLOT=$PREFIX/share/$PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and using $PKG_NAME-$PKG_VERSION-$PKG_BUILDNUM should not work here, I believe.
Those variable are only defined during the build/test phase or during post-link (and others), but not when activating an environment. If the tests passed for this, that's only because they were set in that case.
You'll want to create/modify the activation script in build.sh.

mkdir -p $ACTIVATE_DIR
mkdir -p $DEACTIVATE_DIR
cp $RECIPE_DIR/scripts/activate.sh $ACTIVATE_DIR/ngsplot-activate.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In regards to the comment below for scripts/activate.sh:
Replace

cp $RECIPE_DIR/scripts/activate.sh $ACTIVATE_DIR/ngsplot-activate.sh

by something like

echo > "${ACTIVATE_DIR}/${PKG_NAME}-activate.sh" <<EOF
#!/bin/bash
export NGSPLOT="${outdir}"
export _CONDA_SET_NGSPLOT_ENV=1
EOF

. ${outdir} will then have been evaluated at build time (becaus of using <<EOF with EOF being unquoted) and replaced by the correct output path (which conda will in turn adjust to the environment's path at install time).

@mbargull
Copy link
Member

mbargull commented Jun 8, 2018

Ah, I feel the urge to clarify the

Choices you have for this

The different approaches are of course not necessarily exclusive -- "hero" is always something you can try to achieve 😄.

The "don't care" approach is not really preferable since it leaves the container in an incomplete/not directly usable state: The user would have to "guess" the path and set the variable themselves to make use of the container.

The "defensive patch" approach might be overly wary, but it retain's the software's functionality to let the user choose/set an alternative path. Probably that variable is just used as a workaround to achieve relocatability so that functionality isn't needed if you patch and let conda do this job -- BUT I don't know the program/scripts/uses at all and might be wrong about this. (As a general note, patching should always be done carefully, especially if it's changing the sofware's behavior. But it could be the best solution in this case.)

@apeltzer
Copy link
Member Author

apeltzer commented Jun 8, 2018

Thanks for all these hints - never had that much trouble with a single recipe than this one (which is tbh in a bad shape for packaging...).

I'll patch the call in the script I'm calling here in build.sh and the tool should then be able to function.

progpath <- Sys.getenv('NGSPLOT')
if(progpath == "") {

As far as I see it, this is the best way other than the hero approach, which I don't feel qualified to do.

@apeltzer
Copy link
Member Author

apeltzer commented Jun 8, 2018

So replaced the NGSPLOT ENV call in the R script. Lets hope that works fine then :-)

@mbargull
Copy link
Member

mbargull commented Jun 9, 2018

Did a quick grep for NGSPLOT and it seems the code pattern for that is quite uniform. If the commit I added work, we can leave out the activation script entirely. Let's see how it goes

@mbargull
Copy link
Member

mbargull commented Jun 9, 2018

Let's see how it goes

Didn't quite work yet... debugging...

Thanks for all these hints - never had that much trouble with a single recipe than this one (which is tbh in a bad shape for packaging...).

(I must've missed that comment before.) You're welcome, happy to help. Yeah it's a bit troublesome that it's not packaged as usual, but hey, you got to learn a few new tricks which is never a bad things 😉 .

@mbargull
Copy link
Member

mbargull commented Jun 9, 2018

I think the last changes might make this pass. If it indeed does and you still think the patching is fine, feel free to merge!

- (ngsplotdb.py list | grep -qF "hg19" || ngsplotdb.py list)
about:
home: 'https://github.com/shenlab-sinai/ngsplot'
license: GPL-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: does the database have another home URL? Is the database itself also licensed under the GPL-2.0?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they are on GDrive and linked from the same Github repository - so I think its fine to use the same licence (as they didn't state explicitly another one).

Thanks for all the help - learned a lot here and will add two more databases today to finalize things here :-)

@apeltzer apeltzer merged commit f36bfb7 into master Jun 11, 2018
@mbargull mbargull deleted the r-ngsplot-hg19 branch June 11, 2018 07:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants