diff --git a/README.md b/README.md index e1c9fd9..da6a31d 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,35 @@ A database of completed assemblies for metagenomics-related tasks Kalamari is a database of completed and public assemblies, backed by trusted institutions. These assemblies can be further used in formatted databases such as Kraken or Blast. +### Prerequisites & Recommendations + +Requirements: +- clone this repo locally `git clone https://github.com/lskatz/Kalamari.git` +- NCBI entrez-utilities set of tools `edirect`, `esearch`, etc. + - install via your package manager + - debian/ubuntu: `apt install ncbi-entrez-direct` + +Optional, but recommended: +- `NCBI_API_KEY` environmental variable +- `EMAIL` environmental variable + +Ensure that you have the [NCBI API key](https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities). +This key associates your edirect requests with your username. +Without it, edirect requests might be buggy. +After obtaining an NCBI API key, add it to your environment with + + export NCBI_API_KEY=unique_api_key_goes_here + +where `unique_api_key_goes_here` is a unique hexadecimal number with characters from 0-9 and a-f. + +You should also set your email address in the +`EMAIL` environment variable as edirect tries to guess it, which is an error prone process. +Add this variable to your environment with + + export EMAIL=my@email.address + +using your own email address instead of `my@email.address`. + ## Download instructions For usage, run `perl bin/downloadKalamari.pl --help`