AlexanderLabWHOI · akrinos · Nov 17, 2020 · Nov 17, 2020
diff --git a/docs/source/about.rst b/docs/source/about.rst
@@ -9,7 +9,7 @@ A variety of curated protein databases are available to use with `EUKulele`, whi
 Functionality
 ====================================
 
-``EUKulele`` :cite:`eukulele` is an open-source ``Python``-based package designed to simplify the process of taxonomic identification of marine eukaryotes in meta-omic samples. User-provided metatranscriptomic or metagenomic samples are aligned against a database of the user's choosing, with an aligner of the user's choice (``BLAST`` :cite:`kent2002blat` or ``DIAMOND`` :cite:`buchfink2015fast`). The "blastx" utility is used by default if metatranscriptomic samples are only provided in nucleotide format, while the "blastp" utility is used for metagenomic samples and metatranscriptomic samples available as translated protein sequences. Optionally, the user may indicate a preference to translate nucleotide input sequences using the ``TransDecoder`` software :cite:`haastransdecoder`, with the output provided to "blastp". Any consistently-formatted database may be used, but three published microbial eukaryotic database options are provided by default: MMETSP :cite:`keeling2014marine;@caron2017probing`, PhyloDB :cite:`phylodb`, and EukProt :cite:`richter2020eukprot`. The package returns comma-separated files containing all of the contig matches from the metatranscriptome or metagenome, as well as the total number of transcripts that matched, at each taxonomic level, from supergroup to species. If a quantification tool has been used to estimate the number of counts associated with each transcript ID, counts may also be returned. Additionally, the software returns barplots displaying the relative composition of each sample at each taxonomic level, according to the number of transcripts or number of estimated counts if provided from ``Salmon`` (an external transcript quantification tool :cite:`patro2017salmon`).
+``EUKulele`` :cite:`eukulele` is an open-source ``Python``-based package designed to simplify the process of taxonomic identification of marine eukaryotes in meta-omic samples. User-provided metatranscriptomic or metagenomic samples are aligned against a database of the user's choosing, with an aligner of the user's choice (``BLAST`` :cite:`kent2002blat` or ``DIAMOND`` :cite:`buchfink2015fast`). The "blastx" utility is used by default if metatranscriptomic samples are only provided in nucleotide format, while the "blastp" utility is used for metagenomic samples and metatranscriptomic samples available as translated protein sequences. Optionally, the user may indicate a preference to translate nucleotide input sequences using the ``TransDecoder`` software :cite:`haastransdecoder`, with the output provided to "blastp". Any consistently-formatted database may be used, but three published microbial eukaryotic database options are provided by default: MMETSP :cite:`keeling2014marine,caron2017probing`, PhyloDB :cite:`phylodb`, and EukProt :cite:`richter2020eukprot`. The package returns comma-separated files containing all of the contig matches from the metatranscriptome or metagenome, as well as the total number of transcripts that matched, at each taxonomic level, from supergroup to species. If a quantification tool has been used to estimate the number of counts associated with each transcript ID, counts may also be returned. Additionally, the software returns barplots displaying the relative composition of each sample at each taxonomic level, according to the number of transcripts or number of estimated counts if provided from ``Salmon`` (an external transcript quantification tool :cite:`patro2017salmon`).
 
 ``EUKulele`` will assess the relative 'completeness' of a given taxonomic group by taking a user-inputted list of names at some taxonomic level to determine BUSCO completeness and redundancy :cite:`simao2015busco`. For example, if the user was interested whether there was a set of relatively complete contigs available for genus *Phaeocystis* within their metagenomic sample, they could pass *Phaeocystis*, along with its taxonomic level, "genus", to ``EUKulele``. By default, ``EUKulele`` will assess the BUSCO completeness of the most commonly encountered classifications at each taxonomic level. 
 
@@ -25,4 +25,4 @@ The alignment output is compared to an accompanying phylogenetic reference speci
 Subsequently, ``BUSCO`` :cite:`simao2015busco` is used to identify the core eukaryotic genes present in each sample. Using the list of genes identified as "core", a secondary taxonomic estimation step (and consensus assignment step, for MAGs) is performed to compare the taxonomic assignment predicted using all of the genes in comparison to the assignment made using only the genes that would be expected to be found in most reference transcriptomes. This approach is particularly useful for MAGs, and offers a method for avoiding conflicting or spurious matches made due to strain-level inconsistencies. For metatranscriptome samples, BUSCO completeness can be used to estimate the completeness of taxonomic groups to better inform their downstream interpretation. 
 
 .. bibliography:: refs.bib
-   :cited:
+   :cited:
diff --git a/docs/source/databaseandconfig.rst b/docs/source/databaseandconfig.rst
@@ -6,7 +6,7 @@ Installing Databases and Creating Configuration Files
 Default Databases
 -----------------
 
-Three databases can be downloaded and formatted automatically when invoking ``EUKulele``. Currently the supported databases are:
+Four databases can be downloaded and formatted automatically when invoking ``EUKulele``. Currently the supported databases are:
 
 - `PhyloDB <https://drive.google.com/drive/u/0/folders/0B-BsLZUMHrDQfldGeDRIUHNZMEREY0g3ekpEZFhrTDlQSjQtbm5heC1QX2V6TUxBeFlOejQ>`_
 - `EukProt <https://figshare.com/articles/EukProt_a_database_of_genome-scale_predicted_proteins_across_the_diversity_of_eukaryotic_life/12417881/2>`_
@@ -21,7 +21,7 @@ A database (for example ``phylodb``) can be setup prior to running by using::
 
     EUKulele setup --database phylodb
 
-If a database is not found automatically by ``EUKuele`` it will automatically download the database specified by the flag. If you downloaded a database previously you can specify the ``--reference_dir`` flag indicating the path to the previously downloaded database. If no reference database is specified with ```--reference_dir```, EUKulele will automatically download and use the MMETSP database. You can also (1) download the other databases and use the flag ```reference_dir``` to point EUKulele to the location of already downloaded databases or (2) use your own databases.
+If a database is not found automatically by ``EUKuele`` it will automatically download the database specified by the flag. If you downloaded a database previously you can specify the ``--reference_dir`` flag indicating the path to the previously downloaded database. If no reference database is specified with ``--reference_dir``, EUKulele will automatically download and use the MMETSP database. You can also (1) download the other databases and use the flag ``reference_dir`` to point EUKulele to the location of already downloaded databases or (2) use your own databases.
 
 Composition of Default Databases
 --------------------------------

diff --git a/docs/source/outputstructure.rst b/docs/source/outputstructure.rst
@@ -30,7 +30,7 @@ Below is what you should expect to see when you run ``EUKulele``. ``output-folde
 Taxonomy Estimation Folders
 ---------------------------
 
-Inside each of the taxonomy estimation folders (``core_taxonomy_estimation``, for exclusively transcripts annotated as core genes, and ``taxonomy_estimation``), there are files labeled *sample_name* ``-estimated-taxonomy.out``. Each of these files has the following columns:
+Inside each of the taxonomy estimation folders (``core_taxonomy_estimation``, for exclusively transcripts annotated as core genes, and ``taxonomy_estimation``), there are files labeled ``<sample_name>-estimated-taxonomy.out``. Each of these files has the following columns:
 
 - transcript_name
     - The name of the matched transcript/contig from this sample file
@@ -63,7 +63,7 @@ Inside each of the taxonomy counts folders (``core_taxonomy_counts`` and ``taxon
 - Sample
     - The original metagenomic/metatranscriptomic sample that this count is from (a separate row would be provided if the match is found in multiple samples)
 
-The taxonomic count files are named according to the convention *output-folder-name* ``_all_`` *taxonomic-level* ``_counts.csv``.
+The taxonomic count files are named according to the convention ``<output-folder-name>_all_<taxonomic-level>_counts.csv``.
 
 Taxonomy Visualization Folders
 ------------------------------
@@ -76,4 +76,4 @@ Inside each of the taxonomy visualization folders (``core_taxonomy_visualization
 - y-axis, right subplot (if using counts): relative number of counts
 - bars, right subplot (if using counts): each of the top represented taxonomic groups (must represent >= 5% of total counts)
 
-The right subplot is only generated if counts from a quantification tool (namely, ``Salmon``) are provided.
+The right subplot is only generated if counts from a quantification tool (namely, ``Salmon``) are provided.
diff --git a/docs/source/running-eukulele.rst b/docs/source/running-eukulele.rst
@@ -6,7 +6,7 @@ Using EUKulele
 
 Metatranscriptomes (METs)
 =========================
-In the first case, metatranscriptomes (shortened in ``EUKulele`` to ``mets``), are assumed to be contigs generated from shotgun-style sequencing and assembly of metatranscriptomic data (RNA) from a mixed community. These contigs can be provide to ``EUKulele`` as either nucleotide sequences (such as those output by `Trinity <https://github.com/trinityrnaseq/trinityrnaseq/wiki>`_) or predicted protein sequences from these contigs (such as those output by `Transdecoder <https://github.com/transdecoder>`_). 
+In the first case, metatranscriptomes (shortened in ``EUKulele`` to ``mets``), are assumed to be contigs generated from shotgun-style sequencing and assembly of metatranscriptomic data (RNA) from a mixed community. These contigs can be provided to ``EUKulele`` as either nucleotide sequences (such as those output by `Trinity <https://github.com/trinityrnaseq/trinityrnaseq/wiki>`_) or predicted protein sequences from these contigs (such as those output by `Transdecoder <https://github.com/transdecoder>`_). 
 
 The most basic running of ``EUKulele`` on metatranscriptome samples  would be::