IRIDA Species Abundance Pipeline Plugin

This project contains a pipeline implemented as a plugin for the IRIDA bioinformatics analysis system. This can be used to estimate the relative abundance of sequence reads originating from different species in a sample.

Installation

Installing Galaxy Dependencies

In order to use this pipeline, you will also have to install the kraken2 and bracken Galaxy tools and their data managers within your Galaxy instance. These can be found at:

Name	Version	Owner	Metadata Revision	Galaxy Toolshed Link
fastp	`0.23.2+galaxy0`	`iuc`	10 (2022-02-03)	fastp-10:65b93b623c77
fastp_json_to_tabular	`0.1.0`	`public-health-bioinformatics`	0 (2022-03-10)	fastp_json_to_tabular-0:091a2fb2e7ad
kraken2	`2.1.1+galaxy1`	`iuc`	4 (2021-02-17)	kraken2-4:e674066930b2
bracken	`2.6.1+galaxy0`	`iuc`	4 (2021-06-07)	bracken-4:b08ac10aed96
adjust_bracken_for_unclassified_reads	`0.1.0`	`public-health-bioinformatics`	1 (2021-03-10)	adjust_bracken_for_unclassified_reads-1:3cde438eb222
data_manager_build_kraken2_database	`2.1.2+galaxy0`	`iuc`	6 (2022-06-24)	`data_manager_build_kraken2_database-6:9002633b4737`
data_manager_build_bracken_database	`2.5.1+galaxy1`	`iuc`	3 (2021-11-08)	`data_manager_build_bracken_database-3:3c7d2c84cb09`

Preparing Databases

This pipeline requires databases for kraken2 and bracken to be installed in Galaxy. The Galaxy admin can do this using the data_manager_build_kraken2_database and data_manager_build_bracken_database tools that are listed above.

In the Galaxy 'Admin' panel, select 'Local Data' from the left-side menu:

Preparing the Kraken2 Database

On the 'Local Data' page, select 'Kraken2 database builder' from the 'Installed Data Managers' list:

Choose the type of Kraken2 database to install. For most analyses, the 'Standard' database is recommended. For reproducibility and standardization, using a 'pre-built' database is recommended. Pre-built databases are downloaded from Ben Langmead's 'Index Zone'. To get the very latest sequences from RefSeq, a Standard database can be built locally. Note that building a standard kraken2 database is a computationally resource-intensive job. Consult the kraken2 docs for details.

If a pre-built database type is selected, choose the size of database to download. Larger databases contain more detailed information and are able to correctly assign reads to a greater variety of species. Note that the entire database will be loaded into system RAM during analysis. Ensure that your system can support the database before downloading.

If a pre-built database is selected, choose the build date for the database. The most recent build date is generally preferred.

Click the 'Execute' button to begin downloading (or building) the Kraken2 database. The download or build process may take significant time, depending on system resources. When complete, the Kraken2 job in the Galaxy History panel will turn green:

Preparing the Bracken Database

On the 'Local Data' page, select 'Bracken database builder' from the 'Installed Data Managers' list:

Each bracken database corresponds to a specific Kraken2 database. Select the Kraken2 database that was installed in the previous section.

If the Kraken2 database selected in the step above is a pre-built database, select 'Yes'. If it was locally-built, select 'No':

Each bracken database is configured for a specific read length. All pre-built Kraken2 databased from the Index Zone come bundled with a set of Bracken databases for a variety of read lengths. Select the read length that is appropriate for your dataset:

If necessary, additional bracken databases can be built based on the same kraken2 database, but with different read lengths. This may be necessary if some of your samples were sequenced with read length of 150, and others with read length of 250, for example.

Give your bracken database a name. This is a free-text field, and it will be presented to the IRIDA user when they are asked to select a bracken database to use for their analysis. Give the bracken database a name that clearly indicates which kraken2 database it corresponds to, and which read length it is configured for.

Click the 'Execute' button to begin building the bracken database. If a pre-built Kraken2 database was selected, this step should complete quickly. When complete, the Bracken Database Builder job in the Galaxy History panel will turn green:

Installing to IRIDA

Please download the provided irida-plugin-species-abundance-[version].jar from the releases page and copy to your /etc/irida/plugins directory. Now you may start IRIDA and you should see the pipeline appear in your list of pipelines.

Note: This plugin requires you to be running IRIDA version >= 21.01. Please see the IRIDA documentation for more details.

Usage

The plugin should now show up in the Analyses > Pipelines section of IRIDA.

Analysis Results

You should be able to run a pipeline with this plugin and get analysis results. The results include a kraken2 taxonomic classification report, and a bracken estimate of the relative abundance of reads from each species in your sample.

Metadata Table

And, you should be able to save and view these results in the IRIDA metadata table. The following fields are written to the IRIDA 'Line List':

Field Name	Description
`species-abundance/taxonomy_level`	The taxonomic level at which reads were aggregated ('S' for species)
`species-abundance/taxon_name`	The scientific name of the most abundant species in the sample
`species-abundance/taxonomy_id`	The NCBI taxonomy ID for the most abundant species in the sample
`species-abundance/proportion`	The proportion of reads in this sample assigned to the most abundant species
`species-abundance/taxon_name_2`	The scientific name of the second-most abundant species in the sample
`species-abundance/taxonomy_id_2`	The NCBI taxonomy ID for the second-most abundant species in the sample
`species-abundance/proportion_2`	The proportion of reads in this sample assigned to the second-most abundant species
`species-abundance/taxon_name_3`	The scientific name of the third-most abundant species in the sample
`species-abundance/taxonomy_id_3`	The NCBI taxonomy ID for the third-most abundant species in the sample
`species-abundance/proportion_3`	The proportion of reads in this sample assigned to the third-most abundant species
`species-abundance/taxon_name_4`	The scientific name of the fourth-most abundant species in the sample
`species-abundance/taxonomy_id_4`	The NCBI taxonomy ID for the fourth-most abundant species in the sample
`species-abundance/proportion_4`	The proportion of reads in this sample assigned to the fourth-most abundant species
`species-abundance/taxon_name_5`	The scientific name of the fifth-most abundant species in the sample
`species-abundance/taxonomy_id_5`	The NCBI taxonomy ID for the fifth-most abundant species in the sample
`species-abundance/proportion_5`	The proportion of reads in this sample assigned to the fifth-most abundant species
`species-abundance/proportion_unclassified`	The proportion of unclassified reads in the sample

Note that by default, these fields will not appear in sorted order in the line list. Refer to the IRIDA Documentation on metadata management to create a customized view of these fields.

Building

Building and packaging this code is accomplished using Apache Maven. However, you will first need to install IRIDA to your local Maven repository. The version of IRIDA you install will have to correspond to the version found in the irida.version.compiletime property in the pom.xml file of this project. Right now, this is IRIDA version 19.01.3.

Installing IRIDA to local Maven repository

To install IRIDA to your local Maven repository please do the following:

Clone the IRIDA project

git clone https://github.com/phac-nml/irida.git
cd irida

Checkout appropriate version of IRIDA

git checkout -b 21.01 21.01

Install IRIDA to local repository

mvn clean install -DskipTests

Building the plugin

Once you've installed IRIDA as a dependency, you can proceed to building this plugin. Please run the following commands:

cd irida-plugin-species-abundance

mvn clean package

Once complete, you should end up with a file target/irida-plugin-species-abundance-0.1.0.jar which can be installed as a plugin to IRIDA.

Dependencies

The following dependencies are required in order to make use of this plugin.

IRIDA >= 21.01
Java >= 1.8 and Maven (for building)

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
doc/images		doc/images
src		src
validation		validation
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRIDA Species Abundance Pipeline Plugin

Table of Contents

Installation

Installing Galaxy Dependencies

Preparing Databases

Preparing the Kraken2 Database

Preparing the Bracken Database

Installing to IRIDA

Usage

Analysis Results

Metadata Table

Building

Installing IRIDA to local Maven repository

Building the plugin

Dependencies

About

Releases 3

Packages

Contributors 5

Languages

License

Public-Health-Bioinformatics/irida-plugin-species-abundance

Folders and files

Latest commit

History

Repository files navigation

IRIDA Species Abundance Pipeline Plugin

Table of Contents

Installation

Installing Galaxy Dependencies

Preparing Databases

Preparing the Kraken2 Database

Preparing the Bracken Database

Installing to IRIDA

Usage

Analysis Results

Metadata Table

Building

Installing IRIDA to local Maven repository

Building the plugin

Dependencies

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 5

Languages

Packages