This project contains a pipeline implemented as a plugin for the IRIDA bioinformatics analysis system. This can be used to estimate the relative abundance of sequence reads originating from different species in a sample.
In order to use this pipeline, you will also have to install the kraken2 and bracken Galaxy tools and their data managers within your Galaxy instance. These can be found at:
Name | Version | Owner | Metadata Revision | Galaxy Toolshed Link |
---|---|---|---|---|
fastp | 0.23.2+galaxy0 |
iuc |
10 (2022-02-03) | fastp-10:65b93b623c77 |
fastp_json_to_tabular | 0.1.0 |
public-health-bioinformatics |
0 (2022-03-10) | fastp_json_to_tabular-0:091a2fb2e7ad |
kraken2 | 2.1.1+galaxy1 |
iuc |
4 (2021-02-17) | kraken2-4:e674066930b2 |
bracken | 2.6.1+galaxy0 |
iuc |
4 (2021-06-07) | bracken-4:b08ac10aed96 |
adjust_bracken_for_unclassified_reads | 0.1.0 |
public-health-bioinformatics |
1 (2021-03-10) | adjust_bracken_for_unclassified_reads-1:3cde438eb222 |
data_manager_build_kraken2_database | 2.1.2+galaxy0 |
iuc |
6 (2022-06-24) | data_manager_build_kraken2_database-6:9002633b4737 |
data_manager_build_bracken_database | 2.5.1+galaxy1 |
iuc |
3 (2021-11-08) | data_manager_build_bracken_database-3:3c7d2c84cb09 |
This pipeline requires databases for kraken2 and bracken to be installed in Galaxy. The Galaxy admin can do this using the data_manager_build_kraken2_database
and
data_manager_build_bracken_database
tools that are listed above.
In the Galaxy 'Admin' panel, select 'Local Data' from the left-side menu:
On the 'Local Data' page, select 'Kraken2 database builder' from the 'Installed Data Managers' list:
Choose the type of Kraken2 database to install. For most analyses, the 'Standard' database is recommended. For reproducibility and standardization, using a 'pre-built' database is recommended. Pre-built databases are downloaded from Ben Langmead's 'Index Zone'. To get the very latest sequences from RefSeq, a Standard database can be built locally. Note that building a standard kraken2 database is a computationally resource-intensive job. Consult the kraken2 docs for details.
If a pre-built database type is selected, choose the size of database to download. Larger databases contain more detailed information and are able to correctly assign reads to a greater variety of species. Note that the entire database will be loaded into system RAM during analysis. Ensure that your system can support the database before downloading.
If a pre-built database is selected, choose the build date for the database. The most recent build date is generally preferred.
Click the 'Execute' button to begin downloading (or building) the Kraken2 database. The download or build process may take significant time, depending on system resources. When complete, the Kraken2 job in the Galaxy History panel will turn green:
On the 'Local Data' page, select 'Bracken database builder' from the 'Installed Data Managers' list:
Each bracken database corresponds to a specific Kraken2 database. Select the Kraken2 database that was installed in the previous section.
If the Kraken2 database selected in the step above is a pre-built database, select 'Yes'. If it was locally-built, select 'No':
Each bracken database is configured for a specific read length. All pre-built Kraken2 databased from the Index Zone come bundled with a set of Bracken databases for a variety of read lengths. Select the read length that is appropriate for your dataset:
If necessary, additional bracken databases can be built based on the same kraken2 database, but with different read lengths. This may be necessary if some of your samples were sequenced with read length of 150, and others with read length of 250, for example.
Give your bracken database a name. This is a free-text field, and it will be presented to the IRIDA user when they are asked to select a bracken database to use for their analysis. Give the bracken database a name that clearly indicates which kraken2 database it corresponds to, and which read length it is configured for.
Click the 'Execute' button to begin building the bracken database. If a pre-built Kraken2 database was selected, this step should complete quickly. When complete, the Bracken Database Builder job in the Galaxy History panel will turn green:
Please download the provided irida-plugin-species-abundance-[version].jar
from the releases
page and copy to your /etc/irida/plugins
directory. Now you may start IRIDA and you should see the pipeline appear in your list of pipelines.
Note: This plugin requires you to be running IRIDA version >= 21.01
. Please see the IRIDA documentation for more details.
The plugin should now show up in the Analyses > Pipelines section of IRIDA.
You should be able to run a pipeline with this plugin and get analysis results. The results include a kraken2
taxonomic
classification report, and a bracken
estimate of the relative abundance of reads from each species in your sample.
And, you should be able to save and view these results in the IRIDA metadata table. The following fields are written to the IRIDA 'Line List':
Field Name | Description |
---|---|
species-abundance/taxonomy_level |
The taxonomic level at which reads were aggregated ('S' for species) |
species-abundance/taxon_name |
The scientific name of the most abundant species in the sample |
species-abundance/taxonomy_id |
The NCBI taxonomy ID for the most abundant species in the sample |
species-abundance/proportion |
The proportion of reads in this sample assigned to the most abundant species |
species-abundance/taxon_name_2 |
The scientific name of the second-most abundant species in the sample |
species-abundance/taxonomy_id_2 |
The NCBI taxonomy ID for the second-most abundant species in the sample |
species-abundance/proportion_2 |
The proportion of reads in this sample assigned to the second-most abundant species |
species-abundance/taxon_name_3 |
The scientific name of the third-most abundant species in the sample |
species-abundance/taxonomy_id_3 |
The NCBI taxonomy ID for the third-most abundant species in the sample |
species-abundance/proportion_3 |
The proportion of reads in this sample assigned to the third-most abundant species |
species-abundance/taxon_name_4 |
The scientific name of the fourth-most abundant species in the sample |
species-abundance/taxonomy_id_4 |
The NCBI taxonomy ID for the fourth-most abundant species in the sample |
species-abundance/proportion_4 |
The proportion of reads in this sample assigned to the fourth-most abundant species |
species-abundance/taxon_name_5 |
The scientific name of the fifth-most abundant species in the sample |
species-abundance/taxonomy_id_5 |
The NCBI taxonomy ID for the fifth-most abundant species in the sample |
species-abundance/proportion_5 |
The proportion of reads in this sample assigned to the fifth-most abundant species |
species-abundance/proportion_unclassified |
The proportion of unclassified reads in the sample |
Note that by default, these fields will not appear in sorted order in the line list. Refer to the IRIDA Documentation on metadata management to create a customized view of these fields.
Building and packaging this code is accomplished using Apache Maven. However, you will first need to install IRIDA to your local Maven repository. The version of IRIDA you install will have to correspond to the version found in the irida.version.compiletime
property in the pom.xml file of this project. Right now, this is IRIDA version 19.01.3
.
To install IRIDA to your local Maven repository please do the following:
- Clone the IRIDA project
git clone https://github.com/phac-nml/irida.git
cd irida
- Checkout appropriate version of IRIDA
git checkout -b 21.01 21.01
- Install IRIDA to local repository
mvn clean install -DskipTests
Once you've installed IRIDA as a dependency, you can proceed to building this plugin. Please run the following commands:
cd irida-plugin-species-abundance
mvn clean package
Once complete, you should end up with a file target/irida-plugin-species-abundance-0.1.0.jar
which can be installed as a plugin to IRIDA.
The following dependencies are required in order to make use of this plugin.