-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathaxiome.1.in
287 lines (266 loc) · 21 KB
/
axiome.1.in
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
.\" Authors: Andre Masella, Michael Hall
.TH axiome 1 "May 2013" "1.7" "USER COMMANDS"
.SH NAME
axiome \- Automation toolkit for QIIME
.SH SYNOPSIS
.B axiome
.I input.ax
.SH DESCRIPTION
AXIOME is a way of automating tedious and error-prone analysis with the QIIME toolkit. Given a file of input data and analyses, it generates a \fBMakefile\fR to perform the analyses. Some analyses make use of
.BR R (1)
and additional R packages may need to be installed from CRAN. This can be done using
.BR aq-inst-cran (1).
Formerly known as AutoQIIME.
.SH OPTIONS
.TP
input.ax
The input XML configuration file.
.SH WORKFLOW
Normally, one creates a input file with a \fB.ax\fR extension. After running AXIOME, a directory with a \fB.qiime\fR extension will be created with \fBMakefile\fR, \fBmapping.txt\fR, \fBmapping.extra\fR and possibly others depending on the analyses selected. In this directory, invoking \fBmake\fR will start the analyses. Normally, each analysis is done in the order specified in the original file, but this can be overriden by invoking \fBmake \fItarget\fR where \fItarget\fR is the output file desired (e.g., \fBotu_table.txt\fR).
If \fBmake\fR is interrupted, invoking it again will continue, repeating only the last command. Furthermore, if input files are changed, \fBmake\fR will attempt to return the minimum number of commands needed to update the analysis. Sometimes, this does not work as expected and deleting \fBseq.fasta\fR will guarantee all files will be rebuilt.
.SH CONFIGURATION FILE
Included is a sample configuration file with all possible analyses enabled. The configuration file has four components: pipeline, definitions, assemblies, and analyses.
.B <?xml version="1.0"?>
.br
.br
.I Pipeline
.br
.I Definitions
.br
.I Assemblies
.br
.I Analyses
.br
.B </axiome>
The pipeline definitions differ between QIIME and mothur. If the pipeline parameter shown below is omitted, QIIME will be used.
QIIME pipeline definition:
.br
.B <axiome version="@VERSION@"
.br
\fBpipeline="\fIqiime\fB"\fR
.br
\fR[\fBalign-method="\fImethod\fB"\fR]
.br
\fR[\fBclassification-method="\fImethod\fB"\fR]
.br
\fR[\fBcluster-identity="\fIvalue\fB"\fR]
.br
\fR[\fBotu-blastdb="\fI/path/to/blastdb\fB"\fR]
.br
\fR[\fBotu-flags="\fIflags\fB"\fR]
.br
\fR[\fBotu-method="\fImethod\fB"\fR]
.br
\fR[\fBotu-refseqs="\fI/path/to/refseqs\fB"\fR]
.br
\fR[\fBphylogeny-method="\fImethod\fB"\fR]
.br
\fR[\fBverbose="\fIvalue\fB"\fR]
.br
\fB>\fR
mothur pipeline definition:
.B <axiome version="@VERSION@"
.br
\fBpipeline="\fImothur\fB"\fR
.br
\fBalignment-template="\fI/path/to/alignment-model\fB"\fR
.br
\fBclassification-taxa="\fI/path/to/classifier-taxonomy\fB"\fR
.br
\fBclassification-seqs="\fI/path/to/classifier-sequences\fB"\fR
.br
\fR[\fBcluster-identity="\fIvalue\fB"\fR]
.br
\fR[\fBotu-method="\fImethod\fB"\fR]
.br
\fR[\fBverbose="\fIvalue\fB"\fR]
.br
\fB>\fR
The version is used to check that a file is not newer than the version of AXIOME being used.
By default, AXIOME will try to be quiet and hide the (ugly) commands used to do the analysis. If you wish to see them, set \fBverbose="true"\fR or run \fBmake V=\fR.
A mothur analysis must have an alignment model specified with \fBalignment-model\fR, a classification taxonomy file specified with \fBclassification-taxa\fR, and a classification sequences file specified with \fBclassification-seqs\fR.
For QIIME, the taxonomic classification method can be selected with the \fBclassification-method\fR parameter. There is only one option for mothur (its built-in classifier), configured with the options mentioned in the previous paragraph. Options for QIIME are: \fBrdp\fR (default), \fBblast\fR, \fBrtax\fR. For each of these, there is a plugin to configure the classifier. These are called rdp, blast-classifier, and rtax, respectively (see below for details). For RDP and BLAST, if the configuration plugins are not used, it will default to using the information in the qiime_config file. If you are using Rtax, the rtax plugin must be used.
The method used to pick OTUs can be defined by the otu-method parameter. For QIIME, options are: \fBusearch\fR, \fBusearch_ref\fR, \fBprefix_suffix\fR, \fBmothur\fR, \fBtrie\fR, \fBblast\fR, \fBuclust_ref\fR, \fBcdhit\fR, \fBraw-cdhit\fR, \fBuclust\fR and \fBraw-uclust\fR. The \fBraw\fR options make direct calls to the tools rather than using QIIME's \fBpick_otus.py\fR script. Multicore support is used for \fBraw-cdhit\fR, \fBuclust_ref\fR and \fBblast\fR methods when the multicore plugin is used. \fBusearch_ref\fR and \fBuclust_ref\fR require the \fBotu-refseqs\fR parameter be set to the filepath of reference sequences. \fBblast\fR requires that either the \fBotu-refseqs\fR parameter or the \fBotu-blastdb\fR parameter is set to a valid reference file (but not both). The \fBotu-flags\fR option allows passing the QIIME scripts any arbitary flags that the \fBpick_otus.py\fR script accepts (but note that for each method, some flags may already be set).
For mothur, otu-method options are: \fBaverage\fR (default), \fBnearest\fR, \fBfurthest\fR.
The OTU clustering identity (similarity) can be set with \fBcluster-identity\fR with a value between 0 and 1; default is 0.97 if left blank.
The alignment method used by QIIME can be specified with \fBalign-method\fR as one of \fBinfernal\fR, \fBmuscle\fR, and \fBpynast\fR; the default is \fBpynast\fR. The phylogeny method can be set with \fBphylogeny-method\fR, with valid options: \fBfasttree\fR, \fBfasttree_v1\fR, \fBraw-fasttree\fR, \fBraw-fasttreemp\fR (uses multicore FastTreeMP, which must be installed manually), \fBclustalw\fR, \fBclearcut\fR, \fBraxm\fRl, \fBraxml_v730\fR, or \fBmuscle\fR. If not included, AXIOME defaults to \fBfasttree\fR. The \fBraw-fasttree\fR option calls FastTree without using QIIME's \fBmake_phylogeny.py\fR script.
Definitions are a list of properties associated with the samples. These are usual physical or chemical information such as sampling time, pH, temperature, humidity, and location. As many properties as needed can be specified and each property can have a type so that incorrect information cannot be entered (e.g., enforcing that "year" be a whole number).
Assemblies are descriptions of the samples. AXIOME can automatically assemble pair-end Illumina FASTQ data using
.BR pandaseq (1)
or read data from FASTA files. In either case, a single file might contain multiple samples multiplexed. The format of this section allows all the input files to be specified and provide information about how to separate samples and the properties, defined above, to be associated with each sample.
Finally, the file lists the analyses to be performed. AXIOME will do the analyses in the order specified. Some analyses can be repeated with different parameters and some can be specified only once.
.TP
\fB<!-- \fIcomment\fB -->\fR
This is a comment and will be ignored.
.TP
\fB<include>\fIfilename\fB</include>\fR
Includes the contents of another file as if they were typed in the location where this command is. This may be used anywhere in the configuration file and may be used multiple times. A file must not include itself, directly or indirectly.
.TP
\fB<def name="\fIname\fB" \fR[\fBtype="\fItype\fB"\fR]\fB/>\fR
The configration file must specify all the types of data we intend to associate with the samples. For instance, if we collected the collected the samples from different times, or different environments, or different substrates, then we define those here. Each definition can have a type. The type \fBd\fR is a decimal number, the type \fBi\fR is an integer, the type \fBs\fR is for text. If absent, the type is assumed to be \fBs\fR. The names \fBColour\fR and \fBDescription\fR are used by certain analyses. For each definition supplied, each sample must have a part \fIname\fB="\fIvalue\fB"\fR where \fIvalue\fR is specific to each sample.
.TP
\fB<fasta file="\fIfile.fasta\fB">\fR...\fB</fasta>\fR
Existing FASTA files can be used as sources for sequences. This file may be compressed with
.BR gzip "(1) or"
.BR bzip2 (1).
Sequences must be renamed to be compliant with QIIME's naming conventions. A single FASTA file can contain many samples, if desired. For each sample, include \fB<sample regex="\fIregex\fB" \fR[\fBlimit="\fIlimit\fB"\fR]\fB \fIdefs\fB/>\fR where \fIdefs\fR includes a value for each definition noted above. To only allow some of the sequences, set \fIlimit\fR to the desired number of sequences.
To separate out multiple samples in each file, \fIregex\fR specifies a regular expression, described in
.BR regex (7),
and any sequences with FASTA headers matching that regular expression will be associated with that sample. To match all sequences, specify \fBregex="."\fR. If the sequences have a keyword, specifying it should be sufficient. For instance, all the sequences starting with \fBSALT\fR by specifying \fBregex="^SALT"\fR or only sequences with a numeric identifier by saying \fBregex="^[0-9]*$"\fR.
.TP
\fB<panda forward="\fIforward.fastq\fB" reverse="\fIreverse.fastq\fB" version="\fIversion\fB" \fR[\fBfprimer="\fIACT...\fB"\fR]\fB \fR[\fBrprimer="\fITAG...\fB"\fR]\fB \fR[\fBfprimer="\fIACT...\fB"\fR]\fB \fR[\fBthreshold="\fIthreshold\fB"\fR]\fB>\fR...\fB</panda>\fR
Assemble sequences using PANDAseq, if installed. Two files, the forward and reverse reads, must be supplied and they may be compressed with
.BR gzip "(1) or"
.BR bzip2 (1).
Optionally, the forward and reverse primers can be specified as either a nucleotide string, which can contain degenerate nucleotides (e.g., \fBW\fR), or, the number bases to trim from the read. Primer names can also be specified from a database, stored in \fB@prefix@/share/@PACKAGE@/primers.lst\fR, either name or \fB#\fR followed by the name to use the length of that primer.
The version of the CASAVA pipeline that generated the sequences must be specified. If the sequences are in the “old” Illumina format (i.e., not FASTQ), specify \fBversion="1.3"\fR and they will be converted. The newest sequences with PHRED+33-style quality scores are \fBversion="1.8"\fR. Versions 1.4 through 1.7 have identical data formats. If unsure, examine the data files. If every entry is one line, then it is version 1.3. If the quality scores have of many sequences have long stretches of \fBB\fR at the end, the it is version 1.4. If they quality scores have many stretches of \fB#\fR, then it is 1.8.
Since read files are often multiplexed, multiple samples can be specified. For each sample, include \fB<sample tag="\fItag\fB" \fR[\fBlimit="\fIlimit\fB"\fR]\fB \fIdefs\fB/>\fR where \fIdefs\fR includes a value for each definition noted above. Specify the Illumina sequencing bar code as \fItag\fR. If tag is set to '*', all sequences in the input files will be used, but only one sample per forward/reverse file pair may be defined. To only allow some of the sequences, set \fIlimit\fR to the desired number of sequences.
.TP
\fB<alpha/>\fR
Do a basic alpha diversity analysis (i.e., QIIME's Chao1 curves). Available pipelines: QIIME, mothur
.TP
\fB<beta \fR[\fBlevel="\fIlevel\fB"\fR]\fB \fR[\fBsize="\fIsize\fB"\fR]\fB \fR[\fBtaxa="\fIdepth\fB"\fR]\fB \fR[\fBbackground="\fIcolour\fB"\fR]\fB/>\fR
Do a QIIME beta diversity analysis and produce biplots and bubble plots. QIIME normally uses summarised taxa for the plots, so the taxonomic level can be specified; if it is omitted, OTUs are used instead. The library may be rarefied to a particular size by specifying \fIsize\fR or it may be set to \fBauto\fR to use the smallest sample size; if not specified, the library is not rarefied, which is probably incorrect. Specifying \fItaxa\fR can limit the number of taxa that appear in the biplot; the default is 10 and \fBall\fR can be specified if desired. The background colour can also be specified using the \fBbackground\fR attribute. The colour is interpreted by QIIME. The default is \fBwhite\fR, despite \fBblack\fR seeming to be the popular choice. Available pipeline: QIIME
.TP
\fB<blast \fR[\fBtitle="\fItitle\fB"\fR]\fB \fR[\fBcommand="\fIblastdbcommand\fB"\fR]\fB/>\fR
Build a BLAST database over the sequences. This requires
.BR formatdb (1)
or
.BR makeblastdb (1)
to be installed. A script called \fBblast\fR will be produced that will invoke
.BR blastall (1)
pointing to the appropriate database. The database may optionally be given a friendly name, \fItitle\fR. If one is not provided, it will be inferred from the file name. Valid command options are: formatdb, makeblastdb. Default is formatdb. Available pipeline: QIIME
.TP
\fB<blast-classifier \fR[\fBdb="\fI/path/to/database\fB"\fR] [\fBseqfile="\fI/path/to/seqfile\fB"\fR] [\fBtaxfile="\fI/path/to/taxfile\fB"\fR]\fB/>\fR
Configuration plugin when using BLAST as the classification method through QIIME. Only one of the \fBdb\fR or \fBseqfile\fR parameters can be used. If \fBseqfile\fR is used, QIIME will create a temporary BLAST database. The taxfile parameter specifies a tab delineated file which maps sequences to their taxonomy. Available pipelines: QIIME
.TP
\fB<compare level="\fIlevel\fB"/>\fR
Compare libraries on a plot to see if one is a superset of another. This is useful if one sample which is (possibly) a subsample of another. This uses summarised taxa, not OTUs, so the taxonomic level for the analysis must be specified. Available pipeline: QIIME
.TP
\fB<duleg \fR[\fBp="\fIplimit\fB"\fR]\fB/>\fR
Compute Dufrene-Legendre indicator species analysis in R using optional parameter \fIp\fR as p-value cut-off (default is 0.05). For more information, see
.BR aq-duleg (1), aq-otudulegmerge (1).
Available pipelines: QIIME, mothur
.TP
\fB<heatmap/>\fR
Create an OTU heatmap based on the OTU table using QIIME's make_otu_heatmap_html.py script. Available pipeline: QIIME
.TP
\fB<jackknife [size="\fIsize\fB"]/>\fR
Create 2D and 3D PCoA plots based on jackknifing (repeated subsampling) of the OTU table using QIIME's jackknifed_beta_diversity.py script. Size must be a positive number that is no larger than the largest number of sequences in a sample. Available pipeline: QIIME
.TP
\fB<mrpp [method="\fImethod\fB"]/>\fR
Compute Multi Response Permutation Procedure of within-versus among-group dissimilarities in R using one of the following options: "manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao" or "cao". If no method supplied, defaults to Bray-Curtis ("bray"). For more information, see
.BR aq-mrpp (1).
Available pipelines: QIIME, mothur
.TP
\fB<multicore num-cores="\fInumber\fB"/>\fR
Specifies that the analysis is being run on a multicore system. A value for the number of cores must be specified, and all multicore capable analyses will use this number of cores. It must be less than or equal to the number of cores available to the system. This definition must be placed above all other analysis definitions that you wish to run using multiple cores. Please note that not all steps support multiple cores. Available pipelines: QIIME, mothur
.TP
\fB<nmf-concordance/>\fR
Create a concordance plot for non-negative matrix factorization. The actual NMF analysis, described below, requires a degree and a concordance plot will show the suitability of a particular degree for the data. The degrees that appear as local maxima in the concordance plot are suitable to try for NMF analysis. See
.BR aq-nmf-concordance (1)
for more information. Available pipelines: QIIME, mothur
.TP
\fB<nmf degree="\fIdegree\fB"/>\fR
Do a non-negative matrix factorization of the data creating \fIdegree\fR bases, where this degree has been selected from the concordance plot generated above. For more information, see
.BR aq-nmf (1).
Available pipelines: QIIME, mothur
.TP
\fB<nmf-auto/>\fR
First, plots a concordance plot for NMF as with nmf-concordance, and using this it tries to find local maxima to run the main NMF degree on. You should use this instead of (and not in conjunction with) nmf and nmf-concordance. Note that it may not find a maxima from the concordance plot, and thus may not create a NMF plot. Available pipelines: QIIME, mothur
.TP
\fB<pca/>\fR
Perform principal components in R using the numeric variables supplied in the definitions. While QIIME's biplots use Unifrac distances, this PCA uses the taxa information and any numeric supplemental information. Available pipelines: QIIME, mothur
.TP
\fB<pcoa [method="\fImethod\fB"] [ellipsoid-confidence="\fIvalue\fB"/>\fR
Perform principal coordinate analysis in R using one of the following options: "manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup", "binomial", "chao" or "cao". If no method supplied, defaults to Bray-Curtis ("bray"). If a value between 0 and 1 is supplied to ellipsoid-confidence, it will plot ellipsoids around each of the a priori groups supplied in the metadata, using the specified value as a confidence interval. Plots a MDS plot, and if there are 2 or more numerical properties, it will plot a biplot. See
.BR aq-pcoa (1).
Available pipelines: QIIME, mothur
.TP
\fB<qualityanal/>\fR
Produce some data on the quality of the raw Illumina reads. This only applies to reads specified to PANDAseq. For more information, see
.BR aq-qualityanal (1).
Available pipelines: QIIME, mothur
.TP
\fB<rankabundance/>\fR
Make a rank-abundance plot using QIIME. Available pipeline: QIIME
.TP
\fB<rdp [confidence="\fIvalue\fB"] [taxfile="\fIfilepath\fB"] [seqfile="\fIfilepath\fB"] [max-memory="\fIvalue\fB"]/>\fR
Modifies the RDP classifier flags. Confidence alters the RDP confidence cutoff value. Requires a value between 0 and 1, and default is 0.8. Taxfile and seqfile arguments require full filepaths to the RDP taxonomy file and its related sequence file. If left blank, it will use the default values specified in the QIIME config file (likely GreenGenes). This could be used to train RDP on 18s reference sequences, for example, and these classifications will be used in all downstream analyses. The "max-memory" option changes the amount of memory given to RDP. Set to a high enough value to prevent RDP from silently thrashing (particularly when retraining). Available pipeline: QIIME
.TP
\fB<rtax \fR\fBread-1="\fI/path/to/forward_read.fa\fB"\fR \fR\fBread-2="\fI/path/to/reverse_read.fa\fB"\fR \fBseqfile="\fI/path/to/seqfile\fB"\fR \fBtaxfile="\fI/path/to/taxfile\fB"\fR\fB/>\fR
Configuration plugin when using Rtax as the classification method through QIIME. Each of the parameters is required. The \fBread-1\fR and \fBread-2\fR parameters are the filepath to the pre-clustering sequences in FASTA format. The \fBtaxfile\fR parameter specifies a tab delineated file which maps the sequences in \fBseqfile\fR to their taxonomy. Available pipelines: QIIME
.TP
\fB<taxaplot/>\fR
Make bar and area charts of the taxa. Results placed in taxaplot directory of .qiime folder. Available pipelines: QIIME, mothur (requires QIIME to be installed)
.TP
\fB<uchime/>\fR
Do de novo chimera checking with UCHIME (requires USEARCH v6.1). This will automatically remove any chimeras after clustering has completed. Available pipeline: QIIME
.TP
\fB<unifrac-mrpp \fR[\fBsize="\fIsize\fB"\fR]\fB/>\fR
Compute Multi Response Permutation Procedure of within- versus among-group dissimilarities in R. For more information, see
.BR aq-mrpp-unifrac (1).
The library may be rarefied to a particular size by specifying \fIsize\fR or it may be set to \fBauto\fR to use the smallest sample size; if not specified, the library is not rarefied, which is probably incorrect. Available pipeline: QIIME
.TP
\fB<venn/>\fR
Create Venn diagrams that plots abundance of sequences in all metadata groups where the number of clusters is less than 5 and greater than 1, see
.BR aq-venn (1).
.TP
\fB<withseqs/>\fR
Produce an OTU table with the representatives sequences in a final column. Available pipeline: QIIME
.SH ENVIRONMENT VARIABLES
.TP
QIIME_PREFIX
Prefix before QIIME commands, used for special installations such as MacQIIME or BioLinux. (e.g., \fBQIIME_PREFIX="macqiime "\fR for MacQIIME, or \fBQIIME_PREFIX="qiime "\fR for BioLinux)
.TP
INFERNAL_MODEL
Full path to the alignment Infernal model, usually \fBseed.16s.reference_model.sto\fR.
.TP
PANDA_FLAGS
Extra command line arguments for
.BR pandaseq (1).
This is where \fB-T\fR should be specified if desired.
.SH SEE ALSO
\fB@prefix@/share/doc/@PACKAGE@/sample.ax\fR,
\fB@prefix@/share/@PACKAGE@/primers.lst\fR,
.BR aqxs (1),
.BR aq-inst-cran (1),
.BR pandaseq (1),
.BR make (1).
The following are components used by AXIOME or supplemental tools:
.BR aq-base (1),
.BR aq-biplot (1),
.BR aq-bubbleplot (1),
.BR aq-cmplibs (1),
.BR aq-count-n (1),
.BR aq-estimateq (1),
.BR aq-duleg (1),
.BR aq-demux-illumina (1),
.BR aq-fasta-length (1),
.BR aq-fastq2oldillumina (1),
.BR aq-filter-fastq-known (1),
.BR aq-joinn (1),
.BR aq-libcontrib (1),
.BR aq-marry-illumina-index (1),
.BR aq-mkrepset (1),
.BR aq-mrpp (1),
.BR aq-nmf (1),
.BR aq-nmf-concordance (1),
.BR aq-oldillumina2fastq (1),
.BR aq-otu2lnotu (1),
.BR aq-otu2pcord (1),
.BR aq-otubinner (1),
.BR aq-otuhistogram (1),
.BR aq-otunolineage (1),
.BR aq-otuplothistogram (1),
.BR aq-otusum (1),
.BR aq-otutop (1),
.BR aq-otuwithseqs (1),
.BR aq-pca (1),
.BR aq-pcoa (1),
.BR aq-pretendsummarize (1),
.BR aq-qualhisto (1),
.BR aq-qualityanal (1),
.BR aq-rareotuwithlineage (1),
.BR aq-sort-fasta (1),
.BR aq-syntheticfastq (1).