Skip to content

Commit

Permalink
Showing 10 changed files with 136 additions and 235 deletions.
15 changes: 0 additions & 15 deletions CITATION.cff

This file was deleted.

Binary file removed DEV-MANUAL.pdf
Binary file not shown.
217 changes: 0 additions & 217 deletions USER-MANUAL.md

This file was deleted.

9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -38,6 +38,15 @@ dependencies = [
]
dynamic = ["version"]

[tool.hatch.build.targets.sdist]
exclude = [
"/.github",
"/docs",
]

[tool.hatch.build.targets.wheel]
packages = ["src/seismicrna"]

[tool.hatch.version]
path = "src/seismicrna/__init__.py"

2 changes: 1 addition & 1 deletion src/seismicrna/__init__.py
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@

warnings.simplefilter(action='ignore', category=FutureWarning)

__version__ = "0.7.0"
__version__ = "0.7.1"


########################################################################
File renamed without changes
File renamed without changes.
70 changes: 70 additions & 0 deletions src/userdocs/manuals/inputs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@

Input file patterns
========================================================================

For commands that take a list of input files as positional arguments,
these files can be given in a variety of ways so that you can choose a
convenient manner of specifying input files.

Example output directory
------------------------------------------------------------------------

Assume that the output directory has these contents::

sample_1/
align/
ref_1.bam
ref_2.bam
relate/
ref_1/
batch-relate-0.parquet
report-relate.json
ref_2/
batch-relate-0.parquet
report-relate.json
mask/
ref_1/
batch-mask-0.csv.gz
report-mask.json
ref_2/
batch-mask-0.csv.gz
report-mask.json
sample_2/
align/
ref_1.bam
ref_2.bam
relate/
ref_1/
batch-relate-0.parquet
report-relate.json
ref_2/
batch-relate-0.parquet
report-relate.json
mask/
ref_1/
batch-mask-0.csv.gz
report-mask.json
ref_2/
batch-mask-0.csv.gz
report-mask.json

List of files
------------------------------------------------------------------------

The simplest manner is to list every input file explicitly. For example,
two report files could be processed with ``table`` command like this::

seismic table sample_1/relate/ref_1/report-relate.json sample_2/mask/ref_2/report-relate.json

Glob patterns
------------------------------------------------------------------------

Listing many input files explicitly would be tedious. `Glob patterns`_
use wildcard characters to match many paths with a single expression.
This method is especially useful for matching files that have the same
names and are located in different directories. For example, to process
all report files for the reference ``ref_2``::


.. _glob patterns: https://en.wikipedia.org/wiki/Glob_(programming)
5 changes: 3 additions & 2 deletions src/userdocs/manuals/steps/align.rst
Original file line number Diff line number Diff line change
@@ -290,8 +290,9 @@ then try the following steps (in this order):
``samtools view -f 4 temp/sample/align/align-2_align/refs.sam -o x``
where ``sample``, ``refs``, and ``x`` are replaced with the name of
the sample, name of the FASTA file, and name of the SAM file into
which to write the unaligned reads, respectively. Use `BLAST`_ to
identify the source of the unaligned reads, to infer the problem.
which to write the unaligned reads, respectively. Open the SAM file,
select several unaligned reads randomly, and use `BLAST`_ to discern
their origins, which can help in deducing what went wrong.


.. _FastQC: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
53 changes: 53 additions & 0 deletions src/userdocs/manuals/steps/relate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@

Relate each read to every reference position
------------------------------------------------------------------------

Input files for relate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Relate requires exactly one FASTA file containing one or more reference
sequences and any number of alignment map files in BAM format.

.. note::
The references in the FASTA file must match those to which the reads
were aligned to produce the BAM file(s); the names and the sequences
must be identical. If the names differ, then the BAM files using the
old names will be ignored; while if the sequences differ, then reads
can yield erroneous relation vectors or fail outright.

One BAM file
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

A single BAM file (``sample_1/align/ref_1.bam``) can be run as follows::

seismic relate refs.fa sample_1/align/ref_1.bam

Multiple BAM files
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Multiple BAM files can be run in parallel by giving multiple paths::

seismic relate refs.fa sample_1/align/ref_1.bam

and/or by using `glob patterns`_::

seismic relate refs.fa sample_*/align/ref_1.bam sample_*/align/ref_2.bam

BAM file content conventions
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Generally, a BAM file can contain reads that have aligned to any number
of reference, as well as unaligned reads. However, SEISMIC-RNA requires
that each BAM file contain reads aligned to exactly one reference. This
restriction enables the relate step to process BAM files in parallel,
which increases the speed. If the BAM files were created using ``seismic
align``, then they will already follow this convention.

BAM file path conventions
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The name of the BAM file (minus the extension ``.bam``) must be the name
of the reference to which it was aligned. It must be inside a directory
named ``align``, which must be inside a directory named after the sample
from which the reads came. If the BAM files were created using ``seismic
align``, then they will already follow this convention.

0 comments on commit ddaff9c

Please sign in to comment.