Skip to content

Commit

Permalink
Update docs on +addclust
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewfallan committed Jan 14, 2024
1 parent 6026002 commit 4e6c281
Show file tree
Hide file tree
Showing 57 changed files with 945 additions and 95 deletions.
Binary file modified docs/.doctrees/api/seismicrna.cluster.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/seismicrna.core.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/seismicrna.core.io.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/.doctrees/api/seismicrna.core.mu.doctree
Binary file not shown.
Binary file modified docs/.doctrees/api/seismicrna.wf.doctree
Binary file not shown.
Binary file modified docs/.doctrees/environment.pickle
Binary file not shown.
Binary file added docs/.doctrees/howto/addclust.doctree
Binary file not shown.
Binary file modified docs/.doctrees/howto/index.doctree
Binary file not shown.
Binary file modified docs/.doctrees/howto/run/cluster.doctree
Binary file not shown.
12 changes: 12 additions & 0 deletions docs/_sources/api/seismicrna.cluster.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,12 +64,24 @@ Submodules
:show-inheritance:


.. automodule:: seismicrna.cluster.save
:members:
:undoc-members:
:show-inheritance:


.. automodule:: seismicrna.cluster.uniq
:members:
:undoc-members:
:show-inheritance:


.. automodule:: seismicrna.cluster.update
:members:
:undoc-members:
:show-inheritance:


.. automodule:: seismicrna.cluster.write
:members:
:undoc-members:
Expand Down
8 changes: 8 additions & 0 deletions docs/_sources/api/seismicrna.core.io.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ seismicrna.core.io package
:undoc-members:
:show-inheritance:

Subpackages
-----------

.. toctree::
:maxdepth: 4

seismicrna.core.io.tests

Submodules
----------

Expand Down
16 changes: 16 additions & 0 deletions docs/_sources/api/seismicrna.core.io.tests.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
seismicrna.core.io.tests package
================================

.. automodule:: seismicrna.core.io.tests
:members:
:undoc-members:
:show-inheritance:

Submodules
----------


.. automodule:: seismicrna.core.io.tests.file_test
:members:
:undoc-members:
:show-inheritance:
57 changes: 57 additions & 0 deletions docs/_sources/howto/addclust.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@

Add Clusters to an Already-Clustered Dataset
--------------------------------------------------------------------------------

Background about adding clusters
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Purpose of adding clusters
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

After running the Cluster step, you may want to continue clustering using more
clusters.
You could accomplish this by simply rerunning the Cluster step using a larger
``--max-clusters`` (``-k``).
But since Cluster always begins with 1 cluster, you would need to repeat all the
clusters that you had already run before being able to add more clusters, which
would waste your computer resources.
The Add Clusters tool lets you keep your existing clustering results and merely
append more clusters, which is faster and more efficient.

How to add clusters to a Cluster dataset
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Command line for adding clusters
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Suppose you have already run clustering with a maximum of 2 clusters (``-k 2``)
on a dataset::

seismic cluster -k 2 {out}/{sample}/mask/{ref}/{sect}

To add more clusters, use ``seismic +addclust`` with the new maximum number of
clusters (e.g. 4) and the Cluster report as the input file::

seismic +addclust -k 4 {out}/{sample}/cluster/{ref}/{sect}

This command will resume clustering with one more than the maximum number of
existing clusters (in this case, the previous maximum was 2, so ``+addclust``
will begin at 3).
The maximum number of clusters follows the same rules as in ``cluster`` (see
:ref:`cluster_max` for more information).
You do not need to specify any other settings for clustering, such as the number
of runs or the threshold for convergence: ``+addclust`` automatically uses the
same settings that you used originally in ``cluster``.

You can give any number of Cluster report files as inputs for Add Clusters.
See :doc:`./inputs` for ways to list multiple files.

The only new files produced are ``mus`` and ``props`` files for each additional
clustering run.
The Cluster report file, the batch files, and the counts file are all updated
in-place, as if you had run ``cluster`` the first time with the new ``-k``.
Because updating in-place means that there is a risk of data loss if any error
occurs during writing, the original files are all first backed up in a temporary
directory, which you can specify with ``--temp-dir`` (``-t``).
In case of a fatal error while updating the files, the original files will all
be restored from the backup, as if you had never run ``+addclust``.
1 change: 1 addition & 0 deletions docs/_sources/howto/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ How To
normalize
parallel
logging
addclust
cleanfa
11 changes: 11 additions & 0 deletions docs/_sources/howto/run/cluster.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ To cluster all masked relation vectors in ``{out}``, you can use the command ::
Cluster: Settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. _cluster_max:

Cluster setting: Maximum number of clusters
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Expand Down Expand Up @@ -57,6 +59,10 @@ This process continues until one of two limits is reached:
the number that yielded the best BIC (field "Optimal Number of Clusters"),
which is less than or equal to the maximum you specified.

.. note::
If you decide after clustering that you wanted to run clustering with more
clusters, then use the tool ``+addclust`` (see :doc:`../addclust`).

Cluster setting: Expectation-maximization iterations
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Expand Down Expand Up @@ -175,6 +181,11 @@ In your cluster report:
Cluster: Troubleshoot and optimize
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Run Cluster with more clusters, without repeating the work already done
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The tool ``+addclust`` exists for this purpose: see :doc:`../addclust`.

Cluster takes too long to finish
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Expand Down
Loading

0 comments on commit 4e6c281

Please sign in to comment.