Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/docs #1011

Merged
merged 8 commits into from
Apr 29, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cookbooks/analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ Visualization

If a ``name`` is input into a non-linear search, all results are output to hard-disk in a folder.

By overwriting the ``Visualizer`` object of an ``Analysis`` class with a custom `Visualizer` class, custom results of the
By overwriting the ``Visualizer`` object of an ``Analysis`` class with a custom ``Visualizer`` class, custom results of the
model-fit can be visualized during the model-fit.

The ``Visualizer`` below has the methods ``visualize_before_fit`` and ``visualize``, which perform model specific
Expand Down
2 changes: 1 addition & 1 deletion docs/cookbooks/database.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Database
========

The default behaviour of model-fitting results output is to be written to hard-disc in folders. These are simple to
The default behaviour of model-fitting results output is to be written to hard-disk in folders. These are simple to
navigate and manually check.

For small model-fitting tasks this is sufficient, however it does not scale well when performing many model fits to
Expand Down
5 changes: 5 additions & 0 deletions docs/cookbooks/multiple_datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,11 @@ To fit multiple datasets via a non-linear search we use this summed analysis obj

result_list = search.fit(model=model, analysis=analysis)

In the example above, the same ``Analysis`` class was used twice (to set up ``analysis_0`` and ``analysis_1``) and summed.

**PyAutoFit** supports the summing together of different ``Analysis`` classes, which may take as input completely different
datasets and fit the model to them (via the ``log_likelihood_function``) following a completely different procedure.

Result List
-----------

Expand Down
62 changes: 62 additions & 0 deletions docs/cookbooks/result.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This cookbook provides an overview of using the results.

- **Model Fit**: Perform a simple model-fit to create a ``Result`` object.
- **Info**: Print the ``info`` attribute of the ``Result`` object to display a summary of the model-fit.
- **Loading From Hard-disk**: Loading results from hard-disk to Python variables via the aggregator.
- **Samples**: The ``Samples`` object contained in the ``Result``, containing all non-linear samples (e.g. parameters, log likelihoods, etc.).
- **Maximum Likelihood**: The maximum likelihood model instance.
- **Posterior / PDF**: The median PDF model instance and PDF vectors of all model parameters via 1D marginalization.
Expand Down Expand Up @@ -98,6 +99,67 @@ The output appears as follows:
normalization 24.79 (24.65, 24.94)
sigma 9.85 (9.78, 9.90)

Loading From Hard-disk
----------------------

When performing fits which output results to hard-disc, a ``files`` folder is created containing .json / .csv files of
the model, samples, search, etc.

These files can be loaded from hard-disk to Python variables via the aggregator, making them accessible in a
Python script or Jupyter notebook.

Below, we will access these results using the aggregator's ``values`` method. A full list of what can be loaded is
as follows:

- ``model``: The ``model`` defined above and used in the model-fit (``model.json``).
- ``search``: The non-linear search settings (``search.json``).
- ``samples``: The non-linear search samples (``samples.csv``).
- ``samples_info``: Additional information about the samples (``samples_info.json``).
- ``samples_summary``: A summary of key results of the samples (``samples_summary.json``).
- ``info``: The info dictionary passed to the search (``info.json``).
- ``covariance``: The inferred covariance matrix (``covariance.csv``).
- ``data``: The 1D noisy data used that is fitted (``data.json``).
- ``noise_map``: The 1D noise-map fitted (``noise_map.json``).

The ``samples`` and ``samples_summary`` results contain a lot of repeated information. The ``samples`` result contains
the full non-linear search samples, for example every parameter sample and its log likelihood. The ``samples_summary``
contains a summary of the results, for example the maximum log likelihood model and error estimates on parameters
at 1 and 3 sigma confidence.

Accessing results via the ``samples_summary`` is much faster, because as it does reperform calculations using the full
list of samples. Therefore, if the result you want is accessible via the ``samples_summary`` you should use it
but if not you can revert to the ``samples.

.. code-block:: python

from autofit.aggregator.aggregator import Aggregator

agg = Aggregator.from_directory(
directory=path.join("output", "cookbook_result"),
)

Before using the aggregator to inspect results, lets discuss Python generators.

A generator is an object that iterates over a function when it is called. The aggregator creates all of the objects
that it loads from the database as generators (as opposed to a list, or dictionary, or another Python type).

This is because generators are memory efficient, as they do not store the entries of the database in memory
simultaneously. This contrasts objects like lists and dictionaries, which store all entries in memory all at once.
If you fit a large number of datasets, lists and dictionaries will use a lot of memory and could crash your computer!

Once we use a generator in the Python code, it cannot be used again. To perform the same task twice, the
generator must be remade it. This cookbook therefore rarely stores generators as variables and instead uses the
aggregator to create each generator at the point of use.

To create a generator of a specific set of results, we use the `values` method. This takes the `name` of the
object we want to create a generator of, for example inputting `name=samples` will return the results `Samples`
object (which is illustrated in detail below).

.. code-block:: python

for samples in agg.values("samples"):
print(samples.parameter_lists[0])

Samples
-------

Expand Down
20 changes: 13 additions & 7 deletions docs/cookbooks/search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,24 +60,30 @@ Output To Hard-Disk
-------------------

By default, a non-linear search does not output its results to hard-disk and its results can only be inspected
in Python via the ``result`` object.
in a Jupyter Notebook or Python script via the ``result`` object.

However, the results of any non-linear search can be output to hard-disk by passing the ``name`` and / or ``path_prefix``
attributes, which are used to name files and output the results to a folder on your hard-disk.

The benefits of doing this include:

- Inspecting results via folders on your computer can be more efficient than using a Jupyter Notebook.
- Results are output on-the-fly, making it possible to check that a fit i progressing as expected mid way through.
- Additional information about a fit (e.g. visualization) is output.
- Inspecting results via folders on your computer is more efficient than using a Jupyter Notebook for multiple datasets.
- Results are output on-the-fly, making it possible to check that a fit is progressing as expected mid way through.
- Additional information about a fit (e.g. visualization) can be output.
- Unfinished runs can be resumed from where they left off if they are terminated.
- On high performance super computers which use a batch system, results must be output in this way.
- On high performance super computers results often must be output in this way.

These outputs are fully described in the scientific workflow example.
The code below shows how to enable outputting of results to hard-disk:

.. code-block:: python

search = af.Emcee(path_prefix=path.join("folder_0", "folder_1"), name="example_mcmc")
search = af.Emcee(
path_prefix=path.join("folder_0", "folder_1"),
name="example_mcmc"
)


These outputs are fully described in the scientific workflow example.

Output Customization
--------------------
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ model and marginalized probability density functions.

overview/the_basics
overview/scientific_workflow
overview/statistical_methods

.. toctree::
:caption: Cookbooks:
Expand Down
Loading
Loading