Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updated contributor guide #627

Merged
merged 13 commits into from
Jun 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions docs/contribute/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@
Core contributions are changes to AutoRA which aren't experimentalists, (synthetic) experiment runners, or theorists.
The primary purpose of the core is to provide utilities for:

- describing experiments (in the [`autora-core` package](https://github.com/autoresearch/autora-core))
- handle workflows for automated experiments
(currently in the [`autora-workflow` package](https://github.com/autoresearch/autora-workflow))
- describing experiments and handling workflows (in the [`autora-core` package](https://github.com/autoresearch/autora-core))
- run synthetic experiments (currently in the [`autora-synthetic` package](https://github.com/autoresearch/autora-synthetic). Synthetic experiment runners may be submitted as pull requests to the
[`autora-synthetic`](https://github.com/AutoResearch/autora-synthetic/blob/main/CONTRIBUTING.md) package, providing they
require no additional dependencies. However, if your contribution requires additional dependencies, you can submit it as a full package following
Expand Down
12 changes: 4 additions & 8 deletions docs/contribute/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ as well as external contributors.
![image](../img/package_overview.png)

[`autora`](https://github.com/autoresearch/autora) is the parent package which end users are expected to install. The
parent depends on core packages, such as [`autora-core`](https://github.com/autoresearch/autora-core),
[`autora-workflow`](https://github.com/autoresearch/autora-workflow), and
parent depends on core packages, such as [`autora-core`](https://github.com/autoresearch/autora-core) and
[`autora-synthetic`](https://github.com/autoresearch/autora-synthetic). It also includes vetted modules (child packages) as optional dependencies which users can choose
to install.

Expand Down Expand Up @@ -64,19 +63,16 @@ The following packages are considered core packages, and are actively maintained
[Autonomous Empirical Research Group](https://musslick.github.io/AER_website/Team.html):

- **autora-core** [`https://github.com/autoresearch/autora-core`](https://github.com/autoresearch/autora-core) This package includes fundamental utilities
and building blocks for all the other packages. This is always installed when a user installs `autora` and can be
a dependency of other child packages.


- **autora-workflow** [`https://github.com/autoresearch/autora-workflow`](https://github.com/autoresearch/autora-workflow): The workflow package includes basic utilities for managing the workflow of closed-loop research processes, e.g., coordinating workflows between the theorists, experimentalists, and experiment runners. Though it currently stands alone, this package will ultimately be merged into autora-core.
and building blocks for all the other packages. This includes basic utilities for managing the workflow of closed-loop research processes, e.g., coordinating workflows between the theorists, experimentalists, and experiment runners. The `autora-core` package is always installed when a user installs `autora` and can be
a dependency of other child packages.


- **autora-synthetic** [`https://github.com/autoresearch/autora-synthetic`](https://github.com/autoresearch/autora-synthetic): This package includes a number of ground-truth models from different scientific disciplines that can be used for benchmarking automated scientific discovery. If you seek to contribute a scientific model, please see the [core contributor guide](core.md) for details.


We welcome contributions to
these packages in the form of pull requests, bug reports, and feature requests. For more details, see the
[core contributor guide](core.md). Feel free to ask any questions or provide any feedback regarding core contributions on the
[core contributor guide](core.md). Feel free to ask any questions or provide any feedback regarding core contributions in the
[AutoRA forum](https://github.com/orgs/AutoResearch/discussions/categories/core-contributions).

For core contributions, including contributions to [`autora-synthetic`](https://github.com/autoresearch/autora-synthetic), it is possible to set up your python environment in many different ways.
Expand Down
43 changes: 27 additions & 16 deletions docs/contribute/modules/experimentalist.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,21 @@ Make sure to select the `experimentalist` option when prompted. You can skip all
## Implementation

For an experimentalist, you should implement a function that returns a set of experimental conditions. This set may be
a numpy array, iterator variable or other data format.
a `pandas` data frame, `numpy` array, iterator variable or other data format.

!!! hint
We generally **recommend using 2-dimensional numpy arrays as outputs** in which
each row represents a set of experimental conditions. The columns of the array correspond to the independent variables.
We generally **recommend using pandas data frames as outputs** in which
columns correspond to the independent variables of an experiment.

Once you've created your repository, you can implement your experimentalist by editing the `init.py` file in
Once you've created your repository, you can implement your experimentalist by editing the
`__init__.py` file in
``src/autora/experimentalist/name_of_your_experimentalist/``.
You may also add additional files to this directory if needed.
It is important that the `init.py` file contains a function called `name_of_your_experimentalist`
It is important that the `__init__.py` file contains a function called
`name_of_your_experimentalist`
which returns a set of experimental conditions (e.g., as a numpy array).

The following example ``init.py`` illustrates the implementation of a simple experimentalist
The following example ``__init__.py`` illustrates the implementation of a simple experimentalist
that uniformly samples without replacement from a pool of candidate conditions.

```python
Expand All @@ -44,25 +46,34 @@ Example Experimentalist
"""

import random
from typing import Iterable, Sequence, Union
import pandas as pd
import numpy as np
from typing import Iterable, Union

random_sample(conditions: Union[Iterable, Sequence], n: int = 1):
def random_sample(conditions: Union[pd.DataFrame, np.ndarray],
num_samples: int = 1) -> pd.DataFrame:
"""
Uniform random sampling without replacement from a pool of conditions.
Args:
conditions: Pool of conditions
n: number of samples to collect
num_samples: number of samples to collect

Returns: Sampled pool
Returns: Sampled pool of conditions

"""

if isinstance(conditions, Iterable):
conditions = list(conditions)
random.shuffle(conditions)
samples = conditions[0:n]

return samples
if isinstance(conditions, pd.DataFrame):
# Randomly sample N rows from DataFrame
sampled_data = conditions.sample(n=num_samples)
return sampled_data

elif isinstance(conditions, np.ndarray):
# Randomly sample N rows from NumPy array
if num_samples > conditions.shape[0]:
raise ValueError("num_samples cannot be greater than the number of rows in the array.")
indices = np.random.choice(conditions.shape[0], size=num_samples, replace=False)
sampled_conditions = conditions[indices]
return sampled_conditions
```

## Next Steps: Testing, Documentation, Publishing
Expand Down
24 changes: 14 additions & 10 deletions docs/contribute/modules/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,13 @@ After setting up your repository and linking it to your GitHub account, you can

### Implement Your Code

You may implement your code in the ``init.py`` located in the respective feature folder in ``src/autora``.
You may implement your code in the ``__init__.py`` located in the respective feature folder in ``src/autora``.

Please refer to the following guides on implementing
- [theorists](theorist.md)
- [experimentalists](experimentalist.md)
- [experiment runners](experiment-runner.md)

* [theorists](theorist.md)
musslick marked this conversation as resolved.
Show resolved Hide resolved
* [experimentalists](experimentalist.md)
* [experiment runners](experiment-runner.md)

If the feature you seek to implement does not fit in any of these categories, then
you can create folders for new categories. If you are unsure how to proceed, you are always welcome
Expand Down Expand Up @@ -98,13 +99,16 @@ Once you've published your module, you should take some time to celebrate and an
Once your package is working and published, you can **make a pull request** on [`autora`](https://github.com/autoresearch/autora) to have it vetted and added to the "parent" package. Note, if you are not a member of the AutoResearch organization on GitHub, you will need to create a fork of the repository for the parent package and submit your pull request via that fork. If you are a member, you can create a pull request from a branch created directly from the parent package repository. Steps for creating a new branch to add your module are specified below.

!!! success

In order for your package to be included in the parent package, it must meet the following criteria:
- have basic documentation in ``docs/index.md``
- have a basic python notebook exposing how to use the module in ``docs/Basic Usage.ipynb``
- have basic tests in ``tests/``
- be published via PyPI or Conda
- be compatible with the current version of the parent package
- follow standard python coding guidelines including PEP8

* have basic documentation in ``docs/index.md``
* have a basic python notebook exposing how to use the module in ``docs/Basic Usage.ipynb``
* have basic tests in ``tests/``
* be published via PyPI or Conda
* be compatible with the current version of the parent package
* follow standard python coding guidelines including PEP8
* the repository in which your package is hosted must be public

The following demonstrates how to add a package published under `autora-theorist-example` in PyPI in the GitHub
repository `example-contributor/contributor-theorist`.
Expand Down
76 changes: 64 additions & 12 deletions docs/contribute/modules/theorist.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

AutoRA theorists are meant to return scientific models describing the relationship between experimental conditions
and observations. Such models may take the form of a simple linear regression, non-linear equations, causal graphs,
a more complex neural network, or other models which
a more complex neural network, or other models which

- can be identified based on data (and prior knowledge)
- can be used to make novel predictions about observations given experimental conditions.

Expand All @@ -26,16 +27,19 @@ Make sure to select the `theorist` option when prompted. You can skip all other

## Implementation

Once you've created your repository, you can implement your theorist by editing the `init.py` file in
Once you've created your repository, you can implement your theorist by editing the `__init__.py`
file in
``src/autora/theorist/name_of_your_theorist/``. You may also add additional files to this directory if needed.
It is important that the `init.py` file contains a class called `NameOfYourTheorist` which inherits from
It is important that the `__init__.py` file contains a class called `NameOfYourTheorist` which
inherits from
`sklearn.base.BaseEstimator` and implements the following methods:

- `fit(self, conditions, observations)`
- `predict(self, conditions)`

See the [sklearn documentation](https://scikit-learn.org/stable/developers/develop.html) for more information on
how to implement the methods. The following example ``init.py`` illustrates the implementation of a simple theorist
how to implement the methods. The following example ``__init__.py`` illustrates the implementation
of a simple theorist
that fits a polynomial function to the data:

```python
Expand All @@ -45,6 +49,43 @@ Example Theorist
"""

import numpy as np
import pandas as pd
from typing import Union
from sklearn.base import BaseEstimator


class ExampleRegressor(BaseEstimator):
"""
This theorist fits a polynomial function to the data.
"""

def __init__(self, degree: int = 2):
self.degree = degree

def fit(self, conditions: Union[pd.DataFrame, np.ndarray],
observations: Union[pd.DataFrame, np.ndarray]):

# fit polynomial function: observations ~ conditions
self.coeff = np.polyfit(conditions, observations, deg = 2)
self.polynomial = np.poly1d(self.coeff)
pass

def predict(self, conditions):

return self.polynomial(conditions)
```

Note, however, that it is best practice to make sure the conditions are compatible with the `polyfit`. In this case, we will make sure to add some checks:

```python

"""
Example Theorist
"""

import numpy as np
import pandas as pd
from typing import Union
from sklearn.base import BaseEstimator


Expand All @@ -56,21 +97,32 @@ class ExampleRegressor(BaseEstimator):
def __init__(self, degree: int = 2):
self.degree = degree

def fit(self, conditions, observations):
def fit(self, conditions: Union[pd.DataFrame, np.ndarray],
observations: Union[pd.DataFrame, np.ndarray]):

# polyfit expects a 1D array
if conditions.ndim > 1:
# polyfit expects a 1D array, convert pandas data frame to 1D vector
if isinstance(conditions, pd.DataFrame):
conditions = conditions.squeeze()

# polyfit expects a 1D array, flatten nd array
if isinstance(conditions, np.ndarray) and conditions.ndim > 1:
conditions = conditions.flatten()

if observations.ndim > 1:
observations = observations.flatten()

# fit polynomial
self.coeff = np.polyfit(conditions, observations, 2)
# fit polynomial function: observations ~ conditions
self.coeff = np.polyfit(conditions, observations, deg = 2)
self.polynomial = np.poly1d(self.coeff)
pass

def predict(self, conditions):

# polyfit expects a 1D array, convert pandas data frame to 1D vector
if isinstance(conditions, pd.DataFrame):
conditions = conditions.squeeze()

# polyfit expects a 1D array, flatten nd array
if isinstance(conditions, np.ndarray) and conditions.ndim > 1:
conditions = conditions.flatten()

return self.polynomial(conditions)
```

Expand Down
Loading