Skip to content

Commit

Permalink
refactor: update README with Contribution Guidelines (#1062)
Browse files Browse the repository at this point in the history
* fix: update readme

* fix: update readme

* refactor: style_guide.md

* refactor: update readme

* refactor: style_guide.md

* refactor: remove Note from docstring of a task

* refactor: readme

* refactor: readme
  • Loading branch information
JohannesWesch authored Oct 8, 2024
1 parent 40ef3c1 commit 4ad2b8e
Show file tree
Hide file tree
Showing 2 changed files with 176 additions and 141 deletions.
39 changes: 29 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The key features of the Intelligence Layer are:
- [References](#references)
- [License](#license)
- [For Developers](#for-developers)
- [Python: Naming Conventions](#python-naming-conventions)
- [How to contribute](#how-to-contribute)
- [Executing tests](#executing-tests)

# Installation
Expand Down Expand Up @@ -212,17 +212,36 @@ This project can only be used after signing the agreement with Aleph Alpha®. Pl

# For Developers

## Python: Naming Conventions
For further information check out our different guides and documentations:
- [Concepts.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/Concepts.md) for an overview of what Intelligence Layer is and how it works.
- [style_guide.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md) on how we write and document code.
- [RELEASE.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/RELEASE.md) for the release process of IL.
- [CHANGELOG.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/CHANGELOG.md) for the latest changes.

## How to contribute
:warning: **Warning:** This repository is open-source. Any contributions and MR discussions will be publicly accessible.


1. Share the details of your problem with us.
2. Write your code according to our [style guide](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md).
3. Add doc strings to your code as described [here](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md#docstrings).
4. Write tests for new features ([Executing Tests](#executing-tests)).
5. Add an how_to and/or notebook as a documentation (check out [this](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md#documentation) for guidance).
6. Update the [Changelog](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/CHANGELOG.md) with your changes.
7. Request a review for the MR, so that it can be merged.


We follow the [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/).
In addition, there are the following naming conventions:
* Class method names:
* Use only substantives for a method name having no side effects and returning some objects
* E.g., `evaluation_overview` which returns an evaluation overview object
* Use a verb for a method name if it has side effects and return nothing
* E.g., `store_evaluation_overview` which saves a given evaluation overview (and returns nothing)

## Executing tests
If you want to execute all tests, you first need to spin up your docker container and execute the commands with your own `GITLAB_TOKEN`.

```bash
export GITLAB_TOKEN=...
echo $GITLAB_TOKEN | docker login registry.gitlab.aleph-alpha.de -u your_email@for_gitlab --password-stdin
docker compose pull to update containers
```

Afterwards simply run `docker compose up --build`. You can then either run the tests in your IDE or via the terminal.

**In VSCode**
1. Sidebar > Testing
Expand All @@ -232,7 +251,7 @@ In addition, there are the following naming conventions:
You can then run the tests from the sidebar.

**In a terminal**
In order to run a local proxy w.r.t. to the CI pipeline (required to merge) you can run
In order to run a local proxy of the CI pipeline (required to merge) you can run
> scripts/all.sh
This will run linters and all tests.
Expand Down
278 changes: 147 additions & 131 deletions style_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,147 @@
Welcome to the project's style guide, a foundational document that ensures consistency, clarity, and quality in our collaborative efforts.
As we work together, adhering to the guidelines outlined here will streamline our process, making our code more readable and maintainable for all team members.

## Folder Structure
The source directory is organized into four distinct folders, each with a specific responsibility.

| **Folder** | **Description** |
|----------------|---------------------------------------------------------------------------------|
| Core | The main components of the IL. This includes the `Task` abstraction, the `Tracer` and basic components like the `models`. |
| Evaluation | Includes all resources related to task evaluation. |
| Connectors | Provides tools to connect with third-party applications within the IL. |
| Examples | Showcases various task implementations to address different use cases using the IL. |

## Python: Naming Conventions

We follow the [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/).
In addition, there are the following naming conventions:
* Class method names:
* Use only substantives for a method name having no side effects and returning some objects
* E.g., `evaluation_overview` which returns an evaluation overview object
* Use a verb for a method name if it has side effects and return nothing
* E.g., `store_evaluation_overview` which saves a given evaluation overview (and returns nothing)


## Docstrings

### Task documentation

Document any `Task` like so:
``` python
class MyTask:
"""Start with a one-line description of the task, like this.
Follow up with a more detailed description, outlining the purpose & general functioning of the task.
Attributes:
EXAMPLE_CONSTANT: Any constant that may be defined within the class.
example_non_private_attribute: Any attribute defined within the '__init__' that is not private.
Example:
>>> var = "Describe here how to use this task end to end"
>>> print("End on one newline.")
End on one newline.
"""
```
The Example documentation is optional but preferred to be included in a how-to guide if it is helpful in this case.

Do not document the `run` function of a class. Avoid documenting any other (private) functions.

### Input and output documentation

Document the inputs and outputs for a specific task like so:

``` python
class MyInput(BaseModel):
"""This is the input for this (suite of) task(s).
Attributes:
horse: Everybody knows what a horse is.
chunk: We know what a chunk is, but does a user?
crazy_deep_llm_example_param: Yeah, this probably deserves some explanation.
"""

# Any output should be documented in a similar manner
```

### Defaults

Certain parameters in each task are recurring. Where possible, we shall try to use certain standard documentation.

``` python
"""
client: Aleph Alpha client instance for running model related API calls.
model: A valid Aleph Alpha model name.
"""
```

### Module documentation

We **do not document the module**, as we assume imports like:

``` python
from intelligence_layer.complete import Complete
completion_task = Complete()
```

rather than:

``` python
from intelligence_layer import complete
completion_task = complete.Complete()
```

This ensures that the documentation is easily accessible by hovering over the imported task.

Generally, adhere to this [guideline](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html).

## Documentation Guide: Jupyter Notebooks vs. How-tos vs. Docstrings

When documenting our codebase, we focus on three primary channels: Jupyter notebooks, How-tos and docstrings.
The objective is to provide both a high-level understanding and detailed implementation specifics.
Here's how we differentiate and allocate content between them:

### Jupyter Notebooks

**Purpose**: Jupyter notebooks are used to provide a comprehensive overview and walkthrough of the tasks. They are ideal for understanding the purpose, usage, and evaluation of a task. (look [here](#when-do-we-start-a-new-notebook) to decide whether to create a new notebook or expand an existing one)

- **High-level Overview**:
- **Problem Definition**: Describe the specific problem or challenge this task addresses.
- **Comparison**: (Optional) Highlight how this task stands out or differs from other tasks in our codebase.
- **Detailed Walkthrough**:
- **Input/Output Specifications**: Clearly define the expected input format and the resulting output.
Mention any constraints or specific requirements.
- **Debugging Insights**: Explain what information is available in the trace and how it can aid in troubleshooting.
- **Use-case Examples**: What are concrete use-cases I can solve with this task?
Run through examples.
- **Evaluation Metrics**: (Optional) Suggest methods or metrics to evaluate the performance or accuracy of this task.

### How-tos

**Purpose**: How-tos are a short and concise way of understanding a very specific concept, which they explain in much detail in a step-by-step guide.

- **Table of Content**: Which steps are covered in the how-to?
- **Detailed Walkthrough**: Guide the user step by step. Keep it short and concise.


### Docstrings

**Purpose**: Docstrings give a quickstart overview. They provide the necessary information for a user to be able to use this class/function in a correct manner. Not more, not less.

- **Summary**:
- **One-Liner**: What does this class/function do?
- **Brief Description**: What actually happens when I run this? What are need-to-know specifics?
- **Implementation Specifics**:
- **Parameters & Their Significance**: List all parameters the class/function accepts.
For each parameter, describe its role and why it's necessary.
- **Requirements & Limitations**: What does this parameter require?
Are there any limitations, such as text length?
Is there anything else a user must know to use this?
- **Usage Guidelines**: (Optional) Provide notes, insights or warnings about how to correctly use this class/function.
Mention any nuances, potential pitfalls, or best practices.

By maintaining clear distinctions between the three documentation streams, we ensure that both users and developers have the necessary tools and information at their disposal for efficient task execution and code modification.

## Building a new task

To make sure that we approach building new tasks in a unified way, consider this example task:
Expand Down Expand Up @@ -76,56 +217,15 @@ class ExampleTask(Task[ExampleTaskInput, ExampleTaskOutput]):
return math.exp(some_number)
```

## Documentation

Generally, adhere to this [guideline](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html).

## Documentation Guide: Jupyter Notebooks vs. Docstrings

When documenting our codebase, we focus on two primary channels: Jupyter notebooks and docstrings.
The objective is to provide both a high-level understanding and detailed implementation specifics.
Here's how we differentiate and allocate content between them:

### Jupyter Notebooks

**Purpose**: Jupyter notebooks are used to provide a comprehensive overview and walkthrough of the tasks. They are ideal for understanding the purpose, usage, and evaluation of a task.

- **High-level Overview**:
- **Problem Definition**: Describe the specific problem or challenge this task addresses.
- **Comparison**: (Optional) Highlight how this task stands out or differs from other tasks in our codebase.
- **Detailed Walkthrough**:
- **Input/Output Specifications**: Clearly define the expected input format and the resulting output.
Mention any constraints or specific requirements.
- **Debugging Insights**: Explain what information is available in the trace and how it can aid in troubleshooting.
- **Use-case Examples**: What are concrete use-cases I can solve with this task?
Run through examples.
- **Evaluation Metrics**: (Optional) Suggest methods or metrics to evaluate the performance or accuracy of this task.

### Docstrings

**Purpose**: Docstrings give a quickstart overview. They provide the necessary information for a user to be able to use this class/function in a correct manner. Not more, not less.

- **Summary**:
- **One-Liner**: What does this class/function do?
- **Brief Description**: What actually happens when I run this? What are need-to-know specifics?
- **Implementation Specifics**:
- **Parameters & Their Significance**: List all parameters the class/function accepts.
For each parameter, describe its role and why it's necessary.
- **Requirements & Limitations**: What does this parameter require?
Are there any limitations, such as text length?
Is there anything else a user must know to use this?
- **Usage Guidelines**: (Optional) Provide notes, insights or warnings about how to correctly use this class/function.
Mention any nuances, potential pitfalls, or best practices.

---

By maintaining clear distinctions between the two documentation streams, we ensure that both users and developers have the necessary tools and information at their disposal for efficient task execution and code modification.
## When to use a tracer

## Jupyter notebooks
Each task's input and output are automatically logged.
For most task, we assume that this suffices.

Notebooks shall be used in a tutorial-like manner to educate users about certain tasks, functionalities & more.
Exceptions would be complicated, task-specific implementations.
An example would the classify logprob calculation.

### When do we start a new notebook?
## When do we start a new notebook?

Documenting our LLM-based Tasks using Jupyter notebooks is crucial for clarity and ease of use.
However, we must strike a balance between consolidation and over-segmentation.
Expand All @@ -146,87 +246,3 @@ This ensures that changes to one task don't clutter or complicate the documentat

In summary, while our goal is to keep our documentation organized and avoid excessive fragmentation, we must also ensure that each notebook is comprehensive and user-friendly.
When in doubt, consider the user's perspective: Would they benefit from a consolidated guide, or would they find it easier to navigate separate, focused notebooks?

## Docstrings

### Task documentation

Document any `Task` like so:
``` python
class MyTask:
"""Start with a one-line description of the task, like this.
Follow up with a more detailed description, outlining the purpose & general functioning of the task.
Note:
What is important? Does your task require a certain type of model? Any usage recommendations?
Args:
example_arg: Any parameter provided in the '__init__' of this task.
Attributes:
EXAMPLE_CONSTANT: Any constant that may be defined within the class.
example_non_private_attribute: Any attribute defined within the '__init__' that is not private.
Example:
>>> var = "Describe here how to use this task end to end"
>>> print("End on one newline.")
End on one newline.
"""
```

Do not document the `run`` function of a class. Avoid documenting any other (private) functions.

### Input and output documentation

Document the inputs and outputs for a specific task like so:

``` python
class MyInput(BaseModel):
"""This is the input for this (suite of) task(s).
Attributes:
horse: Everybody knows what a horse is.
chunk: We know what a chunk is, but does a user?
crazy_deep_llm_example_param: Yeah, this probably deserves some explanation.
"""

# Any output should be documented in a similar manner
```

### Defaults

Certain parameters in each task are recurring. Where possible, we shall try to use certain standard documentation.

``` python
"""
client: Aleph Alpha client instance for running model related API calls.
model: A valid Aleph Alpha model name.
"""
```

### Module documentation

We **do not document the module**, as we assume imports like:

``` python
from intelligence_layer.complete import Complete
completion_task = Complete()
```

rather than:

``` python
from intelligence_layer import complete
completion_task = complete.Complete()
```

This ensures that the documentation is easily accessible by hovering over the imported task.

## When to use a tracer

Each task's input and output are automatically logged.
For most task, we assume that this suffices.

Exceptions would be complicated, task-specific implementations.
An example would the classify logprob calculation.

0 comments on commit 4ad2b8e

Please sign in to comment.