refactor: update README with Contribution Guidelines (#1062)

* fix: update readme * fix: update readme * refactor: style_guide.md * refactor: update readme * refactor: style_guide.md * refactor: remove Note from docstring of a task * refactor: readme * refactor: readme
Aleph-Alpha · Oct 8, 2024 · 4ad2b8e · 4ad2b8e
1 parent 40ef3c1
commit 4ad2b8e
Show file tree

Hide file tree

Showing 2 changed files with 176 additions and 141 deletions.
diff --git a/README.md b/README.md
@@ -29,7 +29,7 @@ The key features of the Intelligence Layer are:
 - [References](#references)
 - [License](#license)
 - [For Developers](#for-developers)
-  - [Python: Naming Conventions](#python-naming-conventions)
+  - [How to contribute](#how-to-contribute)
   - [Executing tests](#executing-tests)
 
 # Installation
@@ -212,17 +212,36 @@ This project can only be used after signing the agreement with Aleph Alpha®. Pl
 
 # For Developers
 
-## Python: Naming Conventions
+For further information check out our different guides and documentations:
+- [Concepts.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/Concepts.md) for an overview of what Intelligence Layer is and how it works.
+- [style_guide.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md) on how we write and document code.
+- [RELEASE.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/RELEASE.md) for the release process of IL.
+- [CHANGELOG.md](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/CHANGELOG.md) for the latest changes.
+
+## How to contribute
+:warning: **Warning:** This repository is open-source. Any contributions and MR discussions will be publicly accessible.
+
+
+1. Share the details of your problem with us.
+2. Write your code according to our [style guide](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md).
+3. Add doc strings to your code as described [here](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md#docstrings).
+4. Write tests for new features ([Executing Tests](#executing-tests)).
+5. Add an how_to and/or notebook as a documentation (check out [this](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/style_guide.md#documentation) for guidance).
+6. Update the [Changelog](https://github.com/Aleph-Alpha/intelligence-layer-sdk/blob/main/CHANGELOG.md) with your changes.
+7. Request a review for the MR, so that it can be merged.
+
 
-We follow the [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/).
-In addition, there are the following naming conventions:
-* Class method names:
-  * Use only substantives for a method name having no side effects and returning some objects
-    * E.g., `evaluation_overview` which returns an evaluation overview object
-  * Use a verb for a method name if it has side effects and return nothing
-    * E.g., `store_evaluation_overview` which saves a given evaluation overview (and returns nothing)
 
 ## Executing tests
+If you want to execute all tests, you first need to spin up your docker container and execute the commands with your own `GITLAB_TOKEN`.
+
+```bash
+  export GITLAB_TOKEN=...
+  echo $GITLAB_TOKEN | docker login registry.gitlab.aleph-alpha.de -u your_email@for_gitlab --password-stdin
+  docker compose pull to update containers
+```
+
+ Afterwards simply run `docker compose up --build`. You can then either run the tests in your IDE or via the terminal.
 
 **In VSCode**
 1. Sidebar > Testing
@@ -232,7 +251,7 @@ In addition, there are the following naming conventions:
 You can then run the tests from the sidebar.
 
 **In a terminal**
-In order to run a local proxy w.r.t. to the CI pipeline (required to merge) you can run
+In order to run a local proxy of the CI pipeline (required to merge) you can run
 > scripts/all.sh
 
 This will run linters and all tests.

diff --git a/style_guide.md b/style_guide.md
@@ -3,6 +3,147 @@
 Welcome to the project's style guide, a foundational document that ensures consistency, clarity, and quality in our collaborative efforts.
 As we work together, adhering to the guidelines outlined here will streamline our process, making our code more readable and maintainable for all team members.
 
+## Folder Structure
+The source directory is organized into four distinct folders, each with a specific responsibility.
+
+| **Folder**   | **Description**                                                                 |
+|----------------|---------------------------------------------------------------------------------|
+| Core           | The main components of the IL. This includes the `Task` abstraction, the `Tracer` and basic components like the `models`. |
+| Evaluation     | Includes all resources related to task evaluation.                               |
+| Connectors     | Provides tools to connect with third-party applications within the IL.    |
+| Examples       | Showcases various task implementations to address different use cases using the IL.                  |
+
+## Python: Naming Conventions
+
+We follow the [PEP 8 – Style Guide for Python Code](https://peps.python.org/pep-0008/).
+In addition, there are the following naming conventions:
+* Class method names:
+  * Use only substantives for a method name having no side effects and returning some objects
+    * E.g., `evaluation_overview` which returns an evaluation overview object
+  * Use a verb for a method name if it has side effects and return nothing
+    * E.g., `store_evaluation_overview` which saves a given evaluation overview (and returns nothing)
+
+
+## Docstrings
+
+### Task documentation
+
+Document any `Task` like so:
+``` python
+class MyTask:
+    """Start with a one-line description of the task, like this.
+
+    Follow up with a more detailed description, outlining the purpose & general functioning of the task.
+
+    Attributes:
+        EXAMPLE_CONSTANT: Any constant that may be defined within the class.
+        example_non_private_attribute: Any attribute defined within the '__init__' that is not private.
+
+    Example:
+        >>> var = "Describe here how to use this task end to end"
+        >>> print("End on one newline.")
+        End on one newline.
+    """
+```
+The Example documentation is optional but preferred to be included in a how-to guide if it is helpful in this case. 
+
+Do not document the `run` function of a class. Avoid documenting any other (private) functions.
+
+### Input and output documentation
+
+Document the inputs and outputs for a specific task like so:
+
+``` python
+class MyInput(BaseModel):
+    """This is the input for this (suite of) task(s).
+
+    Attributes:
+        horse: Everybody knows what a horse is.
+        chunk: We know what a chunk is, but does a user?
+        crazy_deep_llm_example_param: Yeah, this probably deserves some explanation.
+    """
+
+# Any output should be documented in a similar manner
+```
+
+### Defaults
+
+Certain parameters in each task are recurring. Where possible, we shall try to use certain standard documentation.
+
+``` python
+"""
+client: Aleph Alpha client instance for running model related API calls.
+model: A valid Aleph Alpha model name.
+"""
+```
+
+### Module documentation
+
+We **do not document the module**, as we assume imports like:
+
+``` python
+from intelligence_layer.complete import Complete
+completion_task = Complete()
+```
+
+rather than:
+
+``` python
+from intelligence_layer import complete
+completion_task = complete.Complete()
+```
+
+This ensures that the documentation is easily accessible by hovering over the imported task.
+
+Generally, adhere to this [guideline](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html).
+
+## Documentation Guide: Jupyter Notebooks vs. How-tos vs. Docstrings
+
+When documenting our codebase, we focus on three primary channels: Jupyter notebooks, How-tos and docstrings.
+The objective is to provide both a high-level understanding and detailed implementation specifics.
+Here's how we differentiate and allocate content between them:
+
+### Jupyter Notebooks
+
+**Purpose**: Jupyter notebooks are used to provide a comprehensive overview and walkthrough of the tasks. They are ideal for understanding the purpose, usage, and evaluation of a task. (look [here](#when-do-we-start-a-new-notebook) to decide whether to create a new notebook or expand an existing one)
+
+- **High-level Overview**:
+    - **Problem Definition**: Describe the specific problem or challenge this task addresses.
+    - **Comparison**: (Optional) Highlight how this task stands out or differs from other tasks in our codebase.
+- **Detailed Walkthrough**:
+    - **Input/Output Specifications**: Clearly define the expected input format and the resulting output.
+    Mention any constraints or specific requirements.
+    - **Debugging Insights**: Explain what information is available in the trace and how it can aid in troubleshooting.
+    - **Use-case Examples**: What are concrete use-cases I can solve with this task?
+    Run through examples.
+    - **Evaluation Metrics**: (Optional) Suggest methods or metrics to evaluate the performance or accuracy of this task.
+
+### How-tos
+
+**Purpose**: How-tos are a short and concise way of understanding a very specific concept, which they explain in much detail in a step-by-step guide. 
+
+- **Table of Content**: Which steps are covered in the how-to?
+- **Detailed Walkthrough**: Guide the user step by step. Keep it short and concise.
+
+
+### Docstrings
+
+**Purpose**: Docstrings give a quickstart overview. They provide the necessary information for a user to be able to use this class/function in a correct manner. Not more, not less.
+
+- **Summary**:
+    - **One-Liner**: What does this class/function do?
+    - **Brief Description**: What actually happens when I run this? What are need-to-know specifics?
+- **Implementation Specifics**:
+    - **Parameters & Their Significance**: List all parameters the class/function accepts.
+    For each parameter, describe its role and why it's necessary.
+    - **Requirements & Limitations**: What does this parameter require?
+    Are there any limitations, such as text length?
+    Is there anything else a user must know to use this?
+    - **Usage Guidelines**: (Optional) Provide notes, insights or warnings about how to correctly use this class/function.
+    Mention any nuances, potential pitfalls, or best practices.
+
+By maintaining clear distinctions between the three documentation streams, we ensure that both users and developers have the necessary tools and information at their disposal for efficient task execution and code modification.
+
 ## Building a new task
 
 To make sure that we approach building new tasks in a unified way, consider this example task:
@@ -76,56 +217,15 @@ class ExampleTask(Task[ExampleTaskInput, ExampleTaskOutput]):
         return math.exp(some_number)
 ```
 
-## Documentation
-
-Generally, adhere to this [guideline](https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html).
-
-## Documentation Guide: Jupyter Notebooks vs. Docstrings
-
-When documenting our codebase, we focus on two primary channels: Jupyter notebooks and docstrings.
-The objective is to provide both a high-level understanding and detailed implementation specifics.
-Here's how we differentiate and allocate content between them:
-
-### Jupyter Notebooks
-
-**Purpose**: Jupyter notebooks are used to provide a comprehensive overview and walkthrough of the tasks. They are ideal for understanding the purpose, usage, and evaluation of a task.
-
-- **High-level Overview**:
-    - **Problem Definition**: Describe the specific problem or challenge this task addresses.
-    - **Comparison**: (Optional) Highlight how this task stands out or differs from other tasks in our codebase.
-- **Detailed Walkthrough**:
-    - **Input/Output Specifications**: Clearly define the expected input format and the resulting output.
-    Mention any constraints or specific requirements.
-    - **Debugging Insights**: Explain what information is available in the trace and how it can aid in troubleshooting.
-    - **Use-case Examples**: What are concrete use-cases I can solve with this task?
-    Run through examples.
-    - **Evaluation Metrics**: (Optional) Suggest methods or metrics to evaluate the performance or accuracy of this task.
-
-### Docstrings
-
-**Purpose**: Docstrings give a quickstart overview. They provide the necessary information for a user to be able to use this class/function in a correct manner. Not more, not less.
-
-- **Summary**:
-    - **One-Liner**: What does this class/function do?
-    - **Brief Description**: What actually happens when I run this? What are need-to-know specifics?
-- **Implementation Specifics**:
-    - **Parameters & Their Significance**: List all parameters the class/function accepts.
-    For each parameter, describe its role and why it's necessary.
-    - **Requirements & Limitations**: What does this parameter require?
-    Are there any limitations, such as text length?
-    Is there anything else a user must know to use this?
-    - **Usage Guidelines**: (Optional) Provide notes, insights or warnings about how to correctly use this class/function.
-    Mention any nuances, potential pitfalls, or best practices.
-
----
-
-By maintaining clear distinctions between the two documentation streams, we ensure that both users and developers have the necessary tools and information at their disposal for efficient task execution and code modification.
+## When to use a tracer
 
-## Jupyter notebooks
+Each task's input and output are automatically logged.
+For most task, we assume that this suffices.
 
-Notebooks shall be used in a tutorial-like manner to educate users about certain tasks, functionalities & more.
+Exceptions would be complicated, task-specific implementations.
+An example would the classify logprob calculation.
 
-### When do we start a new notebook?
+## When do we start a new notebook?
 
 Documenting our LLM-based Tasks using Jupyter notebooks is crucial for clarity and ease of use.
 However, we must strike a balance between consolidation and over-segmentation.
@@ -146,87 +246,3 @@ This ensures that changes to one task don't clutter or complicate the documentat
 
 In summary, while our goal is to keep our documentation organized and avoid excessive fragmentation, we must also ensure that each notebook is comprehensive and user-friendly.
 When in doubt, consider the user's perspective: Would they benefit from a consolidated guide, or would they find it easier to navigate separate, focused notebooks?
-
-## Docstrings
-
-### Task documentation
-
-Document any `Task` like so:
-``` python
-class MyTask:
-    """Start with a one-line description of the task, like this.
-
-    Follow up with a more detailed description, outlining the purpose & general functioning of the task.
-
-    Note:
-        What is important? Does your task require a certain type of model? Any usage recommendations?
-
-    Args:
-        example_arg: Any parameter provided in the '__init__' of this task.
-
-    Attributes:
-        EXAMPLE_CONSTANT: Any constant that may be defined within the class.
-        example_non_private_attribute: Any attribute defined within the '__init__' that is not private.
-
-    Example:
-        >>> var = "Describe here how to use this task end to end"
-        >>> print("End on one newline.")
-        End on one newline.
-    """
-```
-
-Do not document the `run`` function of a class. Avoid documenting any other (private) functions.
-
-### Input and output documentation
-
-Document the inputs and outputs for a specific task like so:
-
-``` python
-class MyInput(BaseModel):
-    """This is the input for this (suite of) task(s).
-
-    Attributes:
-        horse: Everybody knows what a horse is.
-        chunk: We know what a chunk is, but does a user?
-        crazy_deep_llm_example_param: Yeah, this probably deserves some explanation.
-    """
-
-# Any output should be documented in a similar manner
-```
-
-### Defaults
-
-Certain parameters in each task are recurring. Where possible, we shall try to use certain standard documentation.
-
-``` python
-"""
-client: Aleph Alpha client instance for running model related API calls.
-model: A valid Aleph Alpha model name.
-"""
-```
-
-### Module documentation
-
-We **do not document the module**, as we assume imports like:
-
-``` python
-from intelligence_layer.complete import Complete
-completion_task = Complete()
-```
-
-rather than:
-
-``` python
-from intelligence_layer import complete
-completion_task = complete.Complete()
-```
-
-This ensures that the documentation is easily accessible by hovering over the imported task.
-
-## When to use a tracer
-
-Each task's input and output are automatically logged.
-For most task, we assume that this suffices.
-
-Exceptions would be complicated, task-specific implementations.
-An example would the classify logprob calculation.