docs: Better documentation for developers (#427)

Closes #375. ### Summary of Changes * Improved developer documentation about tests * Added guidelines about copying objects (rather than modifying them in-place) * Added code-style guidelines * Added code review guidelines --------- Co-authored-by: Lars Reimann <[email protected]>
Safe-DS · Jul 10, 2023 · 2c54ec0 · 2c54ec0
1 parent bc73a6c
commit 2c54ec0
Show file tree

Hide file tree

Showing 3 changed files with 220 additions and 24 deletions.
diff --git a/docs/development/code_review.md b/docs/development/code_review.md
@@ -0,0 +1,72 @@
+# Code review
+
+This document describes
+general guidelines for developers
+when reviewing a pull request (PR).
+
+## Verify that the PR solves the addressed issue
+
+### Understand the issue
+
+* Read the issue that was addressed by the PR and make sure you understand it
+* Read any discussions that occurred on the issue page
+
+### Check the PR
+
+* Check the PR for __completeness__: Was every point from the issue covered?
+* Check for __correctness__: Were the problems described in the issue solved in the intended way?
+* (For PRs that introduce new features:) Check the design - does it comply with our [project guidelines][guidelines-general]?
+* Check for potential bugs - edge cases, exceptions that may be raised in functions that were called etc.
+* Check the code style: Is it readable? Do variables have sensible names? Is it consistent to existing code? Is there a better / more elegant way to do certain things (e.g. [f-string](https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings) instead of manual string concatenation)?
+* Check any warnings reported by your IDE
+
+## Check the tests
+
+* Run the tests locally
+* Check if there are any warnings that are not caught by the test suite
+* Make sure the test cases are complete - do they cover all edge cases (e.g. empty tables)?
+* Make sure they comply to our [project guidelines][guidelines-tests] - are the tests parametrized and do all testcases have descriptive IDs?
+
+## Verify that the PR does not break existing code
+
+This is largely covered by the automated tests.
+However, you should always:
+
+* Make sure all tests actually ran through (pytest, linter, code coverage)
+* Make sure that the branch is up-to-date with the `main` branch, so you actually test the behaviour that will result once the feature branch is merged
+
+## Check the PR format
+
+* Check that the PR title starts with a fitting [type](https://github.com/Safe-DS/.github/blob/main/.github/CONTRIBUTING.md#types) (e.g. `feat`, `fix`, `docs`, ...)
+* Check that the changes introduced by the PR are documented in the PR message
+
+## Requesting changes
+
+If you found any issues with the reviewed PR,
+navigate to the `Files changed` tab in the PR page
+and click on the respective line
+to add your comments.
+
+For more details on specific review features,
+check out the [documentation on GitHub][github-review].
+
+## Finishing the review
+
+When done, finish your review
+by navigating to the `Files changed` tab
+and clicking the green `Review changes` button
+on the top right.
+
+If you found no issues,
+select the `Approve` option
+to approve the changes made by the PR.
+If you like, you can add a "LGTM" meme
+from [lgtm.party](https://lgtm.party/),
+preferably one with a cat image.
+
+If you found any problems,
+select `Request Changes` instead.
+
+[guidelines-general]: project_guidelines.md
+[guidelines-tests]: project_guidelines.md#tests
+[github-review]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests
diff --git a/docs/development/guidelines.md → docs/development/project_guidelines.md b/docs/development/guidelines.md → docs/development/project_guidelines.md
@@ -1,4 +1,4 @@
-# Guidelines
+# Project Guidelines
 
 This document describes general guidelines for the Safe-DS Python Library. In the **DO**/**DON'T** examples below we either show _client code_ to describe the code users should/shouldn't have to write, or _library code_ to describe the code we, as library developers, need to write to achieve readable client code. We'll continuously update this document as we find new categories of usability issues.
 
@@ -56,6 +56,43 @@ Some flag parameters drastically alter the semantics of a function. This can lea
     table.drop("name", axis="columns")
     ```
 
+### Return copies of objects
+
+Modifying objects in-place
+can lead to surprising behaviour
+and hard-to-find bugs.
+Methods shall never change
+the object they're called on
+or any of their parameters.
+
+!!! success "**DO** (library code):"
+
+    ```py
+        result = self._data.copy()
+        result.columns = self._schema.column_names
+        result[new_column.name] = new_column._data
+        return Table._from_pandas_dataframe(result)
+    ```
+
+!!! failure "**DON'T** (library code):"
+
+    ```py
+        self._data.add(new_column, axis='column')
+    ```
+
+The corresponding docstring should explicitly state
+that a method returns a copy:
+
+!!! success "**DO** (library code):"
+
+    ```py
+    """
+    Return a new table with the given column added as the last column.
+    The original table is not modified.
+    ...
+    """
+    ```
+
 ### Avoid uncommon abbreviations
 
 Write full words rather than abbreviations. The increased verbosity is offset by better readability, better functioning auto-completion, and a reduced need to consult the documentation when writing code. Common abbreviations like CSV or HTML are fine though, since they rarely require explanation.
@@ -205,27 +242,6 @@ The user should not have to deal with exceptions that are defined in the wrapper
         return pd.read_csv(path) # May raise a pd.ParserError
     ```
 
-### Sort entries in `__all__` lists alphabetically
-The entries in the `__all__` list in `__init__.py` files should be sorted alphabetically. This helps reduce the likelihood of merge conflicts when new entries are introduced on different branches.
-!!! success "**DO** (library code):"
-
-    ```py
-    __all__ = [
-        "ColumnSizeError",
-        "DuplicateColumnNameError",
-        "MissingValuesColumnError"
-    ]
-    ```
-!!! failure "**DON'T** (library code):"
-
-    ```py
-    __all__ = [
-        "MissingValuesColumnError",
-        "ColumnSizeError",
-        "DuplicateColumnNameError"
-    ]
-    ```
-
 ### Group API elements by task
 
 Packages should correspond to a specific task like classification or imputation. This eases discovery and makes it easy to switch between different solutions for the same task.
@@ -370,7 +386,114 @@ Examples
 
 ## Tests
 
-If a function contains more code than just the getting or setting of a value, automated test should be added to the [`tests`][tests-folder] folder. The file structure in the tests folder should mirror the file structure of the [`src`][src-folder] folder.
+We aim for 100% line coverage,
+so automated tests should be added
+for any new function.
+
+### File structure
+
+Tests belong in the [`tests`][tests-folder] folder.
+The file structure in the tests folder
+should mirror the file structure
+of the [`src`][src-folder] folder.
+
+### Naming
+
+Names of test functions
+shall start with `test_should_`
+followed by a description
+of the expected behaviour,
+e.g. `test_should_add_column`.
+
+!!! success "**DO** (library code):"
+
+    ```py
+    def test_should_raise_if_less_than_or_equal_to_0(self, number_of_trees) -> None:
+        with pytest.raises(ValueError, match="The parameter 'number_of_trees' has to be greater than 0."):
+            ...
+    ```
+
+!!! failure "**DON'T** (library code):"
+
+    ```py
+    def test_value_error(self, number_of_trees) -> None:
+        with pytest.raises(ValueError, match="The parameter 'number_of_trees' has to be greater than 0."):
+            ...
+    ```
+
+### Parametrization
+
+Tests should be parametrized
+using `@pytest.mark.parametrize`,
+even if there is only a single test case.
+This makes it easier
+to add new test cases in the future.
+Test cases should be given
+descriptive IDs.
+
+!!! success "**DO** (library code):"
+
+    ```py
+    @pytest.mark.parametrize("number_of_trees", [0, -1], ids=["zero", "negative"])
+    def test_should_raise_if_less_than_or_equal_to_0(self, number_of_trees) -> None:
+        with pytest.raises(ValueError, match="The parameter 'number_of_trees' has to be greater than 0."):
+            RandomForest(number_of_trees=number_of_trees)
+    ```
+
+!!! failure "**DON'T** (library code):"
+
+    ```py
+    def test_should_raise_if_less_than_0(self, number_of_trees) -> None:
+        with pytest.raises(ValueError, match="The parameter 'number_of_trees' has to be greater than 0."):
+            RandomForest(number_of_trees=-1)
+
+    def test_should_raise_if_equal_to_0(self, number_of_trees) -> None:
+        with pytest.raises(ValueError, match="The parameter 'number_of_trees' has to be greater than 0."):
+            RandomForest(number_of_trees=0)
+    ```
+
+## Code style
+
+### Consistency
+
+If there is more than one way
+to solve a particular task,
+check how it has been solved
+at other places in the codebase
+and stick to that solution.
+
+### Sort exported classes in `__init__.py`
+
+Classes defined in a module
+that other classes shall be able to import
+must be defined
+in a list named `__all__`
+in the module's `__init__.py` file.
+This list should be sorted alphabetically,
+to reduce the likelihood of merge conflicts
+when adding new classes to it.
+
+!!! success "**DO** (library code):"
+
+    ```py
+    __all__ = [
+        "Column",
+        "Row",
+        "Table",
+        "TaggedTable",
+    ]
+    ```
+
+!!! failure "**DON'T** (library code):"
+
+    ```py
+    __all__ = [
+        "Table",
+        "TaggedTable",
+        "Column",
+        "Row",
+    ]
+    ```
 
 [src-folder]: https://github.com/Safe-DS/Stdlib/tree/main/src
 [tests-folder]: https://github.com/Safe-DS/Stdlib/tree/main/tests

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -15,8 +15,9 @@ nav:
   - Glossary: glossary.md
   - Development:
     - Environment: development/environment.md
-    - Guidelines: development/guidelines.md
+    - Project Guidelines: development/project_guidelines.md
     - Contributing: https://github.com/Safe-DS/Stdlib/contribute
+    - Code Review Guide: development/code_review.md
 
 # Configuration of MkDocs & Material for MkDocs --------------------------------