Skip to content

Latest commit

 

History

History
224 lines (137 loc) · 11 KB

CONTRIBUTING.md

File metadata and controls

224 lines (137 loc) · 11 KB

Contributing to datalad-next

When should I consider a contribution to datalad-next?

In short: whenever a contribution to the DataLad core package would make sense, it should also be suitable for datalad-next.

What contributions should be directed elsewhere?

Special interest, highly domain-specific functionality is likely better suited for a topical DataLad extension package.

Functionality that requires complex additional dependencies, or is highly platform-specific might also be better kept in a dedicated extension package.

If in doubt, it is advisable to file an issue and ask for feedback before preparing a contribution.

When is a contribution to datalad-next preferable over one to the DataLad core package?

New feature releases of datalad-next are happening more frequently. Typically, every 4-6 weeks.

New features depending on other datalad-next features are, by necessity, better directed at datalad-next.

Developer cheat sheet

Hatch is used as a convenience solution for packaging and development tasks. Hatch takes care of managing dependencies and environments, including the Python interpreter itself. If not installed yet, installing via pipx is recommended (pipx install hatch).

Below is a list of some provided convenience commands. An accurate overview of provided convenience scripts can be obtained by running: hatch env show. All command setup can be found in pyproject.toml, and given alternatively managed dependencies, all commands can also be used without hatch.

Run the tests (with coverage reporting)

hatch test [--cover]

There is also a setup for matrix test runs, covering all current Python versions:

hatch run tests:run [<select tests>]

This can also be used to run tests for a specific Python version only:

hatch run tests.py3.10:run [<select tests>]

Build the HTML documentation (under docs/_build/html)

hatch run docs:build
# clean with
hatch run docs:clean

Check type annotations

hatch run types:check

Check commit messages for compliance with Conventional Commits

hatch run cz:check-commits

Show would-be auto-generated changelog for the next release

Run this command to see whether a commit series yields a sensible changelog contribution.

hatch run cz:show-changelog

Create a new release

hatch run cz:bump-version

The new version is determined automatically from the nature of the (conventional) commits made since the last release. A changelog is generated and committed.

In cases where the generated changelog needs to be edited afterwards (typos, unnecessary complexity, etc.), the created version tag needs to be advanced.

Build a new source package and wheel

hatch build

Publish a new release to PyPi

hatch publish

Contribution style guide

A contribution must be complete with code, tests, and documentation.

datalad-next is a staging area for features, hence any code is expected to move and morph. Therefore, tests are essential. A high test-coverage is desirable. Contributors should aim for 95% coverage or better. Tests must be dedicated for the code of a particular contribution. It is not sufficient, if other code happens to also exercise a new feature.

New code should be type-annotated. At minimum, a type annotation of the main API (e.g., function signatures) is needed. A dedicated CI run is testing type annotations.

Docstrings should be complete with information on parameters, return values, and exception behavior. Documentation should be added to and rendered with the sphinx-based documentation.

Commits and commit messages must be Conventional Commits. Their compliance is checked for each pull request. The following commit types are recognized:

  • feat: introduces a new feature
  • fix: address a problem, fix a bug
  • doc: update the documentation
  • rf: refactor code with no change of functionality
  • perf: enhance performance of existing functionality
  • test: add/update/modify test implementations
  • ci: change CI setup
  • style: beautification
  • chore: results of routine tasks, such as changelog updates
  • revert: revert a previous change
  • bump: version update

Any breaking change must have at least one line of the format

BREAKING CHANGE: <summary of the breakage>

in the body of the commit message that introduces the breakage. Breaking changes can be introduced in any type of commit. Any number of breaking changes can be described in a commit message (one per line). breaking changes trigger a major version update, and form a dedicated section in the changelog.

Code organization

In datalad-next, all code is organized in shallow sub-packages. Each sub-package is located in a directory within the datalad_next package.

Consequently, there are no top-level source files other than a few exceptions for technical reasons (__init__.py, conftest.py, _version.py).

A sub-package contains any number of code files, and a tests directory with all test implementations for that particular sub-package, and only for that sub-package. Other, deeper directory hierarchies are not to be expected.

There is no limit to the number of files. Contributors should strive for files with less than 500 lines of code.

Within a sub-package, code should generally use relative imports. The corresponding tests should also import the tested code via relative imports.

Code users should be able to import the most relevant functionality from the sub-package's __init__.py. Only items importable from the sub-package's top-level are considered to be part of its "public" API. If a sub-module is imported in the sub-package's __init__.py, consider adding __all__ to the sub-module to restrict wildcard imports from the sub-module, and to document what is considered to be part of the "public" API.

Sub-packages should be as self-contained as possible. Individual components in datalad-next should strive to be easily migratable to the DataLad core package. This means that any organization principles like all-exceptions-go-into-a-single-location-in-datalad-next do not apply. For example, each sub-package should define its exceptions separately from others. When functionality is shared between sub-packages, absolute imports should be made.

There is one special sub-package in datalad-next: patches. All runtime patches to be applied to the DataLad core package must be placed here.

Runtime patches

The patches sub-package contains all runtime patches that are applied by datalad-next. Patches are applied on-import of datalad-next, and may modify arbitrary aspects of the runtime environment. A patch is enabled by adding a corresponding import statement to datalad_next/patches/enabled.py. The order of imports in this file is significant. New patches should consider behavior changes caused by other patches, and should be considerate of changes imposed on other patches.

datalad-next is imported (and thereby its patches applied) whenever used directly (e.g., when running commands provided by datalad-next, or by an extension that uses datalad-next). In addition, it is imported by the DataLad core package itself when the configuration item datalad.extensions.load=next is set.

Patches modify an external implementation that is itself subject to change. To improve the validity and longevity of patches, it is helpful to consider a few guidelines:

  • Patches should use datalad_next.patches.apply_patch() to perform the patching, in order to yield uniform (logging) behavior

  • Patches should be as self-contained as possible. The aim is for patches to be merged upstream (at the patched entity) as quickly as possible. Self-contained patches facilitate this process.

  • Patches should maximally limit their imports from sources that are not the patch target. The helps to detect when changes to the patch target (or its environment) are made, and also helps to isolate the patch from changes in the general environment of the patches software package that are unrelated to the specific patched code.

Imports

Import centralization per sub-package

If possible, sub-packages should have a "central" place for imports of functionality from outside datalad-next and the Python standard library. Other sub-package code should then import from this place via relative imports. This aims to make external dependencies more obvious, and import-error handling and mitigation for missing dependencies simpler and cleaner. Such a location could be the sub-package's __init__.py, or possibly a dedicated dependencies.py.

No "direct" imports from datalad

This is a specialization of the "Import centralization" rule above. All sub-package code should import from datalad into a single dedicated location inside the sub-package. All other sub-package code imports from this location.

The aim is to clearly see what of the huge DataLad API is actually relevant for a particular feature. For some generic helpers it may be best to import them to datalad_next.utils or datalad_next.tests.utils.

Prohibited DataLad core features

The following components of the datalad package must not be used (directly) in contributions to datalad-next, because they have been replace by a different solution with the aim to phase them out.

require_dataset()

Commands must use datalad_next.constraints.EnsureDataset instead.

nose-style decorators in test implementations

The use of decorators like with_tempfile is not allowed. pytest fixtures have to be used instead. A temporary exception may be the helpers that are imported in datalad_next.tests.utils. However, these will be reduced and removed over time, and additional usage only adds to the necessary refactoring effort. Therefore new usage is highly discouraged.

nose-style assertion helpers in test implementations

The use of helpers like assert_equal is not allowed. pytest constructs have to be used instead -- this typically means plain assert statements. A temporary exception may be the helpers that are imported in datalad_next.tests.utils. However, these will be reduced and removed over time, and additional usage only adds to the necessary refactoring effort. Therefore new usage is highly discouraged.

Test output

Tests should be silent on stdout/stderr as much as possible. In particular, result renderings of DataLad commands must no be produced, unless necessary for testing a particular feature. A no_result_rendering fixture can be used to turn it off, without adding complexity to test implementations.