diff --git a/guide/index.html b/guide/index.html index 7a675df..dcb1437 100644 --- a/guide/index.html +++ b/guide/index.html @@ -655,6 +655,7 @@

Project directory structure

│ ├── interim <- Intermediate data that has been transformed. │ ├── processed <- The final, canonical data sets for modeling. │ └── raw <- The original, immutable data dump. +├── Dockerfile <- Dockerfile definition. ├── docs <- The mkdocs documentation sources. │ ├── api_ref <- Source package docs. │ │ ├── consts.md @@ -669,7 +670,8 @@

Project directory structure

│ │ └── tests.md │ ├── index.md <- Docs homepage. │ └── __init__.py -├── env.yaml <- Conda environment definition. +├── env-dev.yaml <- Conda environment definition with development dependencies. +├── env.yaml <- Main Conda environment definition with only the necessary packages. ├── LICENSE <- The license file. ├── Makefile <- Makefile with commands like `make docs` or │ `make pc`. @@ -705,18 +707,13 @@

Project directory structure

│ and information, which are used by pip to build │ the package and project tooling configs. ├── README.md +├── setup.py └── tests <- The tests directory. ├── conftest.py <- Contains test fixtures and utility functions. ├── e2e <- Contains end-to-end tests. - │ ├── __init__.py - │ └── test_dummy.py ├── __init__.py ├── integration <- Contains integration tests. - │ ├── __init__.py - │ └── test_dummy.py └── unit <- Contains unit tests. - ├── __init__.py - └── test_dummy.py

Most of those folders were described in detail in the Cookiecutter Data Science Docs.

diff --git a/search/search_index.json b/search/search_index.json index e18765d..3847b50 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"ml-project-cookiecutter","text":"

A cookiecutter template for my private ML projects.

"},{"location":"#motivation","title":"Motivation","text":"

During my career I worked in a lot of different ML projects - computer vision, NLP, classical ML, time series forecasting and others. The projects ranged from pure R&D, through PoCs and production ready stuff. Whenever I would start a new project I found myself copying things from a bunch of different sources and my old projects again and again - recreating and duplicating the work I did a dozen times before. Cookiecutter project templates to the resque!

The usage of technologies and certain patterns in the template is highly opinionated and is dictated by years of experience of working with Data Scientists and R&D Engineers. As ML Engineer I would often find myself working with low quality code, written by others in notebooks or scripts, without any form of documentation, standardized coding style or even a way to reproduce the environment or analysis results. Moving that to production? Good luck!

In my opinion the fastest way to move ML stuff to production is to force the Data Scientists to write quality code from the start. Want to add your changes to the repo? Sure, once all pre-commit hooks are green you'll be able to commit your changes. Add to that a CI pipeline, automated tests and PR review process, and you'll have easier way to ensure that the code and models are production ready faster.

Won't that slow down Data Scientists? Yes. At first at least. They'll have to learn working with a set of standard python tools that are known in the industry for years. Spending a few hours on this is way better than spending a few weeks on productionizing the code later. Your ML/MLOps Engineers will thank you for this.

Note

Now, standardized code style, type hints and good documentation are just a small step to success. All of this doesn't mean much without code understanding and following good coding practices. In my opinion every great Data Scientist or ML Engineer should also be a great programmer. Learn how to write clean, testable code. Learn data structures, algorithms and design patterns. Have a CI in place. Verify changes via PRs and automated tests. Automate as much as you can. Integrate with other services that will allow you to ensure reproducibility, scaling, experiment tracing, artifact versioning and easier deployment.

This project was greatly inspired by Cookiecutter Data Science project.

"},{"location":"#features","title":"Features","text":""},{"location":"#getting-started","title":"Getting started","text":"

To get started, please check out this guide.

"},{"location":"#contributing","title":"Contributing","text":"

Please refer to this guide.

"},{"location":"#running-tests","title":"Running tests","text":"

To run the unit tests execute:

pytest tests\n
"},{"location":"contributing/","title":"Contributing","text":"

Project structure and tool usage is highly opinionated within this project. As the times change, so do the best practices. I will try to keep the project up to date with the latest tools and practices.

The goal of this project is to make it easier to start, structure, reproduce, maintain and later deploy an ML project. The stuff in it is based on my own experiences and might not suit your needs. If your think something should be done in a different way feel free to create an issue or fork the repo for your own usage. It's MIT license, so you can do whatever the hell you want with it.

Pull requests and filing issues is welcome. I'd love to hear what works for you, and what doesn't. Although I cannot promise not closing them if I disagree with you.

"},{"location":"guide/","title":"Getting started","text":""},{"location":"guide/#requirements","title":"Requirements","text":"
conda install -c conda-forge cookiecutter\n
conda install -c conda-forge conda-lock -n base\n
"},{"location":"guide/#creating-new-project","title":"Creating new project","text":"

Run:

cookiecutter https://github.com/xultaeculcis/ml-project-cookiecutter\n

You will be prompted to provide project info one argument at a time:

project_name [project_name]: My ML project\nrepo_name [my-ml-project]:\nsrc_dir_name [my_ml_project]:\nauthor_name [Your name (or your organization/company/team)]: xultaeculcsis\nrepo_url [https://github.com/xultaeculcsis/my-ml-project]:\nproject_description [A short description of the project]: Just an ML project :)\nSelect license:\n1 - MIT\n2 - Apache 2.0\n3 - BSD-3-Clause\n4 - Beerware\n5 - GLWTS\n6 - Proprietary\n7 - Empty license file\nChoose from 1, 2, 3, 4, 5, 6, 7 [1]: 1\n

The repo_name, src_dir_name and repo_url will be automatically standardized and provided for you. You can change them to your liking though.

"},{"location":"guide/#working-with-the-project","title":"Working with the project","text":""},{"location":"guide/#project-directory-structure","title":"Project directory structure","text":"

The resulting project structure will look like this:

my-ml-project/\n\u251c\u2500\u2500 data\n\u2502   \u251c\u2500\u2500 analysis                          <- EDA artifacts.\n\u2502   \u251c\u2500\u2500 auxiliary                         <- The auxiliary, third party data.\n\u2502   \u251c\u2500\u2500 inference                         <- Inference results from your models.\n\u2502   \u251c\u2500\u2500 interim                           <- Intermediate data that has been transformed.\n\u2502   \u251c\u2500\u2500 processed                         <- The final, canonical data sets for modeling.\n\u2502   \u2514\u2500\u2500 raw                               <- The original, immutable data dump.\n\u251c\u2500\u2500 docs                                  <- The mkdocs documentation sources.\n\u2502   \u251c\u2500\u2500 api_ref                           <- Source package docs.\n\u2502   \u2502   \u251c\u2500\u2500 consts.md\n\u2502   \u2502   \u251c\u2500\u2500 core\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 configs.md\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 settings.md\n\u2502   \u2502   \u2514\u2500\u2500 utils.md\n\u2502   \u251c\u2500\u2500 guides                            <- How-to guides.\n\u2502   \u2502   \u251c\u2500\u2500 contributing.md\n\u2502   \u2502   \u251c\u2500\u2500 makefile-usage.md\n\u2502   \u2502   \u251c\u2500\u2500 setup-dev-env.md\n\u2502   \u2502   \u2514\u2500\u2500 tests.md\n\u2502   \u251c\u2500\u2500 index.md                          <- Docs homepage.\n\u2502   \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 env.yaml                              <- Conda environment definition.\n\u251c\u2500\u2500 LICENSE                               <- The license file.\n\u251c\u2500\u2500 Makefile                              <- Makefile with commands like `make docs` or\n\u2502                                            `make pc`.\n\u251c\u2500\u2500 mkdocs.yml\n\u251c\u2500\u2500 my_ml_project                         <- Project source code. This will be different\n\u2502   \u2502                                        depending on your input during project creation.\n\u2502   \u251c\u2500\u2500 consts                            <- Constants to be used across the project.\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u251c\u2500\u2500 directories.py\n\u2502   \u2502   \u251c\u2500\u2500 logging.py\n\u2502   \u2502   \u2514\u2500\u2500 reproducibility.py\n\u2502   \u251c\u2500\u2500 core                              <- Core project stuff. E.g., the base classes\n\u2502   \u2502   \u2502                                    for step entrypoint configs.\n\u2502   \u2502   \u251c\u2500\u2500 configs\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 __init__.py\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 argument_parsing.py\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 base.py\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u2514\u2500\u2500 settings.py\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 py.typed\n\u2502   \u2514\u2500\u2500 utils                             <- Utility functions and classes.\n\u2502       \u251c\u2500\u2500 __init__.py\n\u2502       \u251c\u2500\u2500 gpu.py\n\u2502       \u251c\u2500\u2500 logging.py\n\u2502       \u251c\u2500\u2500 mlflow.py\n\u2502       \u2514\u2500\u2500 serialization.py\n\u251c\u2500\u2500 notebooks                             <- Jupyter notebooks. Naming convention is a\n\u2502                                            number (for ordering), the creator's initials,\n\u2502                                            and a short `-` delimited description, e.g.\n\u2502                                            `1.0-jqp-initial-data-exploration`.\n\u251c\u2500\u2500 pyproject.toml                        <- Contains build system requirements\n\u2502                                            and information, which are used by pip to build\n\u2502                                            the package and project tooling configs.\n\u251c\u2500\u2500 README.md\n\u2514\u2500\u2500 tests                                 <- The tests directory.\n    \u251c\u2500\u2500 conftest.py                       <- Contains test fixtures and utility functions.\n    \u251c\u2500\u2500 e2e                               <- Contains end-to-end tests.\n    \u2502   \u251c\u2500\u2500 __init__.py\n    \u2502   \u2514\u2500\u2500 test_dummy.py\n    \u251c\u2500\u2500 __init__.py\n    \u251c\u2500\u2500 integration                       <- Contains integration tests.\n    \u2502   \u251c\u2500\u2500 __init__.py\n    \u2502   \u2514\u2500\u2500 test_dummy.py\n    \u2514\u2500\u2500 unit                              <- Contains unit tests.\n        \u251c\u2500\u2500 __init__.py\n        \u2514\u2500\u2500 test_dummy.py\n

Most of those folders were described in detail in the Cookiecutter Data Science Docs.

"},{"location":"guide/#environment-setup","title":"Environment setup","text":"

You'll need to inti a git repo in your newly created project:

make git-init\n

Or:

git init\ngit add .\n
"},{"location":"guide/#via-makefile","title":"Via Makefile","text":"

Right after creating new project from the cookiecutter template you'll need to freeze the dependencies. Initial conda env.yaml has a minimal set of dependencies needed for the helper functions, test execution and docs creation. Note that most of the conda dependencies are not pinned in the env.yaml. This is done on purpose in order to ensure that new projects can be created with the most up-to-date packages. Once you create the lock file, you can pin specific versions.

By default, the Makefile only supports the linux-64 platform. If your team works on multiple platforms you can add those platforms to the conda-lock command yourself.

To lock the environment run:

make lock-file\n

After creating the lock file you can create the conda environment by running:

make env\n

This command will set up the environment for you. It will also install pre-commit hooks and the project in an editable mode. Once done, you can activate the environment by running:

conda activate <env-name>\n

By default, the <env-name> created using the Makefile will be equal to cookiecutter.repo_name variable.

Note

If you are on Windows, the make command will be unavailable. We recommend working with WSL in that case.

For example for linux-64 the full list of commands (using Makefile) would look like so:

make git-init\nmake lock-file\nmake env\nconda activate <env-name>\n
Note

If you want to initialize Git repository, create lock-file and development environment in one go you can run:

make init-project\nconda activate <env-name>\n
"},{"location":"guide/#manually","title":"Manually","text":"

If you are not on Linux the setup via Makefile might not work. In that case run the following commands manually. But before, that please determine your platform:

To set up your local env from scratch run:

  1. Create conda-lock file:

    conda-lock --mamba -f ./env.yaml -p <your-platform>\n

    You can also create a lock-file for multiple platforms:

    conda-lock --mamba -f ./env.yaml -p linux-64 -p osx-arm64 -p win-64\n
  2. Create environment using conda-lock:

    conda-lock install --mamba -n <env-name> conda-lock.yml\n
  3. Activate the env:

    conda activate <env-name>\n
  4. Install pre-commit hooks:

    pre-commit install\n
  5. Install the project in an editable mode:

    pip install -e .\n

It will use your conda-lock installation to create a lock-file and create a brand new conda environment named after your repository.

Note

Once you've initialized git repo, created the lock file(s) and pinned the package versions, you should commit the changes and push them to a remote repository as an Initial commit.

"},{"location":"guide/#pre-commit-hooks","title":"Pre-commit hooks","text":"

This project uses pre-commit package for managing and maintaining pre-commit hooks.

To ensure code quality - please make sure that you have it configured.

  1. Install pre-commit and following packages: isort, black, flake8, mypy, pytest.

  2. Install pre-commit hooks by running: pre-commit install

  3. The command above will automatically run formatters, code checks and other steps defined in the.pre-commit-config.yaml

  4. All of those checks will also be run whenever a new commit is being created i.e. when you run git commit -m \"blah\"

  5. You can also run it manually with this command: pre-commit run --all-files

You can manually disable pre-commit hooks by running: pre-commit uninstall Use this only in exceptional cases.

"},{"location":"guide/#environment-variables","title":"Environment variables","text":"

Ask your colleagues for .env files which aren't included in this repository and put them inside the repo's root directory. Please, never put secrets in the source control. Always align with your IT department security practices.

To see what variables you need see the .env-sample file.

"},{"location":"guide/#ci-pipelines","title":"CI pipelines","text":"

Currently, the project supports only Azure DevOps Pipelines.

By default, the project comes with a single CI pipeline that runs a set of simplified pre-commit hooks on each PR commit that targets the main branch.

"},{"location":"guide/#documentation","title":"Documentation","text":"

We use MkDocs with Material theme.

To build the docs run:

make docs\n

If you want to verify the docs locally use:

mkdocs serve\n

A page like the one below should be available to you under: http://127.0.0.1:8000/

Note

Please note that google style docstrings are used throughout the repo.

"},{"location":"license/","title":"License","text":""},{"location":"license/#mit-license","title":"MIT License","text":"

Copyright (c) 2023 xultaeculcis

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"ml-project-cookiecutter","text":"

A cookiecutter template for my private ML projects.

"},{"location":"#motivation","title":"Motivation","text":"

During my career I worked in a lot of different ML projects - computer vision, NLP, classical ML, time series forecasting and others. The projects ranged from pure R&D, through PoCs and production ready stuff. Whenever I would start a new project I found myself copying things from a bunch of different sources and my old projects again and again - recreating and duplicating the work I did a dozen times before. Cookiecutter project templates to the resque!

The usage of technologies and certain patterns in the template is highly opinionated and is dictated by years of experience of working with Data Scientists and R&D Engineers. As ML Engineer I would often find myself working with low quality code, written by others in notebooks or scripts, without any form of documentation, standardized coding style or even a way to reproduce the environment or analysis results. Moving that to production? Good luck!

In my opinion the fastest way to move ML stuff to production is to force the Data Scientists to write quality code from the start. Want to add your changes to the repo? Sure, once all pre-commit hooks are green you'll be able to commit your changes. Add to that a CI pipeline, automated tests and PR review process, and you'll have easier way to ensure that the code and models are production ready faster.

Won't that slow down Data Scientists? Yes. At first at least. They'll have to learn working with a set of standard python tools that are known in the industry for years. Spending a few hours on this is way better than spending a few weeks on productionizing the code later. Your ML/MLOps Engineers will thank you for this.

Note

Now, standardized code style, type hints and good documentation are just a small step to success. All of this doesn't mean much without code understanding and following good coding practices. In my opinion every great Data Scientist or ML Engineer should also be a great programmer. Learn how to write clean, testable code. Learn data structures, algorithms and design patterns. Have a CI in place. Verify changes via PRs and automated tests. Automate as much as you can. Integrate with other services that will allow you to ensure reproducibility, scaling, experiment tracing, artifact versioning and easier deployment.

This project was greatly inspired by Cookiecutter Data Science project.

"},{"location":"#features","title":"Features","text":""},{"location":"#getting-started","title":"Getting started","text":"

To get started, please check out this guide.

"},{"location":"#contributing","title":"Contributing","text":"

Please refer to this guide.

"},{"location":"#running-tests","title":"Running tests","text":"

To run the unit tests execute:

pytest tests\n
"},{"location":"contributing/","title":"Contributing","text":"

Project structure and tool usage is highly opinionated within this project. As the times change, so do the best practices. I will try to keep the project up to date with the latest tools and practices.

The goal of this project is to make it easier to start, structure, reproduce, maintain and later deploy an ML project. The stuff in it is based on my own experiences and might not suit your needs. If your think something should be done in a different way feel free to create an issue or fork the repo for your own usage. It's MIT license, so you can do whatever the hell you want with it.

Pull requests and filing issues is welcome. I'd love to hear what works for you, and what doesn't. Although I cannot promise not closing them if I disagree with you.

"},{"location":"guide/","title":"Getting started","text":""},{"location":"guide/#requirements","title":"Requirements","text":"
conda install -c conda-forge cookiecutter\n
conda install -c conda-forge conda-lock -n base\n
"},{"location":"guide/#creating-new-project","title":"Creating new project","text":"

Run:

cookiecutter https://github.com/xultaeculcis/ml-project-cookiecutter\n

You will be prompted to provide project info one argument at a time:

project_name [project_name]: My ML project\nrepo_name [my-ml-project]:\nsrc_dir_name [my_ml_project]:\nauthor_name [Your name (or your organization/company/team)]: xultaeculcsis\nrepo_url [https://github.com/xultaeculcsis/my-ml-project]:\nproject_description [A short description of the project]: Just an ML project :)\nSelect license:\n1 - MIT\n2 - Apache 2.0\n3 - BSD-3-Clause\n4 - Beerware\n5 - GLWTS\n6 - Proprietary\n7 - Empty license file\nChoose from 1, 2, 3, 4, 5, 6, 7 [1]: 1\n

The repo_name, src_dir_name and repo_url will be automatically standardized and provided for you. You can change them to your liking though.

"},{"location":"guide/#working-with-the-project","title":"Working with the project","text":""},{"location":"guide/#project-directory-structure","title":"Project directory structure","text":"

The resulting project structure will look like this:

my-ml-project/\n\u251c\u2500\u2500 data\n\u2502   \u251c\u2500\u2500 analysis                          <- EDA artifacts.\n\u2502   \u251c\u2500\u2500 auxiliary                         <- The auxiliary, third party data.\n\u2502   \u251c\u2500\u2500 inference                         <- Inference results from your models.\n\u2502   \u251c\u2500\u2500 interim                           <- Intermediate data that has been transformed.\n\u2502   \u251c\u2500\u2500 processed                         <- The final, canonical data sets for modeling.\n\u2502   \u2514\u2500\u2500 raw                               <- The original, immutable data dump.\n\u251c\u2500\u2500 Dockerfile                            <- Dockerfile definition.\n\u251c\u2500\u2500 docs                                  <- The mkdocs documentation sources.\n\u2502   \u251c\u2500\u2500 api_ref                           <- Source package docs.\n\u2502   \u2502   \u251c\u2500\u2500 consts.md\n\u2502   \u2502   \u251c\u2500\u2500 core\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 configs.md\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 settings.md\n\u2502   \u2502   \u2514\u2500\u2500 utils.md\n\u2502   \u251c\u2500\u2500 guides                            <- How-to guides.\n\u2502   \u2502   \u251c\u2500\u2500 contributing.md\n\u2502   \u2502   \u251c\u2500\u2500 makefile-usage.md\n\u2502   \u2502   \u251c\u2500\u2500 setup-dev-env.md\n\u2502   \u2502   \u2514\u2500\u2500 tests.md\n\u2502   \u251c\u2500\u2500 index.md                          <- Docs homepage.\n\u2502   \u2514\u2500\u2500 __init__.py\n\u251c\u2500\u2500 env-dev.yaml                          <- Conda environment definition with development dependencies.\n\u251c\u2500\u2500 env.yaml                              <- Main Conda environment definition with only the necessary packages.\n\u251c\u2500\u2500 LICENSE                               <- The license file.\n\u251c\u2500\u2500 Makefile                              <- Makefile with commands like `make docs` or\n\u2502                                            `make pc`.\n\u251c\u2500\u2500 mkdocs.yml\n\u251c\u2500\u2500 my_ml_project                         <- Project source code. This will be different\n\u2502   \u2502                                        depending on your input during project creation.\n\u2502   \u251c\u2500\u2500 consts                            <- Constants to be used across the project.\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u251c\u2500\u2500 directories.py\n\u2502   \u2502   \u251c\u2500\u2500 logging.py\n\u2502   \u2502   \u2514\u2500\u2500 reproducibility.py\n\u2502   \u251c\u2500\u2500 core                              <- Core project stuff. E.g., the base classes\n\u2502   \u2502   \u2502                                    for step entrypoint configs.\n\u2502   \u2502   \u251c\u2500\u2500 configs\n\u2502   \u2502   \u2502   \u2514\u2500\u2500 __init__.py\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 argument_parsing.py\n\u2502   \u2502   \u2502   \u251c\u2500\u2500 base.py\n\u2502   \u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u2502   \u2514\u2500\u2500 settings.py\n\u2502   \u251c\u2500\u2500 __init__.py\n\u2502   \u251c\u2500\u2500 py.typed\n\u2502   \u2514\u2500\u2500 utils                             <- Utility functions and classes.\n\u2502       \u251c\u2500\u2500 __init__.py\n\u2502       \u251c\u2500\u2500 gpu.py\n\u2502       \u251c\u2500\u2500 logging.py\n\u2502       \u251c\u2500\u2500 mlflow.py\n\u2502       \u2514\u2500\u2500 serialization.py\n\u251c\u2500\u2500 notebooks                             <- Jupyter notebooks. Naming convention is a\n\u2502                                            number (for ordering), the creator's initials,\n\u2502                                            and a short `-` delimited description, e.g.\n\u2502                                            `1.0-jqp-initial-data-exploration`.\n\u251c\u2500\u2500 pyproject.toml                        <- Contains build system requirements\n\u2502                                            and information, which are used by pip to build\n\u2502                                            the package and project tooling configs.\n\u251c\u2500\u2500 README.md\n\u251c\u2500\u2500 setup.py\n\u2514\u2500\u2500 tests                                 <- The tests directory.\n    \u251c\u2500\u2500 conftest.py                       <- Contains test fixtures and utility functions.\n    \u251c\u2500\u2500 e2e                               <- Contains end-to-end tests.\n    \u251c\u2500\u2500 __init__.py\n    \u251c\u2500\u2500 integration                       <- Contains integration tests.\n    \u2514\u2500\u2500 unit                              <- Contains unit tests.\n

Most of those folders were described in detail in the Cookiecutter Data Science Docs.

"},{"location":"guide/#environment-setup","title":"Environment setup","text":"

You'll need to inti a git repo in your newly created project:

make git-init\n

Or:

git init\ngit add .\n
"},{"location":"guide/#via-makefile","title":"Via Makefile","text":"

Right after creating new project from the cookiecutter template you'll need to freeze the dependencies. Initial conda env.yaml has a minimal set of dependencies needed for the helper functions, test execution and docs creation. Note that most of the conda dependencies are not pinned in the env.yaml. This is done on purpose in order to ensure that new projects can be created with the most up-to-date packages. Once you create the lock file, you can pin specific versions.

By default, the Makefile only supports the linux-64 platform. If your team works on multiple platforms you can add those platforms to the conda-lock command yourself.

To lock the environment run:

make lock-file\n

After creating the lock file you can create the conda environment by running:

make env\n

This command will set up the environment for you. It will also install pre-commit hooks and the project in an editable mode. Once done, you can activate the environment by running:

conda activate <env-name>\n

By default, the <env-name> created using the Makefile will be equal to cookiecutter.repo_name variable.

Note

If you are on Windows, the make command will be unavailable. We recommend working with WSL in that case.

For example for linux-64 the full list of commands (using Makefile) would look like so:

make git-init\nmake lock-file\nmake env\nconda activate <env-name>\n
Note

If you want to initialize Git repository, create lock-file and development environment in one go you can run:

make init-project\nconda activate <env-name>\n
"},{"location":"guide/#manually","title":"Manually","text":"

If you are not on Linux the setup via Makefile might not work. In that case run the following commands manually. But before, that please determine your platform:

To set up your local env from scratch run:

  1. Create conda-lock file:

    conda-lock --mamba -f ./env.yaml -p <your-platform>\n

    You can also create a lock-file for multiple platforms:

    conda-lock --mamba -f ./env.yaml -p linux-64 -p osx-arm64 -p win-64\n
  2. Create environment using conda-lock:

    conda-lock install --mamba -n <env-name> conda-lock.yml\n
  3. Activate the env:

    conda activate <env-name>\n
  4. Install pre-commit hooks:

    pre-commit install\n
  5. Install the project in an editable mode:

    pip install -e .\n

It will use your conda-lock installation to create a lock-file and create a brand new conda environment named after your repository.

Note

Once you've initialized git repo, created the lock file(s) and pinned the package versions, you should commit the changes and push them to a remote repository as an Initial commit.

"},{"location":"guide/#pre-commit-hooks","title":"Pre-commit hooks","text":"

This project uses pre-commit package for managing and maintaining pre-commit hooks.

To ensure code quality - please make sure that you have it configured.

  1. Install pre-commit and following packages: isort, black, flake8, mypy, pytest.

  2. Install pre-commit hooks by running: pre-commit install

  3. The command above will automatically run formatters, code checks and other steps defined in the.pre-commit-config.yaml

  4. All of those checks will also be run whenever a new commit is being created i.e. when you run git commit -m \"blah\"

  5. You can also run it manually with this command: pre-commit run --all-files

You can manually disable pre-commit hooks by running: pre-commit uninstall Use this only in exceptional cases.

"},{"location":"guide/#environment-variables","title":"Environment variables","text":"

Ask your colleagues for .env files which aren't included in this repository and put them inside the repo's root directory. Please, never put secrets in the source control. Always align with your IT department security practices.

To see what variables you need see the .env-sample file.

"},{"location":"guide/#ci-pipelines","title":"CI pipelines","text":"

Currently, the project supports only Azure DevOps Pipelines.

By default, the project comes with a single CI pipeline that runs a set of simplified pre-commit hooks on each PR commit that targets the main branch.

"},{"location":"guide/#documentation","title":"Documentation","text":"

We use MkDocs with Material theme.

To build the docs run:

make docs\n

If you want to verify the docs locally use:

mkdocs serve\n

A page like the one below should be available to you under: http://127.0.0.1:8000/

Note

Please note that google style docstrings are used throughout the repo.

"},{"location":"license/","title":"License","text":""},{"location":"license/#mit-license","title":"MIT License","text":"

Copyright (c) 2023 xultaeculcis

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the \"Software\"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 41bcad5..c404b47 100644 Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ