This is base repository for PoC (Proof of Concept) code. Boilerplate project for creating python task using the podder-pipeline.
$ tree . -L 2
.
├── Dockerfile
├── README.md
├── api
│ ├── __init__.py
│ ├── grpc_server.py
│ ├── protos
│ └── task_api.py
├── app
│ ├── __init__.py
│ └── task.py # main task implementation
├── log.yml
├── main.py
├── requirements
│ ├── requirements.develop.txt
│ └── requirements.txt # add required packages here
├── run_codegen.py
├── scripts
│ ├── entrypoint.sh
│ └── pre-commit.sh # execute before committing your codes
├── shared
│ ├── data
│ └── tmp
└── tests
├── files
│ └── inputs.json # sample inputs.json
└── unit # add unit test here
Add your code to app/task.py
.
Please check task sample here Sample
def __init__(self, context: Context) -> None:
self.logger.debug("Initiate task...")
super().__init__(context)
def execute(self) -> None:
self.logger.debug("START processing...")
self.yourProcess(self.args.input_path)
self.logger.debug("Completed.")
def set_arguments(self, parser) -> None:
parser.add_argument('--input_path', dest="input_path", help='set input path', default='.')
podder-task-base
python module provides many APIs for the development.
You can output logs with self.logger
. logger
is just a wrapper of logging. For further logging usage, please check here
self.logger.debug("debug")
self.logger.info("info")
You can add your own command line argument using self.context.config.set_argument
within task.py
.
After you execute with command line arguments, you can access to the passed arguments through self.context.config.get
.
For example, set --model
to command line argument.
# Set your command line argument
def set_arguments(self) -> None:
self.context.config.set_argument('--model-path', dest="model_path", help='set model path')
# Execute main.py with argument "--model"
$ python main.py --model-path /path/to/model
# You can access to the value passed to "--model"
def execute(self, inputs: List[Any]) -> List[Any]:
model = self.context.config.get('model_path')
There are 4 shared directories, which is config
, input
, output
, tmp
.
They are shared among the environment and every containers can access them.
config
: Where config files are located.input
: Where input files are located.output
: Where output files are located.tmp
: Where temporary files are located. Podder Pipeline creates the directory under thetmp/dag_id/job_id
to keep each job's temporary files.
When you need to locate the temporary files, please put them into tmp
directory.
You can get the path to tmp
directory by self.context.file.get_tmp_path(file_name)
.
self.context.file.get_tmp_path('sample.csv')
# => /path/to/shared/tmp/sample.csv
We strongly recommend to run Podder Task using Docker.
- Build docker image
$ docker build -t podder-task .
- Execute on the docker container
$ docker run -it --env-file .env.example podder-task bash
# You can run your code
$ python main.py --inputs tests/files/inputs.json
- Run with one-liner
If you want to run it with one-liner code, you can also run it.
$ docker run -it --env-file .env.example podder-task python main.py --inputs tests/files/inputs.json
# clone podder-task
$ git clone [email protected]:podder-ai/podder-task.git
$ cd podder-task
# enable python3
$ python3 -m venv env
$ source env/bin/activate
# install required libraries
$ pip install -r requirements.txt
# run sample code
$ python main.py --inputs /path/to/input/a /path/to/input/b
If using Powershell, the activate script is subject to the execution policies on the system. By default on Windows 7, the system's excution policy is set to Restricted
, meaning no scripts as virtualenv activation script are allowed to be executed.
In order to use the script, you can relax your system's execution policy to Unrestricted
, meaning all scripts on the system can be executed. As an administrator run:
C:\>Set-ExecutionPolicy Unrestricted -Scope CurrentUser -Force -Verbose
# clone podder-task
C:\> git clone [email protected]:podder-ai/podder-task.git
C:\> cd podder-task
# enable python3
C:\>python3 -m venv C:\path\to\myenv
# Windows cmd.exe
C:\> C:\path\to\myenv\Scripts\activate.bat
# PowerShell PS
C:\> C:\path\to\myenv\Scripts\Activate.ps1
# install required libraries
C:\> pip install -r requirements.txt
# run sample code
C:\> python main.py --inputs /path/to/input/a /path/to/input/b
Copy and create .env
file and add your env variables.
$ cp .env.sample .env
Please execute linters, formatters and unit tests before committing your source codes.
You can execute them by the following command. Make sure that you are under the root directory of your project. (e.q. podder-task/)
$ pip install -r ./requirements/requirements.develop.txt
$ sh ./scripts/pre-commit.sh
- flake8
- autopep8
- yapf
- autoflake
- isort
- pytest
Please follow the official documents of the libraries.
$ cd podder-task
$ docker build . -t podder-task
$ docker run --env-file .env.example -t podder-task pytest
Finally, your task implementation will be integrated to Podder-Pipeline and deploy using Docker/Kubernetes. To make it easier, please follow this implementation rules below.
- Only add your code to
app/task.py
- Put your data set or model files to
data
- Your task implementation will be compiled by Cython in integrating. Please don't use
__file__
in your code. - Create virtual environment for your code. Please check Creation of virtual environments
Please add issue & pull request if you have any request!