Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python hint to bigdl.friesian #5430

Merged
merged 14 commits into from
Aug 26, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions docs/docs/DeveloperGuide/typehint.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Python Type hint

This page describes how to add type annotations to unmarked code efficiently.

## Introduction

we need to declare that the python runtime does not enforce function and variable [type annotations](https://docs.python.org/3/library/typing.html#module-typing). But they can actually be used by third party tools such as type checkers, IDEs, linters, etc.
leonardozcm marked this conversation as resolved.
Show resolved Hide resolved

**Python Enhancement Proposals(PEPs)** are widely accepted python standards, and [PEP 484](https://peps.python.org/pep-0484/) introduced syntax for function annotations, for example:
```python
def greeting(name: str) -> str:
return 'Hello ' + name
```
This states that the expected type of the name argument is str. Analogically, the expected return type is str.

Expressions whose type is a subtype of a specific argument type are also accepted for that argument.

## MonkeyType: Automatic Annotation

MonkeyType is a python tools collects runtime types of function arguments and return values, and can automatically generate stub files or even add draft type annotations directly to your Python code based on the types collected at runtime. More details at [MonkeyType github](https://github.com/Instagram/MonkeyType#example).

We can collect the types of interface exposed to the user by running unit tests with monkeytype, but we recommend achieving this by previewing the changes first with `monkey stub` and then manually applying them.

## How Do We Work on Type Hint
Takes bigdl.friesian.feature as an example.

0. Preparation
```shell
pip install monkeytype

cd Bigdl/python
source friesian/dev/prepare_env.sh
```

1. Collect runtime types with monkeytype
Since monkeytype can only operate file-by-file, we shall run uts individually or Or automate the process with the help of `add_type_hint.sh`.
```shell
# Usage:
# bash dev/add_type_hint.sh module_name submodule_name
# module_names: orca, dllib, chronos, friesian, etc.
# submodule_name: directories under module_names/test/bigdl/module_names
#
# Example:
# `bash dev/add_type_hint.sh friesian feature` will run all unit tests
# under python/friesian/test/bigdl/friesian/feature

bash dev/add_type_hint.sh friesian feature
```
If all UTs pass, you will see `friesian_hint.sqlite3` under `dev/`, which contains stubs of refactors. Check the list of all modules which have traces present in the trace store:
```shell
> export MT_DB_PATH="dev/friesian_hint.sqlite3" # monkeytype will read this traces database
> monkeytype list-modules
test.bigdl.friesian.feature.test_table
test.bigdl.friesian.feature.conftest
pyspark.traceback_utils
... ...
bigdl.friesian.feature.utils
bigdl.friesian.feature.table
... ...
```

2. Run `monkeytype stub some.module` to generate a stub file for the given module based on call traces queried from the trace store.
```shell
> monkeytype stub bigdl.friesian.feature.table

...
class Table:
def __init__(self, df: "SparkDataFrame") -> None: ...

@property
def schema(self) -> "StructType": ...

@staticmethod
def _read_parquet(paths: Union[List[str], str]) -> "SparkDataFrame": ...
...

```
which indicates the annotations involved in the UTs. More usages about stub see [docs](https://monkeytype.readthedocs.io/en/latest/generation.html#monkeytype-stub).

3. Apply type annotaions manually (recommended) or use `monkeytype apply some.module`.

4. **Check again if annotations consistent with comments or documents.(IMPORTANT)**
leonardozcm marked this conversation as resolved.
Show resolved Hide resolved

## Notes
1. `monkeytype apply` may not work for some cases.

For example, `friesian.feature.table` invokes two kinds of DataFrame in this module:` pyspark.sql.dataframe.DataFrame `and `pandas.core.frame.DataFrame`. To avoid ambiguity of type `DataFrame`, we rename `pyspark.sql.dataframe.DataFrame t` o SparkDataFrame like:
leonardozcm marked this conversation as resolved.
Show resolved Hide resolved
```
from pyspark.sql.dataframe import DataFrame as SparkDataFrame
```
And so do PandasDataFrame.

2. Use [TYPE_CHECKING](https://docs.python.org/3/library/typing.html#constant) constant to avoid import unnecessary libraries at runtime.
```python
if TYPE_CHECKING:
from pandas.core.frame import DataFrame as PandasDataFrame
from pyspark.sql.column import Column
from pyspark.sql import Row
from pyspark.sql.dataframe import DataFrame as SparkDataFrame
from pyspark.sql.types import StructType
```
3. Mark `TODO` if there are methods not caught by UTs.
54 changes: 54 additions & 0 deletions python/dev/add_type_hint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
#!/usr/bin/env bash

#
# Copyright 2022 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


hint_module_name="friesian"
if [ "$1" ]; then
hint_module_name=$1
echo hint_module_name:${hint_module_name}
fi

hint_submodule_name="feature"
if [ "$2" ]; then
hint_submodule_name=$2
echo hint_submodule_name:${hint_submodule_name}
fi

cd "`dirname $0`"
export MT_DB_PATH="$(pwd)/${hint_module_name}_hint.sqlite3"
echo $MT_DB_PATH

export PYSPARK_PYTHON=python
export PYSPARK_DRIVER_PYTHON=python

cd ../${hint_module_name}
echo "Automatically Add Type Hint"


if [ -f $MT_DB_PATH ];then
rm $MT_DB_PATH
fi

for file in $(find test/bigdl/${hint_module_name}/${hint_submodule_name} -name test_*.py)
do
echo $file
monkeytype run $file
done

cd -
unset MT_DB_PATH
Loading