-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【Type Hints】Paddle 中引入 Tensor stub 文件 #63953
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
Update 20240505
这里参考 #50211 中的方法,主要不同:
另外,没有修改 tensor.py 文件,本来想把 if typing.TYPE_CHECKING:
from .tensor.tensor import Tensor
else:
Tensor = framework.core.eager.Tensor
Tensor.__qualname__ = 'Tensor'
目前,paddle.Tensor 中共有
以上共计 @SigureMo 请评审 ~ |
Sorry to inform you that 770c25b's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
模板唯一一个问题是不受 Ruff 等工具监控,代码风格无法保证
我的建议是在想好 API 形态前不要加泛型,一旦现在加了,以后改就是 breaking change,但从非泛型到泛型在非 strict mode 下是兼容性变动
而且这个签名也不对啊,dtype 哪来的 bool |
.pre-commit-config.yaml
Outdated
@@ -5,7 +5,8 @@ exclude: | | |||
paddle/fluid/framework/fleet/heter_ps/cudf/.+| | |||
paddle/fluid/distributed/ps/thirdparty/round_robin.h| | |||
python/paddle/utils/gast/.+| | |||
third_party/.+ | |||
third_party/.+| | |||
python/paddle/tensor/tensor\.pyi| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个注意合入前要删掉,这个文件不应该存在源码里
python/paddle/py.typed
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件应该在另一个 PR 已经加过了~
python/paddle/tensor/tensor.pyi.in
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以根据 https://github.com/cattidea/paddlepaddle-stubs/blob/main/paddle-stubs/_typing/tensor.pyi 看看有没有什么缺失的,比如 __or__
好像就是缺失的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有的 ~
Paddle/python/paddle/tensor/tensor.pyi
Line 13724 in 770c25b
def __or__(self, y, out=None, name=None): |
是自动生成的 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我这里是比对的 paddle 目前 Tensor 里面的东西,后面我对比一下那个 https://github.com/cattidea/paddlepaddle-stubs/blob/main/paddle-stubs/_typing/tensor.pyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是太多了搜索卡住了,但 __ror__
应该是没有的
python/paddle/tensor/tensor.pyi.in
Outdated
# Add docstring, attributes, methods and alias with type annotaions for `Tensor` | ||
# if not conveniently coding in original place (like c++ source file). | ||
|
||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyi 文件不需要加 PEP 563,因为它没有什么所谓的运行时
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件我是当作 python 文件处理的,这样可以保证通过 mypy 检查,所以添加了这个 ~ 如果生成的 pyi 不需要的话,可以在生成的 tensor.pyi 里面删掉 ~ 如何?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
诶?为啥不从源头删掉?这个文件应该不需要 mypy 检查的
是的,所以这个模板文件原则上应该是能通过 mypy 的 python 文件 ~ 而不是 torch 的那种混有 我在线下用 mypy 检查是通过的,后面考虑是否把他也加到 CI 的检查里面?
本来是没加泛型的,结果就是因为这个 eq 和 nq 方法,返回的是个 bool 类型的 Tensor ,如果只写 Tensor 感觉语义不明确,所以就加上了 ~ 其他地方暂时没看到需要泛型的,要么就去掉吧? 另外,dtype 有 bool 啊 In [2]: a = paddle.to_tensor(False)
In [3]: a
Out[3]:
Tensor(shape=[], dtype=bool, place=Place(gpu:0), stop_gradient=True,
False) 为啥没有??? |
补充说明一下根据模板 tensor.pyi.in 生成 tensor.pyi 的逻辑:
|
另外,关于 tensor.pyi.in 这个模板文件的维护问题,我的想法是,不需要特殊维护,因为,后面 CI 会检查 api 的 typing,如果 tensor 接口有变,且没有办法自动生成有效的签名,那么,如果不修改 tensor.pyi.in 文件手动添加,则 CI 检查 |
这是
如果是合法的 pyi 文件,建议使用 可以考虑改为
对的,我不关心这个文件自身的 typing 问题,因为我们本就通过叶子结点(API)去检查了这一点,我这里只是说其它的通用检查工具,以及,编辑器并不会给 比如 #63256 就是因为不爽很久了,直接改了后缀 |
python/paddle/tensor/tensor.pyi.in
Outdated
@overload | ||
def __init__(self, *args: Any, **kwargs: Any) -> None: | ||
""" | ||
ref: paddle/fluid/pybind/eager.cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样写的话,会不会把这个当成 docstring?会不会影响用户对文档的阅读?注释的话可以用 #
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我这里就是把它当作 docstring 处理的,看看需不需要?如果不需要的话那么删掉就行 ~
倒是不会影响阅读 ~ 例如 vscode 对于 overload 的 api 可以逐个查看 ~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eager.cc
就在这段注释的上面就是 Tensor 的 __doc__
Paddle/paddle/fluid/pybind/eager.cc
Lines 751 to 763 in 4bedcd0
PyDoc_STRVAR( // NOLINT | |
TensorDoc, | |
R"DOC(Tensor($self, /, value, place, persistable, zero_copy, name, stop_gradient, dims, dtype, type) | |
-- | |
Tensor is the basic data structure in PaddlePaddle. There are some ways to create a Tensor: | |
- Use the exsiting ``data`` to create a Tensor, please refer to :ref:`api_paddle_to_tensor`. | |
- Create a Tensor with a specified ``shape``, please refer to :ref:`api_paddle_ones`, | |
:ref:`api_paddle_zeros`, :ref:`api_paddle_full`. | |
- Create a Tensor with the same ``shape`` and ``dtype`` as other Tensor, please refer to | |
:ref:`api_paddle_ones_like`, :ref:`api_paddle_zeros_like`, :ref:`api_paddle_full_like`. | |
)DOC"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
喔喔,是 Tensor 本身也有一个是吧,那这样也没问题
OK ~
嗯!已经改为 另外,做以下修改:
|
# avoid same name: Tensor.slice | ||
_Slice: TypeAlias = slice | ||
|
||
class Tensor(Generic[_ShapeType, _DType]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
泛型这里还没有去掉是嘛?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
漏掉了 ~ 我改一下 ~
name: std::string) | ||
""" | ||
... | ||
def __add__(self, y: Tensor) -> Tensor: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些 magic method 的 rhs 应该不仅仅支持 Tensor,还支持一些 TensorLike 的,比如 x + 1
之类的
与之相对的,paddle.add
是只支持 Tensor 的,paddle.add(x, y)
(含 x.add(y)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是的!当时 _typing
还没 merge,我改一下 ~
def __and__( | ||
self, | ||
y: Tensor, | ||
out: Tensor | None = None, | ||
name: str | None = None, | ||
) -> Tensor: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paddle/python/paddle/tensor/__init__.py
Lines 836 to 841 in fba4675
magic_method_func = [ | |
('__and__', 'bitwise_and'), | |
('__or__', 'bitwise_or'), | |
('__xor__', 'bitwise_xor'), | |
('__invert__', 'bitwise_not'), | |
] |
这几个签名不应该包含 out
和 name
的,包含的原因是目前是直接将相关的 bitwise API patch 上了,但这是不规范的,建议不要在签名里暴露后两个参数~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外建议这里排序按照分组会更好一些,比如
- bitwise (
__lshift__
、__or__
etc.) - comparison (
__eq__
、__lt__
etc.) - math operation
- unary (
__neg__
etc.) - binary (
__add__
、__matmul__
etc., including reversed and inplaced version)
- unary (
- type cast (
__bool__
、__int__
、__float__
etc.) - numpy specific (
__array__
)
另外,为了更好的 stub 检查,可以考虑引入 Ruff 的 flake8-pyi rules 了~(独立任务) |
""" | ||
... | ||
# rich comparison | ||
def __eq__(self, y: _typing.Numberic) -> Tensor: ... # type: ignore[override] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我倒是没注意现在的 Numberic
里包含了 Tensor
,但 Numberic
应该指的是一个能表示数字的类型,这包含了 numpy scalar、0D Tensor
但 TensorLike
是一个不同的类型,它不仅包含这种 0D 的,还应该包含任意维度的 Tensor/ndarray,可以参考下面的:
其中 list[TensorLike] | tuple[TensorLike, ...]
我不是很确定当时添加的理由了,但后面三个是应该有的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我还纳闷呢,找不到 TensorLike
,以为改成 Numberic
了,可是也不合适啊 ... ...
TensorLike
应该挺有用的,把它加到 _typing.basic.py
里吧?
list[TensorLike] | tuple[TensorLike, ...]
有用,比如 paddle.atleast_xd
,inputs (Tensor|list(Tensor))
,方法内部会解包 ~ 不过,我觉得还是不要加到 TensorLike
里面吧,比如这里 __eq__
之类的地方,更多用到的应该是下面这个形式:
TensorLike: TypeAlias = npt.NDArray[Any] | Tensor | Numberic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TensorLike 应该挺有用的,把它加到 _typing.basic.py 里吧?
嗯嗯~
不过,我觉得还是不要加到 TensorLike 里面吧,比如这里 eq 之类的地方,更多用到的应该是下面这个形式:
嗯嗯可以~
@property | ||
def strides(self) -> list[int]: ... | ||
@property | ||
def type(self) -> _typing.DTypeLike: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tensor 的 type 应该不是 dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,但是不晓得应该写啥?Tensor
?
In [8]: a.type
Out[8]: <VarType.LOD_TENSOR: 7>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VarType 是旧体系下的,不太建议暴露,这里可以直接用 Any
def detach(self) -> Tensor: ... | ||
def detach_(self) -> Tensor: ... | ||
@property | ||
def dtype(self) -> _typing.DTypeLike: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
返回值一般都不是 XxxLike
,比如这里应该是明确的 paddle.dtype
def set_string_list(self, value: str) -> None: ... | ||
def set_vocab(self, value: dict) -> None: ... | ||
@property | ||
def shape(self) -> _typing.ShapeLike: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,这里应该是明确的 list[int]
@property | ||
def persistable(self) -> bool: ... | ||
@property | ||
def place(self) -> _typing.PlaceLike: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个也要看一下类型,这些 Place 没有公共基类么?
def placements(self) -> list[paddle.distributed.Placement] | None: ... | ||
@property | ||
def process_mesh(self) -> paddle.distributed.ProcessMesh | None: ... | ||
def rows(self) -> paddle.core.SelectedRows: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个类型就是这个是么?不过这个类型 Python 端并没有额外信息,可能并不是很友好
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有几个接口太陌生了 ... ... 找了找,貌似应该是 list[int]
In [32]: >>> import paddle
...: >>> import paddle.base as base
...: >>> from paddle.base import core
...: >>> paddle.enable_static()
...: >>> scope = core.Scope()
...: >>> block = paddle.static.default_main_program().global_block()
...: >>> x_rows = [0, 5, 5, 4, 19]
...: >>> height = 20
...: >>> x = scope.var('X').get_selected_rows()
...: >>> x.set_rows(x_rows)
...:
In [33]: x
Out[33]: <paddle.base.libpaddle.SelectedRows at 0x7f319b781d70>
In [36]: x.rows()
Out[36]: [0, 5, 5, 4, 19]
In [37]: type(x.rows())
Out[37]: list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有几个接口太陌生了 ... ...
这些我也不清楚 😂,基本都不会用的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update 20240523:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for setup.py.in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--------- Co-authored-by: Nyakku Shigure <[email protected]>
PR Category
User Experience
PR Types
Others
Description
关联 PR : #63597
任务:1-2
根据 #50211 生成 tensor.pyi
涉及文件:
.pre-commit-config.yaml
pre-commit 不检查 tensor.pyipython/CMakeLists.txt
编译后执行 gen_tensor_stub.pypython/paddle/__init__.py
type checking 时引入 tensor 目录下的 Tensorpython/paddle/py.typed
标记文件python/setup.py.in
增加 py.typed 和 tensor.pyisetup.py
增加 py.typed 和 tensor.pyitools/gen_tensor_stub.py
生成 tensor.pyipython/paddle/tensor/tensor.pyi.in
stub 文件的模板临时文件:
python/paddle/tensor/tensor.pyi
生成的 stub 文件,可供参考目前 (20240428) 还有几个问题:
gen_tensor_stub.py
加在哪里合适?python/CMakeLists.txt
吗?有这方面规划吗?python/setup.py.in
和setup.py
都需要增加 py.typed 和 tensor.pyi ?使用gen_tensor_stub.py
目前只是把 [Don't merge][Type Hints] add tensor.pyi generator (at runtime) #50211 里面的拿过来,还需要优化。tensor.pyi.in
模板生成 stub 文件