Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs][hook] add a pre-commit hook to automatically insert space to cn and en char #4973

Merged
merged 32 commits into from
Aug 4, 2022

Conversation

SigureMo
Copy link
Member

@SigureMo SigureMo commented Jun 30, 2022

随便写了个小工具 docufix 来自动在中英文之间增加空格(参考规范:飞桨文档写作规范#12-文本),并将自动在中英文加空格的功能提取到一个 pre-commit hook 中,置于 https://github.com/ShigureLab/dochooks

本 PR 利用 docufix 统一修复历史遗留未加空格的问题,并添加 dochooks 里的 hook,以规范之后的 commit

另外将现有的一些其他格式问题一并修复:

  • CRLF 作为换行
  • TAB 作为缩进
  • 文件末有无换行/文件末有无多余换行
  • 去除每一行后多余的空格

并且将这些对应的 pre-commit hooks 也应用到了 rst 文件,避免这些问题再次出现

docufix 每次修复时所用的命令(点击展开)
docufix --all-rules --ignore-globs='docs/api/paddle/fluid/**,docs/api/paddle/distributed/fleet/utils/HDFSClient_cn.rst' '**/*.md' '**/*.rst' --fix

Note

fluid 目录全部 ignore,因为可能有些 CI 问题难以解决

HDFSClient_cn.rst 文件也 ignore,同样是因为 CI 问题,需要英文文档同步修改,但修改后英文文档 CI 无法通过……就全部恢复了

由于 CI 遇到了 26 个示例代码错误,因此逐一修复了了(包含一些 COPY-FROM),相关修复方案见下面的 comments

英文文档同步 PR

PaddlePaddle/Paddle#44679

PADDLEPADDLE_PR=44679

@paddle-bot-old
Copy link

感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-4973.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html
预览工具的更多说明,请参考:[Beta]飞桨文档预览工具

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jun 30, 2022

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@SigureMo SigureMo changed the title [WIP][Docs][cn] add space between cn and en char [Docs][cn] add space between cn and en char Jul 1, 2022
@SigureMo SigureMo marked this pull request as ready for review July 1, 2022 08:06
@SigureMo

This comment was marked as duplicate.

@SigureMo SigureMo changed the title [Docs][cn] add space between cn and en char [Docs][cn] insert whitespace between cn and en char Jul 2, 2022
@SigureMo SigureMo changed the title [Docs][cn] insert whitespace between cn and en char [Docs][cn] fix some style issue Jul 3, 2022
@SigureMo SigureMo force-pushed the add-space-between-cn-en-char branch from b936275 to bc189d9 Compare July 3, 2022 10:29
@SigureMo

This comment was marked as outdated.

@SigureMo SigureMo force-pushed the add-space-between-cn-en-char branch from a37c3d0 to 231f8db Compare July 27, 2022 09:53
@@ -11,7 +11,7 @@ function filter_cn_api_files() {
local __resultvar=$2
local need_check_files=""
for file in `echo $git_files`;do
grep "code-block" ../docs/$file > /dev/null
grep 'code-block:: python' ../docs/$file > /dev/null
Copy link
Member Author

@SigureMo SigureMo Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

原来 CI 中有部分示例代码因为 no sample code 问题而报错,这是因为在这里筛选是否 need check 的文件时仅仅 check 了是否包含 code-block,然而有些文件是包含一些非可运行的 code-block 的,比如 code-block:: text,这些文件之后传入 chinese_samplecode_processor.py 就会报 no sample code 错误

因此这里收缩检查条件,仅仅包含 code-block:: python 才进行后续 check

目前遇到该问题的文件有以下 8 个,这 8 个文件均不会通过 grep 'code-block:: python' $filename 的检查,也就不会传入后续 check

  • docs/api/paddle/add_n_cn.rst
  • docs/api/paddle/distributed/launch_cn.rst
  • docs/api/paddle/incubate/nn/functional/fused_multi_head_attention_cn.rst
  • docs/api/paddle/multiplex_cn.rst
  • docs/api/paddle/profiler/make_scheduler_cn.rst
  • docs/api/paddle/squeeze_cn.rst
  • docs/api/paddle/text/Overview_cn.rst
  • docs/api/paddle/uniform_cn.rst

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改后 CI 不再出现 no sample code 的问题

# required: gpu
import paddle
event = paddle.device.cuda.Event()
COPY-FROM: paddle.device.cuda.Event
Copy link
Member Author

@SigureMo SigureMo Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

同样是为了避免 CI 的代码示例错误,这些错误主要是由于 CI 环境上缺少部分必需的库(如 scipy)/缺少必需的设备(如 GPU),因此一些能改为 COPY-FROM 的直接改成 COPY-FROM 了,相关修改 20 个 API 文件左右

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其中一个 API 需要同时修改中文文档加上 :name: 才可以,因此额外提了 PaddlePaddle/Paddle#44679

Comment on lines 15 to 28
files: \.md$
files: \.md$|\.rst$
- id: trailing-whitespace
files: \.md$
files: \.md$|\.rst$
- repo: https://github.com/Lucas-C/pre-commit-hooks
rev: v1.0.1
hooks:
- id: forbid-crlf
files: \.md$
files: \.md$|\.rst$
- id: remove-crlf
files: \.md$
files: \.md$|\.rst$
- id: forbid-tabs
files: \.md$
files: \.md$|\.rst$
- id: remove-tabs
files: \.md$
files: \.md$|\.rst$
Copy link
Member Author

@SigureMo SigureMo Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reStructureText 和 Markdown 同样是用于书写文档的文本文件,都应当进行相关检查和修复,以避免相关格式问题再次发生(其实这些 hook 对于 YAML、Python 等等代码文件也适用,也可以考虑删除 files 字段,使其应用于全部文本文件)

关于在中英文中自动添加空格,目前已经有了一个 hook(https://github.com/ShigureLab/dochooks ,于 #5083 测试),但不能确定其稳定性(虽然我目前用着还不错,但毕竟还没经过太多的检验),因此可在充分检验之后再考虑添加(希望有同学帮忙测试呀~~~)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我在本地使用hook对所有文件进行格式化,检查了十多个文件,结果均正常。可以考虑加上该hook试运行一段时间。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,我将那个 hook 也一并放在这个 PR~

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hook的版本可以按照Paddle仓库的配置文件统一下:
Lucas-C/pre-commit-hooks->v1.1.14
yapf->v0.32.0
pre-commit/pre-commit-hooks->v4.1.0

@SigureMo
Copy link
Member Author

应当只剩 2 个示例代码问题了,都是从 Overview_cn.rst 提取出来的示例代码,一个是 docs/api/paddle/hub/Overview_cn.rst,报错如下:

2022-07-28 00:15:39 Traceback (most recent call last):
2022-07-28 00:15:39   File "/usr/local/lib/python3.7/dist-packages/paddle/hapi/hub.py", line 41, in _import_module
2022-07-28 00:15:39     hub_module = __import__(name)
2022-07-28 00:15:39   File "/root/.cache/paddle/hub/PaddlePaddle_PaddleClas_develop/hubconf.py", line 45, in <module>
2022-07-28 00:15:39     import ppcls
2022-07-28 00:15:39   File "/root/.cache/paddle/hub/PaddlePaddle_PaddleClas_develop/ppcls/__init__.py", line 15, in <module>
2022-07-28 00:15:39     from . import optimizer
2022-07-28 00:15:39   File "/root/.cache/paddle/hub/PaddlePaddle_PaddleClas_develop/ppcls/optimizer/__init__.py", line 23, in <module>
2022-07-28 00:15:39     from ppcls.utils import logger
2022-07-28 00:15:39   File "/root/.cache/paddle/hub/PaddlePaddle_PaddleClas_develop/ppcls/utils/__init__.py", line 17, in <module>
2022-07-28 00:15:39     from . import model_zoo
2022-07-28 00:15:39   File "/root/.cache/paddle/hub/PaddlePaddle_PaddleClas_develop/ppcls/utils/model_zoo.py", line 23, in <module>
2022-07-28 00:15:39     import tqdm
2022-07-28 00:15:39 ModuleNotFoundError: No module named 'tqdm'
2022-07-28 00:15:39 
2022-07-28 00:15:39 During handling of the above exception, another exception occurred:
2022-07-28 00:15:39 
2022-07-28 00:15:39 Traceback (most recent call last):
2022-07-28 00:15:39   File "temp/Overview.py", line 5, in <module>
2022-07-28 00:15:39     models = paddle.hub.list('PaddlePaddle/PaddleClas:develop', source='github', force_reload=True,)
2022-07-28 00:15:39   File "/usr/local/lib/python3.7/dist-packages/paddle/hapi/hub.py", line 206, in list
2022-07-28 00:15:39     hub_module = _import_module(MODULE_HUBCONF.split('.')[0], repo_dir)
2022-07-28 00:15:39   File "/usr/local/lib/python3.7/dist-packages/paddle/hapi/hub.py", line 46, in _import_module
2022-07-28 00:15:39     'Please make sure config exists or repo error messages above fixed when importing'
2022-07-28 00:15:39 RuntimeError: Please make sure config exists or repo error messages above fixed when importing
2022-07-28 00:15:39 
2022-07-28 00:15:39 ****************************************************
2022-07-28 00:15:39 ----------------End of the Check--------------------
2022-07-28 00:15:39 ****************************************************
2022-07-28 00:15:39 Error sample code number is:1
2022-07-28 00:15:39 Error type two sample number is:1
2022-07-28 00:15:39 Error raised from type two:running error sample code. ['../docs/api/paddle/hub/Overview_cn.rst']
2022-07-28 00:15:39 Mistakes found in sample codes.

主要是一些依赖没有装(tqdm 等)

另一个是 docs/api/paddle/incubate/autograd/Overview_cn.rst,报错如下:

2022-07-28 00:15:48 Sample code error found in  ../docs/api/paddle/incubate/autograd/Overview_cn.rst :
2022-07-28 00:15:48 
2022-07-28 00:15:48 Traceback (most recent call last):
2022-07-28 00:15:48   File "temp/Overview.py", line 2, in <module>
2022-07-28 00:15:48     x = np.random.rand(2, 20)
2022-07-28 00:15:48 NameError: name 'np' is not defined
2022-07-28 00:15:48 
2022-07-28 00:15:48 
2022-07-28 00:15:48 Sample code error found in  ../docs/api/paddle/incubate/autograd/Overview_cn.rst :
2022-07-28 00:15:48 
2022-07-28 00:15:48 Traceback (most recent call last):
2022-07-28 00:15:48   File "temp/Overview.py", line 3, in <module>
2022-07-28 00:15:48     main = paddle.static.Program()
2022-07-28 00:15:48 NameError: name 'paddle' is not defined
2022-07-28 00:15:48 
2022-07-28 00:15:48 
2022-07-28 00:15:48 Sample code error found in  ../docs/api/paddle/incubate/autograd/Overview_cn.rst :
2022-07-28 00:15:48 
2022-07-28 00:15:48 Traceback (most recent call last):
2022-07-28 00:15:48   File "temp/Overview.py", line 3, in <module>
2022-07-28 00:15:48     exe.run(startup)
2022-07-28 00:15:48 NameError: name 'exe' is not defined
2022-07-28 00:15:48 
2022-07-28 00:15:48 
2022-07-28 00:15:48 Sample code error found in  ../docs/api/paddle/incubate/autograd/Overview_cn.rst :
2022-07-28 00:15:48 
2022-07-28 00:15:48 Traceback (most recent call last):
2022-07-28 00:15:48   File "temp/Overview.py", line 35, in <module>
2022-07-28 00:15:48     _, p_g = opt.minimize(loss)
2022-07-28 00:15:48   File "<decorator-gen-253>", line 2, in minimize
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/base.py", line 300, in __impl__
2022-07-28 00:15:48     return func(*args, **kwargs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/optimizer/optimizer.py", line 1222, in minimize
2022-07-28 00:15:48     no_grad_set=no_grad_set)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/optimizer/optimizer.py", line 925, in backward
2022-07-28 00:15:48     callbacks)
2022-07-28 00:15:48   File "<decorator-gen-245>", line 2, in append_backward_new
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
2022-07-28 00:15:48     return wrapped_func(*args, **kwargs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 517, in __impl__
2022-07-28 00:15:48     return func(*args, **kwargs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/optimizer/optimizer.py", line 64, in append_backward_new
2022-07-28 00:15:48     orig2prim(block)
2022-07-28 00:15:48   File "<decorator-gen-434>", line 2, in orig2prim
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
2022-07-28 00:15:48     return wrapped_func(*args, **kwargs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 517, in __impl__
2022-07-28 00:15:48     return func(*args, **kwargs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/incubate/autograd/primx.py", line 538, in orig2prim
2022-07-28 00:15:48     _lower(block, reverse=False, blacklist=[])
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/incubate/autograd/primx.py", line 489, in _lower
2022-07-28 00:15:48     attrs=attrs)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 2734, in __init__
2022-07-28 00:15:48     proto = OpProtoHolder.instance().get_op_proto(type)
2022-07-28 00:15:48   File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/framework.py", line 2595, in get_op_proto
2022-07-28 00:15:48     raise ValueError("Operator \"%s\" has not been registered." % type)
2022-07-28 00:15:48 ValueError: Operator "tanh_grad" has not been registered.
2022-07-28 00:15:48 
2022-07-28 00:15:48 ****************************************************
2022-07-28 00:15:48 ----------------End of the Check--------------------
2022-07-28 00:15:48 ****************************************************
2022-07-28 00:15:48 Error sample code number is:4
2022-07-28 00:15:48 Error type two sample number is:4
2022-07-28 00:15:48 Error raised from type two:running error sample code. ['../docs/api/paddle/incubate/autograd/Overview_cn.rst_2', '../docs/api/paddle/incubate/autograd/Overview_cn.rst_3', '../docs/api/paddle/incubate/autograd/Overview_cn.rst_4', '../docs/api/paddle/incubate/autograd/Overview_cn.rst_5']
2022-07-28 00:15:48 Mistakes found in sample codes.

这是因为该示例代码是分段渐进叙述的,每一段都不是完整的示例代码

这些都可以考虑直接加入到 api_white_list.txt 中直接 ignore 掉,这也是有先例的,比如 paddle/hapi/hub/Overview_cn.rst 就在该列表中(这应当是 paddle/hub/Overview_cn.rst API 以前的路径?),因此我们也许可以将 paddle/incubate/autograd/Overview_cn.rstpaddle/hub/Overview_cn.rst 加入该列表中避免示例代码问题

此外,由于我们现在示例代码已经基本全部采用 COPY-FROM 取代,因此也可以考虑取消 docs repo 里的示例代码检查,将示例代码检查完全交给 Paddle repo 里的 CI 来进行

@SigureMo
Copy link
Member Author

SigureMo commented Aug 1, 2022

你merge develop后再用工具全量扫描一遍,我刚才全量格式化后有1351个文档文件被修改,是这段时间新增的。

有些可能是 docs/api/paddle/fluid/ 下的,fluid 下的 API 文档也需要更新嘛?我之前为了避免一些 CI 问题就暂时把 fluid 下的全部 ignore 了,现在每次我都会运行下面的命令

docufix --all-rules --ignore-globs='docs/api/paddle/fluid/**' '**/*.md' '**/*.rst' --fix

也就是 ignore 了 fluid 下的,目前除了 fluid 下的文档应该都没有问题

如果有必要的话我把 fluid 下的也一并修了

@SigureMo
Copy link
Member Author

SigureMo commented Aug 1, 2022

hook的版本可以按照Paddle仓库的配置文件统一下:
Lucas-C/pre-commit-hooks->v1.1.14
yapf->v0.32.0
pre-commit/pre-commit-hooks->v4.1.0

配置文件已更新完毕~

@betterpig
Copy link
Contributor

你merge develop后再用工具全量扫描一遍,我刚才全量格式化后有1351个文档文件被修改,是这段时间新增的。

有些可能是 docs/api/paddle/fluid/ 下的,fluid 下的 API 文档也需要更新嘛?我之前为了避免一些 CI 问题就暂时把 fluid 下的全部 ignore 了,现在每次我都会运行下面的命令

docufix --all-rules --ignore-globs='docs/api/paddle/fluid/**' '**/*.md' '**/*.rst' --fix

也就是 ignore 了 fluid 下的,目前除了 fluid 下的文档应该都没有问题

如果有必要的话我把 fluid 下的也一并修了

暂时先不更新吧,fluid在逐渐退场。
我看PR和develop还有冲突需要解决。

@SigureMo
Copy link
Member Author

SigureMo commented Aug 1, 2022

我看PR和develop还有冲突需要解决。

刚刚才出现的新冲突 😂,已经 merge 并 resolve 了

@SigureMo SigureMo changed the title [Docs][cn] fix some style issue [Docs][hook] fix some style issue Aug 1, 2022
@Ligoml Ligoml requested a review from betterpig August 4, 2022 03:56
@SigureMo
Copy link
Member Author

SigureMo commented Aug 4, 2022

之后是一些 COPY-FROM 的修改,确保 COPY-FROM 能正常显示

以及合并 upstream 并重新格式化

HDFSClient_cn 那个文件由于即便是用了 COPY-FROM 也因为英文文档那边的问题 copy 不过来,修改后英文文档下示例代码会跑不了,因此就恢复了,目前只有 HDFSClient_cnfluid 下的 API 是没有格式化的

Copy link
Collaborator

@Ligoml Ligoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for docs

@SigureMo SigureMo changed the title [Docs][hook] fix some style issue [Docs][hook] add a pre-commit hook to automatically insert space to cn and en char Aug 4, 2022
Copy link
Contributor

@betterpig betterpig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants