From 726ea223d34259d6807fe688b23ed6233bcfe1de Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Wed, 13 Sep 2023 19:35:58 +0800
Subject: [PATCH 01/13] add 20230913_api_design_for_masked_fill.md

---
 .../20230913_api_design_for_masked_fill.md    | 253 ++++++++++++++++++
 1 file changed, 253 insertions(+)
 create mode 100644 rfcs/APIs/20230913_api_design_for_masked_fill.md
diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
new file mode 100644
index 000000000..400a5b158
--- /dev/null
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -0,0 +1,253 @@
+# paddle.masked_fill 设计文档
+
+| API名称      | paddle.masked_fill                     |
+| ------------ | -------------------------------------- |
+| 提交作者     | AndSonder                              |
+| 提交时间     | 2023-09-13                             |
+| 版本号       | V1.0                                   |
+| 依赖飞桨版本 | develop                                |
+| 文件名       | 20230913_api_design_for_masked_fill.md |
+
+# 一、概述
+
+## 1、相关背景
+
+`masked_fill` 是一个常用的API，该 API 的作用是根据 `mask` 信息，将 `value` 中的值填充到 `Tensor` 中 `mask` 对应为 `True` 的位置。这个功能在语义分割、序列标注等任务中经常用到。因此，在Paddle中提供该API，方便用户使用。
+
+## 2、功能目标
+
+在 Paddle 框架中，新增 `paddle.masked_fill` 对于一个Tensor，根据mask信息，将 value 中的值填充到该Tensor中mask对应为True的位置。
+
+## 3、意义
+
+该API是一个常用的API，可以方便用户使用。让用户不用自己实现该功能，提高用户的使用效率。
+
+# 二、飞桨现状
+
+目前paddle缺少相关功能实现。只能通过 paddle 现有的 API 组合实现。
+
+```python
+# paddlepaddle >= 2.0
+import paddle
+
+paddle.seed(123)
+x = paddle.rand([3, 3], dtype='float32')
+# Tensor(shape=[3, 3], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
+#        [[0.00276479, 0.45899123, 0.96637046],
+#         [0.66818708, 0.05855134, 0.33184195],
+#         [0.34202638, 0.95503175, 0.33745834]])
+
+mask = paddle.randint(0, 2, [3, 3]).astype('bool')
+# Tensor(shape=[3, 3], dtype=bool, place=CUDAPlace(0), stop_gradient=True,
+#        [[True , True , False],
+#         [True , True , True ],
+#         [True , True , True ]])
+
+def masked_fill(x, mask, value):
+    y = paddle.full(x.shape, value, x.dtype)
+    return paddle.where(mask, y, x)
+
+out = masked_fill(x, mask, 2)
+# Tensor(shape=[3, 3], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
+#        [[2.        , 2.        , 0.96637046],
+#         [2.        , 2.        , 2.        ],
+#         [2.        , 2.        , 2.        ]])
+```
+
+# 三、业内方案调研
+
+## Pytorch
+
+Pytorch中 有 API `Tensor.masked_fill_(mask, value)`
+
+在pytorch中，介绍为：
+
+```
+Fills elements of self tensor with value where mask is True. The shape of mask must be broadcastable with the shape of the underlying tensor.
+```
+
+### 实现方法
+
+
+在实现方法上, Pytorch 设计了两种实现方式，一种是CPU实现，一种是GPU实现。
+
+
+
+核心代码如下：
+
+```cpp
+// GPU 实现
+void masked_fill_kernel(TensorIterator& iter, const Scalar& value) {
+  AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(
+      kBool, kHalf, kBFloat16, kComplexHalf, iter.common_dtype(), "masked_fill_", [&]() {
+        const auto value_ = value.to<scalar_t>();
+        gpu_kernel(
+            iter, [value_] GPU_LAMBDA(scalar_t self, bool mask) -> scalar_t {
+              if (mask) {
+                return value_;
+              }
+              return self;
+            });
+      });
+}
+
+// CPU 实现
+template <typename scalar_t>
+void cpu_masked_fill_kernel(TensorIterator& iter, scalar_t value) {
+  auto loop = [&](char** data, const int64_t* strides, int64_t n) {
+    char* dst = data[0];
+    char* mask = data[1];
+    for (const auto i : c10::irange(n)) {
+      bool mask_value = *reinterpret_cast<bool*>(mask + strides[1] * i);
+
+      if (mask_value) {
+        *(scalar_t*)(dst + strides[0] * i) = value;
+      }
+    }
+  };
+  iter.for_each(loop);
+}
+
+void masked_fill_kernel(TensorIterator& iter, const Scalar& value) {
+  AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND4(kComplexHalf, kBool, kBFloat16, kHalf,
+    iter.dtype(), "masked_fill", [&] {
+      scalar_t scalar_val = value.to<scalar_t>();
+      auto mask_dtype = iter.input_dtype(0);
+      TORCH_CHECK(mask_dtype == ScalarType::Bool, "masked_fill only supports boolean masks, "
+        "but got mask with dtype ", mask_dtype);
+      cpu_masked_fill_kernel<scalar_t>(iter, scalar_val);
+    });
+}
+
+```
+
+Pytorch 在 CPU 和 GPU 上对 masked_fill 的实现方式有些不同:
+
+CPU 实现:
+
+1. 使用模板函数 cpu_masked_fill_kernel 来实现标量值填充逻辑。
+
+2. TensorIterator::for_each 启动循环,对每组数据调用 lambda 函数。
+
+3. lambda 中直接访问指针进行填充判断和赋值。
+
+4. 使用宏生成不同数据类型的特化模板。
+
+5. 调用入口做参数校验。
+
+GPU 实现: 
+
+1. 用 gpu_kernel 启动 CUDA kernel。
+
+2. 用 GPU Lambda 编写 kernel 函数体。
+
+3. kernel 函数签名为 (value, mask) -> output, 直接在 GPU 上判断和赋值。
+
+4. 用宏生成不同数据类型的 kernel。
+
+5. 调用入口转换 value 为模板类型。
+
+
+
+## Tensorflow
+
+Tensorflow 并没有直接提供 `masked_fill` 的API，但是可以通过 `tf.where` 来实现。相关讨论PR: https://github.com/tensorflow/tensorflow/pull/41975
+
+讨论结果为使用 `tf.where` 实现 `masked_fill` 的功能更加高效，因此没有提供 `masked_fill` 的API。
+
+# 四、对比分析
+
+- Pytorch 自定义Kernel的方式更加高效
+- Tensorflow 通过 `tf.where` 实现 `masked_fill`
+
+
+# 五、方案设计
+
+## 命名与参数设计
+
+paddle.masked_fill(input, mask, value, inplace=False)
+
+paddle.masked_fill_(input, mask, value, inplace=False)
+
+Tensor.masked_fill(input, mask, value)
+
+Tensor.masked_fill_(input, mask, value)
+
+masked_fill_支持inplace方式修改输入张量。
+
+- `input (Tensor)`: 输入的张量，需要进行填充操作。
+- `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
+- `value (Tensor, bool, int, float)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `inplace (bool, optional)`: 是否进行 inplace 操作。如果设置为 True，则会直接修改输入张量，否则返回一个新的张量，默认为 False。
+
+
+## 底层OP设计
+
+参考飞桨现有算子，分别实现cpu和cuda的算子kernel。对于value是Tensor和非Tensor的两种不同情况各自使用单独的OP。
+
+## API实现方案
+
+在 python/paddle/tensor/manipulation.py 中增加 masked_fill 以及 masked_fill_ 函数，分别通过_C_ops调用底层算子。
+
+- 首先检查输入参数的合法性，然后调用底层算子
+- 如果value的值是多维度的张量，则需要报错
+- 检查mask和input的形状是否一致，如果不一致，则检查是否可以broadcast，如果不能broadcast，则报错
+- 调用 CPU/GPU 的算子进行计算
+
+CPU Kernel实现方案预定为使用Where Op实现，GPU 计算Kernel实现方案预定为使用CUDA Kernel实现。
+
+## 代码实现文件路径
+
+CPU中正向和反向计算： paddle/phi/kernels/cpu/masked_fill_scalar_kernel.cc paddle/phi/kernels/cpu/masked_fill_scalar_grad_kernel.cc paddle/phi/kernels/cpu/masked_fill_tensor_kernel.cc paddle/phi/kernels/cpu/masked_fill_tensor_grad_kernel.cc
+
+GPU中正向和反向计算: paddle/phi/kernels/gpu/masked_fill_scalar_kernel.cu paddle/phi/kernels/gpu/masked_fill_scalar_grad_kernel.cu paddle/phi/kernels/gpu/masked_fill_tensor_kernel.cu paddle/phi/kernels/gpu/masked_fill_tensor_grad_kernel.cu
+
+```cpp
+template <typename T, typename Context>
+void MasedFillScalarKernel(const Context& dev_ctx,
+                           const DenseTensor& x,
+                           const DenseTensor& mask,
+                           float value,
+                           DenseTensor* output);
+
+template <typename T, typename Context>
+void MasedFillTensorKernel(const Context& dev_ctx,
+                           const DenseTensor& x,
+                           const DenseTensor& mask,
+                           const DenseTensor& value,
+                           DenseTensor* output);
+
+```
+
+算子注册路径与算子实现路径相同。
+
+函数API实现路径: python/paddle/tensor/manipulation.py
+
+单元测试路径：在 Paddle repo 的 test/ 目录, 同时在 paddle/test/legacy_test/test_inplace.py 中新增对应的inplace api 单测
+
+
+# 六、测试和验收的考量
+
+测试考虑的case如下：
+
+- 输入的mask和input的形状不一致，但是可以broadcast
+- 校验参数 value 的正确性， 是否是支持的数据类型，当 value 是0维 tensor 时梯度正确回传
+- 测试在进行反向梯度计算时结果的正确性
+- 错误检查：输入x不是Tensor时,能否正确抛出错误
+
+
+# 七、可行性分析及规划排期
+
+方案实施难度可控，工期上可以满足在当前版本周期内开发完成。
+
+# 八、影响面
+
+为独立新增API，对其他模块没有影响
+
+# 名词解释
+
+无
+
+# 附件及参考资料
+
+无
\ No newline at end of file

From 43d7aacb4ef581b3d4301d7faacb54e8db170e36 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 17:14:56 +0800
Subject: [PATCH 02/13] fix

---
 .../20230913_api_design_for_masked_fill.md    | 52 ++++++-------------
 1 file changed, 17 insertions(+), 35 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 400a5b158..93ec76e70 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -66,6 +66,11 @@ Pytorch中 有 API `Tensor.masked_fill_(mask, value)`
 Fills elements of self tensor with value where mask is True. The shape of mask must be broadcastable with the shape of the underlying tensor.
 ```
 
+其中输入参数的描述如下：
+
+- mask (BoolTensor) – the boolean mask
+- value (float) – the value to fill in with
+
 ### 实现方法
 
 
@@ -165,61 +170,38 @@ Tensorflow 并没有直接提供 `masked_fill` 的API，但是可以通过 `tf.w
 
 ## 命名与参数设计
 
-paddle.masked_fill(input, mask, value, inplace=False)
+paddle.masked_fill(input, mask, value)
 
-paddle.masked_fill_(input, mask, value, inplace=False)
+paddle.masked_fill_(input, mask, value)
 
-Tensor.masked_fill(input, mask, value)
+Tensor.masked_fill(mask, value)
 
-Tensor.masked_fill_(input, mask, value)
+Tensor.masked_fill_(mask, value)
 
 masked_fill_支持inplace方式修改输入张量。
 
 - `input (Tensor)`: 输入的张量，需要进行填充操作。
 - `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
-- `value (Tensor, bool, int, float)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `value (Tensor, bool, int, float, complex)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
 - `inplace (bool, optional)`: 是否进行 inplace 操作。如果设置为 True，则会直接修改输入张量，否则返回一个新的张量，默认为 False。
 
 
 ## 底层OP设计
 
-参考飞桨现有算子，分别实现cpu和cuda的算子kernel。对于value是Tensor和非Tensor的两种不同情况各自使用单独的OP。
+依赖python实现，无需底层op支持。
 
 ## API实现方案
 
-在 python/paddle/tensor/manipulation.py 中增加 masked_fill 以及 masked_fill_ 函数，分别通过_C_ops调用底层算子。
-
-- 首先检查输入参数的合法性，然后调用底层算子
-- 如果value的值是多维度的张量，则需要报错
-- 检查mask和input的形状是否一致，如果不一致，则检查是否可以broadcast，如果不能broadcast，则报错
-- 调用 CPU/GPU 的算子进行计算
-
-CPU Kernel实现方案预定为使用Where Op实现，GPU 计算Kernel实现方案预定为使用CUDA Kernel实现。
-
-## 代码实现文件路径
+在 python/paddle/tensor/manipulation.py 中增加 masked_fill 以及 masked_fill_ 函数。
 
-CPU中正向和反向计算： paddle/phi/kernels/cpu/masked_fill_scalar_kernel.cc paddle/phi/kernels/cpu/masked_fill_scalar_grad_kernel.cc paddle/phi/kernels/cpu/masked_fill_tensor_kernel.cc paddle/phi/kernels/cpu/masked_fill_tensor_grad_kernel.cc
-
-GPU中正向和反向计算: paddle/phi/kernels/gpu/masked_fill_scalar_kernel.cu paddle/phi/kernels/gpu/masked_fill_scalar_grad_kernel.cu paddle/phi/kernels/gpu/masked_fill_tensor_kernel.cu paddle/phi/kernels/gpu/masked_fill_tensor_grad_kernel.cu
-
-```cpp
-template <typename T, typename Context>
-void MasedFillScalarKernel(const Context& dev_ctx,
-                           const DenseTensor& x,
-                           const DenseTensor& mask,
-                           float value,
-                           DenseTensor* output);
-
-template <typename T, typename Context>
-void MasedFillTensorKernel(const Context& dev_ctx,
-                           const DenseTensor& x,
-                           const DenseTensor& mask,
-                           const DenseTensor& value,
-                           DenseTensor* output);
+通过full和where实现。
 
+```python
+out = paddle.full(x.shape, value, x.dtype)
+out = paddle.where(mask, y, x)
 ```
 
-算子注册路径与算子实现路径相同。
+## 代码实现文件路径
 
 函数API实现路径: python/paddle/tensor/manipulation.py
 

From 96637dbaed7b6f5db01ade5e4ac8f4dc062ecea6 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 21:10:09 +0800
Subject: [PATCH 03/13] =?UTF-8?q?=E6=9B=B4=E6=96=B0Paddle=E7=8E=B0?=
 =?UTF-8?q?=E7=8A=B6?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../20230913_api_design_for_masked_fill.md    | 36 +++++++++++++++++--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 93ec76e70..19e8f81ac 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -31,7 +31,7 @@
 import paddle
 
 paddle.seed(123)
-x = paddle.rand([3, 3], dtype='float32')
+x = paddle.ones([3, 3], dtype='float32')
 # Tensor(shape=[3, 3], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 #        [[0.00276479, 0.45899123, 0.96637046],
 #         [0.66818708, 0.05855134, 0.33184195],
@@ -44,7 +44,7 @@ mask = paddle.randint(0, 2, [3, 3]).astype('bool')
 #         [True , True , True ]])
 
 def masked_fill(x, mask, value):
-    y = paddle.full(x.shape, value, x.dtype)
+    y = paddle.full_like(x, value, x.dtype)
     return paddle.where(mask, y, x)
 
 out = masked_fill(x, mask, 2)
@@ -54,6 +54,36 @@ out = masked_fill(x, mask, 2)
 #         [2.        , 2.        , 2.        ]])
 ```
 
+paddle.full_like 支持的参数 dtype:
+
+- x: ['bool','float16','float32','float64','int16','int32','int64','uint16']
+- fill_value: ['bool','float16','float32','float64','int16','int32','int64','uint16']
+- dtype: ['bool','float16','float32','float64','int16','int32','int64','uint16']
+
+paddle.where 支持的参数 dtype:
+
+- x: ['float16', 'float32', 'float64', 'int32', 'int64', 'uint16']
+- y: ['float16', 'float32', 'float64', 'int32', 'int64', 'uint16']
+- condition: ['bool']
+
+使用 full 和 where 组合完成的 masked_fill API，支持 broadcast 机制。
+
+```python
+x = paddle.ones([3, 3], dtype='float32')
+mask = paddle.randint(0, 2, [1, 3]).astype('bool')
+
+out = masked_fill(x, mask, 2)
+print(out)
+
+# Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+#        [[2., 1., 2.],
+#         [2., 1., 2.],
+#         [2., 1., 2.]])
+```
+
+full/full_like 和 where 均支持在 CPU 和 GPU 上运行。
+
+
 # 三、业内方案调研
 
 ## Pytorch
@@ -194,7 +224,7 @@ masked_fill_支持inplace方式修改输入张量。
 
 在 python/paddle/tensor/manipulation.py 中增加 masked_fill 以及 masked_fill_ 函数。
 
-通过full和where实现。
+通过 `paddle.full_like` 和 `paddle.where` 组合实现。
 
 ```python
 out = paddle.full(x.shape, value, x.dtype)

From ffc9e9ec071a984c7ab64068410c365a87690598 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 21:27:43 +0800
Subject: [PATCH 04/13] update

---
 .../20230913_api_design_for_masked_fill.md    | 26 +++++++++++++++----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 19e8f81ac..3acd05f83 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -54,11 +54,28 @@ out = masked_fill(x, mask, 2)
 #         [2.        , 2.        , 2.        ]])
 ```
 
+
+full/full_like 和 where 均支持在 CPU 和 GPU 上运行。
+
 paddle.full_like 支持的参数 dtype:
 
-- x: ['bool','float16','float32','float64','int16','int32','int64','uint16']
-- fill_value: ['bool','float16','float32','float64','int16','int32','int64','uint16']
-- dtype: ['bool','float16','float32','float64','int16','int32','int64','uint16']
+```python 
+CPU Kernel 
+float,double,int8_t,uint8_t,int16_t,int,int64_t,bool,float16,bfloat16,complex32,complex64
+
+GPU Kernel
+float,double,int8_t,uint8_t,int16_t,int,int64_t,bool,float16,bfloat16,complex32,complex64
+```
+
+paddle.where 支持的参数 dtype:
+
+```python 
+CPU Kernel 
+float, double, int, int64_t
+
+GPU Kernel
+float,double,int,int64_t,float16,bfloat16
+```
 
 paddle.where 支持的参数 dtype:
 
@@ -66,7 +83,7 @@ paddle.where 支持的参数 dtype:
 - y: ['float16', 'float32', 'float64', 'int32', 'int64', 'uint16']
 - condition: ['bool']
 
-使用 full 和 where 组合完成的 masked_fill API，支持 broadcast 机制。
+使用 full/full_like 和 where 组合完成的 masked_fill API，支持 broadcast 机制。
 
 ```python
 x = paddle.ones([3, 3], dtype='float32')
@@ -81,7 +98,6 @@ print(out)
 #         [2., 1., 2.]])
 ```
 
-full/full_like 和 where 均支持在 CPU 和 GPU 上运行。
 
 
 # 三、业内方案调研

From 2ddd4baa4cc512c961cdd0b18d6316ee2777dae1 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 21:29:46 +0800
Subject: [PATCH 05/13] update

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 3acd05f83..17d8cd1e8 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -77,12 +77,6 @@ GPU Kernel
 float,double,int,int64_t,float16,bfloat16
 ```
 
-paddle.where 支持的参数 dtype:
-
-- x: ['float16', 'float32', 'float64', 'int32', 'int64', 'uint16']
-- y: ['float16', 'float32', 'float64', 'int32', 'int64', 'uint16']
-- condition: ['bool']
-
 使用 full/full_like 和 where 组合完成的 masked_fill API，支持 broadcast 机制。
 
 ```python

From f19f55a6a480e99265ea98256f7f30b6824840b0 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 21:30:03 +0800
Subject: [PATCH 06/13] update

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 17d8cd1e8..659cdb190 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -57,7 +57,7 @@ out = masked_fill(x, mask, 2)
 
 full/full_like 和 where 均支持在 CPU 和 GPU 上运行。
 
-paddle.full_like 支持的参数 dtype:
+paddle.full_like 支持的 dtype:
 
 ```python 
 CPU Kernel 
@@ -67,7 +67,7 @@ GPU Kernel
 float,double,int8_t,uint8_t,int16_t,int,int64_t,bool,float16,bfloat16,complex32,complex64
 ```
 
-paddle.where 支持的参数 dtype:
+paddle.where 支持的 dtype:
 
 ```python 
 CPU Kernel 

From 1065c93746b98401c1a76771d488edbae067a13a Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Thu, 14 Sep 2023 22:59:21 +0800
Subject: [PATCH 07/13] update

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 659cdb190..9f1d54628 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -223,7 +223,7 @@ masked_fill_支持inplace方式修改输入张量。
 - `input (Tensor)`: 输入的张量，需要进行填充操作。
 - `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
 - `value (Tensor, bool, int, float, complex)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
-- `inplace (bool, optional)`: 是否进行 inplace 操作。如果设置为 True，则会直接修改输入张量，否则返回一个新的张量，默认为 False。
+- `name (str，可选)` - 具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name)，一般无需设置，默认值为 None。
 
 
 ## 底层OP设计

From 2b05c0843ae7250d40751978a156ff67113aa056 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Fri, 15 Sep 2023 12:57:34 +0800
Subject: [PATCH 08/13] update dtype

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 9f1d54628..b0f3196bd 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -220,9 +220,9 @@ Tensor.masked_fill_(mask, value)
 
 masked_fill_支持inplace方式修改输入张量。
 
-- `input (Tensor)`: 输入的张量，需要进行填充操作。
+- `input (Tensor, float, double, int, int64_t, float16, bfloat16)`: 输入的张量，需要进行填充操作。
 - `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
-- `value (Tensor, bool, int, float, complex)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `value (Tensor, float, double, int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
 - `name (str，可选)` - 具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name)，一般无需设置，默认值为 None。
 
 

From 6e09108b67e60a43add9b385cd336f3fe7c62bd4 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Fri, 15 Sep 2023 13:24:36 +0800
Subject: [PATCH 09/13] update dtype

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index b0f3196bd..4387d1496 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -222,7 +222,7 @@ masked_fill_支持inplace方式修改输入张量。
 
 - `input (Tensor, float, double, int, int64_t, float16, bfloat16)`: 输入的张量，需要进行填充操作。
 - `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
-- `value (Tensor, float, double, int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `value (Tensor, float, double,int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
 - `name (str，可选)` - 具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name)，一般无需设置，默认值为 None。
 
 

From c80563d3eff9850bcfc799499345f41e013a990c Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Fri, 15 Sep 2023 13:24:47 +0800
Subject: [PATCH 10/13] update dtype

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 4387d1496..b0f3196bd 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -222,7 +222,7 @@ masked_fill_支持inplace方式修改输入张量。
 
 - `input (Tensor, float, double, int, int64_t, float16, bfloat16)`: 输入的张量，需要进行填充操作。
 - `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
-- `value (Tensor, float, double,int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `value (Tensor, float, double, int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
 - `name (str，可选)` - 具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name)，一般无需设置，默认值为 None。
 
 

From 7d89ae46f9d5eeebaf44e6add6c4c854b9cb3228 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Fri, 15 Sep 2023 15:04:55 +0800
Subject: [PATCH 11/13] update

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index b0f3196bd..61d0f4b35 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -234,11 +234,10 @@ masked_fill_支持inplace方式修改输入张量。
 
 在 python/paddle/tensor/manipulation.py 中增加 masked_fill 以及 masked_fill_ 函数。
 
-通过 `paddle.full_like` 和 `paddle.where` 组合实现。
+通过 `paddle.where` 实现。
 
 ```python
-out = paddle.full(x.shape, value, x.dtype)
-out = paddle.where(mask, y, x)
+out = paddle.where(mask, value, x)
 ```
 
 ## 代码实现文件路径

From 976d443801b8509be319228279a246f09166b65d Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Fri, 15 Sep 2023 15:57:30 +0800
Subject: [PATCH 12/13] remove full_like

---
 .../20230913_api_design_for_masked_fill.md    | 25 ++++++-------------
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index 61d0f4b35..b6db03a00 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -31,7 +31,7 @@
 import paddle
 
 paddle.seed(123)
-x = paddle.ones([3, 3], dtype='float32')
+x = paddle.ones([3, 3], dtype='float64')
 # Tensor(shape=[3, 3], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 #        [[0.00276479, 0.45899123, 0.96637046],
 #         [0.66818708, 0.05855134, 0.33184195],
@@ -44,10 +44,9 @@ mask = paddle.randint(0, 2, [3, 3]).astype('bool')
 #         [True , True , True ]])
 
 def masked_fill(x, mask, value):
-    y = paddle.full_like(x, value, x.dtype)
-    return paddle.where(mask, y, x)
+    return paddle.where(mask, value, x)
 
-out = masked_fill(x, mask, 2)
+out = masked_fill(x, mask, 2.)
 # Tensor(shape=[3, 3], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
 #        [[2.        , 2.        , 0.96637046],
 #         [2.        , 2.        , 2.        ],
@@ -55,17 +54,7 @@ out = masked_fill(x, mask, 2)
 ```
 
 
-full/full_like 和 where 均支持在 CPU 和 GPU 上运行。
-
-paddle.full_like 支持的 dtype:
-
-```python 
-CPU Kernel 
-float,double,int8_t,uint8_t,int16_t,int,int64_t,bool,float16,bfloat16,complex32,complex64
-
-GPU Kernel
-float,double,int8_t,uint8_t,int16_t,int,int64_t,bool,float16,bfloat16,complex32,complex64
-```
+where 支持在 CPU 和 GPU 上运行。
 
 paddle.where 支持的 dtype:
 
@@ -77,13 +66,13 @@ GPU Kernel
 float,double,int,int64_t,float16,bfloat16
 ```
 
-使用 full/full_like 和 where 组合完成的 masked_fill API，支持 broadcast 机制。
+使用 where 可以完成 masked_fill API，支持 broadcast 机制。
 
 ```python
-x = paddle.ones([3, 3], dtype='float32')
+x = paddle.ones([3, 3], dtype='float64')
 mask = paddle.randint(0, 2, [1, 3]).astype('bool')
 
-out = masked_fill(x, mask, 2)
+out = masked_fill(x, mask, 2.)
 print(out)
 
 # Tensor(shape=[3, 3], dtype=float32, place=Place(gpu:0), stop_gradient=True,

From 0555927e781824e2febf78c26d185ffd8463ac83 Mon Sep 17 00:00:00 2001
From: andsonder <changlu@keter.top>
Date: Mon, 18 Sep 2023 12:18:51 +0800
Subject: [PATCH 13/13] update doc

---
 rfcs/APIs/20230913_api_design_for_masked_fill.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/rfcs/APIs/20230913_api_design_for_masked_fill.md b/rfcs/APIs/20230913_api_design_for_masked_fill.md
index b6db03a00..51d7d9bef 100644
--- a/rfcs/APIs/20230913_api_design_for_masked_fill.md
+++ b/rfcs/APIs/20230913_api_design_for_masked_fill.md
@@ -209,15 +209,15 @@ Tensor.masked_fill_(mask, value)
 
 masked_fill_支持inplace方式修改输入张量。
 
-- `input (Tensor, float, double, int, int64_t, float16, bfloat16)`: 输入的张量，需要进行填充操作。
-- `mask (Tensor, bool)`: 用于指定填充位置的布尔值掩码张量，与 input 张量形状相同。
-- `value (Tensor, float, double, int, int64_t, float16, bfloat16)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。
+- `input (Tensor)`: 输入的张量，需要进行填充操作，支持的数据类型有float、double、int、int64_t、float16和bfloat16。
+- `mask (Tensor)`: 用于指定填充位置的布尔值掩码张量，与 input 满足可广播的条件。支持的数据类型为bool。
+- `value (Tensor, scalar)`: 待填充的数据，参数类型支持布尔值、整数、浮点数以及0维的张量。支持的数据类型有float、double、int、int64_t、float16和bfloat16。
 - `name (str，可选)` - 具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name)，一般无需设置，默认值为 None。
 
 
 ## 底层OP设计
 
-依赖python实现，无需底层op支持。
+依赖已有OP(where / full)实现，无需实现新的底层Op。
 
 ## API实现方案