[Feature] add sympy2func #480

AndPuQing · 2023-08-09T16:51:21Z

PR types

New features

PR changes

Others

Describe

添加对 sympy 表达式转化为function的支持

【快乐开源】完善PaddleScience对符号计算(symbolic computation)功能的支持 #437

paddle-bot · 2023-08-09T16:51:26Z

Thanks for your contribution!

ppsci/equation/sympy2func.py

HydrogenSulfate · 2023-08-10T03:56:42Z

ppsci/equation/sympy2func.py

+class DerivativeNode(nn.Layer):
+    """
+    A node representing a derivative in the computational graph
+
+    Args:
+        expr (sympy.Expr): the expression to be derived
+        syms (List[Tuple[sympy.Symbol, int]]): the symbols to be derived and their orders
+
+    Returns:
+        the value of the derivative
+    """
+
+    def __init__(self, expr: sympy.Expr, syms: List[Tuple[sympy.Symbol, int]]):
+        super().__init__()
+        self.expr = expr
+        self.syms = syms
+
+    def forward(self, x):
+        x_value = self.expr(x)
+        for sym, order in self.syms:
+            sym_value = sym(x)
+            if order == 1:
+                x_value = jacobian(x_value, sym_value)
+            elif order == 2:
+                x_value = hessian(x_value, sym_value)
+            else:
+                raise NotImplementedError(
+                    f"Higher order derivatives are not implemented yet, got {order}"
+                )
+        return x_value


DerivativeNode 对应的是 sympy里 Derivative 这种结点，所以init里应该接受一个这样的Node，然后初始化记录：微分函数以及对应的微分自变量以及微分阶数三个字段，然后forward里去x中获取微分函数和微分自变量的值，然后根据阶数进行微分调用，并且需要考虑混合型偏微分的情况，如 $\dfrac{\partial^2 u}{\partial x \partial y}$

其余Node应该也遵循类似设计，与sympy不同的Node一一对应

这里是通过函数 process_sympy_expression 中的转换逻辑实现的，

elif expr.is_Derivative: model = process_sympy_expression(expr.args[0]) syms = [(process_sympy_expression(sym), order) for sym, order in expr.args[1:]] return DerivativeNode(model, syms)

其中的model便是解析因变量(通常情况下是一个LayerNode)，而syms是一个列表，组织形式是 (自变量SymbolNode, order)

HydrogenSulfate · 2023-08-10T04:04:02Z

ppsci/equation/sympy2func.py

+class ExtraFuncNode(nn.Layer):
+    """
+    A node representing a extra function in the computational graph
+
+    Args:
+        fun (sympy.Function): the function
+        args (List[paddle.nn.Layer]): the arguments of the function
+
+    Returns:
+        the result of the function
+
+    Note:
+        This is used to handle the case where the function is a neural network
+
+    Examples:
+        >>> x, y = sympy.symbols("x y")
+        >>> u = sympy.Function("u")(x, y)
+        >>> fun = sympy.Derivative(u, x, y)
+        >>> fun = sympy_to_function(fun)
+        >>> fun({u: model, x: paddle.to_tensor(0.5), y: paddle.to_tensor(0.5)})
+
+        Other cases:
+
+        >>> x, y = sympy.symbols("x y")
+        >>> u = sympy.Function("u")(x, y)
+        >>> v = sympy.Function("v")(x, y)
+        >>> fun = sympy.Derivative(u, x, y) + sympy.Derivative(v, x, y)
+        >>> fun = sympy_to_function(fun)
+        >>> fun({u: (model, 0), v: (model, 1), x: paddle.to_tensor(0.5), y: paddle.to_tensor(0.5)})
+    """
+
+    def __init__(self, fun: sympy.Function, *args):
+        super().__init__()
+        assert isinstance(fun, sympy.Function)
+        self.fun = fun
+        self.args = args
+
+    def forward(self, x: Dict):
+        model = x[self.fun]
+        if isinstance(model, tuple):
+            model, pos = model
+            return model(*[arg(x) for arg in self.args])[
+                pos
+            ]  # TODO(PuQing): lazy computing for model, avoid multiple computing
+        return model(*[arg(x) for arg in self.args])


类名可以改为LayerNode或者FunctionNode，在不看docstring的情况下，ExtraNode不太能直接说明这个类是做什么的

paddlescience的模型 forward 接受的都是 Dict[str, paddle.Tensor]，这里model.__call__传的是*args

需要考虑多模型的情况，比如案例 bracket.py 有两个模型，他们接受同样的输入 t，x，y，但是输出不同，这里可能需要考虑两个点：尽量让用户的少写代码的前提下，让本模块知道一个 FuncNode 内部的model 是哪一个；调用多次FuncNode应该是本模块中比较常见的场景，所以输出变量需要进行类似于cache的操作

这个类修改为了LayerNode 这类主要是考虑到可能有些变量是需要外部模型前向传播的，或者是用户自定义的算子

已修改
3.1. 这个是支持的，写法为

x, y = sympy.symbols("x y") u = sympy.Function("u")(x, y) v = sympy.Function("v")(x, y) func = sympy.Derivative(u, x, y) + sympy.Derivative(v, x, y) func = sympy_to_function(func) func({u: model, x: paddle.to_tensor(0.5), y: paddle.to_tensor(0.5)}) # The model should have output_keys = ["u", "v"]

这里具有多个输出symbol的模型，只提供一个{u : model}即可(甚至u可以为其他任意名称)，LayerNode 会根据模型中的output_keys 用于推断该 symbol 用哪个模型的输出进行表示。

但是缺陷很明显，其模型上必须有 output_keys 属性，并且每个模型的 output_keys 应不能重叠。另外还考虑过之前的写法，即显式的连接symbol和模型, 用户写法如下：

func = sympy_to_function(func) func({u: model,v: model, x: paddle.to_tensor(0.5), y: paddle.to_tensor(0.5)})

3.2 有点粗暴的用类变量实现了cache，缓存时以模型输出字典的keys为键值，模型输出Tensor为value, 可能会有释放的问题？？

“每个模型都需要带有output_keys这个属性”，是paddlescience基于科学计算场景特殊性的规定。不同于图像分类场景，绝大多数情况下输就只有features、logits这两类输出，科学计算中不同案例可能会使用同一个模型，但输出变量的含义可能差别很大，所以用户必须显式指定每个模型的output_keys属性，以供后续以其为key在字典中取值。并且这个成本并不高。

根据output_keys判断使用哪个模型进行前向计算是没问题的，因为如果有两个模型的输出变量相同，那么用户就应该在output_keys里通过不同命名将他们区别开来，比如u_1、u_2，这是一个正常的行为，否则如果两个都叫u，即便在套件层面处理好了，用户自己也很容易混淆到底哪个u来自第一个模型，哪个u来自第二个模型。

HydrogenSulfate · 2023-08-10T04:06:50Z

ppsci/equation/sympy2func.py

+        Tensor(shape=[1], dtype=float32, place=CPUPlace, stop_gradient=True,
+                [0.47942555])
+    """
+    return process_sympy_expression(sympy.expand(expr))


这里expand的作用是什么呢

想对表达式一定程度上的化简，避免对于同一个东西重复计算

想对表达式一定程度上的化简，避免对于同一个东西重复计算

可以举一个例子吗，感觉这个expand主要是会影响表达式树的形态，让表达式展开、粒度更细，可以影响树高、结点个数，但是跟重复计算好像关系不大？初步感觉如果expand之前存在重复计算，expand之后也不能消除这部分重复计算吧？

HydrogenSulfate · 2023-08-11T09:10:09Z

ppsci/equation/sympy2func.py

+FUNC_MAP = {
+    sympy.sin: paddle.sin,
+    sympy.cos: paddle.cos,
+    sympy.exp: paddle.exp,
+    sympy.Pow: paddle.pow,
+    sympy.sqrt: paddle.sqrt,
+    sympy.log: paddle.log,
+    sympy.tan: paddle.tan,
+    sympy.Mul: paddle.multiply,
+    sympy.Add: paddle.add_n,
+}


这里还需要注意一下，sympy.min/max两个函数是取最大/最小值，需要映射成paddle.minimum和paddle.maximum，而不是带有reduce功能的paddle.min和paddle.max

HydrogenSulfate · 2023-08-11T09:12:16Z

ppsci/equation/sympy2func.py

+        >>> node({x: paddle.to_tensor(0.5)})
+        Tensor(shape=[], dtype=float32, place=Place(gpu:0), stop_gradient=True,
+         0.47942555)


这个Examples的预期输出是不是有点问题，0.5转成tensor怎么变成0.47942555了

HydrogenSulfate · 2023-08-11T09:15:44Z

ppsci/equation/sympy2func.py

+        self.syms = syms
+
+    def forward(self, x):
+        x_value = self.expr(x)


self.expr(x) 是什么操作，这是一个可以call的对象吗？

HydrogenSulfate · 2023-08-11T09:16:43Z

ppsci/equation/sympy2func.py

+
+    def forward(self, x):
+        x_value = self.expr(x)
+        for sym, order in self.syms:


命名可以不用简写，sym->symbol

有点不懂，sym是可以call的对象吗，我本地简单试了下symbol和Expr都不是能call的对象

HydrogenSulfate · 2023-08-11T09:27:56Z

ppsci/equation/sympy2func.py

+            for model in x.values():
+                if hasattr(model, "output_keys"):
+                    output_keys: Dict = model.output_keys
+                    if self.func.name in output_keys:
+                        model_output_dict: Dict = model(
+                            {arg.symbol.name: arg(x) for arg in self.args}
+                        )
+                        for key in output_keys:
+                            self._MODEL_OUTPUT_CACHE[key] = model_output_dict[key]
+                        break


传进来的x应该是Dict[str, paddle.Tensor]类型，for model in x.values()是什么意思呢

实际上LayerNode才是真正的FuncNode，因为像z=x+y**2这种运算，从数学上来讲，z确实是关于x和y的函数，但在sympy中，z并不是用Function类来表示，反而是一个Add类型的对象，这个Add的儿子结点分别是x和y**2，代码如下
import sympy as sp x, y = sp.symbols('x y') z = x + y ** 2 print(z.is_Function) # False print(type(z)) # <class 'sympy.core.add.Add'> u = sp.Function('u')(x, y) print(u.is_Function) # True print(type(u)) # u
这意味着只有node.is_Function为True时，这个node才涉及到model计算，否则都是Operator Node，而不是Function Node

HydrogenSulfate · 2023-08-11T09:32:28Z

ppsci/equation/sympy2func.py

+
+    def forward(self, x: Dict):
+        # check if the model output is in the cache
+        model_output = self._MODEL_OUTPUT_CACHE.get(self.func.name)


我是想着把整颗树上的所有中间结果（结点）都进行cache，而不只是对model计算结果进行cache，这样能最大限度减少计算，所以output_cache这个字典可以直接蕴含在x里（python里传递可变类型dict作为参数，类似于C++的引用参数），而不用独立出一个变量，最后全部计算完毕只需要从这个大cache字典中筛选出需要返回的变量即可

HydrogenSulfate · 2023-08-11T11:08:22Z

ppsci/equation/sympy2func.py

+    elif expr.is_Function or expr.is_Pow or expr.is_Mul or expr.is_Add:
+        args = [process_sympy_expression(arg) for arg in expr.args]
+        try:
+            paddle_func = FUNC_MAP[expr.func]
+            return FuncNode(paddle_func, *args)
+        except KeyError:
+            logger.warning(
+                f"Note that you appear to be using a non-built-in function {expr}, please pass in that when you call the function"
+            )
+            return LayerNode(expr, *args)


感觉可以这么分类：NumberNode、OperatorNode、Function(Layer)Node？Derivative归为OperatorNode类，前向计算归为FunctionNode，至于SymbolNode感觉不是很有必要存在？因为它只起到一个identity的功能，类似于直接返回。paddlescience的代码在设计上会优先考虑减少不必要的代码

cachedict 的 key全部使用 str 好了，只是针对不同的node可能需要写一个类似于get_key的函数，因为Function的key应该是其本身的函数名f，而不是f(x,y)，其余node类型也可能需要用if-else来得到所需的string key

HydrogenSulfate

整体思路和结构基本OK，除了docstring的修改之外，其他地方需要再稍微修改一下

HydrogenSulfate · 2023-08-14T07:33:50Z

ppsci/equation/sympy2func.py

+                if order == 1:
+                    inputs_dict[expr_str] = jacobian(expr_tensor, symbol_tensor)
+                elif order == 2:
+                    inputs_dict[expr_str] = hessian(expr_tensor, symbol_tensor)
+                else:
+                    logger.warning(
+                        f"The order {order} of the derivative is not supported, the order should be 1 or 2."
+                    )


这里单个自变量对某一个因变量的求微分操作，感觉可以封装成一个函数，然后逻辑如下：

def single_derivate_func(dvar: paddle.Tensor, invar: paddle.Tensor, order: int): order_left = order while order_left > 0: if order_left >= 2: dvar = hessian(dvar, invar) order_left -= 2 else: dvar = jacobian(dvar, invar) order_left -= 1 return dvar

上面这个函数可以求任意阶的单变量偏微分，所以任意阶的混合偏微分就可以通过以下代码实现

for symbol, order in symbols: symbol_tensor = inputs_dict[str(symbol)] inputs_dict[expr_str] = single_derivate_func( inputs_dict[expr_str], symbol_tensor, order, )

比如求（一个复杂的情况，实际应该不会出现这种） $\dfrac{\partial ^4 u}{\partial x^2 \partial y \partial z}$ ，那么上述迭代式的代码就会先求 $t0 = \dfrac{\partial ^2 u} {\partial x^2}$ ，再求 $t1 = \dfrac{\partial (t0)} {\partial y}$ ，再求 $t2 = \dfrac{\partial (t1)} {\partial z}$

HydrogenSulfate · 2023-08-14T07:42:45Z

ppsci/equation/sympy2func.py

+                logger.warning(
+                    f"The operator {self.expr.func} is not supported, please add it to FUNC_MAP."
+                )


这里应该从警告改为raise NotImplementedError吧，如果没实现对应计算逻辑，结果应该算不出来或者是错的

HydrogenSulfate · 2023-08-14T08:07:25Z

ppsci/equation/sympy2func.py

+    def __init__(self, expr: sympy.core.function.UndefinedFunction):
+        super().__init__(expr)
+
+    def forward(self, inputs_dict: Dict, models_dict: Dict[str, nn.Layer]):


个人感觉LayerNode应该在init里就记录好自身对应的model是哪个，因为把model作为forward的参数显得非常奇怪，但目前Equation和Model模块是解耦的，好像也没什么好办法做到这一点……

HydrogenSulfate · 2023-08-14T08:11:38Z

ppsci/equation/sympy2func.py

+        super().__init__(expr)
+
+    def forward(self, inputs_dict: Dict, models_dict: Dict[str, nn.Layer] = None):
+        expr_str = str(self.expr)


对于Derivetive结点，其对应的key不用str(self.expr)，而是直接转成双下划线分割的字符串吧

HydrogenSulfate · 2023-08-14T08:14:48Z

ppsci/equation/sympy2func.py

+                    expr_key = (
+                        f"{key}({', '.join([str(arg) for arg in self.expr.args])})"
+                    )
+                    inputs_dict[expr_key] = model_output[key]


这里和Derivetive结点同样的问题，能否直接用key来记录，而不是用expr_key来记录？

HydrogenSulfate · 2023-08-14T08:25:05Z

ppsci/equation/sympy2func.py

+        for (
+            key,
+            value,
+        ) in layer_key_maps.items():  # rename the layer key e.g. u(x, y) to u
+            for inputs_key in list(inputs_dict.keys()):
+                if key in inputs_key:
+                    inputs_dict[inputs_key.replace(key, value)] = inputs_dict.pop(
+                        inputs_key
+                    )


这里感觉也可以去掉了？

HydrogenSulfate · 2023-08-14T08:30:32Z

ppsci/equation/sympy2func.py

+    nodes = []
+
+    def traverse_expression(expr, nodes):
+        nodes.insert(0, expr)


这个后序遍历写法有点怪，应该有更自然的写法吧

def post_traverse(expr, nodes): # traverse into sub-nodes for arg in expr.args: nodes = post_traverse(arg, nodes) # process current node if expr.func == sympy.Derivative: nodes.append(expr.args[0]) else: nodes.append(expr) return nodes

HydrogenSulfate · 2023-08-14T08:38:06Z

ppsci/equation/sympy2func.py

+        return inputs_dict
+
+
+def get_expression_nodes(expr: sympy.Expr) -> List[sympy.Expr]:


rename to sort_nodes_as_post_order

HydrogenSulfate · 2023-08-14T08:38:47Z

ppsci/equation/sympy2func.py

+         'sin(u)*cos(v)': Tensor(shape=[1, 1], dtype=float32, place=Place(gpu:0), stop_gradient=False,
+                [[0.84003019]])}
+    """
+    expression_nodes = get_expression_nodes(expr)


sort_nodes_as_post_order

HydrogenSulfate · 2023-08-14T08:43:47Z

ppsci/equation/sympy2func.py

+    expression_nodes = list(dict.fromkeys(expression_nodes))  # remove duplicates
+    nodes = []
+    for node in expression_nodes:
+        if isinstance(node.func, sympy.core.function.UndefinedFunction):


感觉is_Function=True等价于LayerNode？或者说有什么更好的判断方法，引入sympy.core.function.UndefinedFunction会增加对sympy本身的理解成本（我不能明确地知道这是啥意思，表示用户自定义函数？），尤其是这个判断跟is_Function之间的区别

luotao1 · 2023-08-30T10:08:59Z

close due the following PR is merged：

[Feature&Fix] Add sympy_to_func #505

add sympy2func

2d95737

paddle-bot bot added the contributor label Aug 9, 2023

HydrogenSulfate requested changes Aug 10, 2023

View reviewed changes

AndPuQing added 2 commits August 10, 2023 14:33

fix node

f8bb795

fix var name

b31a785

luotao1 added the HappyOpenSource 快乐开源活动issue与PR label Aug 11, 2023

luotao1 assigned luotao1 and HydrogenSulfate Aug 11, 2023

HydrogenSulfate reviewed Aug 11, 2023

View reviewed changes

fix implementation

d9d30d1

HydrogenSulfate requested changes Aug 14, 2023

View reviewed changes

luotao1 closed this Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] add sympy2func #480

[Feature] add sympy2func #480

AndPuQing commented Aug 9, 2023 •

edited by luotao1

Loading

paddle-bot bot commented Aug 9, 2023

HydrogenSulfate Aug 10, 2023 •

edited

Loading

AndPuQing Aug 10, 2023

HydrogenSulfate Aug 10, 2023

AndPuQing Aug 10, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 10, 2023

AndPuQing Aug 10, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023

HydrogenSulfate Aug 11, 2023 •

edited

Loading

HydrogenSulfate Aug 11, 2023

HydrogenSulfate left a comment

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023

HydrogenSulfate Aug 14, 2023 •

edited

Loading

luotao1 commented Aug 30, 2023

		return inputs_dict


		def get_expression_nodes(expr: sympy.Expr) -> List[sympy.Expr]:

[Feature] add sympy2func #480

[Feature] add sympy2func #480

Conversation

AndPuQing commented Aug 9, 2023 • edited by luotao1 Loading

PR types

PR changes

Describe

paddle-bot bot commented Aug 9, 2023

HydrogenSulfate Aug 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HydrogenSulfate Aug 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HydrogenSulfate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HydrogenSulfate Aug 14, 2023 • edited Loading

Choose a reason for hiding this comment

luotao1 commented Aug 30, 2023

AndPuQing commented Aug 9, 2023 •

edited by luotao1

Loading

HydrogenSulfate Aug 10, 2023 •

edited

Loading

HydrogenSulfate Aug 11, 2023 •

edited

Loading

HydrogenSulfate Aug 14, 2023 •

edited

Loading