Serializable expressions/operators #1848

yaugenst-flex · 2024-07-22T15:34:53Z

EDIT: Details below are outdated, refer to #1848 (comment)

This might not get merged, but serves as a place to discuss the implementation of serializable operators/expressions (and eventually metrics) so that we can do serverside evaluation.

The main idea is that we have Expressions and Operators that are both instances of a Tidy3dBaseModel. These can be combined into higher-level mathematical expressions using regular Python syntax, and are still serializable. This is essentially a compromise between arbitrary user-defined functions and something that is feasible and secure for us to implement/use.

The proposed API looks something like this:

import tidy3d.plugins.invdes as tdi
from tidy3d.plugins.metrics import ModeCoefficient

m1 = ModeCoefficient(mode_index=0, direction="+")
m2 = ModeCoefficient(mode_index=0, direction="-")

post_process = (abs(m1)**2 + abs(m2)**2) / 2  # post_process is instance of `CompoundExpression`

# post_process is callable on some data locally
value = post_process(sim_data)

design = tdi.InverseDesign(
    simulation=simulation,
    design_region=design_region,
    post_processing_fn=post_process,  # and can be serialized and evaluated server-side too
)

Some points that would be great to nail down:

Scope, i.e., where do we want/plan to use this. invdes and design plugins come to mind, but @tylerflex also mentioned that we might potentially want to add monitors with user-defined postprocessing.
Related to the above: Where things should go. Currently, I put the base implementation of Expression and Operator in components/expressions/ so that it essentially becomes part of tidy3d's core and split the Metrics part into plugins/metrics/, as those might not prove to be generally useful / part of the core machinery. But it might make sense to revisit this.
Naming: What should the module be named, what should the classes be named? Does it make sense currently? My first iteration was Operation and Metric, so I think Operator and Expression is a step up from that. Any better suggestions?

Some rough edges:

Multiple arguments. Currently, the assumption is that the final callable (i.e. post_process from above) takes exactly one input argument, which is assumed to be a SimulationData. It might be nice to be able to supply multiple arguments, or even multiple arguments that enter the expression at different points in the evaluation. The former should be easy enough to tack on to the proposed architecture, the latter will require more work.
Variables (not constants) in the expressions. This is kind of similar to the above and might not be strictly necessary, but it would be nice to use variables in the expressions, the value of which gets determined at some later point.
This approach allows the user to write mathematical expressions that can be evaluated, but it might not be immediately obvious that in f = a + b, f ends up being a callable. Maybe we can come up with some syntactic sugar to make this more obvious.

Related to #1828

tylerflex · 2024-07-23T17:21:43Z

Looks great. A few random notes / things that came to mind:

Should we allow postprocess to be a function of sim_data as well? and just validate it to an Expression when we add to the InverseDesign or does this complicate things? I also slightly worry that users might get confused by adding an expression to the pydantic model field, but overall I do like how clean and simple it is!
What if the user wants to use np or anp to define operations like sum(), abs(), exp()? do we have some import tidy3d.numpy as tnp, perhaps, that can wrap these in our Operators?
What does the UI look like for expressions like (abs(m1)**2 + abs(m2)**2) / 2 ? This is obviously serialized as a computational graph, but I wonder if there should be some way to load / export these expressions to string or something? for debugging, GUI display and definition? maybe a class method constructor based on parsing a string? this could be a future feature..

Some points tha t would be great to nail down:

Scope, i.e., where do we want/plan to use this. invdes and design plugins come to mind, but @tylerflex also mentioned that we might potentially want to add monitors with user-defined postprocessing.

For now let's just do invdes. Then we can think about trying to hook this up tom the design plugin pre-process (sim setup). But other ideas include:

adding a postprocessing expression to a Monitor to specify some additional operations to apply to the data. To reduce data download / postprocessing script complexity.
Defining a medium or geometry using an expression, really not thought out examples:
- sphere = FunctionGeometry(inside_expression=(r - center) <= radius) (geometry is "in" when position is < radius)
- med = FunctionMedium(eps_expression=1 + 0.3 * x) (medium linearly increases along x)
Defining sets of structures / objects based on expressions:
- metalens = FunctionStructureGroup(expression={'center', ...)) (not sure but basically place a structure at set of locations specified).

Related to the above: Where things should go. Currently, I put the base implementation of Expression and Operator in components/expressions/ so that it essentially becomes part of tidy3d's core and split the Metrics part into plugins/metrics/, as those might not prove to be generally useful / part of the core machinery. But it might make sense to revisit this.

I think this is fine for now. Or we can put everything in plugins and move the general stuff into components later, depending on how things go? But definitely the data-specific metric stuff I think should go in plugins for now.

Naming: What should the module be named, what should the classes be named? Does it make sense currently? My first iteration was Operation and Metric, so I think Operator and Expression is a step up from that. Any better suggestions?

Operator, Expression, maybe Variable for the unknown or known constants / arguments? basically my internal model is a computational graph where the nodes (with more than one edge) are operators, and the variables are the leaf nodes. The whole graph is the expression? And an expression expecting something specific (like a SimulationData) is maybe a metric?

Some rough edges:

Multiple arguments. Currently, the assumption is that the final callable (i.e. post_process from above) takes exactly one input argument, which is assumed to be a SimulationData. It might be nice to be able to supply multiple arguments, or even multiple arguments that enter the expression at different points in the evaluation. The former should be easy enough to tack on to the proposed architecture, the latter will require more work.

Variables (not constants) in the expressions. This is kind of similar to the above and might not be strictly necessary, but it would be nice to use variables in the expressions, the value of which gets determined at some later point.

Can we evaluate the expression into a function where the *args correspond to all of the unknown variables? then we just call this function passing our variables?

This approach allows the user to write mathematical expressions that can be evaluated, but it might not be immediately obvious that in f = a + b, f ends up being a callable. Maybe we can come up with some syntactic sugar to make this more obvious.

Maybe we still allow them to write f as a callable, but then provide a way to turn callable f into an Expression by passing in some special Variables? For example

f_callable = lambda a: a + b
f_expression = f_callable(tracer_a)

this would maybe have to be done internally, but could be one way of defining the callables and validating them. For example, a user could supply an objective function as a function of sim_data, when validating this in InverseDesign, we can pass a tracer for SimulationData and catch any errors there. If it works, we store the expression?

e-g-melo · 2024-07-23T20:00:09Z

Hi @yaugenst-flex!

When you call value = post_process(sim_data), how it will decide which monitor to assign to m1 and m2?

yaugenst-flex · 2024-07-24T07:27:50Z

When you call value = post_process(sim_data), how it will decide which monitor to assign to m1 and m2?

Good question, the metric part is not fleshed out at all currently, haha. But I think the easiest would be to just supply the monitor name as an argument to a metric? My example from above would become:

m1 = ModeCoefficient(mode_index=0, direction="+", monitor_name="monitor1")
m2 = ModeCoefficient(mode_index=0, direction="-", monitor_name="monitor2")

post_process = (abs(m1)**2 + abs(m2)**2) / 2

What do you think?

e-g-melo · 2024-07-24T11:47:23Z

What do you think?

It sounds good! It should work very well for the GUI inverse design.

Regarding the scope, in addition to invades, design, and monitor post-processing, I wonder if we could create a kind of CustomDataset object that accepts these expressions as arguments and which can be appended to SimulationData at any time after running the simulation. That would be interesting for GUI and Python compatibility because, when we create custom datasets in GUI using the interface below, we could include them in the simulation results file.

For example:

from tidy3d.plugins.metrics import ModeCoefficient
from tidy3d.something import CustomDataset

m1 = ModeCoefficient(mode_index=0, direction="+", monitor_name="monitor1")
m2 = ModeCoefficient(mode_index=1, direction="-", monitor_name="monitor2")

custom_data_1 = CustomDataset(
    expression=(abs(m1)**2 + abs(m2)**2) / 2,
    name="mode1_plus_mode2",
)
sim_data.append(custom_data_1)

custom_data_2 = CustomDataset(
    expression=(abs(m1)**2 - abs(m2)**2) / 2,
    name="mode1_minus_mode2",
)
sim_data.append(custom_data_2)

sim_data.to_file(fname="SimulationData.hdf5")

sim_data = td.SimulationData.from_file(fname="SimulationData.hdf5")

custom_data_1 = sim_data["mode1_plus_mode2"].value
custom_data_2 = sim_data["mode1_minus_mode2"].value

yaugenst-flex · 2024-07-24T12:20:28Z

@e-g-melo: Yeah that sounds like a cool idea! I guess one caveat is that I'd assume most users do their data postprocessing client-side with all the power of python, so doing it this way is maybe a bit limited. But definitely worthwhile for compatibility and for anyone doing some light postprocessing on the GUI side.

@tylerflex: Ok going point by point 😃

Should we allow postprocess to be a function of sim_data as well? and just validate it to an Expression when we add to the InverseDesign or does this complicate things?

Not sure I understand. You mean function in the sense of a regular Python function? Because it is callable like this already right. If that's the case, I'm not sure how we would go about converting a regular function into an Expression, that sounds tricky...

What if the user wants to use np or anp to define operations like sum(), abs(), exp()? do we have some import tidy3d.numpy as tnp, perhaps, that can wrap these in our Operators?

We could do that, although the things that are implemented currently "just work" under autograd or regular numpy, and autodiff works too. What might be easiest to do is if we just design our operators in a way that they just do what you would expect when supplying either scalars or arraylikes to them. I think in most cases this would already work fine if we just made the operators be autograd.numpy functions. E.g., if we implement an operator Exp, that would just call anp.exp on the inputs and that automatically works with scalars, numpy arrays, and autodiff?

What does the UI look like for expressions like (abs(m1)**2 + abs(m2)**2) / 2 ? This is obviously serialized as a computational graph, but I wonder if there should be some way to load / export these expressions to string or something? for debugging, GUI display and definition? maybe a class method constructor based on parsing a string? this could be a future feature..

Yeah definitely, a lot of this can just be added to the parent class I think, this shouldn't be too hard to do.

Or we can put everything in plugins and move the general stuff into components later, depending on how things go?

I'm fine with either, I'm just thinking if we already decide that we do want to use this for regular components too, then it might make sense to include it directly, since we are going to end up moving the thing anyways.

Operator, Expression, maybe Variable for the unknown or known constants / arguments? basically my internal model is a computational graph where the nodes (with more than one edge) are operators, and the variables are the leaf nodes. The whole graph is the expression? And an expression expecting something specific (like a SimulationData) is maybe a metric?

That makes sense to me, my problem is mostly that that big expression graph has a ton of subgraphs, that's how it is constructed. So the distinction between a variable and an expression is lost as soon as you apply one operator to it. I'll have to think about it some more.

Can we evaluate the expression into a function where the *args correspond to all of the unknown variables? then we just call this function passing our variables?

Yes that is possible I think, although we would have to be really careful about the ordering of the *args in that final expression, I think this might behave in unexpected ways. Maybe we can enforce keyword arguments.

Maybe we still allow them to write f as a callable, but then provide a way to turn callable f into an Expression by passing in some special Variables? For example

I think this is a cool idea but it will require a non-trivial amount of work I think. It's not only tracing, but also recording the transformations and converting them to our operators. Have to think about it.

tylerflex · 2024-07-24T17:22:46Z

Should we allow postprocess to be a function of sim_data as well? and just validate it to an Expression when we add to the InverseDesign or does this complicate things?
Not sure I understand. You mean function in the sense of a regular Python function? Because it is callable like this already right. If that's the case, I'm not sure how we would go about converting a regular function into an Expression, that sounds tricky...

What I mean is defining the postprocesing function in the old style

def f(sim_data):
    return abs(sim_data['name'].abs.sel(...)**2)

InverseDesign(postprocess=f)

and yea I'm not sure how to convert it either, my original thought was like how autodiff compiles callable into computational graph by passing some tracer argument and recording the operations.

What if the user wants to use np or anp to define operations like sum(), abs(), exp()? do we have some import tidy3d.numpy as tnp, perhaps, that can wrap these in our Operators?
We could do that, although the things that are implemented currently "just work" under autograd or regular numpy, and autodiff works too. What might be easiest to do is if we just design our operators in a way that they just do what you would expect when supplying either scalars or arraylikes to them. I think in most cases this would already work fine if we just made the operators be autograd.numpy functions. E.g., if we implement an operator Exp, that would just call anp.exp on the inputs and that automatically works with scalars, numpy arrays, and autodiff?

This sounds good, but Im more wondering about if the user tries to do (for example using your example)

post_process = (np.abs(m1)**2 + np.abs(m2)**2) / 2

would the np.abs() cause issues?

yaugenst-flex · 2024-07-25T11:55:30Z

@tylerflex

def f(sim_data):
    return abs(sim_data['name'].abs.sel(...)**2)

InverseDesign(postprocess=f)

Yeah this seems pretty difficult to do, especially if going through a DataArray, not even regular numpy. I think that's a tradeoff we have to make at the moment, to not allow this syntax.

would the np.abs() cause issues?

Yes it would, I think. Well actually in this particular case maybe not, because it might call __abs__? But in general, yes that wouldn't work because you have to use either operators that call to the respective Python dunder methods or use the higher-level functions that we define. Maybe this can be supported in the future, but in any case we would need our own Operator that implements those behaviors, so in a sense we would be writing a new autograd.

tylerflex · 2024-07-25T11:57:35Z

in a sense we would be writing a new autograd.

Don't threaten me with a good time :D

yaugenst-flex · 2024-09-04T10:01:54Z

Closes #1944

yaugenst-flex · 2024-09-13T15:20:26Z

Changes from the Original Proposal:

The module is now entirely within plugins.metrics, avoiding changes to core tidy3d components. If this proves useful, we can move it into components later.
Everything is an Expression now, and Expressions recursively build up on their own (no more CompoundExpression)
Introduced functions module that includes some functions that are not defined by Python dunder methods such as Sin, Exp, ...
Pretty printing - printing an expression object like f = a + b**2 * abs(c) will print the assembled equation
Variables, i.e. the ability to call expressions with multiple keyword arguments (metrics are variables too and can be called with e.g. different simulation datas):
```
x = Variable(name="x")
y = Variable(name="y")
expr = x + y**2 - y
expr(x=1, y=2)
```

For a more detailed explanation refer to the readme.

@tylerflex @momchil-flex @e-g-melo

tylerflex · 2024-09-17T11:07:13Z

To @momchil-flex 's point, I think it should be possible to just have a very thin wrapper around autograd.numpy? where basically the evaluate() function calls the corresponding anp. function? could either be done manually or programmatically if we want to getattr from anp? should be safe I think since we control anp on the server

yaugenst-flex · 2024-09-17T11:08:56Z

I don't think there is a need for any special handling of autograd, since functions are already implemented as autograd. functions. everything else is differentiable out of the box

tylerflex · 2024-09-17T11:09:21Z

I guess the challenge is how do we allow the user to build these expressions just calling anp.? So for example, what we'd actually need is the ability eg to do

import metrics.numpy as np

np.exp(ModeAmps(...))

and have metrics.numpy.exp be a UnaryOperation?

yaugenst-flex · 2024-09-17T11:10:55Z

I think the user should just do:

from metrics import ModeAmps
from metrics.functions import Exp

x = ModeAmps()
f = Exp(2 * x**2)

v, g = value_and_grad(f)(sim_data)

tylerflex · 2024-09-17T11:11:00Z

I don't think there is a need for any special handling of autograd, since functions are already implemented as autograd. functions. everything else is differentiable out of the box

I guess what about functions that are not implemented? maybe the idea is to create a class (or programmatically create classes) that implement each of the autograd numpy operations?

yaugenst-flex · 2024-09-17T11:14:30Z

I guess what about functions that are not implemented? maybe the idea is to create a class (or programmatically create classes) that implement each of the autograd numpy operations?

I see. yeah I guess that's possible, but will probably run into a lot of edge cases. in particular, we currently don't really support nary operations, i.e. there is no concept of a list or an array, a Metric always returns a scalar.

Variables (and in extension Constants) do work with array types, it's just that for example there is no way to turn an array into a scalar, for example. but we can add that. do we need to add it right now though? :D

tylerflex · 2024-09-17T12:46:15Z

a Metric always returns a scalar.

hm, so basically if the user doesn't select out all of the data it will error? (eg if two modes are summed over?)

yaugenst-flex · 2024-09-17T12:48:38Z

hm, so basically if the user doesn't select out all of the data it will error? (eg if two modes are summed over?)

you sum over two modes by doing

a = ModePower(mode_index=1)
b = ModePower(mode_index=2)
expr = a + b
result = expr(sim_data)

…l expressions

…eyword) arguments

…parameters

…initial parameters" This reverts commit 74189be.

yaugenst-flex · 2024-09-24T17:34:23Z

Closing as merged in #1973

yaugenst-flex requested review from m-bone, momchil-flex, tylerflex, daquinteroflex and e-g-melo July 22, 2024 15:34

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 8c213a1 to 25edfe6 Compare July 23, 2024 11:25

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 25edfe6 to c3ca716 Compare July 29, 2024 09:34

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from c3ca716 to 256d059 Compare August 6, 2024 13:08

yaugenst-flex self-assigned this Aug 29, 2024

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 256d059 to 7e814cd Compare August 29, 2024 13:55

yaugenst-flex added the 2.8 will go into version 2.8.* label Aug 29, 2024

This was referenced Sep 4, 2024

invdes plugin web compatibility #1941

Closed

Serializable functions/expressions #1944

Closed

yaugenst-flex linked an issue Sep 4, 2024 that may be closed by this pull request

Serializable functions/expressions #1944

Closed

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 1586bc1 to 54903ba Compare September 13, 2024 15:01

yaugenst-flex changed the base branch from pre/2.8 to develop September 13, 2024 15:01

yaugenst-flex marked this pull request as ready for review September 13, 2024 15:02

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 54903ba to 4a50587 Compare September 13, 2024 15:09

yaugenst-flex removed request for m-bone and momchil-flex September 13, 2024 15:23

tylerflex approved these changes Sep 17, 2024

View reviewed changes

yaugenst-flex added 4 commits September 17, 2024 14:50

Introduce metrics plugin for constructing and serializing mathematica…

da65327

…l expressions

Add Variables, i.e the ability to call an expression with multiple (k…

297d336

…eyword) arguments

Slight docstring improvements

07aeda3

tyler comments

ff01229

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 3fb6c51 to ff01229 Compare September 17, 2024 12:51

yaugenst-flex linked an issue Sep 18, 2024 that may be closed by this pull request

Integration of serializable expressions as post-processing option for invdes #1943

Closed

Add initial support for expressions/metrics in invdes plugin

ce5f870

yaugenst-flex force-pushed the yaugenst-flex/serializable-ops branch from 3c3ec11 to ce5f870 Compare September 18, 2024 08:54

yaugenst-flex and others added 9 commits September 18, 2024 10:58

Fix type hints

e9faba3

Add ParameterSpec classes to invdes optimizer for sampling initial …

74189be

…parameters

Revert "Add ParameterSpec classes to invdes optimizer for sampling …

3a7d211

…initial parameters" This reverts commit 74189be.

Add direction_multiplier to objective function value in aux_data

5bac848

Introduce InitializationSpec

afdfd5d

Introduce callback in optimizer and a num_steps argument in continue_run

61097df

Make serializable invdes backwards-compatible

62fd194

Add data and params to objective_fn aux_data

fbf217a

More docstrings, and add sqrt function

dfa6861

yaugenst-flex mentioned this pull request Sep 23, 2024

invdes plugin GUI support #1973

Merged

yaugenst-flex closed this Sep 24, 2024

yaugenst-flex deleted the yaugenst-flex/serializable-ops branch November 7, 2024 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serializable expressions/operators #1848

Serializable expressions/operators #1848

yaugenst-flex commented Jul 22, 2024 •

edited

Loading

tylerflex commented Jul 23, 2024

e-g-melo commented Jul 23, 2024

yaugenst-flex commented Jul 24, 2024

e-g-melo commented Jul 24, 2024

yaugenst-flex commented Jul 24, 2024

tylerflex commented Jul 24, 2024

yaugenst-flex commented Jul 25, 2024

tylerflex commented Jul 25, 2024

yaugenst-flex commented Sep 4, 2024

yaugenst-flex commented Sep 13, 2024 •

edited

Loading

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024 •

edited

Loading

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

yaugenst-flex commented Sep 24, 2024

Serializable expressions/operators #1848

Serializable expressions/operators #1848

Conversation

yaugenst-flex commented Jul 22, 2024 • edited Loading

tylerflex commented Jul 23, 2024

e-g-melo commented Jul 23, 2024

yaugenst-flex commented Jul 24, 2024

e-g-melo commented Jul 24, 2024

yaugenst-flex commented Jul 24, 2024

tylerflex commented Jul 24, 2024

yaugenst-flex commented Jul 25, 2024

tylerflex commented Jul 25, 2024

yaugenst-flex commented Sep 4, 2024

yaugenst-flex commented Sep 13, 2024 • edited Loading

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024 • edited Loading

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

tylerflex commented Sep 17, 2024

yaugenst-flex commented Sep 17, 2024

yaugenst-flex commented Sep 24, 2024

yaugenst-flex commented Jul 22, 2024 •

edited

Loading

yaugenst-flex commented Sep 13, 2024 •

edited

Loading

yaugenst-flex commented Sep 17, 2024 •

edited

Loading