Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic generation of model documentation #752

Closed
GiorgioBalestrieri opened this issue Nov 18, 2018 · 9 comments
Closed

Automatic generation of model documentation #752

GiorgioBalestrieri opened this issue Nov 18, 2018 · 9 comments

Comments

@GiorgioBalestrieri
Copy link

GiorgioBalestrieri commented Nov 18, 2018

(not sure how to flag this as a feature request)

It would be a great, great feature to be able to automatically generate model documentation. I'm thinking of something like GAMS Model2Tex command, possibly through a Sphinx-like tool.

Clearly, the fact that Pyomo models can include arbitrary Python code will make quite difficult to make a tool that will always work, but I think using a combination of docstrings, doc fields and user-inputs should work.

I tend to think it would be somewhat easier to generate documentation from a ConcreteModel than an AbstractModel, but I would be happy to be corrected.

If anyone with a better knowledge of the inner workings of Pyomo would mind elaborating on some possible strategies and challenges in doing this, it would be helpful.

A non fully automated way - but very effective - is to include LaTeX formulations in docstring, and then rely on Sphix to generate the documentation, as done for the (awesome) Calliope project. Not sure how this would work for Constraints defined through lambda functions or expressions though.
The mathematical formulation is here, and an example of docstring is here.

@blnicho
Copy link
Member

blnicho commented Nov 19, 2018

We do have a prototype for something like this in another project built on top of Pyomo. We may be able to generalize that functionality for pure Pyomo models to support what you're asking for. I don't have an estimate for when that might happen though.

@GiorgioBalestrieri
Copy link
Author

GiorgioBalestrieri commented Feb 23, 2019

Ok, I gave it a try. It's still quite hacky, it basically reads the doc field of all components in a model and builds reStructuredText based on that.

See an example here

@fleimgruber
Copy link

fleimgruber commented Feb 23, 2019

When I gave the thumbs up, I obviously misread the OP. I thought that since Pyomo Params and Vars objects contain their indexing specifics they could be derived automatically and the LaTeX could be built from that in a generic way. Same goes for the constraints using the Pyomo internal expression parser results. I had a chat with @jsiirola during a conference where he mentioned that this could be pulled off.

@GiorgioBalestrieri I do not mean to hijack your feature request, but I thought the discussion might fit here. Would you be interested in that as well? Your approach is very reasonable if you want maximum flexibility, but requires seperate maintenance of the LaTeX "docstrings".

@GiorgioBalestrieri
Copy link
Author

GiorgioBalestrieri commented Feb 23, 2019

@fleimgruber that sounds absolutely reasonable. I see using the components docstrings as a step forward compared to maintaining a separate documentation where one still has to write all the latex code in a separate location. I think it makes much easier to ensure that the mathematical formulation and model implementation are synced, and potentially adds clarity to the model itself by coupling a mathematical formulation to each component.

As a side note, I tried to include information from the object itself such as domain, index, default values (basically most things that are displayed by pprint).

I think it's far from a perfect approach, as it does risk to clutter the code a bit and it requires some extra caution in the way the mathematical formulation is written (escaping backslashes etc.), plus of course it still relies on someone manually writing that mathematical formulation.

If this could be generated automatically (without leveraging the docstrings, I mean) it would obviously save a lot of time and reduce the risk or errors, so I would be very interested in that. I'm honestly not an an expert in the inner workings of Pyomo, and in particular in how and when expressions are parsed.

I see two main challenges with trying to get this done through the expression parser:

  • I don't believe rule functions are evaluated/parsed when AbstractModels are created, meaning we might only be able to do this with ConcreteModels, thus requiring to have valid input data.
  • I'm not sure how the expression parser works, but I believe it would be very hard to somehow turn into LaTeX / mathematical formulation what happens inside a rule function, where we can have if statements or potentially any Python expression. BuildActions could make this even more complicated.

Again, I have a fairly limited understanding of the inner workings of Pyomo, and I would be thrilled to see something more automated becoming available. I'll definitely leave the Issue open for now, and wait for @jsiirola to chime in.

In the meanwhile, feel free to have a look at how the docstring-based approach works and recommend any changes, it's extremely rudimentary for now and very biased towards the way I tend to use Pyomo, so there is certainly much space for improvement.

@jsiirola
Copy link
Member

This is an often-requested feature (I was surprised that there wasn't already an issue for it), and something that I think all of us would want (but no one has had time to dig into). I gave a quick look over the docstring approach, and there are some really neat ideas in there. Thank you!

To answer some of @GiorgioBalestrieri's comments:

  • Correct: AbstractModels are "abstract" exactly because none of the rules have been fired. The mathematical formulation is "buried" in the code object that defines the rule function. While I suspect that it is possible to parse the Python AST for the function and generate the LaTeX, that feels like it would be a rather challenging process.
  • The challenge with ConcreteModels is that the we see the expression after the rule has been fired, so several things have happened.
    • All loops have been unrolled (e.g., the sum() function has iterated over its terms and returned a _SumExpression with a list of all the terms).
    • All immutable Params have been replaced with actual numbers (so we don't see the original symbolic objects in the expression tree)
    • Pyomo has gone ahead and performed several expression simplifications

Now, if you can live with the form of the expression that exists after the rule has fired, then you can convert the expression into a form more amenable to documentation in a relatively straightforward way by walking over the expression tree. The project that @blnicho references has had an initial public release, and you can see their approach here. Basically, they handle it by converting the expression to a sympy expression and then using sympy's LaTeX generator. This works particularly well for that project as it generally does not have expressions with large sums in it.

For more OR-like models (LP//MIP), where large sums are the norm, I think there is an 80+% solution that we could put together. This would rely on undocumented Pyomo features and some pretty fundamental (i.e., low-level) changes to Pyomo to pull off, though. The short "design summary" is:

  • Leverage TemplateExpression to hijack the rule function to return expressions that contain _GetItem nodes (i.e. represent m.x[i] as _GetItem(m.x, i) in the expression tree. This works well for simple expressions, and we use it extensively within the Pyomo.DAE simulation interface.
  • The problem with template expressions is that it won't prevent sum() functions in the rule from being unrolled. To do that, we would need to set a global flag in Pyomo to move into "documentation mode". In this mode, Set.__iter__ would need to yield a single IndexTemplate instead of the values in the Set. This will prevent the sum() from being expanded, but doesn't actually let us know that there was a summation. We would have to infer that (not 100% reliable) by looking for "0 + {pyomo expression containing _GetItem nodes}".
  • The global flag would have to do a couple other things, too:
    • prevent all expression simplification in the expression system,
    • cause immutable Params to return symbolic expressions and not POD data (i.e., they would need to behave as if they were mutable)

Now for the problems...

  • if a rule contains any "if" logic, the TemplateExpression will fail (on purpose - we can't interrogate conditionals through operator overloading). This applies to if's in the main body of the rule as well as as filters on generators (like in sum(m.x[i] for i in m.I if i != ...).
  • guessing sum() based on its signature (implementing "0 + {first term}") is fragile.
  • I am sure there are other situations I haven't thought of where things will break.

This is where something like the docstring approach will fill in nicely: for anything where the automated approach fails, or generates the "wrong" documentation, then the modeler can override it by explicitly providing the documentation through the docstring. I also like the convention of using :math:`x` for providing the "LaTeX name" for the Pyomo object through its doc field (as my models always avoid the use of single-letter variables that papers encourage).

@GiorgioBalestrieri
Copy link
Author

@jsiirola thanks for all the information. As mentioned above, I think the docstring-based approach is far from perfect, but it offers an incremental improvement over maintaining a model and the related docs in two separate files.

Anything more automated would be awesome, but I don't think I am familiar enough with Pyomo's core or have enough experience with different ways people formulate Pyomo models to come up with something general enough.

If any effort goes in this direction, I'd be glad to test it and provide some feedback.

As a side note, I think that quite often the best way to express the mathematical formulation for a constraint, expression or objective is different from the way one would formulate it in terms of code, so including the formulation in the docstring might improve the clarity of the model documentation.

@l-kotzur
Copy link

Would it help to combine the expression template system with latexify?

@mrmundt
Copy link
Contributor

mrmundt commented Aug 8, 2023

@codykarcher - this issue!

@mrmundt
Copy link
Contributor

mrmundt commented Jan 3, 2024

There is an initial implementation of a LaTeX printer now in pyomo.contrib. The author has another issue (#3048) tracking requested changes, bugs, etc.

@mrmundt mrmundt closed this as completed Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants