Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySD v3.0 (Split parsing and building) #312

Merged
merged 112 commits into from
May 23, 2022

Conversation

enekomartinmartinez
Copy link
Collaborator

@enekomartinmartinez enekomartinmartinez commented Mar 3, 2022

The current version of PySD builds the models while parsing them with parsimonious. This has some inconveniences as we cannot build the model knowing its full structure. For example, when working with subscripted variables in the current version we cannot know which is the final subscripts of arithmetic expressions, which forces us to use some kind of magic functions such as 'rearrange', 'xrmerge' and '@subs'. Therefore, we cannot optimize the translated code and avoid xarray which is slowing the models with subscripts. Moreover, those improvements in the final model file (such as the dependencies dictionary) need to be done in three different areas (vensim parser, xmile parser and builder). Which force developers of one language to know also how work the parser of the other language. The main idea of splitting the parsing from the building is to enable several improvements and make PySD much easier to maintain. We would like to have something like:

Vensim model -> Intermediate Model (AbstractModel) -> PySD Model
Xmile model -> Intermediate Model (AbstractModel) -> PySD Model

this way we apply the builder over the Intermediate Model and all the improvements in the final model can be done by only change the builder from Intermediate to PySD. Moreover, this makes it easier to include new SD models as we only need to parse to AbstractModel and not include all the building, and enables the possibility in the future of including new output languages. See #302.

This PR changes completely how the models are translated. Vensim parsing and building are totally split and managed with objects. Now we have the following workflow:

  1. We use some classes to split Vensim model parts (VensimFile, VensimSection, VensimElement...), we build these classes at the same time we parse the model with parsimonious. The grammar is now in separated files and compiled only once (this makes faster the translation).
  2. We use a set of classes that go from AbstractModel to Abstract Syntax Tree (AST). This is a kind of metamodel that should be common for all origin and output languages.
  3. We have a Python builder that read Abstract classes and builds the model.

Some other improvements added:

  • Now we can manage better the subscript operations, which avoid us using magic functions as xrmerge or rearrange, this makes the subscripted models much faster.
  • We are close to a numpy backend in subscripted models, this will make the models run faster and may allow using code accelerators in the future. -> Some comments added along the code about how to make the migration (This could be a release 3.0.0)
  • Now the translation and building are case-insensitive and whitespace/underscore-insensitive, as Vensim and Xmile. -> test_reference_capitalization working
  • Macros defined in the middle of a model file were making the translation fail, as they split the main in two or more parts, this is now corrected -> test_macro_trailing_definition working
  • Subscripts and subscript ranges now can be used as variables, as Vensim does, identifying the main range and adding the subranges and subscripts the numeric value depending on the main range -> test_conditional_subscripts working
  • Some bugs corrected for models with more than one "stateful" expression in the same variable.
  • Migrated the Vensim integration test to pytest, the rtol of all the test was reduced from 0.05 to the default (1e-5) to ensure better results of the test. Only one test fails with rtol=1e-5 .
  • Now hardcoded lookups are included as an object, this allows having subscripted lookups and operating with them with subscripted arguments. -> new test working test_subscripted_lookups
  • The parenthesis are "removed" during the translation and added again during building taking into account the priority of arithmetics and logics operations, this cleans the final model file from unnecessary parenthesis. -> new test working test_arithmetics
  • Vensim EXCLUDE is now working for some specific cases, need to add a model that fails and make it work for all the cases.
  • Random functions in subscripted variables now return a random array instead of a float and fill an array with only one random value. Need to be tested.
  • Documentation is automatically generated when importing the model, this will keep the information of a variable nevertheless we change its value.
  • When we finish including Xmile will allow making improvements in the final model without needing to know how each parser works.
  • Include variables metadata (units, type, subscripts...) using a decorator instead of the docstring, this will be more solid.

Note that I keep the old vensim2py to ensure the backwards compatibility of the library.

TODOs before merging:

  • Document all the new functions in the scripts and add typing to them.
  • Include the new translation workflow in the PySD documentation (ReadTheDocs).
  • Migrate the Xmile translation to create AbstractModels as Vensim and share Python builder (This could take me some time because I am not familiarized with Xmile, I would appreciate a lot the help of Xmile users).
  • Include unit test (pytest based) to each of the new classes and methods.
  • Include more full Vensim integration tests to cover all the possibilities.
  • Include more full Xmile integration tests to cover all the possibilities, need help from Xmile users to create the models. Missing test for xmile integrations may cause future bugs #280
  • Remove old translation files (vensim2py, xmile2py, builder.py...).
  • Add new no-python files to MANIFEST (necessary for PyPI and conda-forge packagings).
  • Update version of PySD, just before merging to take into account the last version of PySD.
  • Solve bug of installing progressbar
  • Include support for multiple EXCEPT arguments and improve EXCEPT management (add a new test)
  • Improve the tracking of susbcripts for external and lookup objects (add a new test)
  • Unit test for split dataframe with 0-dim array.
  • Test for selecting submodel with active initial
  • Solve ReadTheDocs building bug with new ninja version

Solves #310 #309 #306 #302 #301 #296 #116 #253 #168 #154 #107 #289 #168 #203

Future backwards compatible ideas:

  • Include a logger with different levels to track and debug the translation and building of models.
  • Include other output languages.

Future no backwards compatible ideas:

  • Migrate to numpy array backend, keeping xarray for user interactions.
  • Use variables metadata for the internal methods (get_args, get_subscripts...)

@enekomartinmartinez
Copy link
Collaborator Author

enekomartinmartinez commented Mar 3, 2022

@julienmalard I think you used to work with large subscripted models, it would be nice if you can use this branch to translate any of your models and let me know if you experience any bug or improvement, any help will be welcome :)

@enekomartinmartinez
Copy link
Collaborator Author

@alexprey do you still use Xmile and Xmile translator or do you know someone who uses it or is interested? I may need some to split xmile translator from building and produce Xmile -> AbstractModel

@enekomartinmartinez
Copy link
Collaborator Author

@JamesPHoughton you can start reviewing this PR if you want, if you have any doubt let me know, this way I can make the documentation more complete.

@coveralls
Copy link

coveralls commented Mar 3, 2022

Coverage Status

Coverage increased (+0.04%) to 99.851% when pulling 06cb2bf on split-parsing-building into fc5237e on master.

@pep8speaks
Copy link

pep8speaks commented Mar 3, 2022

Hello @enekomartinmartinez! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 23:1: E402 module level import not at top of file

Line 68:15: E127 continuation line over-indented for visual indent

Line 1211:80: E501 line too long (87 > 79 characters)
Line 1215:80: E501 line too long (87 > 79 characters)
Line 1216:80: E501 line too long (98 > 79 characters)
Line 1222:80: E501 line too long (87 > 79 characters)
Line 1223:80: E501 line too long (98 > 79 characters)
Line 1225:80: E501 line too long (80 > 79 characters)
Line 1233:80: E501 line too long (87 > 79 characters)

Line 118:80: E501 line too long (81 > 79 characters)

Comment last updated at 2022-05-20 11:55:09 UTC

@rogersamso
Copy link
Contributor

@enekomartinmartinez there's a builder.py file in the translation folder. Is this intentional, or you forgot to remove it?

@enekomartinmartinez
Copy link
Collaborator Author

@enekomartinmartinez there's a builder.py file in the translation folder. Is this intentional, or you forgot to remove it?

@rogersamso , yes, this is intentional. It is the old builder, which was already placed there. It is still used for Xmile translation and for the old Vensim integration tests (which are used to ensure backwards compatibility).

Once we have this PR ready the main idea is to remove this file and old translators (vemsim2py.py, xmile2py...)

pysd/pysd.py Outdated Show resolved Hide resolved
pysd/translation/structures/abstract_model.py Outdated Show resolved Hide resolved
pysd/translation/vensim/parsing_expr/sketch.peg Outdated Show resolved Hide resolved
@@ -0,0 +1,276 @@
import re
from pathlib import Path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All files inside the vensim folder start with vensim (e.g. vensim_element.py, vensim_file.py, etc.). This is a bit repetitive, I would just suppress the word vensim in front of all file names. I guess the same applies in the smile folder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem as previous, we need to see if we are comfortable with the current classes and once we are done find the best classification of files, including their names.

pysd/building/python/imports.py Outdated Show resolved Hide resolved
@enekomartinmartinez
Copy link
Collaborator Author

@JamesPHoughton The Xmile translator now works using the new builder via Abstract Model.
I built the new translator to make it possible to work with subscripts, one of the already available xmile models with subscripts is working (some others have ??? in the equations). I also included support for safediv (zidz, xidz).

Still need to document all the new scripts for both Vensim and Xmile translators.

Note that a lot of features that were already included have never been tested, for example, there is no full integration test for a lookup with "extrapolate" or discrete. The same happens to different types of delays, smooths...

@enekomartinmartinez
Copy link
Collaborator Author

@JamesPHoughton I added some new sections in the documentations for release notes and other pruposes, are you okay with all of these.
If you find it okay I will proceed to merge :)

@enekomartinmartinez enekomartinmartinez merged commit 64de740 into master May 23, 2022
@enekomartinmartinez enekomartinmartinez deleted the split-parsing-building branch May 23, 2022 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment