Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stubs and docstrings for python bindings #196

Closed
chadrik opened this issue Apr 8, 2017 · 14 comments
Closed

Add stubs and docstrings for python bindings #196

chadrik opened this issue Apr 8, 2017 · 14 comments

Comments

@chadrik
Copy link
Contributor

chadrik commented Apr 8, 2017

There have been some big strides made recently with static type analysis in python, and I have a perhaps unhealthy obsession with it (hopefully to everyone's benefit). I'm wrapping up work on a script that creates stub files which can be used by an IDE to provide code hints and warnings for the pxr python package. I find to be indispensable, especially when working with a new API. I'd like to discuss the possibility of adding the execution of this script as a build option.

Python stub files are normal python files with whose class and function definitions contain pass statements instead of functionality. There are a handful of extensions used for stub files, depending on the IDE. The ones I'm aware of are .py, .pi, .pypredef. The IDE inspects the stub files instead of the real modules, which is particularly useful when the real modules are compiled extensions and therefore cannot be statically analyzed, such as with USD.

Recently there's been a big push in the python community to add static type hints to python, described in PEP484, recently released in the form of the typing module for python 2&3 and actual syntax support in versions 3.5+. Python itself doesn't do anything with the type hints: the point of them is to be available to third-party static type checking tools. The gold standard of these is mypy, but PyCharm also has very good support as of version 2017.1.

PEP484 describes its own type of stub file, saved with a pyi extension. The pyi file sits next to its .py equivalent and in addition to providing membership hints an docstrings like normal stubs, it also adds type hints.

Long story short, my stub generator can create either kind of stub file.

The script works like this:

  • generate xml data using Doxygen
  • parse the xml files, extract the arguments and typing info, and run it through a set of rules to convert it to our best prediction of the corresponding python types and object names.
  • import the built pxr module, inspect the contents, looking up argument and type data in the structure from the previous step, and write out .pyi stub files for each __init__.py file.

Note: we rely on inspection to drive stub generation rather than creating stubs directly from Doxygen, because we don't have perfect knowledge of how C++ classes are wrapped in python, and it's better to have missing type info than missing objects/members.

Writing the rules has been challenging and edifying. There's no documentation on the conventions used when wrapping C++ code, so uncovering them has taken some digging (and there's still more to do). My hope is that this tool will ultimately serve as fairly clear documentation of these conventions. Moreover, I think it will reveal where the conventions could be improved or more successfully adhered to, and create a positive feedback loop for developers writing python wrappers.

Here are the rules I've got so far:

# c++ to python type substitutions
TYPE_MAP = [
    (r'\bVtArray<\s*SdfAssetPath\s*>', 'SdfAssetPathArray'),
    (r'\bstd::string\b', 'str'),
    (r'\bsize_t\b', 'int'),
    (r'\bchar\b', 'str'),
    (r'\bstd::function<(.+)\((.*)\)>', r'Callable[[\2], \1]'),
    (r'\bstd::vector\b', 'List'),
    (r'\bstd::pair\b', 'Tuple'),
    (r'\bstd::set\b', 'List'),
    (r'\bdouble\b', 'float'),
    (r'\bboost::python::', ''),
    (r'\bvoid\b', 'None'),
    (r'\b' + IDENTIFIER + r'Vector\b', r'List[\1]'),
    (r'\bTfToken\b', 'str'),
    (r'\bVtDictionary\b', 'dict'),
    (r'\bMetadataValueMap\b', 'dict'),
    # strip suffixes
    (r'RefPtr\b', ''),
    (r'Ptr\b', ''),
    (r'ConstHandle\b', ''),
    (r'Const\b', ''),
    (r'Handle\b', ''),
    (r'\bSdfLayerHandleSet\b', 'List[pxr.Sdf.Layer]'),
    # Sdf mapping types
    (r'\bSdfDictionaryProxy\b', 'pxr.Sdf.MapEditProxy_VtDictionary'),
    (r'\bSdfReferencesProxy\b', 'pxr.Sdf.ReferenceTypePolicy'),
    (r'\bSdfSubLayerProxy\b', 'pxr.Sdf.ListProxy_SdfSubLayerTypePolicy'),
    # pathKey
    (r'\bSdfInheritsProxy\b', 'pxr.Sdf.ListEditorProxy_SdfPathKeyPolicy'),
    (r'\bSdfSpecializesProxy\b', 'pxr.Sdf.ListEditorProxy_SdfPathKeyPolicy'),
    # nameTokenKey
    (r'\bSdfNameOrderProxy\b', 'pxr.Sdf.ListProxy_SdfNameTokenKeyPolicy'),
    (r'\bSdfNameChildrenOrderProxy\b', 'pxr.Sdf.ListProxy_SdfNameTokenKeyPolicy'),
    (r'\bSdfPropertyOrderProxy\b', 'pxr.Sdf.ListProxy_SdfNameTokenKeyPolicy'),
    # nameKey
    (r'\bSdfVariantSetNamesProxy\b', 'pxr.Sdf.ListEditorProxy_SdfNameKeyPolicy'),
]

# python types which can be substituted for other types when used as arguments
IMPLICIT_UNION_TYPES = {
    'pxr.Sdf.Path': ['str'],
    'pxr.Sdf.AssetPath': ['str'],
    'pxr.Usd.EditTarget': ['pxr.Sdf.Path'],
}	

There are a handful more of the Sdf mapping types which I've omitted because they have some truly gnarly names like MapEditProxy_map_string__string__less_string___allocator_pair_string_const__string_____. It would be nice if we could come up with a simpler set of names and rules for some of these, or possibly make it easier to guess the python name from the C++ type, but in the meantime I can simply document them.

Additionally, I've noticed that Doxygen is not creating parameter info for some classes, such as the UsdStageCacheContext constructor.

Lastly, Alex Mohr mentioned that at Pixar there are docstrings stored in __DOC modules which are bound to classes and functions at runtime, but that this script is missing from the repo. We could wrap the generation of those files into the same script that generates stubs, since they both need to parse Doxgyen data.

And that's all for now. Thanks for reading.

@jtran56
Copy link

jtran56 commented Apr 10, 2017

Filed as internal issue #145214.

@asluk
Copy link
Collaborator

asluk commented Mar 7, 2019

Reviving this thread. I seem to remember there being Doxygen pages for the Python bindings at some point, but don't remember if or why we internally at Pixar decided not to maintain or publish them?

@jfeez91
Copy link

jfeez91 commented Jun 10, 2020

Reviving this again. Having code hints is always useful, but it would be particularly useful to have some python docs, especially for Sdf where the python bindings don't match up to the c++ api docs. I've spent a lot of time running dir(SdfClass) to figure out the differences which gets fairly tedious

@igor-elovikov
Copy link

Also reviving this one. Apparently Omniverse at least created stubs for usd shipped with their installs.

As far as I can tell it's pretty straightforward with stubgen included in mypy. The only thing is required is pointing stubgen to correct .pyd file associated with the module (_sdf.pyd for pxr.Sdf for example)

It doesn't resolve the docstring issue, only autocomplete, but even this one improves workflow tremendously:
usd_autocomplete

I've also attached stubs to show how exactly it looks like
pxr.zip

@hubbas
Copy link

hubbas commented Jun 19, 2022

Reviving this thread too! Would really love to have proper stubs for this too, and I'm more than happy to help out if needed (when I have some time). For now I'm doing what @igor-elovikov wrote about stubgen, and that seems to work fine. I even created a small script to generate the stubs and copy them into site-package generate_usd_stubs.py.

@chadrik
Copy link
Contributor Author

chadrik commented Jun 20, 2022

Since I created this ticket I've gained a lot more experience creating stubs using mypy's stubgen tool, including for pymel, nuke, and katana, and we also maintain stubs internally for PySide and houdini, which were created using other means.

Ultimately, there's not much stopping me (or anyone else) from using mypy's stubgen to create a pxr-stubs package and upload it to pypi, so that people could pip install it (there's a whole pep for how to distribute stub-only packages), but it would not include any type information for arguments and return values: just the names of classes and methods.

But I don't want to give up on creating a high quality stub package with type information, which will be much more useful for auto-complete and static type analysis. However, I'm stuck on some USD build issue. Read on if you'd like to help out:

In order to automatically create stubs with high quality type info, the best way is to embed the signature info into each function docstring, and then use that info during stub generation. I spent some time trying to get USD+boost to include docstrings in its python bindings, but I could not get it to work.

There’s a global option that can be enabled to turn on signature generation: https://www.boost.org/doc/libs/1_54_0/libs/python/doc/v2/docstring_options.html

The problem is that even with the simplest example taken directly from the boost example code I could not force the signatures to be added to docstrings when the code was built within the USD project, but I could from within a canonical standalone C++ project. I think it has to do with the tf code generation used instead of the standard boost macros.

Being perfectly honest, figuring out how to enable the signatures within USD may take more time than I have, but for someone who knows the tf system it may be pretty trivial (hint, hint). So if someone could get this working and expose a CMake flag to enable it, I promise that in return I would solve this stub generation problem for the community, since this is a topic that's come up quite a few times.

@OlegAlexander
Copy link

OlegAlexander commented Jun 13, 2023

Hello fellow stub lovers! I'd like to reopen this issue. While I understand that this issue has been marked closed, I wouldn't consider it closed until:

  • usd-core ships with Python type stubs that pass mypy --strict.
  • Or someone publishes a well-maintained third-party stub library on PyPi that passes mypy --strict.

What I've tried

I have tried the solutions posted here by @igor-elovikov and @hubbas and these are certainly a great start. However, those stubs provide only basic auto-completion and don't pass mypy --strict. For example, in the stubs provided by @igor-elovikov, the type signature for the Usd.Stage.ExportToString() method is:

    @staticmethod
    def ExportToString(*args, **kwargs) -> None: ...

However, the real signature should be something like:

    @staticmethod
    def ExportToString(addSourceFileComment: bool = True) -> str: ...

Notice how the Python signature doesn't exactly match the one in the Usd API docs, where this method returns bool and takes the result string as the first argument.

Many methods in the Usd API documentation are more complex and heavily overloaded. But it's possible to define stubs for overloaded methods with the @overload decorator.

In another example, using the stubs generated by @hubbas's tool, the method Usd.Stage.CreateInMemory() has the following signature:

    @classmethod
    def CreateInMemory(cls, *args, **kwargs) -> Any: ...

But in reality, this method should return a Usd.Stage and doesn't have a self/cls as the first argument.

An appeal to reason

What follows is only my opinion based on my experience working as a technical artist in the VFX/games/animation industry. Much of what I'm about to say is based on my own recollections, inferences, and feelings and not much on facts, so please take it with a grain of salt. Also, I'm going to make some broad generalizations about C++ and Python programmers and for that I apologize in advance. So with that out of the way, here goes.

When I first started working as a technical artist in California over 15 years ago, there was a palpable cultural divide between C++ programmers and Python programmers. It could be summarized by the adage that "scripting is not programming." As a Python programmer at the time, the view that the tools I was writing were "toys" certainly annoyed me. But years later, after I learned several programming languages besides Python, I understood that the C++ programmers who held this view were, by and large, correct. Python programmers and C++ programmers were not even in the same ballpark in terms of complexity and rigor. Python programmers liked to move fast and break things, while C++ programmers moved deliberately without breaking much. Python programmers thought static types were a waste of time. Why would I need to define the data model of a json file before opening it, when I could just say data = json.load(open(json_file)) and move on with my day? Unit tests just slow me down, etc. Maybe some Python programmers feel this way even today.

But things have changed.

Over the past 9 years or so, Python has "grown up". Not a full-grown adult with a mortgage, like C++, but certainly old enough to drink. The modern Python programmer wants static types, writes unit tests, and needs an IDE. For example, the 2021 Python Developers Survey states that 74% of Python devs use optional type hinting. The modern Python programmer follows "software engineering best practices" and is used to great tooling. And by the way, all this tooling doesn't yet ship with Python, so the modern Python programmer assembles their own custom toolchain like a sweet kit car. In other words, Python programmers are closer in their mentality to C++ programmers today than they were 15 years ago.

Now let's get back to the type stubs. As it stands, usd-core doesn't ship with type stubs. Nor does it ship with proper Python API documentation. Python developers need to read the C++ API documentation and mentally translate it to Python. The type stubs made available by NVidia provide only very basic auto-completion, but don't actually pass mypy in --strict mode. This means that Python developers who care about type safety (in the type-theoretic sense) will simply write from pxr import Gf, Sdf, Usd, UsdGeom # type: ignore[import], hence turning off type checking for the pxr library.

So while it's likely that Python programmers in 2008 would have been very happy with the usd-core library, Python programmers in 2023 aren't happy and expect a better developer experience. This is why I feel that this issue should be reopened. I'm more than willing to test any stubs that Pixar provides!

I hope I didn't offend too many people. Although I've made some generalizations, I believe the overall trend of a "grown up" Python is true. I'm CCing @spiffmon, who was very helpful with one of my other issues. Thank you very much for reading.

@chadrik
Copy link
Contributor Author

chadrik commented Jun 13, 2023

Hi @OlegAlexander,
I'm working on a solution to this.

In an ideal world, boost python would generate the stubs, since it has all the necessary info (here's the issue for that if you want to add your vote of support: boostorg/python#277 (comment)). For now, I think the best path forward is to build USD with a flag that makes boost python generate signatures in docstrings, which are somewhat passable as a starting point for stub generation.

For example, these are the signatures that it generates for the methods you mentioned:

python -c "from pxr import Usd;print(Usd.Stage.ExportToString.__doc__)"
ExportToString( (Stage)arg1 [, (bool)addSourceFileComment=True]) -> str
python -c "from pxr import Usd;print(Usd.Stage.CreateInMemory.__doc__)"

CreateInMemory([  (object)load=Usd.Stage.LoadAll]) -> Stage

CreateInMemory( (object)identifier [, (object)load=Usd.Stage.LoadAll]) -> Stage

CreateInMemory( (object)identifier, (ResolverContext)pathResolverContext [, (object)load=Usd.Stage.LoadAll]) -> Stage

CreateInMemory( (object)identifier, (Layer)sessionLayer [, (object)load=Usd.Stage.LoadAll]) -> Stage

CreateInMemory( (object)identifier, (Layer)sessionLayer, (ResolverContext)pathResolverContext [, (object)load=Usd.Stage.LoadAll]) -> Stage

Ignoring the fact that they're not true python signatures, you can see that they reflect some traits which are hard (or impossible?) to determine from parsing the doxygen docs, such as overloaded methods. However, some of the types are missing (notice object above).

I'm going to start writing a stub generator based on these signatures in the next few days. I'll post back here when I've made some progress.

Building USD with the signatures enabled is pretty straightforward. Here's the diff:

diff --git a/.gitignore b/.gitignore
index af6e02bde..55bd2fb82 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,4 +1,4 @@
 .p4*
 .DS_Store
 .AppleDouble
-
+.*
\ No newline at end of file
diff --git a/cmake/defaults/CXXDefaults.cmake b/cmake/defaults/CXXDefaults.cmake
index e970dfbe2..68a531110 100644
--- a/cmake/defaults/CXXDefaults.cmake
+++ b/cmake/defaults/CXXDefaults.cmake
@@ -42,7 +42,7 @@ _add_define(GL_GLEXT_PROTOTYPES)
 _add_define(GLX_GLXEXT_PROTOTYPES)
 
 # Python bindings for tf require this define.
-_add_define(BOOST_PYTHON_NO_PY_SIGNATURES)
+# _add_define(BOOST_PYTHON_NO_PY_SIGNATURES)
 
 if(CMAKE_BUILD_TYPE STREQUAL "Debug")
     _add_define(BUILD_OPTLEVEL_DEV)
diff --git a/pxr/base/tf/pyModule.cpp b/pxr/base/tf/pyModule.cpp
index 223cfa7b3..a0b1fcf40 100644
--- a/pxr/base/tf/pyModule.cpp
+++ b/pxr/base/tf/pyModule.cpp
@@ -242,9 +242,12 @@ public:
 
     inline object ReplaceFunctionOnOwner(char const *name, object owner, object fn)
     {
+        object fnDocstring = fn.attr("__doc__");
         object newFn = DecorateForErrorHandling(name, owner, fn);
         PyObject_DelAttrString(owner.ptr(), name);
         objects::function::add_to_namespace(owner, name, newFn);
+        // add_to_namespace removes docstrings, so we restore them here
+        newFn.attr("__doc__") = fnDocstring;
         return newFn;
     }
     
@@ -429,7 +432,8 @@ void Tf_PyInitWrapModule(
 
     // Disable docstring auto signatures.
     boost::python::docstring_options docOpts(true /*show user-defined*/,
-                                             false /*show signatures*/);
+                                             true /*show py signatures*/,
+                                             false /*show cpp signatures*/);
 
     // Do the wrapping.
     wrapModule();

@igor-elovikov
Copy link

From my experience pybind11 bindings are much better in that regard. mypy method is working pretty much out of the box with correct support of overloads, std::optional, std::variant, most of the containers etc.

I guess when usd become "deboostified" and c++17 complaint this could me much easier. Not sure though what exactly the future framework for bindings but pybind seems more robust today.

@OlegAlexander
Copy link

OlegAlexander commented Jun 13, 2023

@chadrik, thank you so much for working on this! Your approach seems very promising.

One thing to keep in mind is how the bool operator is overloaded. I'm not even sure how to express this with type hints. Maybe @overload __bool__?

def prim_exists(stage: Usd.Stage, prim_path: str) -> bool:
    """Does the prim exist on the stage?
    The bool operator is overloaded so 'if stage.GetPrimAtPath(primPath)' returns a bool."""
    return True if stage.GetPrimAtPath(prim_path) else False

I look forward to testing your stubs as soon as they're ready.

@chadrik
Copy link
Contributor Author

chadrik commented Jun 28, 2023

Hi all,
I've released my stubs for USD here: https://pypi.org/project/types-usd/

Just pip install types-usd.

This is a first draft that I'll be iterating on a lot in the next few weeks, but they're good enough to type check Luma's entire USD code-base using mypy without errors so they're already very useful. I'll publish these more widely once people on this thread have had a chance to take a poke around.

To make these I combined a handful of tools and data sources:

  • the doxygenlib parser from USD to parse the xml docs
  • a utility from my old stubgenusd project to transform c++ types to python types, which also parses boost C++ code looking for types that are implicitly convertible
  • a build of USD with boost python signatures enabled to get hints about what boost thinks the types are
  • a docstring parser for boost's special brand of python signature
  • a heavily modified version mypy stubgen as a framework for inspecting python code and generating stubs

Ultimately, the most correct way to solve this would be to contribute better signature generation to boost-python itself, but this is getting good results and has much more immediate payoff.


One thing to keep in mind is how the bool operator is overloaded. I'm not even sure how to express this with type hints. Maybe @overload bool?

@OlegAlexander In your example, stage.GetPrimAtPath() returns Usd.Prim which implements __bool__(), so there's nothing too special here. Everything works correctly with the stubs. It can also be expressed as bool(stage.GetPrimAtPath(prim_path))

@OlegAlexander
Copy link

Hi @chadrik,

Thank you so much for creating these stubs! I've had a chance to test them with mypy --strict. Everything is working, however, I did have to change the way I'm importing the modules. For example, I was originally writing imports like this:

from pxr import Sdf, Usd, UsdGeom, UsdShade  # type: ignore[import]

But with the stubs Sdf, Usd, UsdGeom, UsdShade are all Any, even after removing the type: ignore. So now I have to write the imports like this:

from pxr.UsdGeom import SetStageMetersPerUnit, SetStageUpAxis, Xformable, Mesh
from pxr.Usd import Stage, Prim
from pxr.Sdf import Path as SdfPath  # Differentiate from pathlib.Path
from pxr.UsdShade import Material, MaterialBindingAPI

This isn't a huge deal, but if there's a simple way to make your stubs work with both import styles, that would be ideal for existing codebases.

One other note is about the mypy.ini file config recommended in your README. I'm using a pyproject.toml with poetry, so I believe this line from mypy.ini:

[mypy-pxr.*]
ignore_errors = true

becomes:

[[tool.mypy.overrides]]
module = "pxr.*"
ignore_errors = true

in the pyproject.toml file. I've added this config to my pyproject.toml, but I'm not seeing any errors either way. Please try it on your end with a pyproject.toml and update the README, if needed.

These stubs are total game changer for working with USD. Thank you!

@chadrik
Copy link
Contributor Author

chadrik commented Jun 29, 2023

@OlegAlexander that's interesting, because I did not see this issue on my end using mypy 1.3.0 (though I'm doing more testing), but I did get a report of a similar problem with pylance/pyright. I updated the stubs to 23.5.1, can you try that out? If it doesn't work, open up a ticket at https://github.com/LumaPictures/cg-stubs/issues.

@OlegAlexander
Copy link

I'm on mypy 1.4.1. I've updated the stubs to 23.5.1. This is working now:

from pxr import Sdf, Usd, UsdGeom, UsdShade

Thanks!

AdamFelt pushed a commit to autodesk-forks/USD that referenced this issue Apr 16, 2024
…/feature/pipeline_adsk

Refine Testing Pipeline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants