Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4846 - Allow importing Python package with C-extension e.g. numpy in the labs CLI #4868

Merged
merged 11 commits into from
Jul 6, 2023

Conversation

jmarrec
Copy link
Collaborator

@jmarrec jmarrec commented Apr 25, 2023

Pull request overview

  • Fixes [Enhancement Request] Allow importing Python package with C-extension e.g. numpy #4846

  • Add some ctest for using numpy with the CLI. 3 flavors

    • Explicitly insert to sys.path the site-packages and lib-dynload folders from the system python
    • Pass --python_path twice (site-packages and lib-dynload)
    • Pass --python_home
  • Model API Changes / Additions

  • Any new or modified fields have been implemented in the EnergyPlus ForwardTranslator (and ReverseTranslator as appropriate)

  • Model API methods are tested (in src/model/test)

  • EnergyPlus ForwardTranslator Tests (in src/energyplus/Test)

  • If a new object or method, added a test in NREL/OpenStudio-resources: Add Link

  • If needed, added VersionTranslation rules for the objects (src/osversion/VersionTranslator.cpp)

  • Verified that C# bindings built fine on Windows, partial classes used as needed, etc.

  • All new and existing tests passes

  • If methods have been deprecated, update rest of code to use the new methods

Labels:

  • If change to an IDD file, add the label IDDChange
  • If breaking existing API, add the label APIChange
  • If deemed ready, add label Pull Request - Ready for CI so that CI builds your PR

Review Checklist

This will not be exhaustively relevant to every PR.

  • Perform a Code Review on GitHub
  • Code Style, strip trailing whitespace, etc.
  • All related changes have been implemented: model changes, model tests, FT changes, FT tests, VersionTranslation, OS App
  • Labeling is ok
  • If defect, verify by running develop branch and reproducing defect, then running PR and reproducing fix
  • If feature, test running new feature, try creative ways to break it
  • CI status: all green or justified

@jmarrec jmarrec added component - CLI Pull Request - Ready for CI This pull request if finalized and is ready for continuous integration verification prior to merge. labels Apr 25, 2023
@jmarrec jmarrec self-assigned this Apr 25, 2023
@@ -1,5 +1,5 @@
message(STATUS "Using CMake ${CMAKE_VERSION}")
cmake_minimum_required(VERSION 3.19.0)
cmake_minimum_required(VERSION 3.20.0)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using cmake_path which is new in 3.20

https://cmake.org/cmake/help/latest/command/cmake_path.html

@@ -7,9 +7,9 @@ set(Python_USE_STATIC_LIBS OFF)
# find_package(Python) has the problem that on github actions in particular it'll pick up the most recent python (eg 3.9) from the tool cache
# even if you have used the setup-python action and set it to 3.8
if (PYTHON_VERSION)
find_package(Python ${PYTHON_VERSION} EXACT COMPONENTS Interpreter Development REQUIRED)
find_package(Python ${PYTHON_VERSION} EXACT REQUIRED COMPONENTS Interpreter Development OPTIONAL_COMPONENTS NumPy)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using NumPy to test loading a native gem because the FindPython (cmake) has a NumPy component so it's easy to test whether the user actually has it installed.

Comment on lines +333 to +341
# ============ Native Ruby Gems / Python Modules - C extensions =============
if (Python_NumPy_FOUND)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only enable the tests if numpy found

Comment on lines 336 to 352
if(NOT Python_ROOT_DIR)
# Python_STDLIB: we expect it to be
# Unix: ~/.pyenv/versions/3.8.12/lib/python3.8
# Windows C:\Python38\Lib
cmake_path(GET Python_STDLIB PARENT_PATH Python_ROOT_DIR)
if(UNIX)
cmake_path(GET Python_ROOT_DIR PARENT_PATH Python_ROOT_DIR)
endif()
endif()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like these shenanigans too much, but not sure how to do otherwise.

Comment on lines 346 to 391
add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.explicit_sys_path_insert
COMMAND $<TARGET_FILE:openstudio> labs execute_python_script execute_python_script_with_numpy.py ${Python_STDLIB}
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)

if(UNIX)
add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_path
COMMAND $<TARGET_FILE:openstudio> labs
--python_path "${Python_SITELIB}" # == "${Python_STDLIB}/site-packages"
--python_path "${Python_STDLIB}/lib-dynload"
execute_python_script execute_python_script_with_numpy.py
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)
else()
add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_path
COMMAND $<TARGET_FILE:openstudio> labs
--python_path "$<SHELL_PATH:${Python_SITELIB}>"
execute_python_script execute_python_script_with_numpy.py
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)
endif()

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_home
COMMAND $<TARGET_FILE:openstudio> labs
--python_home "$<SHELL_PATH:${Python_ROOT_DIR}>"
execute_python_script execute_python_script_with_numpy.py
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three different tests to cover the scenarios listed in OP

}
}

PythonEngine::PythonEngine(int argc, char* argv[]) : ScriptEngine(argc, argv), program(Py_DecodeLocale(pythonProgramName, nullptr)) {
// TODO: modernize and use PyConfig (new in 3.8): https://docs.python.org/3/c-api/init_config.html
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for later

Comment on lines +61 to +84
// The PYTHONPATH / PYTHONHOME should be set before initializing Python
// If this Py_SetPath is called before Py_Initialize, then Py_GetPath won't attempt to compute a default search path
// The default search path is affected by the Py_SetPythonHome
// * if the user passed --python_home, we use that as the Python Home, and do not use Py_SetPath. But later we add the E+ standard_lib anyways
// so it takes precedence (to limit incompatibility issues...)
// * If the user didn't pass it, we use Py_SetPath set to the E+ standard_lib

std::vector<std::string> args(argv, std::next(argv, static_cast<std::ptrdiff_t>(argc)));
bool pythonHomePassed = false;
auto it = std::find(args.cbegin(), args.cend(), "--python_home");
if (it != args.cend()) {
openstudio::path pythonHomeDir(*std::next(it));
wchar_t* h = Py_DecodeLocale(pythonHomeDir.make_preferred().string().c_str(), nullptr);
Py_SetPythonHome(h);
pythonHomePassed = true;
} else {
wchar_t* a = Py_DecodeLocale(pathToPythonPackages.make_preferred().string().c_str(), nullptr);
Py_SetPath(a);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deal with --python_home . Can't do it later, it has to be done before Py_Initialize... so grab it from the argc/argv...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feels like a code smell that we are parsing argc / argv at this point instead of having everything parsed out by the CLI11 library. I think I am guilty of establishing this pattern in ScriptEngine and I think I did it because the ruby init functions (at least one of them) accept argc and argv.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ruby_sysinit does take argc, argv yes.

Plus, here this is an order thing, and specific to Python while the CLI only sees a ScriptEngineInstance, so adding an explicit PythonHome argument isn't great either.

If you see another cleaner way to do it, this can be addressed later. There are other things that probably need cleanup too (including simplifying the folder / Cmake structure probably).

Another thing: we need to process the argv data manually at the start of the CLI anyways since we do support (for legacy reasons) some usage like openstudio labs test.rb that is translated as being openstudio labs execute_ruby_script test.rb . cf https://github.com/NREL/OpenStudio/blob/2af6123b91304e12060af360d0afb24083336a4c/src/cli/main.cpp#L45C1-L89

Comment on lines +85 to +95
if (pythonHomePassed) {
addToPythonPath(pathToPythonPackages);
}
#if defined(__APPLE__) || defined(__linux___) || defined(__unix__)
addToPythonPath(pathToPythonPackages / "lib-dynload");
#endif
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib-dynload is present in the E+ python_standard_lib, we should add it

@@ -126,28 +126,33 @@ int main(int argc, char* argv[]) {
experimentalApp
->add_option("-I,--include", includeDirs, "Add additional directory to add to front of Ruby $LOAD_PATH (may be used more than once)")
->option_text("DIR")
->check(CLI::ExistingDirectory)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some CLI11 validators to ensure the various args are valid directories to begin with and catch mistakes

Comment on lines +29 to +44
explicit DynamicLibrary(openstudio::path location) : m_location{std::move(location)} {
int flags = RTLD_LAZY | RTLD_LOCAL; // NOLINT(misc-const-correctness, hicpp-signed-bitwise)

// This seems to work on Mac without RTLD_GLOBAL...
#ifdef __linux__
if (m_location.filename().generic_string().find("python") != std::string::npos) {
// https://stackoverflow.com/questions/67891197/ctypes-cpython-39-x86-64-linux-gnu-so-undefined-symbol-pyfloat-type-in-embedd
flags = RTLD_LAZY | RTLD_GLOBAL;
}
#endif
m_handle = {dlopen(m_location.c_str(), flags), m_handle_deleter};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit that bothers me a bit @kbenne

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we actually need these flags ever? I'm not sure about that. I think there is a decent chance they got added unnessarily.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which flag(s)? RTLD_LAZY?

@jmarrec jmarrec marked this pull request as ready for review April 26, 2023 01:23
@jmarrec jmarrec requested a review from kbenne April 26, 2023 01:23
Comment on lines 336 to 383
if(NOT Python_ROOT_DIR)
# Python_STDLIB: we expect it to be
# Unix: ~/.pyenv/versions/3.8.12/lib/python3.8 or
# or on CI: /usr/lib/python3.8/ ... with numpy if install via pip3 and not apt install python3-numpy in `/usr/local/lib/python3.8/dist-packages/`
# Windows C:\Python38\Lib
cmake_path(GET Python_STDLIB PARENT_PATH Python_ROOT_DIR)
if(UNIX)
cmake_path(GET Python_ROOT_DIR PARENT_PATH Python_ROOT_DIR)
endif()
endif()

if(UNIX)
set(PYTHON_PATH "${Python_SITELIB}" "${Python_STDLIB}/lib-dynload")

if(NOT APPLE)
set(EXTRA_LOCAL_DIST "/usr/local/lib/python3.8/dist-packages")
if (EXISTS "${EXTRA_LOCAL_DIST}")
list(APPEND PYTHON_PATH "${EXTRA_LOCAL_DIST}")
endif()
endif()
else()
set(PYTHON_PATH "$<SHELL_PATH:${Python_SITELIB}>")
endif()

message(DEBUG "PYTHON_PATH=${PYTHON_PATH}")

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.explicit_sys_path_insert
COMMAND $<TARGET_FILE:openstudio> labs execute_python_script execute_python_script_with_numpy.py ${Python_STDLIB}
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
)

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.python_path
COMMAND $<TARGET_FILE:openstudio> labs
"$<$<BOOL:${PYTHON_PATH}>:--python_path;$<JOIN:${PYTHON_PATH},;--python_path;>>"
execute_python_script execute_python_script_with_numpy.py
COMMAND_EXPAND_LISTS
WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/test/"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI on ubuntu uses the system python... which on debian uses a special directory for pip3 installed stuff (apt-get install python3-numpy would have worked out of the box).... so I had to do great cmake horrors with cmake generator expressions being expanded a list... sorry

@jmarrec jmarrec added this to the OpenStudio SDK 3.7.0 milestone May 31, 2023
@jmarrec
Copy link
Collaborator Author

jmarrec commented Jun 30, 2023

38eed72, which installs jinja2 numpy pandas pip if -DPYTHON_INSTALL_NATIVE_MODULES:BOOL=ON is passed can be reverted, I'm just trying it out.

Edit: separated onto another branch

@@ -25,9 +25,12 @@ void addToPythonPath(const openstudio::path& includePath) {
if (!includePath.empty()) {
PyObject* sys = PyImport_ImportModule("sys");
PyObject* sysPath = PyObject_GetAttrString(sys, "path");
Py_DECREF(sys); // PyImport_ImportModule returns a new reference, decrement it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how many instances there are where we miss these decrements. Or we have one that should not be. I bet there are a few remaining.

Comment on lines +61 to +84
// The PYTHONPATH / PYTHONHOME should be set before initializing Python
// If this Py_SetPath is called before Py_Initialize, then Py_GetPath won't attempt to compute a default search path
// The default search path is affected by the Py_SetPythonHome
// * if the user passed --python_home, we use that as the Python Home, and do not use Py_SetPath. But later we add the E+ standard_lib anyways
// so it takes precedence (to limit incompatibility issues...)
// * If the user didn't pass it, we use Py_SetPath set to the E+ standard_lib

std::vector<std::string> args(argv, std::next(argv, static_cast<std::ptrdiff_t>(argc)));
bool pythonHomePassed = false;
auto it = std::find(args.cbegin(), args.cend(), "--python_home");
if (it != args.cend()) {
openstudio::path pythonHomeDir(*std::next(it));
wchar_t* h = Py_DecodeLocale(pythonHomeDir.make_preferred().string().c_str(), nullptr);
Py_SetPythonHome(h);
pythonHomePassed = true;
} else {
wchar_t* a = Py_DecodeLocale(pathToPythonPackages.make_preferred().string().c_str(), nullptr);
Py_SetPath(a);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If feels like a code smell that we are parsing argc / argv at this point instead of having everything parsed out by the CLI11 library. I think I am guilty of establishing this pattern in ScriptEngine and I think I did it because the ruby init functions (at least one of them) accept argc and argv.

Comment on lines +29 to +44
explicit DynamicLibrary(openstudio::path location) : m_location{std::move(location)} {
int flags = RTLD_LAZY | RTLD_LOCAL; // NOLINT(misc-const-correctness, hicpp-signed-bitwise)

// This seems to work on Mac without RTLD_GLOBAL...
#ifdef __linux__
if (m_location.filename().generic_string().find("python") != std::string::npos) {
// https://stackoverflow.com/questions/67891197/ctypes-cpython-39-x86-64-linux-gnu-so-undefined-symbol-pyfloat-type-in-embedd
flags = RTLD_LAZY | RTLD_GLOBAL;
}
#endif
m_handle = {dlopen(m_location.c_str(), flags), m_handle_deleter};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we actually need these flags ever? I'm not sure about that. I think there is a decent chance they got added unnessarily.


message(DEBUG "PYTHON_PATH=${PYTHON_PATH}")

add_test(NAME OpenStudioCLI.Labs.execute_python_script.numpy.explicit_sys_path_insert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first this series of tests kind of seemed off to me, because they are testing the environment of the developer/build environment, but when I think more carefully, I guess that is the point. The dev environment is a proxy for whatever environment a user may have, and since you already have the scaffolding in place to find a random numpy installation it turns out to be a convenient way to formumulate a test. Seems nice to me.

The only thing this leaves on the table are custom extensions that we choose to ship by default, but I believe that will be addressed in a different issue.

@jmarrec jmarrec merged commit 82304dc into develop Jul 6, 2023
@jmarrec jmarrec deleted the 4846_labs_python_native branch July 6, 2023 07:23
@jmarrec jmarrec mentioned this pull request Jul 17, 2023
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component - CLI Pull Request - Ready for CI This pull request if finalized and is ready for continuous integration verification prior to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement Request] Allow importing Python package with C-extension e.g. numpy
3 participants