-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orca integration for static image export #1120
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Works in QtConsole if you initialize the qt event loop with: from PyQt5.QtWebEngineWidgets import QWebEngineView %gui qt QWebEngineView import must precede the %gui qt command.
Added orca server process management tests
- Save/load settings from ~/.plotly/.orca file - More validation - write image - add image options (format, size, scale)
Lets save some complexity and not support using an external orca server for now.
The old approach required OS-specific process management and it still didn't kill the child process for orca installed with npm. Now all of the OS-specifics are in psutil. psutil is an optional import that is check when the server is first requested.
We could leave the plotly.io._show module in place, so people could experiment with the image backend concept.
It emits some errors when children are killed, but these are harmless
This way program exist won't wait for it to complete
…Mathjax CDN - Add topojson files to `plotly/package_data` - Add new config settings for plotly.js bundle (use local by default), topojson, mathjax, and mapbox access token - Add image tests for `topojson` images and mathjax images - Remove saving of orca config to ~/.plotly. Need more a more wholistic settings solution that handles environments - Shutdown server when setting config parameters that won't be active until server restarts (e.g. plotlyjs bundle) - Make default timeout None. So shutting down the server due to inactivity is now opt-in.
details in error message.
to bypass figure dict validation. Also improve presentation of orca error messages and added a special check for EPS failures that might be due to the needed poppler dependency
… fails to communicate with the orca server process.
Needed for EPS tests
On windows, this avoids Popen being unable to find the orca executable when it is on the environment path. [ci skip]
If orca returns a 525: 'plotly.js error', and the figure contains at least one mapbox trace, and not mapbox_access_token is configured, then include a error message explaining what to do.
Alright, time to merge this thing! |
This was referenced Aug 27, 2018
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR integrates orca into plotly.py to support exporting figures as static images 🎉
cc @chriddyp @jackparmer @nicolaskruchten @cldougl @Kully @etpinard @malmaud
Even if you don't have time to look at the code or test out the branch, I'd appreciate any feedback on the architecture and API notes.
Here goes...
Background
See #1105 for background information and discussion of related work.
Architecture
In this PR I went with method (3) from the issue above, "Use orca in server mode".
The first time an image export operation is performed, an orca server process is launched in the background (as non-blocking subprocess). Image export requests are posted to the server on a local port.
By default, the server process runs until the main process exits. But there is also a
timeout
configuration option (more on configuration options below) that allows a user to specify that the server should be automatically shut down after a certain period of inactivity.Regardless of whether a
timeout
is set, the server may also be manually shutdown and manually started.Implementation Notes
Starting the server
The server subprocess is launched using
subprocess.Popen
to create a long-running background process. The server is launched in--graph-only
mode to be as lean as possible (this avoids running processes for exporting thumbnails, dashboards, etc.)Communicating with the Server
Communication with the server is done using
requests.post
. The request function is wrapped in the@retrying.retry
decorator to handle the automatic retrying of failed requests. The retrying logic is very convenient, as it allows an image request to be made right after the server process is launched and the request will simply block until the server responds.Shutting down the server
It's possible to terminate the particular process created using
subprocess.Popen
with thePopen.terminate
method. Unfortunately, this isn't always enough to actually shut down the server. The trouble is that typical orca entry points (orca.sh, orca.js, orca.cmd) are simply wrapper scripts that call the main orca/electron executable. In my testing on OS X, Linux, and Windows I found thatPopen.terminate
generally only terminates the shell/wrapper process, leaving the orca server running. This is definitely not acceptable, as a user could end up with a new orca process each time they restart their kernel and export images.I initially tried some workarounds involving process groups, and sending different signals, but the result ended up being platform dependent and still not fully reliable. I settled on introducing the
psutil
library as a new optional dependency.psutil
provides a platform agnostic API for iterating over the children of a process, and then terminating them. In my testing, thispsutil
approach has been fully reliable in terminating the server processes across platforms. Since our CI test suites is Linux only at this point, I'm especially glad to not need to introduce any OS X/Windows specific process management logic.Shutdown server after timeout
If a
timeout
is configured when the server process is launched, athreading.Timer
object is created to call the shutdown function aftertimeout
seconds.Each time an image render request is made, any existing
Timer
object is canceled, and a newTimer
is created.Importantly, each timer thread has the daemon property set to
True
. This prevents the main process from waiting for the timer to complete before exiting.Shutdown on exit
The shutdown function is annotated with the
@atexit.register
decorator to ensure that the server is properly shutdown when the main Python process exits.API Design
This PR introduces the beginning of the
plotly.io
module.Image export
Two image export functions are introduced. These function follow the export conventions proposed in #1098.
This functions works very much like the matplotlib
savefig
function.fig
is aFigure
or compatibledict
.file
may be a string referring to a local filesystem path, or a file-like object to be written to. Iffile
is a string, then the file extension is used to infer the image format if possible. Theformat
may be used to explicitly specify the format, and it is required iffile
is not a string with a common extension. Supported formats are png, jpeg (jpg extension supported as well), webp, svg, pdf, and eps (with poppler installed).scale
,width
, andheight
work as you would expect.This function may be used to return the binary representation of the image directly (no temp files or messing with
io.BytesIO
!). This can be used in conjunction withIPython.display.Image
to display static images directly in the notebook or QtConsole.Orca management
If users install orca using conda or npm, they should be able to use the above methods immediately, without additional configuration. But for more technical users, and for general users if things go wrong, there is a new
plotly.io.orca
module.Manual server management
The server may be manually started using
plotly.io.orca.ensure_orca_server()
, and it may be manually shut down usingplotly.io.orca.shutdown_orca_server()
Orca config
plotly.io.orca.config
is an orca configurations/settings object. Here are the properties that can be configuredIf automatic port selection is not desirable, an explicit
port
value may be set here. If an executable namedorca
cannot be found on the path, then theexecutable
property may be set to the absolute path to an orca executable. This is where thetimeout
property is set. The default width, height, scale, and format, control the default values used byto_image
when not otherwise specified.I took the liberty of supplying a default mathjax CDN, this way latex image export just works as long as the user is online. For offline use, the
mathjax
property can be set to the path to a local mathjax installation. Whentopojson
isNone
the plot.ly CDN will be used, but a local path can be supplied if working offline. Finally, themapbox_access_token
property can store a mapbox token that will automatically be applied when exporting mapbox traces.Properties can be set using property assignment
or using the
update
methodThe constants are not settable and are listed for informational purposes.
Saving configuration properties
The config values may optionally be saved to the
~/.plotly
settings directory as~/.plotly/.orca
using theplotly.io.config.save()
method. If present, these setting are automatically loaded on import.Orca status
The current status of the orca server process can be displayed using the
plotly.io.orca.status
object.At initial startup the
state
will beunvalidated
After a valid orca executable has been found, and the server is not yet running, the
state
will be `validated'Here the user can see which orca executable was found on the path, and what version it is.
When the server process is currently running, the
state
will berunning
Here the user can see the details of the running process (port, pid) and the exact command line arguments that were passed to the orca server at startup.
Error messages
There are a lot of things that can potentially go wrong here, so I've tried to make the error messages as helpful as possible. For example here's the error that is raised if the orca executable cannot be found on the path:
Testing
I've added two new test suites.
plotly/tests/test_orca/test_orca_server.py
. These tests cover the logic for locating and validating the orca executable. And the logic for launching and shutting it down. This testing relies onpsutil
to check that the process with the rightpid
is running and then not running. And it relies on pinging the server to make sure it's running on the right port, and that it stops responding when it should be shut down.plotly/tests/test_orca/test_to_image.py
. These tests cover the image conversion logic. I've generated a set or reference images to compare against. These ensure that valid images are produces where they should be, and that the topojson and mathjax configuration is working properly. Unfortunately, the images are not exactly reproducible between my local mac and CircleCI, so for the time being there is a separate directory of reference images for OS X and Linux (though I'm not sureLinux
is fine grained enough).These tests are working on CircleCI. The new tests follow a new conda environment pathway so that orca can be installed using conda. The tests are run with Python 2.7, 3.5, and 3.7.
Performance
The whole reason for using this more complex client/server architecture is to improve image export performance. So how well does it do?
This is not an extensive performance comparison, but I did an initial comparison of matplotlib, this branch, and bokeh (setup instructions). The test was to create a 1000 point scatter plot with varying point size and color and then save it to a png.
So after the orca server is running, the export time here is right on par with matplotlib (~215ms), and much faster than bokeh (~1.7s).
Being on par with matplotlib here is really exciting, and opens up a lot of new use cases for plotly.py. I'm thinking, in particular, of the possibility of a static image backend for interactive use outside of the notebook/browser context.
Side note: bokeh isn't doing any wrong here. This is just how expensive it is to launch a web browser from scratch. This is also about how long it takes the orca server to start up the first time. The advantage with this orca approach is that the server only needs to start up once per session, instead of once per image.
Produced images
And here are the images produced by matplotlib, this branch, and bokeh
TODO
Various things still to do/look into:
validate
option toto_image
andwrite_image