Skip to content
mdboom edited this page Oct 8, 2012 · 3 revisions
.. author:: Michael Droettboom

This MEP attempts to improve the way in which third-party dependencies in matplotlib are handled.

Discussion

#1157: Use automatic dependency resolution

#1290: Debundle pyparsing

#1261: Update six to 1.2

One of the goals of matplotlib has been to keep it as easy to install as possible. To that end, some third-party dependencies are included in the source tree and, under certain circumstances, installed alongside matplotlib. This MEP aims to resolve some problems with that approach, bring some consistency, while continuing to make installation convenient.

At the time that was initially done, setuptools, easy_install and PyPI were not mature enough to be relied on. However, at present, we should be able to safely leverage the "modern" versions of those tools, distribute and pip.

While matplotlib has dependencies on both Python libraries and C/C++ libraries, this MEP addresses only the Python libraries so as to not confuse the issue. C libraries represent a larger and mostly orthogonal set of problems.

matplotlib depends on the following third-party Python libraries:

  • Numpy
  • dateutil (pure Python)
  • pytz (pure Python)
  • six -- required by dateutil (pure Python)
  • pyparsing (pure Python)
  • PIL (optional)
  • GUI frameworks: pygtk, gobject, tkinter, PySide, PyQt4, wx (all optional, but one is required for an interactive GUI)

When installing from source, a git checkout or pip:

  • setup.py attempts to import numpy. If this fails, the installation fails.
  • For each of dateutil, pytz and six, setup.py attempts to import them (from the top-level namespace). If that fails, matplotlib installs its local copy of the library into the top-level namespace.
  • pyparsing is always installed inside of the matplotlib namespace.

This behavior is most surprising when used with pip, because no pip dependency resolution is performed, even though it is likely to work for all of these packages.

The fact that pyparsing is installed in the matplotlib namespace has reportedly (#1290) confused some users into thinking it is a matplotlib-related module and import it from there rather than the top-level.

When installing using the Windows installer, dateutil, pytz and six are installed at the top-level always, potentially overwriting already installed copies of those libraries.

TODO: Describe behavior with the OS-X installer.

When installing using a package manager (Debian, RedHat, MacPorts etc.), this behavior actually does the right thing, and there are no special patches in the matplotlib packages to deal with the fact that we handle dateutil, pytz and six in this way. However, care should be taken that whatever approach we move to continues to work in that context.

Maintaining these packages in the matplotlib tree and making sure they are up-to-date is a maintenance burden. Advanced new features that may require a third-party pure Python library have a higher barrier to inclusion because of this burden.

Third-party dependencies are downloaded and installed from their canonical locations by leveraging pip, distribute and PyPI.

dateutil, pytz, and pyparsing should be made into optional dependencies -- though obviously some features would fail if they aren't installed. This will allow the user to decide whether they want to bother installing a particular feature.

For installing from source, and assuming the user has all of the C-level compilers and dependencies, this can be accomplished fairly easily using distribute and following the instructions here. The only anticipated change to the matplotlib library code will be to import pyparsing from the top-level namespace rather than from within matplotlib. Note that distribute will also allow us to remove the direct dependency on six, since it is, strictly speaking, only a direct dependency of dateutil.

For binary installations, there are a number of alternatives (here ordered from best/hardest to worst/easiest):

  1. The distutils wininst installer allows a post-install script to run. It might be possible to get this script to run pip to install the other dependencies. (See this thread for someone who has trod that ground before).
  2. Continue to ship dateutil, pytz, six and pyparsing in our installer, but use the post-install-script to install them only if they can not already be found.
  3. Move all of these packages inside a (new) matplotlib.extern namespace so it is clear for outside users that these are external packages. Add some conditional imports in the core matplotlib codebase so dateutil (at the top-level) is tried first, and failing that matplotlib.extern.dateutil is used.

2 and 3 are undesirable as they still require maintaining copies of these packages in our tree -- and this is exacerbated by the fact that they are used less -- only in the binary installers. None of these 3 approaches address Numpy, which will still have to be manually installed using an installer.

TODO: How does this relate to the Mac OS-X installer?

At present, matplotlib can be installed from source on a machine without the third party dependencies and without an internet connection. After this change, an internet connection (and a working PyPI) will be required to install matplotlib for the first time. (Subsequent matplotlib updates or development work will run without accessing the network).

Distributing binary eggs doesn't feel like a usable solution. That requires getting easy_install installed first, and Windows users generally prefer the well known .exe or .msi installer that works out of the box.