Skip to content

Commit

Permalink
Merge pull request #33: Amazon S3 cache support
Browse files Browse the repository at this point in the history
See also pull request #33 on GitHub:

  #33

As discussed in the pull request I definitely see the value of being
able to keep the binary cache in Amazon S3. What I didn't like about the
pull request was that it introduced a call to store_file_into_s3_cache()
inside the module pip_accel.bdist. Conceptually that module has
absolutely nothing to do with Amazon S3 so that had to change :-)

This change set merges pull request #33 but also introduces a new
pluggable cache backend registration mechanism that enables cache
backends to be added without changing pip-accel. This mechanism uses
setuptools' support for custom entry points to discover the relevant
modules and a trivial metaclass to automagically track cache backend
class definitions.

The local binary cache backend and the Amazon S3 cache backend
(introduced in the pull request) have been modified to use the pluggable
registration mechanism. Maybe more backends will follow. We'll see :-)
  • Loading branch information
xolox committed Nov 9, 2014
2 parents 20a2754 + 6ddbb42 commit 8ff50a9
Show file tree
Hide file tree
Showing 18 changed files with 670 additions and 62 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include *.rst
include *.txt
include pip_accel/deps/*.ini
include tox.ini
5 changes: 3 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Makefile for the pip accelerator.
#
# Author: Peter Odding <[email protected]>
# Last Change: August 14, 2013
# Last Change: November 5, 2014
# URL: https://github.com/paylogic/pip-accel

default:
Expand All @@ -16,7 +16,8 @@ default:
@echo

test:
python setup.py test
pip-accel install -r requirements-testing.txt
tox

clean:
rm -Rf .tox build dist docs/build *.egg-info *.egg
Expand Down
61 changes: 59 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,11 @@ Usage

The ``pip-accel`` command supports all subcommands and options supported by
``pip``, however it is of course only useful for the ``pip install``
subcommand. So for example::
subcommand. So for example:

pip-accel install -r requirements.txt
.. code-block:: bash
$ pip-accel install -r requirements.txt
If you pass a `-v` or `--verbose` option then ``pip`` and ``pip-accel`` will
both use verbose output. The `q` or `--quiet` option is also supported.
Expand Down Expand Up @@ -99,6 +101,58 @@ pip-accel First run 397 seconds 89%
pip-accel Second run 30 seconds 7%
========= ================================ =========== ===============

Alternative cache backends
--------------------------

Bundled with pip-accel are a local cache backend (which stores distribution
archives on the local file system) and an Amazon S3 backend (see below).

Both of these cache backends are registered with pip-accel using a generic
pluggable cache backend registration mechanism. This mechanism makes it
possible to register additional cache backends without modifying pip-accel. If
you are interested in the details please refer to pip-accel's ``setup.py``
script and the two simple Python modules that define the bundled backends.

If you've written a cache backend that you think may be valuable to others,
please feel free to open an issue or pull request on GitHub in order to get
your backend bundled with pip-accel.

Storing the binary cache on Amazon S3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can configure pip-accel to store its binary cache files in an `Amazon S3`_
bucket. In this case Amazon S3 is treated as a second level cache, only used if
the local file system cache can't satisfy a dependency. If the dependency is
not found in the Amazon S3 bucket, the package is built and cached locally (as
usual) but then also saved to the Amazon S3 bucket. This functionality can be
useful for continuous integration build worker boxes that are ephemeral and
don't have persistent local storage to store the pip-accel binary cache.

To get started you need to install pip-accel as follows:

.. code-block:: bash
$ pip install 'pip-accel[s3]'
The ``[s3]`` part enables the Amazon S3 cache backend by installing the Boto_
package. Once installed you can use the following environment variables to
configure the Amazon S3 cache backend:

``$PIP_ACCEL_S3_BUCKET``
The name of the Amazon S3 bucket in which binary distribution archives should
be cached. This environment variable is required to enable the Amazon S3 cache
backend.

``$PIP_ACCEL_S3_PREFIX``
The optional prefix to apply to all Amazon S3 keys. This enables name spacing
based on the environment in which pip-accel is running (to isolate the binary
caches of ABI incompatible systems). *The user is currently responsible for
choosing a suitable prefix.*

You will also need to set AWS credentials, either in a `.boto file`_ or in the
``$AWS_ACCESS_KEY_ID`` and ``$AWS_SECRET_ACCESS_KEY`` environment variables
(refer to the Boto documentation for details).

Dependencies on system packages
-------------------------------

Expand Down Expand Up @@ -212,8 +266,11 @@ This software is licensed under the `MIT license`_ just like pip_ (on which


.. External references:
.. _.boto file: http://boto.readthedocs.org/en/latest/boto_config_tut.html
.. _Amazon S3: http://aws.amazon.com/s3/
.. _behind a CDN: http://mail.python.org/pipermail/distutils-sig/2013-May/020848.html
.. _Binary distributions: http://docs.python.org/2/distutils/builtdist.html
.. _Boto: https://github.com/boto/boto
.. _GitHub project page: https://github.com/paylogic/pip-accel
.. _hosted on Read The Docs: https://pip-accel.readthedocs.org/
.. _issue #30 on GitHub: https://github.com/paylogic/pip-accel/issues/30
Expand Down
21 changes: 15 additions & 6 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,27 @@

# Refer to the Python standard library.
# From: http://twistedmatrix.com/trac/ticket/4582.
intersphinx_mapping = {'python': ('http://docs.python.org', None)}
intersphinx_mapping = {
'python': ('http://docs.python.org', None),
'boto': ('http://boto.readthedocs.org/en/latest/', None),
}

# -- Options for HTML output ---------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['static']

# Output file base name for HTML help builder.
htmlhelp_basename = 'pip-acceldoc'

def setup(app):
def skip_member(app, what, name, obj, skip, options):
"""
Based on http://stackoverflow.com/a/5599712/788200.
"""
if name == '__init__':
return not obj.__doc__
else:
return skip
app.connect('autodoc-skip-member', skip_member)
35 changes: 34 additions & 1 deletion docs/developers.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,47 @@
Documentation for the pip accelerator API
=========================================

On this page you can find the complete API documentation of pip-accel
|release|. Please note that pip-accel has not yet reached a 1.0 version and
until that time arbitrary changes to the API can be made. To clarify that
statement:

- On the one hand I value API stability and I've built a dozen tools on top of
pip-accel myself so I don't think too lightly about breaking backwards
compatibility :-)

- On the other hand if I see opportunities to simplify the code base or make
things more robust I will go ahead and do it. Furthermore the implementation
of pip-accel is dictated (to a certain extent) by pip and this certainly
influences the API. For example API changes may be necessary to facilitate
the upgrade to pip 1.5.x (the current version of pip-accel is based on pip
1.4.x).

Here are the relevant Python modules that make up pip-accel:

.. contents::
:local:

.. automodule:: pip_accel
:members:

.. automodule:: pip_accel.req
:members:

.. automodule:: pip_accel.bdist
:members:

.. automodule:: pip_accel.caches
:members:

.. automodule:: pip_accel.caches.local
:members:

.. automodule:: pip_accel.caches.s3
:members:

.. automodule:: pip_accel.deps
:members:

.. automodule:: pip_accel.bdist
.. automodule:: pip_accel.utils
:members:
22 changes: 12 additions & 10 deletions pip_accel/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
# Accelerator for pip, the Python package manager.
#
# Author: Peter Odding <[email protected]>
# Last Change: October 26, 2014
# Last Change: November 9, 2014
# URL: https://github.com/paylogic/pip-accel
#
# TODO Permanently store logs in the pip-accel directory (think about log rotation).
# TODO Maybe we should save the output of `python setup.py bdist_dumb` somewhere as well?

"""
:py:mod:`pip_accel` - Top level functions and command line interface
====================================================================
The Python module :py:mod:`pip_accel` defines the classes and functions that
implement the functionality of the pip accelerator and the ``pip-accel``
command. Instead of using the ``pip-accel`` command you can also use the pip
Expand All @@ -20,7 +23,7 @@
"""

# Semi-standard module versioning.
__version__ = '0.13.5'
__version__ = '0.14'

# Standard library modules.
import logging
Expand All @@ -42,8 +45,8 @@

# Modules included in our package.
from pip_accel.bdist import get_binary_dist, install_binary_dist
from pip_accel.config import (binary_index, download_cache, index_version_file,
on_debian, source_index)
from pip_accel.caches import CacheManager
from pip_accel.config import binary_index, download_cache, index_version_file, on_debian, source_index
from pip_accel.req import Requirement
from pip_accel.utils import run

Expand Down Expand Up @@ -104,6 +107,7 @@ def main():
main_timer = Timer()
initialize_directories()
build_directory = tempfile.mkdtemp()
cache = CacheManager()
# Execute "pip install" in a loop in order to retry after intermittent
# error responses from servers (which can happen quite frequently).
try:
Expand All @@ -114,7 +118,7 @@ def main():
logger.info("We don't have all source distributions yet!")
download_source_dists(arguments, build_directory)
else:
install_requirements(requirements)
install_requirements(requirements, cache)
logger.info("Done! Took %s to install %i package%s.", main_timer, len(requirements), '' if len(requirements) == 1 else 's')
return
logger.info("pip failed, retrying (%i/%i) ..", i + 1, MAX_RETRIES)
Expand Down Expand Up @@ -239,11 +243,12 @@ def download_source_dists(arguments, build_directory):
except Exception as e:
logger.warn("pip raised an exception while downloading source distributions: %s.", e)

def install_requirements(requirements, install_prefix=ENVIRONMENT):
def install_requirements(requirements, cache, install_prefix=ENVIRONMENT):
"""
Manually install all requirements from binary distributions.
:param requirements: A list of :py:class:`pip_accel.req.Requirement` objects.
:param cache: A :py:class:`.CacheManager` object.
:param install_prefix: The "prefix" under which the requirements should be
installed. This will be a pathname like ``/usr``,
``/usr/local`` or the pathname of a virtual
Expand Down Expand Up @@ -276,7 +281,7 @@ def install_requirements(requirements, install_prefix=ENVIRONMENT):
else:
members = get_binary_dist(requirement.name, requirement.version,
requirement.source_directory, requirement.url,
prefix=install_prefix, python=python)
cache=cache, prefix=install_prefix, python=python)
install_binary_dist(members, prefix=install_prefix, python=python)
requirement.pip_requirement.remove_temporary_source()
logger.info("Finished installing all requirements in %s.", install_timer)
Expand Down Expand Up @@ -508,6 +513,3 @@ def dependency_links(self):
@dependency_links.setter
def dependency_links(self, value):
logger.debug("Custom package finder ignoring 'dependency_links' value (%r) ..", value)

if __name__ == '__main__':
main()
39 changes: 20 additions & 19 deletions pip_accel/bdist.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,18 @@
# Functions to manipulate Python binary distribution archives.
#
# Author: Peter Odding <[email protected]>
# Last Change: July 16, 2014
# Last Change: November 9, 2014
# URL: https://github.com/paylogic/pip-accel

"""
Binary distribution archive manipulation
========================================
:py:mod:`pip_accel.bdist` - Binary distribution archive manipulation
====================================================================
The functions in this module are used to create, transform and install from
binary distribution archives.
"""

# Standard library modules.
import hashlib
import logging
import os
import os.path
Expand All @@ -31,14 +30,13 @@
from humanfriendly import Spinner, Timer

# Modules included in our package.
from pip_accel.config import binary_index, on_debian
from pip_accel.config import on_debian
from pip_accel.deps import sanity_check_dependencies
from pip_accel.utils import get_python_version

# Initialize a logger for this module.
logger = logging.getLogger(__name__)

def get_binary_dist(package, version, directory, url=None, python='/usr/bin/python', prefix='/usr'):
def get_binary_dist(package, version, directory, url, cache, python='/usr/bin/python', prefix='/usr'):
"""
Get the cached binary distribution archive that was previously built for
the given package (name, version) (and optionally URL). If no archive has
Expand All @@ -49,8 +47,9 @@ def get_binary_dist(package, version, directory, url=None, python='/usr/bin/pyth
:param version: The version of the requirement to build.
:param directory: The directory where the unpacked sources of the
requirement are available.
:param url: The URL of the requirement (optional). When given this is used
to generate the filename of the cached binary distribution.
:param url: The URL of the requirement (may be ``None``). This is used to
generate the filename of the cached binary distribution.
:param cache: A :py:class:`.CacheManager` object.
:param python: The pathname of the Python executable to use to run
``setup.py`` (obviously this should point to a working
Python installation).
Expand All @@ -73,11 +72,8 @@ def get_binary_dist(package, version, directory, url=None, python='/usr/bin/pyth
would change with every run of pip-accel, triggering a time
consuming rebuild of the binary distribution.
"""
if url and url.startswith('file://'):
url = None
tag = hashlib.sha1(str(version + url).encode()).hexdigest() if url else version
cache_file = os.path.join(binary_index, '%s:%s:%s.tar.gz' % (package, tag, get_python_version()))
if not os.path.isfile(cache_file):
cache_file = cache.get(package, version, url)
if not cache_file:
logger.debug("%s (%s) hasn't been cached yet, doing so now.", package, version)
# Build the binary distribution.
try:
Expand All @@ -86,15 +82,18 @@ def get_binary_dist(package, version, directory, url=None, python='/usr/bin/pyth
sanity_check_dependencies(package)
raw_file = build_binary_dist(package, version, directory, python=python)
# Transform the binary distribution archive into a form that we can re-use.
transformed_file = '%s.tmp-%i' % (cache_file, os.getpid())
transformed_file = os.path.join(tempfile.gettempdir(), os.path.basename(raw_file))
archive = tarfile.open(transformed_file, 'w:gz')
for member, from_handle in transform_binary_dist(raw_file, prefix=prefix):
archive.addfile(member, from_handle)
archive.close()
# Try to avoid race conditions between multiple processes by atomically
# moving the transformed binary distribution into its final place.
os.rename(transformed_file, cache_file)
logger.debug("%s (%s) cached as %s.", package, version, cache_file)
# Push the binary distribution archive to all available backends.
with open(transformed_file, 'rb') as handle:
cache.put(package, version, url, handle)
# Cleanup the temporary file.
os.remove(transformed_file)
# Get the absolute pathname of the file in the local cache.
cache_file = cache.get(package, version, url)
archive = tarfile.open(cache_file, 'r:gz')
for member in archive.getmembers():
yield member, archive.extractfile(member.name)
Expand Down Expand Up @@ -323,3 +322,5 @@ class NoBuildOutput(Exception):
Raised by :py:func:`build_binary_dist()` when a binary distribution build
fails to produce a binary distribution archive.
"""


Loading

0 comments on commit 8ff50a9

Please sign in to comment.