Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[travis/tox] Restructuring configuration and testing #4552

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Mar 6, 2018

TL;DR this PR changes the way we configure and test the application. I know there are mixed options about the primary use case of tox (#3881), but I thought it was worthwhile exploring.

tox

I think of tox as a testing environment which can run locally via shell-based testing or with a CI server, and thus should be agnostic of the CI service. In my opinion the way we test locally and remotely should be the same. This helps to reinforce familiarity and consistency. I agree there is some additional overhead when the environment is first created, but future runs leverage the existing environment (it can be recreated if necessary via tox --recreate) which is built in a deterministic manner whilst guaranteeing all the dependencies are met. All tests (including unit and linting) should be run via tox rather than executing flake8, npm run lint etc.

Caching

Modern versions of pip fetch the wheel file from PyPI if it exists (or builds it if it doesn't) and caches it locally, and thus there's no longer a need for the pip wheel logic, which helps to simplify the tox environment. Note the Travis CI leverages the pip cache.

Python unit tests

Previously for unit testing locally one would run,

run_tests.sh

which used a slew of globally defined SQLite databases which resided in ~/.superset. These databases are re-created every time the script is re-run. With tox one simply runs,

tox -e py27|py34|...

which uses a self contained Python version specific virtual environment which houses the SQLite databases. These are stored in a tox environment temporary directory which are re-created on each test run in a similar manner to before.

Previously when running a single unit test one would first have to run all the tests, in order to create the databases, populate the data etc., however here this requirement is loosened. Rather than running,

run_specific_test.sh tests.test_file:TestClassName.test_method_name 

one would use the same tox test harness and run,

tox -e py27|py34|... -- tests/test_file.py:TestClassName.test_method_name

Now you one can run either i) all tests in a specific file, or ii) a single test, without having to run the entire test suite first. Every time one re-runs the tests, the tox environment persists, however the main database is re-created/re-populated only if needed (not all tests require the database). Though there is some potential overhead it does ensure that the database state is valid before each relevant test run.

Reformatting

Finally there are a few other more minor changes to the tox configuration including; new tox multi-dimensional platform-specific configuration, removing unnecessary installs, and fixing the MySQL Python 3.4 SUPERSET__SQLALCHEMY_DATABASE_URI environment variable.

requirements.txt

I realize that Superset is an application rather than a library but when defining Python packages setup.py should only contain abstract dependencies as described in detail here. Decoupling the actual requirements from setup.py means individual deployments have the freedom to use other versions, whereas from a development standpoint you want to have builds which are predictable and deterministic.

The pinned versions from setup.py have been removed and added to a requirements.txt file, which is used by tox. Similarly development and documentation requirements have been pinned. Note I explore options such as pip-compile from pip-tools which can generate requirement files from setup.py or other config files. Though this seemed promising, it pins all sub-dependencies as well which can be problematic as some of these packages are not defined in every Python version. At the moment it seems prudent to manage these files by hand.

setup.py

As stated previous all pinning has been removed from setup.py and all packages are now listed in alphabetical order, and lower cased. Additionally I've removed the test_requires as python setup.py tests is both discouraged and unused as tests were never run this way.

Travis

Only minor changes to the .travis.yml file, deprecated tox-travis's which wasn't used, simplifying the environment matrix to be a list of environments (which set the Travis TOXENV environment), explicitly defining all the services, leveraging the pip cache (see previously) and removing /tmp/hive/bin from the PATH which I believe may have been copied from Airflow's .travis.yml file.

to: @mistercrunch @xrmx

@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch 18 times, most recently from 827d844 to 75589db Compare March 7, 2018 04:42
@codecov-io
Copy link

codecov-io commented Mar 7, 2018

Codecov Report

Merging #4552 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4552   +/-   ##
=======================================
  Coverage   72.63%   72.63%           
=======================================
  Files         205      205           
  Lines       15403    15403           
  Branches     1183     1183           
=======================================
  Hits        11188    11188           
  Misses       4212     4212           
  Partials        3        3
Impacted Files Coverage Δ
superset/config.py 92.3% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 41c158e...9ecb70c. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch 4 times, most recently from 969a338 to c086986 Compare March 7, 2018 05:18
@john-bodley john-bodley changed the title [wip][travis/tox] Restructuring configuration [travis/tox] Restructuring configuration and testing Mar 7, 2018
@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch 6 times, most recently from e4e1c55 to 5c69cce Compare March 9, 2018 06:42
@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch from 0cdaff4 to cbbf647 Compare March 23, 2018 16:49
@john-bodley
Copy link
Member Author

@mistercrunch @betodealmeida per the conservation regarding pinning earlier today what are your thoughts on this PR? Note beyond requirements.txt there's changes in configuration around testing (including unit tests), the tox test harness, and Travis CI.

@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch 4 times, most recently from 683061c to 8254499 Compare April 5, 2018 01:01
@john-bodley
Copy link
Member Author

@mistercrunch sorry I was wondering what your thoughts were on this? Additionally having a requirements.txt with pinned versions would help us with caching Python packages in Docker.

@mistercrunch
Copy link
Member

So if I want to add a python dep where do I do it? In setup.py AND in requirements.txt?

I was thinking this process would involve a [documented] pip-compile step and the whole dep tree.

I remember we spoke about this but forgot the details on why no pip-compile + pinning the whole tree. If the goal is to get as close as possible to an fully reproducible build, it seems like we need the whole dep tree.

@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch from 8254499 to af9241e Compare April 5, 2018 05:57
@john-bodley
Copy link
Member Author

@mistercrunch I definitely explored the pip-compile route as it would be great to have requirements.txt autogenerated from setup.py and requirements-dev.txt autogenerated from requirements-dev.in etc.

The issue is there's not a plausible way to autogenerate a Python 2/3 compatible requirements.txt file, i.e., in Python 2.7 pip-compile generates:

#
# This file is autogenerated by pip-compile
# To update, run:
#
#    pip-compile --output-file requirements.txt setup.py
#
alembic==0.9.9            # via flask-migrate
amqp==2.2.2               # via kombu
asn1crypto==0.24.0        # via cryptography
babel==2.5.3              # via flask-babel, flower
backports-abc==0.5        # via tornado
billiard==3.5.0.3         # via celery
bleach==2.1.3
boto3==1.7.0
botocore==1.10.0          # via boto3, s3transfer
celery==4.1.0
certifi==2018.1.18        # via requests
cffi==1.11.5              # via cryptography
chardet==3.0.4            # via requests
click==6.7                # via flask, flask-appbuilder
colorama==0.3.9
cryptography==2.2.2
docutils==0.14            # via botocore
enum34==1.1.6             # via cryptography
flask-appbuilder==1.10.0
flask-babel==0.11.1       # via flask-appbuilder
flask-cache==0.13.1
flask-compress==1.4.0
flask-login==0.2.11       # via flask-appbuilder
flask-migrate==2.1.1
flask-openid==1.2.5       # via flask-appbuilder
flask-script==2.0.6
flask-sqlalchemy==2.1
flask-testing==0.7.1
flask-wtf==0.14.2
flask==0.12.2
flower==0.9.2
future==0.16.0
futures==3.2.0            # via flower, s3transfer, tornado
geopy==1.12.0
gunicorn==19.7.1
html5lib==1.0.1           # via bleach
humanize==0.5.1
idna==2.6
ipaddress==1.0.19         # via cryptography
itsdangerous==0.24        # via flask
jinja2==2.10              # via flask, flask-babel
jmespath==0.9.3           # via boto3, botocore
kombu==4.1.0              # via celery
mako==1.0.7               # via alembic
markdown==2.6.11
markupsafe==1.0           # via jinja2, mako
numpy==1.14.2             # via pandas
pandas==0.22.0
parsedatetime==2.4
pathlib2==2.3.0
polyline==1.3.2
pycparser==2.18           # via cffi
pydruid==0.4.2
pyhive==0.5.1
python-dateutil==2.6.1
python-editor==1.0.3      # via alembic
python-geohash==0.8.5
python-openid==2.2.5      # via flask-openid
pytz==2018.3              # via babel, celery, flower, pandas
pyyaml==3.12
requests==2.18.4
s3transfer==0.1.13        # via boto3
sasl==0.2.1               # via thrift-sasl
scandir==1.7              # via pathlib2
simplejson==3.13.2
singledispatch==3.4.0.3   # via tornado
six==1.11.0
sqlalchemy-utils==0.33.2
sqlalchemy==1.2.6
sqlparse==0.2.4
thrift-sasl==0.3.0
thrift==0.11.0
tornado==5.0.1            # via flower
unicodecsv==0.14.1
unidecode==1.0.22
urllib3==1.22             # via requests
vine==1.1.4               # via amqp
webencodings==0.5.1       # via html5lib
werkzeug==0.14.1          # via flask
wtforms==2.1              # via flask-wtf

however if you try to install this in Python 3 via, pip3 install -r requirements.txt it fails with the following error:

Collecting futures==3.2.0 (from -r requirements.txt (line 33))
  Could not find a version that satisfies the requirement futures==3.2.0 (from -r requirements.txt (line 33)) (from versions: 0.2.python3, 0.1, 0.2, 1.0, 2.0, 2.1, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.1.5, 2.1.6, 2.2.0, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.0.4, 3.0.5, 3.1.0, 3.1.1)
No matching distribution found for futures==3.2.0 (from -r requirements.txt (line 33))

I thought of two possible solutions:

  1. Enabling pip-compile to support multiple Python versions, which is something that `requirements.txt supports, i.e.,
futures==3.2.0; python_version == '2.7'
  1. Enabling pip-compile to exclude dependencies (which futures is) which would make the output Python 2/3 compatible.

I raised this issue with pip-tools where the suggestion was to create a requirements.txt for each environment, however I'm not certain we want to have to build/maintain environment specific files.

@timifasubaa
Copy link
Contributor

timifasubaa commented Apr 5, 2018

@mistercrunch @john-bodley Would (pipenv)[https://docs.pipenv.org/] solve our problems here?

@mistercrunch
Copy link
Member

Also I'm not clear on how Pypi operates, but my guess is that it only looks at what's in setup.py and disregards the requirements.txt file, meaning a simple pip install superset would be less predictable following this PR since much version-range specs was removed from setup.py in this PR.

At Lyft our build process for all Python apps generates both a requirements.txt and a requirements3.txt out of a requirements.in file, not sure how it's accomplished, maybe it has pip-compile while 2 different venv are used alternatively.

@john-bodley
Copy link
Member Author

john-bodley commented Apr 5, 2018

@mistercrunch it would be good to understand how Lyft creates these different files (though your hypothesis seems valid). It seems you maybe able to achieve this in conjunction with pyenv. I can also explore pipenv, though it would need to play well with tox, Travis, etc.

Also if we went down the virtualenv route one would need to activate it for the various services or wrap existing command via pipenv run.

Finally the setup.py/requirements.txt relationship is well defined, in essence setup.py should never pin packages but merely provide the list of required dependencies, with version restrictions were appropriate. I realize that Superset is an application rather than a package, but this allows individual instances to configure their Python environment how they see appropriate, i.e., a custom plugin may require a different version of dependency. The requirements.txt simply provides a deterministic environment using package versions which we’ve certified/tested.

@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch 2 times, most recently from 198bc7f to 0d3765f Compare April 8, 2018 17:33
@john-bodley john-bodley force-pushed the john-bodley-travis-tox-restructure branch from 0d3765f to 9ecb70c Compare April 8, 2018 17:50
@john-bodley
Copy link
Member Author

john-bodley commented Apr 8, 2018

@mistercrunch per our conversation last week, I updated CONTRIBUTING.md regarding maintaining setup.py and requirements.txt.

@mistercrunch
Copy link
Member

LGTM!

@john-bodley john-bodley merged commit 1627fd0 into apache:master Apr 10, 2018
@john-bodley john-bodley deleted the john-bodley-travis-tox-restructure branch April 10, 2018 22:59
john-bodley added a commit to john-bodley/superset that referenced this pull request Apr 10, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request May 31, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.25.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants