Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for new Breeze project #19867

Merged
merged 1 commit into from
Nov 30, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 28, 2021

It includes:

  • proposal for initial ADRs (Architecture Decision records)
    where we will keep decision records about both - Breeze2 and CI
  • scaffolding for the new breeze command including command line,
    pre-commit checks, automated tests in CI and requirements

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

dev/Breeze/requirements.txt Outdated Show resolved Hide resolved
dev/Breeze/conftest.py Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the initial-python-breeze-commit branch 2 times, most recently from b1de041 to ff2944b Compare November 29, 2021 12:42
@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

It's not exactly what you proposed @uranusjr , but I think with this setup we will be able to modify (in the future) the entrypoint script, to be able to create venv and run equivalent of pip install -e . in this venv if someone runs it without installing it first and then we could simply run breeze from there. WDYT?

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

I am not sure if we want to release the whole airlfow-breeze package on Pypi, but at least the way it is setup now is more "standard" and does not require any hacks indeed. (especially the dir structure and pytest discovery).

I am not sure yet how to approach the "small bootstrap" package - I like how the install would work now with editable installed Breeze in local virtualenv (where I think we want to have the whole of Breeze installed rather than the "bootstrap" script only).

Maybe when we install it as "editable" locally it should install the whole of breeze (all packages) but then when we distribute it in PyPI we only install the "bootstrap" script there. I guess that would be possible? Happy to brainstorm on this one @uranusjr

@uranusjr
Copy link
Member

Using Flit + PEP 621 would eliminate much of the guess work how to get things right 😉

@potiuk potiuk force-pushed the initial-python-breeze-commit branch from ff2944b to 444f8aa Compare November 29, 2021 13:44
@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

I just added the Breeze2 script as I'd see it - we do not even have to install the package in PIP this way - see the latest commit @uranusjr (with the caveat that we'd have to add upgrade whenever setup.cfg changes but that should be easy).

I think that might be nice and portable and "self-running" without extra deps like flit/PEP 621. For me this is really the case of having a runnable "devel" environment.

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

Sounds simple enough:

#!/usr/bin/env python3
import subprocess
import sys
from os import execv
from pathlib import Path

AIRFLOW_SOURCES_DIR = Path(__file__).parent.resolve()
BUILD_DIR = AIRFLOW_SOURCES_DIR / ".build"
BREEZE_VENV_DIR = BUILD_DIR / "breeze2" / "venv"
BREEZE_VENV_BIN_DIR = BREEZE_VENV_DIR / "bin"
BREEZE_VENV_PIP = BREEZE_VENV_BIN_DIR / "pip"
BREEZE_VENV_BREEZE = BREEZE_VENV_BIN_DIR / "Breeze2"

BREEZE_SOURCE_PATH = AIRFLOW_SOURCES_DIR / "dev" / "breeze"

if not BREEZE_VENV_DIR.exists():
    BREEZE_VENV_DIR.mkdir(parents=True, exist_ok=True)
    subprocess.Popen([sys.executable, "-m", "venv", f"{BREEZE_VENV_DIR}"]).wait()
    subprocess.Popen([f"{BREEZE_VENV_PIP}", "install", "-e", "."], cwd=BREEZE_SOURCE_PATH).wait()

execv(f"{BREEZE_VENV_BREEZE}", [f"{BREEZE_VENV_BREEZE}"] + sys.argv[1:])

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

(and we'd need to add some protection against old python version)

@uranusjr
Copy link
Member

Note that execv does not work on Windows, so that’d be a problem if we’re moving toward supporting Windows.

You mention pipx earlier, and I feel it might be a better setup for contributors? Just pipx install apache-airflow-breeze and run breeze in the repository root. This is also more friendly to Windows since pipx (actually pip) would automatically create an exe for the command.

@uranusjr
Copy link
Member

This actually gives me a thought, since you feel it’s important to make the virtual environment (for Breeze) invisibal to the user. Maybe we should build Breeze entirely upon an existing task runner solution, instead of rolling our own infrastructure. Nox has pretty much have virtual environment management and task invocation figured out—we can make Breeze a wrapper to noxfile.py, and just write subcommands as Nox tasks.

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

I have also added current stats of the languages used - I beleive sh should be less than 300 lines of code (< 0.1% when we finish).

SLOC    Directory         SLOC-by-Language (Sorted)

144905  tests             python=144761,xml=132,sh=12
130115  airflow           python=127249,javascript=2827,sh=39
12052   docs              javascript=8977,python=2931,sh=144
9073    scripts           sh=7457,python=1616
6314    chart             python=6218,sh=96
3665    top_dir           sh=2896,python=769
3102    dev               python=2938,sh=164
1723    kubernetes_tests  python=1723
280     docker_tests      python=280
140     metastore_browser python=140
109     clients           sh=109
28      images              sh=28

Totals grouped by language (dominant language first):
python:      288625 (92.65%)
javascript:     11804 (3.79%)
sh:           10945 (3.51%)
xml:            132 (0.04%)

I am pretty happy with what we have now as the first PR:

  • automated bootstrapping of virtualenv installation for Breeze2 command
  • the bootstrapping is resistant to any missing dependencies. Currently it only requires Python3.7+ to be installed locally and nothing else.
  • the bootstrapping will automatically update the Breeze2 venv whenever necesseary (first time, and whenever setup.cfg changes, including checking out branch).
  • standardised package layout and using pip install -e . to install virtualenb
  • likely Windows support for bootstraping (to be tested on Windows)
  • it's possible to add Breeze2.exe later on generated from Breeze2 later on if we want to make it really nicely work on Windows (currently python Breeze2 should be used to start Breeze on Windows)

I don't think we need to have another Breeze package in PyPI with this setup, it's simply not needed as the Breeze2 bootstraping should do the work nicely for the user. The nice we have is that the standard package structure will help with development (pip install -e .) even if we won't build and publish the package in PyPI.

I already nicely integrated it as module in my IntelliJ without extra "magic" using the bootstrapped virtualenv - should nicely work also on Windows. @edithturn @Bowrna - maybe you could check it out and check it on your windows machines to see how easy it is to bootstrap (also maybe you could try to generate ./Breeze2.exe using pyinstaller and see if it woudl work?

@potiuk potiuk force-pushed the initial-python-breeze-commit branch from 46c639a to d919fc3 Compare November 29, 2021 18:55
Breeze2 Outdated Show resolved Hide resolved
Breeze2 Outdated Show resolved Hide resolved
Breeze2 Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the initial-python-breeze-commit branch from 4cd2cf9 to 6ca6f02 Compare November 29, 2021 19:21
It includes:

* proposal for initial ADRs (Architecture Decision records)
  where we will keep decision records about both - Breeze2 and CI
* scaffolding for the new breeze command including command line,
  pre-commit checks, automated tests in CI and requirements
@potiuk potiuk force-pushed the initial-python-breeze-commit branch from 6ca6f02 to d2c23ed Compare November 29, 2021 19:25
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Nov 29, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member Author

potiuk commented Nov 29, 2021

Any more thought @uranusjr ?

@uranusjr
Copy link
Member

I have more thoughts on environment provision (still don’t like automatically setting up virtual environments much, it can go wrong way to often), but let’s merge this first since those don’t really affect the “actual Breeze” functionalities.

@potiuk
Copy link
Member Author

potiuk commented Nov 30, 2021

I have more thoughts on environment provision (still don’t like automatically setting up virtual environments much, it can go wrong way to often), but let’s merge this first since those don’t really affect the “actual Breeze” functionalities.

Yep. happy to improve it later on :)

@potiuk potiuk merged commit ad07923 into apache:main Nov 30, 2021
@potiuk potiuk deleted the initial-python-breeze-commit branch November 30, 2021 10:33
@potiuk
Copy link
Member Author

potiuk commented Nov 30, 2021

BTW. Following @mik-laj's advice I reserved PyPI package: the https://pypi.org/project/apache-airflow-breeze/ as well. Maybe we can figure out other /parallell workflows with installig the package from PyPI.

@potiuk
Copy link
Member Author

potiuk commented Nov 30, 2021

I have more thoughts on environment provision (still don’t like automatically setting up virtual environments much, it can go wrong way to often), but let’s merge this first since those don’t really affect the “actual Breeze” functionalities.

One more comment: It actually might well be that when Breeze2 reaches maturity and we will not update is as frequently, installing via PyPI and discovery of Airflow might be better workflow - so let's keep it as a possibility. But for now I think especially when we will develop it, keeping it the way where you can switch to latest version of Breeze2 by just git pull is a very much needed feature.

import os
import sys

# Python <3.4 does not have pathlib
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment doesn't line up with code :)

Copy link
Member Author

@potiuk potiuk Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. That was a mentl shortcut. The comment should be:

We do the check here, before importing pathlib becasue Python <3.4 does not have pathlib and import would fail with cryptic "missing pathlib module" error.

- [Decision](#decision)
- [Consequences](#consequences)

# 1. Record architecture decisions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I thin you were the one who motivated me to do it :)

dillonjohnson pushed a commit to dillonjohnson/airflow that referenced this pull request Dec 1, 2021
It includes:

* proposal for initial ADRs (Architecture Decision records)
  where we will keep decision records about both - Breeze2 and CI
* scaffolding for the new breeze command including command line,
  pre-commit checks, automated tests in CI and requirements
@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 7, 2022
@potiuk potiuk restored the initial-python-breeze-commit branch April 26, 2022 20:46
@potiuk potiuk deleted the initial-python-breeze-commit branch July 29, 2022 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants