Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitpodify Apache Airflow - online development workspace #19756

Merged
merged 1 commit into from
Nov 30, 2021

Conversation

j143
Copy link
Contributor

@j143 j143 commented Nov 22, 2021

At present, configuration startups up the ide
with ./breeze -y for setting up breeze environment. It takes 5 mins
to load all the docker images. 😸

How to test?

  1. Visiting the link apache/airflow/pull/16498 would fire up the online ready to code workspace.
  2. There will be two terminals. In the left it is docker terminal, in the right you could run any tests with breeze

Terminals:

image

Testing:

pytest tests/core/test_core.py::TestCore::test_check_operators
root@d143c0ff1e51:/opt/airflow# pytest tests/core/test_core.py::TestCore::test_check_operators
================================= test session starts ==================================
platform linux -- Python 3.6.13, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /usr/local/bin/python
cachedir: .pytest_cache
rootdir: /opt/airflow, configfile: pytest.ini
plugins: httpx-0.12.0, cov-2.12.0, requests-mock-1.9.3, celery-4.4.7, forked-1.3.0, instafail-0.4.2, xdist-2.2.1, rerunfailures-9.1.1, flaky-3.7.0, timeouts-1.2.1
setup timeout: 0.0s, execution timeout: 0.0s, teardown timeout: 0.0s
collected 1 item                                                                       

tests/core/test_core.py::TestCore::test_check_operators 
PASSED                   [100%]

=================================== warnings summary ===================================
tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/importlib/__init__.py:126: DeprecationWarning: This module is deprecated. Please use `airflow.providers.tableau.hooks.tableau`.
    return _bootstrap._gcd_import(name[level:], package, level)

tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
    return f(*args, **kwds)

tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/boto/plugin.py:40: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/dns/hash.py:25: DeprecationWarning: dns.hash module will be removed in future versions. Please use hashlib instead.
    DeprecationWarning)

tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/eventlet/green/OpenSSL/__init__.py:6: DeprecationWarning: OpenSSL.tsafe is deprecated and will be removed
    from . import tsafe

tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/alembic/ddl/sqlite.py:44: UserWarning: Skipping unsupported ALTER for creation of implicit constraintPlease refer to the batch mode feature which allows for SQLite migrations using a copy-and-move strategy.
    "Skipping unsupported ALTER for "

tests/core/test_core.py::TestCore::test_check_operators
tests/core/test_core.py::TestCore::test_check_operators
tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/flask_caching/__init__.py:241: DeprecationWarning: Using the initialization functions in flask_caching.backend is deprecated.  Use the a full path to backend classes directly.
    category=DeprecationWarning,

tests/core/test_core.py::TestCore::test_check_operators
tests/core/test_core.py::TestCore::test_check_operators
tests/core/test_core.py::TestCore::test_check_operators
tests/core/test_core.py::TestCore::test_check_operators
  /usr/local/lib/python3.6/site-packages/marshmallow/fields.py:201: RemovedInMarshmallow4Warning: Passing field metadata as a keyword arg is deprecated. Use the explicit `metadata=...` argument instead.
    RemovedInMarshmallow4Warning,

tests/core/test_core.py::TestCore::test_check_operators
  /opt/airflow/tests/core/test_core.py:102: DeprecationWarning: This class is deprecated.
              Please use `airflow.operators.sql.SQLCheckOperator`.
    task_id='check', sql="select count(*) from operator_test_table", conn_id=conn_id, dag=self.dag

tests/core/test_core.py::TestCore::test_check_operators
  /opt/airflow/tests/core/test_core.py:113: DeprecationWarning: This class is deprecated.
              Please use `airflow.operators.sql.SQLValueCheckOperator`.
    dag=self.dag,

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== 1 passed, 15 warnings in 37.48s ============================

Problems encountered:

  1. "Error response from daemon: driver failed programming external connectivity on endpoint" while running ./breeze --integration mongo

  2. Which ports should be open to public/private ? (suggestions please.)
    image

Related: #16480

@j143
Copy link
Contributor Author

j143 commented Nov 22, 2021

copy to @potiuk and David Brownkush

.gitpod.yml Outdated Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Nov 23, 2021

Nice and Simple! :) This is what Breeze was created for :) . My initial goal was to get starrted with airflow under 10 minutes, so 5 minutes is pretty damn good.

Re: ports: I think all the ports that Breeze has comments about:

   Ports are forwarded to the running docker containers for webserver and database
     * 12322 -> forwarded to Airflow ssh server -> airflow:22
     * 28080 -> forwarded to Airflow webserver -> airflow:8080
     * 25555 -> forwarded to Flower dashboard -> airflow:5555
     * 25433 -> forwarded to Postgres database -> postgres:5432
     * 23306 -> forwarded to MySQL database  -> mysql:3306
     * 21433 -> forwarded to MSSQL database  -> mssql:1443
     * 26379 -> forwarded to Redis broker -> redis:6379

   Here are links to those services that you can use on host:
     * ssh connection for remote debugging: ssh -p 12322 [email protected] pw: airflow
     * Webserver: http://127.0.0.1:28080
     * Flower:    http://127.0.0.1:25555
     * Postgres:  jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
     * Mysql:     jdbc:mysql://127.0.0.1:23306/airflow?user=root
     * Redis:     redis://127.0.0.1:26379/0

Re: mongo - not sure why the problems. there are some problems with docker-compose2 for integrations (and networking) so maybe worth checking if we can configure docker-compose version used.

Questions @j143 :

I do not know gitpod that much, but is there a way we could configure some "options" when starting such vm? for example it would be great if when starting the vm you could choose:

  • backend
  • python version
  • integrations enabled

For integrations - maybe just some predefined sets of those would be enough: (--integrations all switch)

@potiuk
Copy link
Member

potiuk commented Nov 23, 2021

If all else fails - i think it would be possible with env variables. Breeze already supports reacting to the environment variables so you could pass them for your gp instance https://www.gitpod.io/docs/environment-variables.

Those will be:

  • BACKEND
  • PYTHON_MAJOR_MINOR_VERSION
  • INTEGRATIONS

The last one is the list of integrations enabled.

I think for this one to be merged we need a separate "quick-start" - short version on how to start and how to configure the env variables) in https://github.com/apache/airflow/blob/main/CONTRIBUTORS_QUICK_START.rst .

@potiuk
Copy link
Member

potiuk commented Nov 23, 2021

Two more things:
a) licence is missing in the yml file
b) we should also install pre-commit automatically

@potiuk
Copy link
Member

potiuk commented Nov 23, 2021

One more cool thing while we are adding it, what's interesting is this one: https://www.gitpod.io/docs/environment-variables#provide-env-vars-via-url.

It should be a follow-up PR but tt would be great if we can add an option to replicate CI failed builds in GitPod environment - seems with this one it should be possible. So it should be essentially possibleo add instructions for the user on how to replicate CI failed build in their gitpod environment.

Look here:
https://github.com/apache/airflow/blob/main/scripts/ci/testing/ci_run_single_airflow_test_in_docker.sh#L162

Essentially we should be able to craft an URL that should create a gitpodify environment for this specific build configuration:

  1. It will pull the right image from the CI build
  2. it will set the right environment (Backend. python version, integrations etc).
  3. you will be able toreplicate failed tests directly there.

@potiuk
Copy link
Member

potiuk commented Nov 23, 2021

I really like how simple it is to make the environment works with GitPod + Breeze :). We'll do very similar thing for Codespaces when they are publicly available.


.. code-block:: bash

$ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
Copy link
Contributor Author

@j143 j143 Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command gives an error for me. that the port 28080 is already allocated, I also have the container running (may be that is the cause!).

error log
gitpod /workspace/airflow $ breeze --backend mysql --mysql-version 5.7 --python 3.8 --db-reset --test-type Core
Good version of docker 20.10.8.
Backend: mysql

MySQL version: 5.7

Python version: 3.8

Resetting the DB!

Selected test type: Core

mkdir: created directory '/workspace/airflow/.build/main/3.8'
mkdir: created directory '/workspace/airflow/.build/main/3.8/CI'
a69c30ae03621a0e7051da64fcf34eff62de9166c8538ef408793a4caa5af362
a69c30ae03621a0e7051da64fcf34eff62de9166c8538ef408793a4caa5af362

                               Use CI image.

                               Branch name:            main
                               Docker image:           ghcr.io/apache/airflow/main/ci/python3.8:latest
                               Airflow source version: 2.3.0.dev0
                               Python version:         3.8
                               Backend:                mysql 5.7

####################################################################################################
                                  Airflow Breeze CHEATSHEET

/workspace/airflow/breeze
####################################################################################################
 Port forwarding:

   Ports are forwarded to the running docker containers for webserver and database
     * 12322 -> forwarded to Airflow ssh server -> airflow:22
     * 28080 -> forwarded to Airflow webserver -> airflow:8080
     * 25555 -> forwarded to Flower dashboard -> airflow:5555
     * 25433 -> forwarded to Postgres database -> postgres:5432
     * 23306 -> forwarded to MySQL database  -> mysql:3306
     * 21433 -> forwarded to MSSQL database  -> mssql:1443
     * 26379 -> forwarded to Redis broker -> redis:6379

   Here are links to those services that you can use on host:
     * ssh connection for remote debugging: ssh -p 12322 airflow@127.0.0.1 pw: airflow
     * Webserver: http://127.0.0.1:28080
     * Flower:    http://127.0.0.1:25555
     * Postgres:  jdbc:postgresql://127.0.0.1:25433/airflow?user=postgres&password=airflow
     * Mysql:     jdbc:mysql://127.0.0.1:23306/airflow?user=root
     * Redis:     redis://127.0.0.1:26379/0
####################################################################################################
  You can setup autocomplete by running 'breeze setup-autocomplete'


####################################################################################################
  You can toggle ascii/cheatsheet by running:
      * breeze toggle-suppress-cheatsheet
      * breeze toggle-suppress-asciiart

####################################################################################################



Unable to find image 'ghcr.io/apache/airflow/main/ci/python3.8:latest' locally
ylatest: Pulling from apache/airflow/main/ci/python3.8
a10c77af2613: Already exists
...
Digest: sha256:e19ff75603a5e74a82f39f897426119ad4c61cbbb5f0035b00c216d83f39190e
Status: Downloaded newer image for ghcr.io/apache/airflow/main/ci/python3.8:latest

Checking resources.

* Memory available 63G. OK.
* CPUs available 16. OK.
WARNING!!!: Not enough Disk space available for Docker.
At least 40 GBs recommended. You have 23G

WARNING!!!: You have not enough resources to run Airflow (see above)!
Please follow the instructions to increase amount of resources available:
   Please check https://github.com/apache/airflow/blob/main/BREEZE.rst#resources-required for details


Good version of docker-compose: 1.29.2

WARNING: The ENABLE_TEST_COVERAGE variable is not set. Defaulting to a blank string.
Pulling mysql (mysql:5.7)...
5.7: Pulling from library/mysql

2e35f83a12e9: Pull complete
Digest: sha256:7a3a7b7a29e6fbff433c339fc52245435fa2c308586481f2f92ab1df239d6a29
Status: Downloaded newer image for mysql:5.7
Creating docker-compose_mysql_1 ... done
Creating docker-compose_airflow_run ... done
Error response from daemon: driver failed programming external connectivity on endpoint docker-compose_airflow_run_f081fd6ac899 (0d04b319ee1c4e4ceaed9d626bc70df0cb1ae0699efdf9ccead83aaed72ca420): Bind for 0.0.0.0:28080 failed: port is already allocated
ERROR: 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - fixed ports unfortunately :(. I think it might be a good idea to run breeze stop beore starting a new instance. It will also make sure that all the DB volumes are cleared and databased will be "fresh like a daisy". You can also actually start breeze always with --db-reset switch - this will make sure that every time you initialize gitpod environment the database will be recreated. This is a nice feature - especially if you plan switch back/forth between branches and the environment will be preserved.

CONTRIBUTORS_QUICK_START.rst Outdated Show resolved Hide resolved
CONTRIBUTORS_QUICK_START.rst Outdated Show resolved Hide resolved
CONTRIBUTORS_QUICK_START.rst Outdated Show resolved Hide resolved

1. Breeze is already initialized in one of the terminals in Gitpod

2. Once the breeze environment is initialized, create airflow tables and users from the breeze CLI. ``airflow db reset``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be gone if you use -db-reset switch when starting Breeze (see the other comment)

.. code-block:: bash

root@b76fcb399bb6:/opt/airflow# airflow db reset
root@b76fcb399bb6:/opt/airflow# airflow users create --role Admin --username admin --password admin \
Copy link
Member

@potiuk potiuk Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay here - even if you do --db-reset but since it is only needed when you run/use webserver, I think you can specify that you need it only when you do.

I, for example, use airflow webserver extremely rarely when developing Airflow, and while it is useful to have it, it's mostly not needed to add core feature or provider.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love where it goes - few more corrections and it shoudl be good to go!

.gitpod.yml Show resolved Hide resolved
@j143
Copy link
Contributor Author

j143 commented Nov 25, 2021

Hi Jarek,

I have added the basic docs, and remaining tasks I have added as a checklist at #16480

Thanks for review and suggestion.

fun thing: I have done all the development, on the web browser itself. I pressed . once on my keyboard!

@potiuk
Copy link
Member

potiuk commented Nov 25, 2021

fun thing: I have done all the development, on the web browser itself. I pressed . once on my keyboard!

Oh yeah. That's ALMOST codespaces. Actually I already have access to codespaces and I want to do very same thing you did for gitpodify to make breeze starts when you enter codespaces :)

@potiuk
Copy link
Member

potiuk commented Nov 25, 2021

BTW. Static checks are failing :). That's why it would have been great to integrate pre-commit from get-go :D

@potiuk
Copy link
Member

potiuk commented Nov 25, 2021

How about the comments with --db-reset ?

@j143
Copy link
Contributor Author

j143 commented Nov 25, 2021

How about the comments with --db-reset ?

I thought, I have added a comment on breeze stop.

I have added this notes as far I understood - fc45c3e

please provide any comments on that. :)

@potiuk
Copy link
Member

potiuk commented Nov 25, 2021

please provide any comments on that. :)

I think maybe (not 100% sure if you think it's a good idea) you shoud add --db-reset in the command starting breeze in the gitpod configuration (./breeze --db-reset -y) . This way you could simply remove the instruction about "running airlfow db init once" - as the db will be initialized when you enter breeze. This has the drawback that every time you enter the environment again, the db will be cleaned, but I think it happens anyway - I do not know for how long the remote VMs are cached and whether you always recreate it when you reconnect.

@j143
Copy link
Contributor Author

j143 commented Nov 25, 2021

I think maybe (not 100% sure if you think it's a good idea) you shoud add --db-reset in the command starting breeze in the gitpod configuration (./breeze --db-reset -y).

Is it ok, if I skip this note and add it as a new task in the #16480 . I need to spend little more time in this to understand this better.

But, If you suggesting adding ./breeze --db-reset -y. I will do it.

* starts the workspace with ./breeze -y
* opens another terminal with bash
* add documentation for opening Gitpod workspace, creating a branch,
  making changes
* also, the instructions about setting up and working with `breeze`
* add workaround for setting PIP_USER=no variable
@j143
Copy link
Contributor Author

j143 commented Nov 30, 2021

Hi @potiuk , I have rebased it recently into one single commit. I hope main points were addressed. :)

@potiuk potiuk merged commit 5ebd63a into apache:main Nov 30, 2021
@potiuk
Copy link
Member

potiuk commented Nov 30, 2021

@j143 - you might be interested that we just start the project of rewriting Breeze to python based version - first commit that implements the scaffolding and explain some decisions behind the project is here #19867

You might want to contribute to it and eventually switch the gitpodified experience to it (and any comments/suggestions/improvemetns or contribution while we develop it is most welcome).

@j143
Copy link
Contributor Author

j143 commented Nov 30, 2021

Thank you @uranusjr for review. 😺

dillonjohnson pushed a commit to dillonjohnson/airflow that referenced this pull request Dec 1, 2021
…ache#19756)

* starts the workspace with ./breeze -y
* opens another terminal with bash
* add documentation for opening Gitpod workspace, creating a branch,
  making changes
* also, the instructions about setting up and working with `breeze`
* add workaround for setting PIP_USER=no variable
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants