Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick start for Airflow on Mac OS #14231

Closed
mik-laj opened this issue Feb 15, 2021 · 24 comments
Closed

Quick start for Airflow on Mac OS #14231

mik-laj opened this issue Feb 15, 2021 · 24 comments
Assignees
Labels

Comments

@mik-laj
Copy link
Member

mik-laj commented Feb 15, 2021

Hello,

Installing Airflow on Mac OS in the most common configuration (Postgres / MySQL, Celery, Redis) causes problems for new users for several reasons:

It would be great if we have a guide that describes how to install Airflow on Mac OS and not fall into any trap related to installing dependencies. I am thinking of a new guide similar to the quick start on Docker and quick start locally. The guide will configure a similar environment as in Docker, but will install everything locally. Some users do not use Docker or are not comfortable with it, so we should also describe native installations. The local quickstart guide is also not sufficient as it does not describe Mac OS specific problems and sets up an environment that requires a lot of changes to be able to use it effectively.

In the future, I will also want to prepare installations for Linux and Windows, but I would prefer to deal with each environment step by step. For this reason, I am also not very interested in updating the quick start guide locally. This too will benefit the end-user as they will have a quick start guide that addresses their needs more precisely. They will not have to search for information that is specific to their operating system.

Part of: #13838

@sdanbury
Copy link

Went through this process today. It wasn't clear where to start exactly, so I started with breeze and went through the following setup on Mac OS. I came across a lot of the issues you outlined as well and basically spent a large amount of time just trying to get all the dependencies installed.

The kind/helm setup seems like it might have been a lot quicker in hindsight, but I wanted to go through the process outlined in the contributing doc.

Do any of the contributors use a docker-based workflow when making development contributions, or is it mainly the breeze setup? I am comfortable with docker so naturally lean towards containerised workflows like the ones you outlined. However, if I want the least path of resistance when making contributions to airflow, is it best I persevere with the breezy setup?

@potiuk
Copy link
Member

potiuk commented Feb 15, 2021

Thanls @sdanbury for feedback

There are two parts of the setup:

  1. Local dev env (and it indeed is always "brittle" because of the dependencies and it is generally harder to keep it in sync with the changes and new dependencies added (and especially with the CI environment where we run most of the tests)

  2. Breeze - which is local Bash + docker-compose/docker setup. Breeze environment is in fact dockerized environment under the hood and the breeze script mainly automatically manages the docker environment for you and makes it easy to run certain configurations (databases/integration/python versions) as well as run some common tasks in a consistent way (such as building images, building documentation, running consistent static code checks etc). Also it is identical to the environment used in CI - so you can very easily replicate any of the problems you see in the CI with it. It is also meant to be easy for anyone - even by people who have no prior docker experience but still makes use of the docker goodies (i.e. consistent development environment shared between people).

Both of them have different purposes (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#development-environments)

Could you please elaborate a bit more what your problems were with the dependencies?

  • Were the problems you mentioned more on the local dev env setup or the Breeze env setup (the dockerized) one?

  • What did you mean by 'docker-based" workflow. How would that be different from Breeze? I know a few committers who are more experienced and familiar with docker are running their own docker or docker-compose environments. We also (as of recently) have a simple way to "use" (but not develop) Airflow with docker-compose (https://github.com/apache/airflow/blob/master/docs/apache-airflow/start/docker.rst)

I'd love to hear more from the fresh contributor perspective on how we can improve the experience here :)

Maybe you already can have some proposals and even PRs on how to improve our documentation/workflow process? It would be awesome to get it from a "newcomer" but with some experience, to see how many assumptions we have in our heads and how the "initial" experience can be improved.

@sdanbury
Copy link

sdanbury commented Feb 15, 2021

Apologies, I think there was some initial confusion on my part. It appeared on the surface that breeze was mainly a non-containerised approach, but it appears to be a hybrid approach as you say (some of it uses docker/docker-compose, some doesn't).

The confusion was mainly around the ./breeze initialize-local-virtualenv --python 3.6 step, which I believe falls into your 1. Local dev env point above. This took a really long time for me on my setup. Most of the issues were raised by @mik-laj in the issue description; some examples:

  • cryptography requiring a rust compiler
  • This issue requiring me to set some environment variables
  • Needing to brew install mysql

There were a few other little things. Without a fully containerised setup (where you don't need a local venv) I can't see a quick way of getting past stuff like this, you just have to invest the time upfront, hopefully get through it and create docs. Although I appreciate that this isn't always possible for newcomers to the project.

I have worked with Airflow for a couple of years now and have always just rolled my own local setup. We are running airflow on k8s with the k8s executor and I wanted to revisit my local airflow setup, hence why I started going through this process.

Very keen to begin contributing back. This seems like a great place to start. I will continue down this path, gather some thoughts together and open some PRs if I come across something.

@potiuk
Copy link
Member

potiuk commented Feb 15, 2021

Oh yeah ./breeze initialize-local-virtualenv --python 3.6 is just a convenience setup to do just this - setup the virtualenv for either Linux and MacOS automatically. No more, no less.

It's just a convenience way of running iinitialization, creating the empty sqlite database and initializing configuration for airflow. With breeze being entrypoint for everything else it seems reasonable to add it there.

And it worked just fine until 7 days ago.

This is the message printed when the initialize command fails:

#######################################################################
  You had some troubles installing the venv !!!!!
  Try running the command below and rerun virtualenv installation

      brew install sqlite mysql postgresql openssl

      export LDFLAGS=\"-L/usr/local/opt/openssl/lib\
      export CPPFLAGS=\"-I/usr/local/opt/openssl/include\

#######################################################################"

Similar message is printed when you run breeze initialize-local-virtualenv on Linux. It tells you the apt command that you most likely need to run to have it succeed.

So what @mik-laj is proposing here has already been working and implemented as breeze initialize-local-virtualenv command interactive instructions.

My proposal is then @sdanbury - maybe you can make your first PR and:

  • fix the cryptography issue (I believe it is a single env variable to add)
  • display all the instructions not after but also before initialization (and ask the user to confirm it)
  • make the message displayed in YELLOW to grab user attention (we recently started to use coloring of terminal output).
  • review the docs to see if we have those prerequisites nicely descrbed

I think the "interactive" single command that tells you what to do is better than list of prerequisites in docs (although we have both already I believe). Simply fixing what was already there and working seems better than reinventing stuff.

@mik-laj - what do you think? Do you have anything agains interactive setup in this case?

@mik-laj
Copy link
Member Author

mik-laj commented Feb 15, 2021

@mik-laj - what do you think? Do you have anything agains interactive setup in this case?

If we manage to prepare a script that will configure a few steps automatically, I'm all for it. We should test it very well to make sure that it has no assumptions about the tools installed on the operating system, e.g. the Python interpreter. It should be easy to use by end-users, not just contributors, so it shold'nt not require downloading the full repository, but a little simple script that we can download via curl sounds good to me. We can be inspired by https://get.docker.com/ or https://sh.vector.dev.

PS. I recently bought a few MacOS and Windows computers, so I will have a playground to test all scripts to limit any assumptions. On a daily computer, I cannot install the system to test the tutorial, but separate machines will allow me to do so.

@potiuk
Copy link
Member

potiuk commented Feb 15, 2021

so it shold'nt not require downloading the full repository

But we are talking about installing it for contributors - not users - at least this is what I believe @sdanbury was talking about.

@potiuk
Copy link
Member

potiuk commented Feb 15, 2021

I think for users, this should be simply description in the places we already have - again, let's not reinvent the wheel:

So I guess we should simply update those two places? I think we dp not need any scripts for users really. And certainly Breeze should not be used by users - it's contributor's tool

@mik-laj
Copy link
Member Author

mik-laj commented Feb 15, 2021

For contributors, we already have a script, and of course if it has bugs we should fix it. In this ticket, I tried to focus on the documentation for end users, as I believe this is the biggest problem. See: #13838

Part of: #13838

I think for users, this should be simply description in the places we already have - again, let's not reinvent the wheel:

In my opinion, there are several types of documentation, each with its own audience and purpose. First of all, both guides you mentioned are intended for advanced users and they try to describe all the information most accurately and at the same time do not describe some steps if they fall outside the scope of this project, e.g. they do not describe the configuration of the Python interpreter because it assumes that every user has a Python interpreter (this is a trap because even if it has a good version, it can be badly compiled).

"A quick start guide is a very simple guide with only the most important information that is required to get start with using the product or service. A User manual on the other hand needs to be much more comprehensive and cover all aspects of the product or service. It needs to take into account all the ways that a user might use your product and provide relevant help to complete the relevant tasks." (https://www.quora.com/What-are-the-important-functional-differences-between-a-Quick-Start-Guide-and-an-User-Manual-Guide)

In my opinion, we lack documentation intended for novice users who are not experts in Python and system administration, but who just want to install Airflow and start experimenting with it. They don't need to make decisions about the type of database because they don't need that knowledge. They need a single database that's easy to install and reliable.

You can think of this guide as a guide, which will be directed to our close friend - @mschickensoup. I have the impression that she doesn't need and doesn't even want to teach everything about installation Airflow for all operating systems. She knows that she has a Mac OS computer and that she has access to training materials from @marc Lambert that teach her how to code DAG files. I have the impression that although she would like to learn everything, it is too laborious and I think she does not need it at all. She just wanted to learn how to use Airflow, not how to administer operating systems.

For this reason, I would like to prepare a guide that will not describe all possible scenarios but describe only one step-by-step installation scenario.

Do you think that we currently have a documentation guide that our friend Karolina Rosół could use to prepare the environment for learning how to write DAGs? Do you think that a similar guide would be worth contributing for her?

@mik-laj
Copy link
Member Author

mik-laj commented Feb 15, 2021

I also recommend these two articles that explain the differences between a user manual and a quick start guide.
https://medium.com/@shavindridissanayake/are-quick-start-guides-important-cee3b2758ae8
https://medium.com/make-it-clear/the-importance-of-quick-start-guides-26ab7b1cf0df

@potiuk
Copy link
Member

potiuk commented Feb 15, 2021

I also recommend these two articles that explain the differences between a user manual and a quick start guide.
https://medium.com/@shavindridissanayake/are-quick-start-guides-important-cee3b2758ae8
https://medium.com/make-it-clear/the-importance-of-quick-start-guides-26ab7b1cf0df

Why don't we turn the "installation" into a "quick start guide" ? I really think both INSTALL and installation.html are exactly the type of document which is described by the links you sent. And they are rather close to this ideal. They might need some improvments, but claiming that we need "new" documents there is I think not justified. We need to improve what is there already .

We have lot more "user manual" documents, but those two which we have are serving the purpose. I would rather focus on improving those rather than multiply the number of documents we have.

@mik-laj
Copy link
Member Author

mik-laj commented Feb 16, 2021

@potiuk instaallattion.rst is a user manual document that describes all possible installation scenarios. It is a very complex document and does not contain information on the configuration of other components, e.g. a database. This document also does not describe a step-by-step installation, but requires the user to decide which steps to follow eg you first need to install the database in order to install Airflow.

Let's just look at what is included in the various sections of this page.

  • Prerequisites: this section uses the words "Kubernetes", "MySQL", "PostgreSQL", but for a novice user it is a buzzword and saying nothing and not able to decide whether he needs Kubernetes, MySQL or PostgresSQL. The novice user expects that this decision will be made by him and he will be told "Install PostgreSQL", but if he wishes he will have a link to detailed documentation that will allow him to change this selection.

  • Installation tools:: A novice user does not need information on custom installation tools. They even don't know these tools, so reading this section is a waste of time. However, he does not see this section, he may think that we expect him to make some decisions and perform steps that he can use these tools. When he sees this section, he may think that we expect him to make some decisions and perform steps that he can use these tools.

  • Airflow extra dependencies: This section describes the installation of additional components that are not needed for the first use.

  • Provider packages: This section describes the installation of additional components that are not needed for the first use.

  • Differences between extras and providers:: This section describes the installation of additional components that are not needed for the first use.

  • System dependencies:: This describes only one operating system and here we should add information about other operating systems and indicate which dependencies are not needed for the minimal installation. This section is not useful for the user because it contains too much information. For example: it contains information about Kerberos installation and ODBC drivers. But this is not useful knowledge for the novice user. Instead of 6 paragraphs of text (for now, we have 2 paragraphs, but that should be extended), we can only show the user one paragraph of text and add a link to this section to be able to read more detailed information.

  • Constraints files: This section provides information on floating pointers, several Airflow versions, and much more. The novice user needs a brief information that best suits his situation.

  • Installation script:: In this section, the user also has to make decisions, but the novice user does not have enough knowledge to do so.

  • Python versions support:: This section is not useful for Karolina Rosół when she wants to install Airflow for the first time. Additionally, it contains complex words that she will have problems with understanding, eg EOL, smoke tests, PRs, CI Pipeline, non-patch version.

  • Set up a database:: Database installation is another complex process that can also be facilitated by the user if we, as experts, make several decisions, e.g. database type, executor type, or if we know the audience (operating system) and prepare the content especially for him.

Why is a quick start guide important? A quick start guide is a document that gives a user an overall understanding of the product in a short time span (5 to 10 minutes). On the other hand, we have a user manual that covers many installation scenarios and of course, if you have the skills and time you can install with this application documentation, but it won't be easy. Quick start does not replace the User Manual in any way, but is a summary of it. Any information that is in the Quick Start Guide should also be found in the User manual, but not all information in the User Manual is needed for a beginner.

@potiuk
Copy link
Member

potiuk commented Feb 16, 2021

Why not just adding "quick installation" chapter in the installation document? Could be first chapter It cannot be long if it's going to be 5-10 minute installation it should be maximum one-two paragraph per system. Then it could be followed by more complex scenarios.

My point is discoverability. "Installation" is the link that people will be following from the documentation. It could be separated out as separate document of course, but this is simply a variant of installation that we are talking about.

@mik-laj
Copy link
Member Author

mik-laj commented Feb 16, 2021

Could be first chapter It cannot be long if it's going to be 5-10 minute installation it should be maximum one-two paragraph per system.

This won't be two paragraphs when it comes to installing on Mac OS, because you need to install the correct version of the Python interpreter (probably using pyenv), PostgresSQL, Redis, OpenSSL, PostgresSQL Client, Rust compiler, GCC compiler, xcode tools, and possibly other dependencies. You need to create schema and user in the database. You need to set some options in the Airflow configuraitoon. That's all you have to do in the right order and preferably in one terminal session, because sometimes you need to set up environment variables to configure options for the compiler. Unfortunately, we have many dependencies that make configuration not easy.

My point is discoverability. "Installation" is the link that people will be following from the documentation. It could be separated out as separate document of course, but this is simply a variant of installation that we are talking about.

We can change the position of the items in the menu, or add annotations that the easier guide is in another section.We currently have a page that has a similar purpose, but lacks an article that describes more Mac OS specific information.
http://airflow.apache.org/docs/apache-airflow/stable/start/index.html
After installation, the user has to take a few additional steps to be able to use Airflow, e.g. perform migrations, create a user, run the webserver and scheduler. It's a bit more than just the installation itself.
http://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#initializing-environment
Please have a look at the article that describes installations in a containerized Docker environment - docker-compose. This article contains the following sections:

  • Before you begin -described prerequisites. We do not have specific requirements here, so we can recommend official documentation.
  • docker-compose.yaml - a description of how to download the docker-compose.yaml file. If you have MacOS, you can run one command - docker-compose up to install Airflow
  • Initializing Environment - We also need to describe information specific to other operating systems, and docker-compose up is not a nice command as it generates a lot of logs, so I decided to create a new section. This is still an installation step. This is still the installation stage.
  • Running Airflow - Description of how to start Airflow, which is beyond the installation stage. This point is important for the user, so I also chose to include a sample result so that the user can compare with their result.
  • Accessing the environment - Description not only of how to start, but also how to perform some of the simplest operations in Airflow.
  • Notes/What’s Next? - Links to other helpful articles

@potiuk
Copy link
Member

potiuk commented Feb 16, 2021

So should not we restructure the installation page to do this step-by-step installation ? I still do not understand why "quick" installation guide should not be part of the installation document :)?

If it's difficult to follow installation.rst to do the installation then we should update and fix that page - and maybe move "advanced installation" elsewhere. My point is that we already have an installation page - if we start having separate documents people will be confused - should I follow the "quick one" or the "full one"? I think the installation page should be the starting point and it should explain various installation options - from the simple to advanced ones. Maybe there should be separated out to "sub-pages" - each of the option as separate page- but they should be simply part of the "installation" area of the docs.

@potiuk
Copy link
Member

potiuk commented Feb 16, 2021

For example i imagine such structure:

Installation
    > Installation for local testing and DAG development (quick installation)
              > MacOS
              > Linux 
              > Windows
     > Celery deployment (Linux only)
     > Kubenetes deployment           
     > Advanced installation topics

Maybe names should be different a bit. Maybe - if the content is too big, they should be sub-pages of it, But they should all be part of installation (if we are talking about the users not contributors).

@mik-laj
Copy link
Member Author

mik-laj commented Feb 16, 2021

So should not we restructure the installation page to do this step-by-step installation ?

This does not always work because we support many installation scenarios and configurations and we will not always be able to prepare a guide. I just want one page that will allow you to install Airflow from stratch without having to make difficult decisions that require expert knowledge and which will help you install Airflow in 5 minutes. Also, such an article should not contain unnecessary information that is not applicable in a given situation, eg information about all kinds of constraints files and extra packages.

Most users aren't interested in whether they use MySQL or PostgresSQL, or whether they have LocalExecutor or SequentialExecutor. They only read the documentation about it when they have problems. We, as experts, can decide which database/executor is best for the novice user. A novice doesn't need to know it, because it doesn't make any difference to him, if he just wants to quickly start Airflow and try to use it. If he starts using Airflow, he can modify the environment and try to adapt it better to his needs.

I think the installation page should be the starting point and it should explain various installation options - from the simple to advanced ones. Maybe there should be separated out to "sub-pages" - each of the option as separate page- but they should be simply part of the "installation" area of the docs.

Yes. We can move all the quick start guide to the "Installation" section but that is a different issue. To the new "Installation" section, I think it is worth moving other articles as well e.g. Set up a Database Backend. For now, I prefer to focus on exists section - Quick start.

@potiuk
Copy link
Member

potiuk commented Feb 16, 2021

Yes. We can move all the quick start guide to the "Installation" section but that is a different issue. To the new "Installation" section, I think it is worth moving other articles as well e.g. Set up a Database Backend. For now, I prefer to focus on exists section - Quick start.

So I think we are getting to the point here:

  • Yes - let's meake separate "quick install pages" under the installation page
  • yes - let them be separate pages
  • yes - let's move Set UP a DB backend as sub-page of it

BUT we should restructure the installation page (and INSTALLATION for people using sources) to be the single "point of entry" for all those documents. I think when we see that something is messy and difficult to follow we should not add more confusion by creating a different variant of the "installation" but let's restructure the installation page first so that anyone looking for installation gets there and will reach out to "quick instllation" from there. By adding new "variants" without explaining how it relates to previous "unclear" instructions will add to the mess rather than solves it.

This is my whole point. Let's clean and then add new stuff where it fits.

@r12habh
Copy link

r12habh commented Apr 16, 2021

@mik-laj
It would be great if new contributors who use macOS can get an installation guide for airflow 2.0.1dev like this
https://medium.com/@rafaelbottega/from-zero-to-apache-airflow-contribution-part-1-474ac201b53e [this is for version 1, I guess]

And I am stuck here: https://github.com/apache/airflow/blob/master/CONTRIBUTORS_QUICK_START.rst#setting-up-breeze
Workbench isn't detecting any MySQL server running, but the docker dashboard is showing that it's running.

@ManiBharataraju
Copy link

@mik-laj - I somehow managed to do all the steps in my mac to get the UI up but when I go to the UI I am unable to click on anything and see something weird like this. Could you please let me know what has gone wrong or what is missed?
image

@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

If you install airflow "from sources" and you do not use breeze, you need to run python setup.py compile_assets. I think this one is missing in some of our guides, it would be great @ManiBharataraju if you you could contribute a fix to our contributor's guide in the right places which you followed and it was missing.

@ManiBharataraju
Copy link

@potiuk Apologies for the delayed response. That helped. Thanks!
But I have a question here, I ran the following command ./breeze --python 3.8 --backend mysql which lands me into a container and running the command there helped. Is this not breeze? I believe it is.
Regarding adding that step to the guide, I will add it.

@potiuk
Copy link
Member

potiuk commented Jun 17, 2021

Ah yeah in breeze it's the same. First time you need to run it and I believe it even prints it it in yellow that you should - as a warning.

@eladkal
Copy link
Contributor

eladkal commented Jul 13, 2022

I think the points raised in this issue are already addressed in the new (Python based) Breeze?

@potiuk
Copy link
Member

potiuk commented Jul 13, 2022

yep

@potiuk potiuk closed this as completed Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants