-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick start for Airflow on Mac OS #14231
Comments
Went through this process today. It wasn't clear where to start exactly, so I started with breeze and went through the following setup on Mac OS. I came across a lot of the issues you outlined as well and basically spent a large amount of time just trying to get all the dependencies installed. The kind/helm setup seems like it might have been a lot quicker in hindsight, but I wanted to go through the process outlined in the contributing doc. Do any of the contributors use a docker-based workflow when making development contributions, or is it mainly the breeze setup? I am comfortable with docker so naturally lean towards containerised workflows like the ones you outlined. However, if I want the least path of resistance when making contributions to airflow, is it best I persevere with the breezy setup? |
Thanls @sdanbury for feedback There are two parts of the setup:
Both of them have different purposes (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#development-environments) Could you please elaborate a bit more what your problems were with the dependencies?
I'd love to hear more from the fresh contributor perspective on how we can improve the experience here :) Maybe you already can have some proposals and even PRs on how to improve our documentation/workflow process? It would be awesome to get it from a "newcomer" but with some experience, to see how many assumptions we have in our heads and how the "initial" experience can be improved. |
Apologies, I think there was some initial confusion on my part. It appeared on the surface that breeze was mainly a non-containerised approach, but it appears to be a hybrid approach as you say (some of it uses docker/docker-compose, some doesn't). The confusion was mainly around the
There were a few other little things. Without a fully containerised setup (where you don't need a local venv) I can't see a quick way of getting past stuff like this, you just have to invest the time upfront, hopefully get through it and create docs. Although I appreciate that this isn't always possible for newcomers to the project. I have worked with Airflow for a couple of years now and have always just rolled my own local setup. We are running airflow on k8s with the k8s executor and I wanted to revisit my local airflow setup, hence why I started going through this process. Very keen to begin contributing back. This seems like a great place to start. I will continue down this path, gather some thoughts together and open some PRs if I come across something. |
Oh yeah It's just a convenience way of running iinitialization, creating the empty sqlite database and initializing configuration for airflow. With And it worked just fine until 7 days ago.
This is the message printed when the
Similar message is printed when you run So what @mik-laj is proposing here has already been working and implemented as My proposal is then @sdanbury - maybe you can make your first PR and:
I think the "interactive" single command that tells you what to do is better than list of prerequisites in docs (although we have both already I believe). Simply fixing what was already there and working seems better than reinventing stuff. @mik-laj - what do you think? Do you have anything agains interactive setup in this case? |
If we manage to prepare a script that will configure a few steps automatically, I'm all for it. We should test it very well to make sure that it has no assumptions about the tools installed on the operating system, e.g. the Python interpreter. It should be easy to use by end-users, not just contributors, so it shold'nt not require downloading the full repository, but a little simple script that we can download via curl sounds good to me. We can be inspired by PS. I recently bought a few MacOS and Windows computers, so I will have a playground to test all scripts to limit any assumptions. On a daily computer, I cannot install the system to test the tutorial, but separate machines will allow me to do so. |
But we are talking about installing it for contributors - not users - at least this is what I believe @sdanbury was talking about. |
I think for users, this should be simply description in the places we already have - again, let's not reinvent the wheel:
So I guess we should simply update those two places? I think we dp not need any scripts for users really. And certainly Breeze should not be used by users - it's contributor's tool |
For contributors, we already have a script, and of course if it has bugs we should fix it. In this ticket, I tried to focus on the documentation for end users, as I believe this is the biggest problem. See: #13838
In my opinion, there are several types of documentation, each with its own audience and purpose. First of all, both guides you mentioned are intended for advanced users and they try to describe all the information most accurately and at the same time do not describe some steps if they fall outside the scope of this project, e.g. they do not describe the configuration of the Python interpreter because it assumes that every user has a Python interpreter (this is a trap because even if it has a good version, it can be badly compiled). "A quick start guide is a very simple guide with only the most important information that is required to get start with using the product or service. A User manual on the other hand needs to be much more comprehensive and cover all aspects of the product or service. It needs to take into account all the ways that a user might use your product and provide relevant help to complete the relevant tasks." (https://www.quora.com/What-are-the-important-functional-differences-between-a-Quick-Start-Guide-and-an-User-Manual-Guide) In my opinion, we lack documentation intended for novice users who are not experts in Python and system administration, but who just want to install Airflow and start experimenting with it. They don't need to make decisions about the type of database because they don't need that knowledge. They need a single database that's easy to install and reliable. You can think of this guide as a guide, which will be directed to our close friend - @mschickensoup. I have the impression that she doesn't need and doesn't even want to teach everything about installation Airflow for all operating systems. She knows that she has a Mac OS computer and that she has access to training materials from @marc Lambert that teach her how to code DAG files. I have the impression that although she would like to learn everything, it is too laborious and I think she does not need it at all. She just wanted to learn how to use Airflow, not how to administer operating systems. For this reason, I would like to prepare a guide that will not describe all possible scenarios but describe only one step-by-step installation scenario. Do you think that we currently have a documentation guide that our friend Karolina Rosół could use to prepare the environment for learning how to write DAGs? Do you think that a similar guide would be worth contributing for her? |
I also recommend these two articles that explain the differences between a user manual and a quick start guide. |
Why don't we turn the "installation" into a "quick start guide" ? I really think both INSTALL and installation.html are exactly the type of document which is described by the links you sent. And they are rather close to this ideal. They might need some improvments, but claiming that we need "new" documents there is I think not justified. We need to improve what is there already . We have lot more "user manual" documents, but those two which we have are serving the purpose. I would rather focus on improving those rather than multiply the number of documents we have. |
@potiuk instaallattion.rst is a user manual document that describes all possible installation scenarios. It is a very complex document and does not contain information on the configuration of other components, e.g. a database. This document also does not describe a step-by-step installation, but requires the user to decide which steps to follow eg you first need to install the database in order to install Airflow. Let's just look at what is included in the various sections of this page.
Why is a quick start guide important? A quick start guide is a document that gives a user an overall understanding of the product in a short time span (5 to 10 minutes). On the other hand, we have a user manual that covers many installation scenarios and of course, if you have the skills and time you can install with this application documentation, but it won't be easy. Quick start does not replace the User Manual in any way, but is a summary of it. Any information that is in the Quick Start Guide should also be found in the User manual, but not all information in the User Manual is needed for a beginner. |
Why not just adding "quick installation" chapter in the installation document? Could be first chapter It cannot be long if it's going to be 5-10 minute installation it should be maximum one-two paragraph per system. Then it could be followed by more complex scenarios. My point is discoverability. "Installation" is the link that people will be following from the documentation. It could be separated out as separate document of course, but this is simply a variant of installation that we are talking about. |
This won't be two paragraphs when it comes to installing on Mac OS, because you need to install the correct version of the Python interpreter (probably using pyenv), PostgresSQL, Redis, OpenSSL, PostgresSQL Client, Rust compiler, GCC compiler, xcode tools, and possibly other dependencies. You need to create schema and user in the database. You need to set some options in the Airflow configuraitoon. That's all you have to do in the right order and preferably in one terminal session, because sometimes you need to set up environment variables to configure options for the compiler. Unfortunately, we have many dependencies that make configuration not easy.
We can change the position of the items in the menu, or add annotations that the easier guide is in another section.We currently have a page that has a similar purpose, but lacks an article that describes more Mac OS specific information.
|
So should not we restructure the installation page to do this step-by-step installation ? I still do not understand why "quick" installation guide should not be part of the installation document :)? If it's difficult to follow installation.rst to do the installation then we should update and fix that page - and maybe move "advanced installation" elsewhere. My point is that we already have an installation page - if we start having separate documents people will be confused - should I follow the "quick one" or the "full one"? I think the installation page should be the starting point and it should explain various installation options - from the simple to advanced ones. Maybe there should be separated out to "sub-pages" - each of the option as separate page- but they should be simply part of the "installation" area of the docs. |
For example i imagine such structure:
Maybe names should be different a bit. Maybe - if the content is too big, they should be sub-pages of it, But they should all be part of installation (if we are talking about the users not contributors). |
This does not always work because we support many installation scenarios and configurations and we will not always be able to prepare a guide. I just want one page that will allow you to install Airflow from stratch without having to make difficult decisions that require expert knowledge and which will help you install Airflow in 5 minutes. Also, such an article should not contain unnecessary information that is not applicable in a given situation, eg information about all kinds of constraints files and extra packages. Most users aren't interested in whether they use MySQL or PostgresSQL, or whether they have LocalExecutor or SequentialExecutor. They only read the documentation about it when they have problems. We, as experts, can decide which database/executor is best for the novice user. A novice doesn't need to know it, because it doesn't make any difference to him, if he just wants to quickly start Airflow and try to use it. If he starts using Airflow, he can modify the environment and try to adapt it better to his needs.
Yes. We can move all the quick start guide to the "Installation" section but that is a different issue. To the new "Installation" section, I think it is worth moving other articles as well e.g. Set up a Database Backend. For now, I prefer to focus on exists section - Quick start. |
So I think we are getting to the point here:
BUT we should restructure the installation page (and INSTALLATION for people using sources) to be the single "point of entry" for all those documents. I think when we see that something is messy and difficult to follow we should not add more confusion by creating a different variant of the "installation" but let's restructure the installation page first so that anyone looking for installation gets there and will reach out to "quick instllation" from there. By adding new "variants" without explaining how it relates to previous "unclear" instructions will add to the mess rather than solves it. This is my whole point. Let's clean and then add new stuff where it fits. |
@mik-laj And I am stuck here: https://github.com/apache/airflow/blob/master/CONTRIBUTORS_QUICK_START.rst#setting-up-breeze |
@mik-laj - I somehow managed to do all the steps in my mac to get the UI up but when I go to the UI I am unable to click on anything and see something weird like this. Could you please let me know what has gone wrong or what is missed? |
If you install airflow "from sources" and you do not use |
@potiuk Apologies for the delayed response. That helped. Thanks! |
Ah yeah in breeze it's the same. First time you need to run it and I believe it even prints it it in yellow that you should - as a warning. |
I think the points raised in this issue are already addressed in the new (Python based) Breeze? |
yep |
Hello,
Installing Airflow on Mac OS in the most common configuration (Postgres / MySQL, Celery, Redis) causes problems for new users for several reasons:
mysqlclient
, which requires setting additional compiler parameters on newer Mac OS. See: https://stackoverflow.com/questions/43612243/install-mysqlclient-for-django-python-on-mac-os-x-sierra/54521244cryptography
require rust compiler: https://cryptography.io/en/latest/installation.html#building-cryptography-on-macosIt would be great if we have a guide that describes how to install Airflow on Mac OS and not fall into any trap related to installing dependencies. I am thinking of a new guide similar to the quick start on Docker and quick start locally. The guide will configure a similar environment as in Docker, but will install everything locally. Some users do not use Docker or are not comfortable with it, so we should also describe native installations. The local quickstart guide is also not sufficient as it does not describe Mac OS specific problems and sets up an environment that requires a lot of changes to be able to use it effectively.
In the future, I will also want to prepare installations for Linux and Windows, but I would prefer to deal with each environment step by step. For this reason, I am also not very interested in updating the quick start guide locally. This too will benefit the end-user as they will have a quick start guide that addresses their needs more precisely. They will not have to search for information that is specific to their operating system.
Part of: #13838
The text was updated successfully, but these errors were encountered: