Replies: 1 comment 2 replies
-
Not exactly this split but I think it would be a side effect of a more profound architectural change that we might think about for Airflow 3.0. This is more than just library/application split. I think it is not easy to split it now without introducing much more strict isolation between the workers and scheduler. What we really need to do is to isolate workers from scheduler on the API/architecture level. Currently workers and schedulers can:
This effectively prevents two scenarios: A) splitting dag maintainers from operations people in controlled way (now DAG maintainers can do whatever operations people might do under the hood. There is no effective protection. B) not allowing access and modifications of different task execution by different people. Currently anyone can modify anyone's executed dags in any way by just updating the DB. What it really prevents is true multi-tenancy with full isolation between tenants for example. This comes from a bit different angle than your request, but I believe solving those two problems above will lead to an architecture which will enable splitting airflow in those two usages you mentions. Not sure if two separate packages is a good idea, possibly it could be handled differently. We will likely start discussing it soon how we can do and how. This will be a broader discussion on a devlist / Airflow Improvement proposal in which a number of community members should take part, and splitting to different library might be one point there to discuss. But I think we cannot do it now without rearchitecting Airflow. |
Beta Was this translation helpful? Give feedback.
-
Airflow is both a library and a web application. In many companies, the users of the library and the people who manage the web application do not have much overlap. Especially because "virtualized" development and deployment flows (via Docker, venv, conda, Poetry, NixOS, etc.) are more popular than ever, installing the entire Airflow package costs a lot of extra time, space, and network bandwidth.
Is there any interest within the community to factor the library (DAG, Operator definition, Task) and the application (webserver, scheduler, workers, Operator execution) into two separate packages?
Beta Was this translation helpful? Give feedback.
All reactions