Skip to content
This repository has been archived by the owner on Nov 25, 2019. It is now read-only.

Use redis as Airflow broker #49

Merged
merged 3 commits into from
Sep 16, 2019
Merged

Use redis as Airflow broker #49

merged 3 commits into from
Sep 16, 2019

Conversation

GiancarloFusiello
Copy link
Contributor

In reference to libero/publisher#247

Adds redis image to docker-compose.yml and update config files to use redis broker.

@GiancarloFusiello GiancarloFusiello requested a review from a team as a code owner September 13, 2019 14:32
@diversemix
Copy link

Intrigued to know why we are using redis as a message bus? I read an interesting article here: https://medium.com/wbaa/airflow-ha-environment-c60ddca825a9 .... I'm guessing its because its quick and easy to setup?

docker-compose.yml Outdated Show resolved Hide resolved
@giorgiosironi
Copy link
Member

The name of the docker-compose service needs changing as a Message Bus enables separate applications to integrate, whereas this broker is internal to the jats-ingester application to provide a queue of jobs for workers to pick up.

@giorgiosironi
Copy link
Member

Intrigued to know why we are using redis as a message bus?

(the answer then being, we should not be using redis as a message bus)

@@ -26,6 +26,11 @@ services:
AWS_ACCESS_KEY_ID: longkey
AWS_SECRET_ACCESS_KEY: verysecretkey

message-bus:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this should be private to Celery, as we don't write or read values from the DAGs code for example. Possibilities for a service name?

  • celery-broker (not very precise because brokers usually have their own protocol like AMQP, this is just part of the broker concept)
  • celery-broker- backend (matches db as that doesn't mention postgres, and Celery's terminology)
  • celery-broker-redis (to communicate what this is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name updated to airflow-broker

@giorgiosironi
Copy link
Member

https://medium.com/wbaa/airflow-ha-environment-c60ddca825a9

Not targeting High Availability at the moment as that would require redundancy for all containers and multiple nodes, plus unclear if it's a requirement of Libero demo deployments.

@giorgiosironi
Copy link
Member

Trying to verify how values are cleared from Redis, not sure if we need to setup a TTL for them.

@diversemix
Copy link

diversemix commented Sep 16, 2019

Trying to verify how values are cleared from Redis, not sure if we need to setup a TTL for them.

@petereast is the in-house redis expert... my understanding is that its up to the app adding the data to set the TTL

@giorgiosironi
Copy link
Member

My current understanding is that this PR should stop the kombu_* tables in the Postgres database from growing as Redis would substitute for that. Collecting data while running locally (very valuable to have the fake S3 implementation which allows this).

@GiancarloFusiello
Copy link
Contributor Author

@giorgiosironi @diversemix My original thought was to use RabbitMQ as it is likely (but not defined yet) we may need to implement a message bus for sending messages/events/commands between Libero Publisher services. Therefore, why not use the same message bus to handle messages for celery.

My discussion with @giorgiosironi before actioning this task lead me to believe that this is an area of uncertainty and, as a result, the decision to use redis at this time was based on the ease of setup and maintenance.

Perhaps the task needs further discussion and a separate issue/ticket?

@diversemix
Copy link

@GiancarloFusiello @giorgiosironi ... my preference is to raise a ticket too... and keep this ticket moving forward with what was intended 😉

@giorgiosironi
Copy link
Member

Just verified that this is working well.

On the db side, the kombu_* tables do not appear in a brand new airflow-db database. However in an existing database they are not removed so they will need an explicit deletion in the environments like demo.

On the Redis side, there is a small space of Redis keys (4 to 6) such as _kombu.binding.celery.pidbox and unacked that is maintained by the scheduler and the workers. The naming pattern (e.g. unacked) suggests that this Redis instance will be private to Celery and can't be used by anything else (which is fine as long as we know it). The memory being used by Redis here is in the order of a few MBs so incredibly low.

@GiancarloFusiello
Copy link
Contributor Author

GiancarloFusiello commented Sep 16, 2019

Adding more information. If I'm reading the following documents correctly, it seems that Kombu uses redis pub/sub by default. If this is the case then data sent (published) to redis will not be retained when pushed to subscribers:
https://docs.celeryproject.org/projects/kombu/en/stable/introduction.html?highlight=ttl#transport-comparison
https://docs.celeryproject.org/projects/kombu/en/stable/reference/kombu.transport.redis.html?highlight=redis#kombu.transport.redis.Channel.fanout_patterns

@giorgiosironi
Copy link
Member

raise a ticket too

Not sure what would be the title of a follow-up, can you suggest one? 🤔

Copy link
Member

@giorgiosironi giorgiosironi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empirically works as intended, sorting out the naming before merge

@diversemix
Copy link

raise a ticket too

Not sure what would be the title of a follow-up, can you suggest one?

"Decide what Message Bus to use in Libero"

@giorgiosironi
Copy link
Member

Is that libero/community#10 (outdated in requirements but not in technology understanding)?

@@ -379,7 +379,7 @@ worker_log_server_port = 8793
# information.
# http://docs.celeryproject.org/en/latest/userguide/configuration.html#broker-settings
# broker_url = sqla+mysql://airflow:airflow@localhost:3306/airflow
broker_url = sqla+postgresql+psycopg2://postgres:example@db/airflow-db
broker_url = redis://airflow-broker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to change the result_backend setting as well? Not sure what the impact would be, and The use of a database is highly recommended

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Using a relational database is best for this.

@GiancarloFusiello
Copy link
Contributor Author

Some log messages I've seen:

airflow-broker_1     | 1:M 16 Sep 2019 08:55:18.672 # Server initialized
airflow-broker_1     | 1:M 16 Sep 2019 08:55:18.672 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
airflow-broker_1     | 1:M 16 Sep 2019 08:55:18.672 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

@GiancarloFusiello
Copy link
Contributor Author

Just seen these also... will address separately:

airflow_worker_1     |  * Environment: production
airflow_worker_1     |    WARNING: This is a development server. Do not use it in a production deployment.
airflow_worker_1     |    Use a production WSGI server instead.

@diversemix
Copy link

Is that libero/community#10 (outdated in requirements but not in technology understanding)?

I got Peter to comment on this with the state of where we are with PubSweet/reviewer 😄

@giorgiosironi
Copy link
Member

Feel free to open tickets and tag them with jats-ingester for issues to be dealt with separately. Approved.
https://github.com/libero/sample-configuration needs a similar change.

@GiancarloFusiello
Copy link
Contributor Author

Feel free to open tickets and tag them with jats-ingester for issues to be dealt with separately.

Will create a new ticket regarding the airflow worker logs.

https://github.com/libero/sample-configuration needs a similar change.

As it's related, i'll use the same pattern of linking the main issue to the PR.

@GiancarloFusiello GiancarloFusiello merged commit b9a9459 into libero:master Sep 16, 2019
@GiancarloFusiello GiancarloFusiello deleted the use_redis_as_message_bus branch September 16, 2019 09:29
@GiancarloFusiello GiancarloFusiello changed the title Use redis as message bus Use redis as Airflow broker Sep 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants