Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypt passwords #267

Closed
wants to merge 7 commits into from
Closed

Conversation

neovintage
Copy link
Contributor

This should encrypt the passwords that are saved in the metadata database. If #254 gets merged first, I'll fix any merge conflicts on this one.

@mistercrunch
Copy link
Member

Thanks for adding this. I'm thinking about change management here and wondering how to approach this to make it easy on the community. What about adding a is_encrypted field set to False by default and set to True upon first save/encryption?

Also I had spotted this before, but I don't think it's as flexible as your solution since we wouldn't be able to decrypt or not based on the is_encrypted field :
http://sqlalchemy-utils.readthedocs.org/en/latest/data_types.html#module-sqlalchemy_utils.types.encrypted

What do you think?

@neovintage
Copy link
Contributor Author

Yeah, I explicitly left out the change management piece because I didn't know the best way to proceed. I'm ok with your approach. Let me fix this up and send it over.

@neovintage
Copy link
Contributor Author

I went a little overboard here and that's largely due to there not being a way to upgrade the metadata database. If there was a way to do it, it wasn't obvious to me. Plus the initdb and resetdb commands were using Sqlalchemy's create_all and drop_all functions which didn't account for new migrations. I installed alembic and have a migrations folder to manage changes to the metadata database. Here's the premise of what I changed and assumptions:

If a user already has airflow running, they can issue a new command called airflow upgradedb and that will migrate any new database changes that need to happen over time. Running this command will be required when a user upgrades airflow when it has db migrations in the release. This also assumes that from this point forward the user will be on at least version 1.3.0. If the db schema does not include changes up to 1.3.0, the user will be out of luck (maybe we can create a migration file to get people up to speed).

If the user is creating airflow from scratch, all of the migrations and current quickstart guide will run as expected airflow initdb and the versioning will be created in the database properly.

For those wanting to help develop and need to create a database migration the workflow would look something like this:

$ pwd
~/airflow
$ cd airflow
$ alembic revision -m "add new field to db"
  Generating ~/airflow/airflow/migrations/versions/12341123_add_new_field_to_db.py

From there, the appropriate steps can be taken to do the migration or what ever else needs to be taken care of.

This pull request kinda went in all sorts of directions. Let me know if this isn't what you were expecting and I can change it.

@mistercrunch
Copy link
Member

I think this is great!

You're addressing overdue things I've been putting off for a while (encryption + alembic). I'm very familiar with Django's "south" module but was putting off alembic for no good reason (mostly putting off having to deal with change management and the scare of me or anyone in the community getting tangled as I used to with Django's south before knowing it well...).

@mistercrunch
Copy link
Member

The PR looks solid after a quick scan. I'm out sailing today (Airbnb offsite!), but I'll actually take a backup of our production db hopefully tomorrow and get this running.

I'd be nice to add a piece of code that runs early in the CLI and looks for the alembic version in the code versus the one in the DB and suggests the user to take a backup and run airflow upgradedb and exits if the versions don't match. It doesn't have to be part of this PR but it should make the next Pypi release.

@mistercrunch
Copy link
Member

I tried installing and it looks like I need libffi-dev on my debian box to install cryptography. Looks like most crypto library in python require some sort of os dependency so there are no ways around it...

What about making encryption an opt-in feature? (It's really important to me to allow people to pip install airflow and get going instantly without having to install non instantaneous packages). You could move the cryptography import statement in a try block within the set_password method and fallback on storing plain if the lib isn't available. This would require a small note in docs/installation.rst and a new extras_require group named cryptography in setup.py

Let me know if this goes beyond your original intent, I can take over the branch.

@neovintage
Copy link
Contributor Author

I think this brings up an interesting dichotomy. On one hand, you want to be able to quickly run pip install airflow to get started and, on the other, passwords should be stored safely by default. Personally, I lean on the second one, making the product secure by default. Ideally, I'd like to be able to put a Heroku Button on this project, so that if people want to, they could deploy the whole thing with a push of button.

Anyway, I'm getting off track. As an alternative, would it be ok to put instructions on how to install libffi-dev per OS environment in the airflow instructions because it would a dependency of the project?

@mistercrunch
Copy link
Member

Let's get best of both worlds. It's really important to me to be able to get going instantly. Many operators don't require passwords (BashOperator, PythonOperator, HiveOperator, Presto*, ... ), and sometimes people don't have sudo rights on the boxes they need Airflow to work on. If someone simply wants to go through the tutorial I don't want them to spend more time installing software than it takes them to go through the tutorial...

Anyhow, this make it very clear that the passwords are stored in clear, and the steps to take to address it:
#279

@neovintage
Copy link
Contributor Author

No worries. Let me get this fixed up.

@neovintage
Copy link
Contributor Author

Let me know if you want me to squash commits. I rolled in your feedback on the try blocks around crypto.

['known_event_type.id'], ),
sa.ForeignKeyConstraint(['user_id'], ['user.id'], ),
sa.PrimaryKeyConstraint('id')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome -- just want to point out that we added a new table xcom. Thanks!

@mistercrunch
Copy link
Member

Alright, so I didn't know how to add on to a PR coming from a fork so I ended up fetching your branch, rebasing it, testing it, tweaking it a bit in different ways and created a new PR with a new commit on top here:
#292

I think it's ready to merge, though merging means it hits production at Airbnb and late Saturday night is probably not the best idea if I want to make sure I can enjoy my Sunday tomorrow :) So I'm planning on making this PR hit the metal sometime Monday morning.

Thanks again for this stellar PR. Not only addressing a major flaw, but also setting Alembic which will help us tremendously at growing this piece of software in a smooth way.

img

@neovintage
Copy link
Contributor Author

closing PR as a result of #292 merge. thanks @mistercrunch

@neovintage neovintage closed this Aug 25, 2015
mobuchowski pushed a commit to mobuchowski/airflow that referenced this pull request Jan 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants