Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GoogleBaseHook to not follow 308 and use 60s timeout #8816

Merged
merged 4 commits into from
May 13, 2020
Merged

Update GoogleBaseHook to not follow 308 and use 60s timeout #8816

merged 4 commits into from
May 13, 2020

Conversation

waiyan1612
Copy link
Contributor

@waiyan1612 waiyan1612 commented May 11, 2020

This PR fixes #8810.

httplib2 recently made changes to redirect / follow 308 status code. This conflicts with some Google Drive APIs which are using 308 for resumable uploads. This PR replicates the google-api-python-client fix that addresses the exact same issue.

For backward compatibility, I decided against reusing build_http from googleapiclient.http since it will introduce a default timeout of 60s if no timeout was set previously.

EDIT

After discussing with @mik-laj, agreed that using build_http is a better approach. Updated the PR with code changes and test cases accordingly.


Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Unit tests coverage for changes (not needed for documentation changes)
  • Target Github ISSUE in description if exists
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the provider:google Google (including GCP) related issues label May 11, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented May 11, 2020

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://apache-airflow-slack.herokuapp.com/

@mik-laj
Copy link
Member

mik-laj commented May 12, 2020

Since we are already using set_user_agent from googleapiclient.http, is it possible to use build_http instead of directly creating from httplib2?

I think that's a good idea. Thanks to this, if this library introduces improvements, they will also be available in our HTTP client.

For backward compatibility, I decided against reusing build_http from googleapiclient.http since it will introduce a default timeout of 60s if no timeout was set previously.

I don't think this is a problem. This option was not user-configurable. Now just save according to the default library behavior.

@waiyan1612
Copy link
Contributor Author

@mik-laj Thanks for reviewing the PR. I also think that following googleapiclient would be better in the long run. Since using build_http means that timeout will always be 60s,

  1. Do you think this change should be documented? I cannot find a suitable place in docs (https://github.com/apache/airflow/tree/master/docs/howto/operator/gcp).
  2. Do you see it necessary to expose this timeout as a config? Though personally speaking, I might not use this config at all.

@mik-laj
Copy link
Member

mik-laj commented May 12, 2020

  1. All we have to do is create a note in the change log. The changelog is generated automatically based on commits, so you just have to ensure a good PR title.
  2. We do not have to. I don't think it is very useful. User can overide timeout via socket.getdefaulttimeout() also. If users ask, we can add this option later.

@waiyan1612 waiyan1612 changed the title Exclude 308 status code from httplib2's list of redirect codes. Update GoogleBaseHook to not follow 308 and use 60s timeout May 12, 2020
@mik-laj
Copy link
Member

mik-laj commented May 12, 2020

@waiyan1612 Github action is sad. Can you fix it?

@@ -207,13 +207,20 @@ def _get_access_token(self) -> str:
"""
return self._get_credentials().token

@staticmethod
def _build_http() -> httplib2.Http:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to give the opportunity to override this method. If the user needs it, the user can overwrite authorize. It would make sense if _http were used in several places.

@mik-laj
Copy link
Member

mik-laj commented May 13, 2020

Thanks a lot!

@mik-laj mik-laj merged commit e1e833b into apache:master May 13, 2020
@boring-cyborg
Copy link

boring-cyborg bot commented May 13, 2020

Awesome work, congrats on your first merged pull request!

@potiuk
Copy link
Member

potiuk commented May 13, 2020

Cool. We'll get it in the backport providers :)

@tonycoco
Copy link

Any chance this is in the backport providers now @potiuk ?

@potiuk
Copy link
Member

potiuk commented Jun 15, 2022

What's backport providers ? :P

No idea. You need to check the release notes in backport providers in PyPI. For me Airflow 1.10 has ended a year ago.

Let me add this one here: If you are on 1.10 - MIGRATE. NOW! Our survey have shown that there are < 15% users who use 1.10 and most of them plans to migrate in a month or so. Don't delay. Join the majority.

Otherwise only echo will respond to any questions you might have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GoogleCloudBaseHook does not exclude 308 response code
4 participants