Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop requestslib from overriding auth headers with data from .netrc file #338

Merged
merged 18 commits into from
Jan 18, 2024

Conversation

susodapop
Copy link

@susodapop susodapop commented May 9, 2023

Resolves #337

Description

This pull request refactors the credentials handling in python_submissions.py using the workaround suggested in #337. The desired behaviour from this change is not obvious when looking only at the code diff. However the idea here is to explicitly order Python requests to use Bearer authentication with an explicitly provided token. If this is not done explicitly, there is an edge case where a local .netrc file can override an auth token present in profiles.yml.

There's no automated test for this because I'm not sure how to deposit a .netrc file in a test environment.

I reproduced the issue manually by adding this to ~/.netrc on my workstation:

machine <my-workspace>.cloud.databricks.com
login token
password <expired_token>

I removed the skip marker on test_python_uc_databricks_sql_endpoint and ran this integration test on the main branch. The test failed with this exception:

ERROR    stdout_log:eventmgr.py:66 �[0m18:58:04     b'{"error_code":"403","message":"Invalid access token."}'
ERROR    file_log:eventmgr.py:66 �[0m13:58:04.325701 [error] [MainThread]:    b'{"error_code":"403","message":"Invalid access token."}'

Then I applied the change in this pull request and re-ran the test. It passed.

Checklist

  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-databricks next" section.

@susodapop susodapop temporarily deployed to azure-prod-peco May 9, 2023 19:52 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-pecou May 9, 2023 19:52 — with GitHub Actions Inactive
@susodapop susodapop marked this pull request as ready for review May 9, 2023 19:52
Copy link
Collaborator

@andrefurlan-db andrefurlan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it needs to work with oauth as well. Token is not passed in config

@susodapop susodapop temporarily deployed to azure-prod-pecou May 10, 2023 03:03 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-peco May 10, 2023 03:03 — with GitHub Actions Inactive
@susodapop
Copy link
Author

I've updated the code to use the HeaderFactory provided by databricks.sdk and verified it still works with PAT auth. Need to do a quick test with OAuth and maybe basic auth to make sure those cases are also handled.

@susodapop susodapop changed the title Python Models: always use bearer auth method (not .netrc file) Python Models: ignore .netrc file for non-authenticated network requests May 12, 2023
@susodapop
Copy link
Author

I found that the same issue we're fixing here (.netrc overrides auth headers in requests made by dbt-databricks) also affects our databricks-sdk-py repo (.netrc overrides auth headers in requests made for oauth handshake).

The fix in this PR will fix dbt-databricks .netrc woes if the auth method is PAT. But it will only fix the issue for oauth use-cases after databricks-sdk-py is updated with this patch databricks/databricks-sdk-py#107 and we update the dbt-databricks dependency to incorporate that fix.

Note: I found the same bug in databricks-sql-connector's custom implementation of OAuth handshakes but this doesn't affect dbt-databricks because the dbt adapter uses the OAuth implementation in the SDK.

@susodapop susodapop changed the title Python Models: ignore .netrc file for non-authenticated network requests Stop requestslib from overriding auth headers with data from .netrc file May 12, 2023
@susodapop susodapop temporarily deployed to azure-prod-peco May 17, 2023 21:28 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-pecou May 17, 2023 21:28 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-pecou May 17, 2023 21:30 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-peco May 17, 2023 21:30 — with GitHub Actions Inactive
@susodapop
Copy link
Author

Now that databricks-sdk has pushed the parallel fix for oauth I've rebased on main. This included a refactor of SessionCredentials base on a breaking change in databricks-sdk.

@susodapop
Copy link
Author

Looks like there is now a parallel PR that bumps the databricks-sdk version and includes the RefreshableCredentials refactor: #349

Jesse Whitehouse added 3 commits May 19, 2023 12:10
Update changelog.

Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
Signed-off-by: Jesse Whitehouse <[email protected]>
@susodapop susodapop temporarily deployed to azure-prod-pecou May 19, 2023 17:11 — with GitHub Actions Inactive
@susodapop susodapop temporarily deployed to azure-prod-peco May 19, 2023 17:11 — with GitHub Actions Inactive
refactor to push the .auth into requests session. It's not required in
DBContext anymore.

Signed-off-by: Jesse Whitehouse <[email protected]>
@susodapop
Copy link
Author

Alright, I've pushed the final version of this code. I moved the BearerAuth class into connections.py because we're about to use it there to fix an issue with streaming tables / materialized views.

I also simplified the code by setting the Session.auth property rather than setting the .auth property for every single network request.

Jesse Whitehouse added 2 commits January 17, 2024 17:47
Signed-off-by: Jesse Whitehouse <[email protected]>
This was leftover from moving the authentication class into connections.py
in 4f56a5c

Signed-off-by: Jesse Whitehouse <[email protected]>
This fixes a merge conflict in the changelog

Signed-off-by: Jesse Whitehouse <[email protected]>
@susodapop susodapop merged commit b2a77d9 into main Jan 18, 2024
21 checks passed
@susodapop susodapop deleted the issue-337-redo branch January 18, 2024 20:06
github-merge-queue bot pushed a commit to databricks/cli that referenced this pull request Feb 19, 2024
## Changes

This adds a new dbt-sql template. This work requires the new WorkspaceFS
support for dbt tasks.

In this latest revision, I've hidden the new template from the list so
we can merge it, iterate over it, and propertly release the template at
the right time.

Blockers:
- [x] WorkspaceFS support for dbt projects is in prod
- [x] Move dbt files into a subdirectory
- [ ] Wait until the next (>1.7.4) release of the dbt plugin which will
have major improvements!
- _Rather than wait, this template is hidden from the list of
templates._
- [x] SQL extension is preconfigured based on extension settings (if
possible)
- MV / streaming tables:
  - [x] Add to template
- [x] Fix databricks/dbt-databricks#535 (to be
released with in 1.7.4)
- [x] Merge databricks/dbt-databricks#338 (to be
released with in 1.7.4)
- [ ] Fix "too many 503 errors" issue
(databricks/dbt-databricks#570, internal
tracker: ES-1009215, ES-1014138)
  - [x] Support ANSI mode in the template
- [ ] Streaming tables support is either ungated or the template
provides instructions about signup
- _Mitigation for now: this template is hidden from the list of
templates._
- [x] Support non-workspace-admin deployment
- [x] Make sure `data_security_mode: SINGLE_USER` works on non-UC
workspaces (it's required to be explicitly specified on UC workspaces
with single-node clusters)
- [x] Support non-UC workspaces

## Tests

- [x] Unit tests
- [x] Manual testing
- [x] More manual testing
- [ ] Reviewer manual testing
  - _I'd like to do a small bug bash post-merging._
- [x] Unit tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Esoteric Bug with regards to authentication via requests headers= argument being overwritten by .netrc
4 participants