Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert sftp hook to use paramiko instead of pysftp #24512

Merged
merged 13 commits into from
Jun 19, 2022

Conversation

pauldalewilliams
Copy link
Contributor

This PR is a major change to the SFTP provider to remove the dependency on pysftp and use paramiko via the SSHHook instead. The provider was previously inconsistent in its use of pysftp vs the SSHHook. The SFTPHook used pysftp to actually create its connection and perform operations, but it inherited from SSHHook. The SFTPOperator did not use the SFTPHook or pysftp and instead used the SSHHook to create its connection and use the SFTP operations provided by paramiko. This PR rewrites the provider so the SFTPHook uses the SSHHook and paramiko (via SSHHook) and the SFTPOperator uses the SFTPHook. This enables support for all of the parameters supported by the SSHHook and creates consistency within the SFTP provider. There should be no breaking changes aside from the removal of previously deprecated parameters in the SFTPHook since pysftp used paramiko under the hood anyway.

A number of changes are contained in this PR:

  • Minor change to the SSH provider to support ciphers as an optional extra parameter, which is used by the SFTPHook
  • Deprecated parameters in the SFTP provider were completely removed since this is a major change to the provider (seemed appropriate, but let me know if that should not have been done yet)
  • Any functionality provided by pysftp that was used by the SFTPHook has been incorporated directly into the provider
  • SFTPHook supports passing an SSHHook to continue supporting that option in the SFTPOperator; however, this is marked as deprecated and should be removed in a future major version. SFTPOperator warns users to pass an SFTPHook instead now.
  • _make_intermediate_dirs was removed from SFTPOperator since the create_directory method on SFTPHook supports creating parent directories of subdirectories (functionally equivalent)
  • typing was added to SFTPOperator

Tests have been written to achieve 100% code coverage and all tests are passing. Previously existing tests were only modified where required to ensure backward compatibility with the provider's original functionality. Please double-check my tests though as I'm not confident in my test writing abilities yet. Docs have been updated as well.

I will note there was brief discussion with @malthe on Slack to consider using parallel-ssh instead but I opted to hold off on that for now. That could be a good future update since some have reported issues with paramiko (for example, #16286).

@pauldalewilliams pauldalewilliams force-pushed the convert-sftp-hook-to-use-paramiko branch 2 times, most recently from d1f1ff5 to c98f39d Compare June 17, 2022 13:13
@pauldalewilliams
Copy link
Contributor Author

One other note I forgot to include: the default for no_host_key_check used to be true for SFTPHook but it will now default to false since that is the default for SSHHook. That's not a breaking change but those relying on it to default to true will need to explicitly add it to the extra parameters on the connection.

The primary motivation for doing this was to ensure that the SFTP connections can support all of the parameters supported by SSHHook. Because pysftp did not support passing all those parameters to paramiko I figured this was the best way to get that consistency.

@pauldalewilliams pauldalewilliams force-pushed the convert-sftp-hook-to-use-paramiko branch 2 times, most recently from 9d424f3 to df9c8eb Compare June 19, 2022 00:45
@pauldalewilliams pauldalewilliams force-pushed the convert-sftp-hook-to-use-paramiko branch from df9c8eb to 2262400 Compare June 19, 2022 12:53
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Jun 19, 2022
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@@ -214,7 +201,18 @@ def create_directory(self, path: str, mode: int = 777) -> None:
:param mode: int representation of octal mode for directory
"""
conn = self.get_conn()
conn.makedirs(path, mode)
if self.isdir(path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice an detailed errors.

@potiuk potiuk merged commit f3aaceb into apache:main Jun 19, 2022
@pauldalewilliams pauldalewilliams deleted the convert-sftp-hook-to-use-paramiko branch June 20, 2022 00:02
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) full tests needed We need to run full set of tests for this PR to merge kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants