Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout is ambiguous in SSHHook and SSHOperator #16364

Closed
NBardelot opened this issue Jun 10, 2021 · 15 comments · Fixed by #17236
Closed

Timeout is ambiguous in SSHHook and SSHOperator #16364

NBardelot opened this issue Jun 10, 2021 · 15 comments · Fixed by #17236
Assignees
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow good first issue kind:bug This is a clearly a bug

Comments

@NBardelot
Copy link
Contributor

NBardelot commented Jun 10, 2021

In SSHHook the timeout argument of the constructor is used to set a connection timeout. This is fine.

But in SSHOperator the timeout argument of the constructor is used for both the timeout of the SSHHook and the timeout of the command itself (see paramiko's ssh client exec_command use of the timeout parameter). This ambiguous use of the same parameter is very dirty.

I see two ways to clean the behaviour:

  1. Let the SSHHook constructor be the only way to handle the connection timeout (thus, if one wants a specific timeout they should explicitely build a hook to be passed to the operator using the operator's constructor).
  2. Split the timeout argument in SSHOperator into two arguments conn_timeout and cmd_timeout for example.

The choice between 1 and 2 depends on how frequently people are supposed to want to change the connection timeout. If it is something very frequent. then go for 2. if not go for 1.

BR and thanks for the code!

@NBardelot NBardelot added the kind:bug This is a clearly a bug label Jun 10, 2021
@uranusjr
Copy link
Member

I feel Option 2 is clearer regardless, since it is still quite confusing if SSHHook(timeout=...) and SSHOperator(timeout=...) mean two different things. It’s probably better to be explicit about what the timeout is for:

  1. Create conn_timeout and pass it to SSHCHook(timeout).
  2. Create cmd_timeout for the command line timeout.
  3. Keep SSHOperator(timeout); if it’s passed, set conn_timeout and cmd_timeout to timeout if they are None. TypeError if timeout is all three are passed.
  4. (Optional) Deprecate SSHOperator(timeout) in favour of conn_timeout and cmd_timeout.

@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

Agree and I think point 4. should not be optional. We should deprecate it.

One more thing - I noticed that "timeout" (among others) options for sshHook which also should be updated:

                if "timeout" in extra_options:
                    self.timeout = int(extra_options["timeout"], 10)

I though also that we could convert the SSHHook to be configurable via UI changing the extra__ssh__ names similarly to #15159 (and follow up fix #16388). However, unfortunately this is a bit more complex, there is a small caveat with this - we have SFTPHook that derives from SSHHook and this means that we could not immediately reuse the logic of the fields defined there (extras__ssh__cmd_timeout vs. extras__sftp__cmd_timeout) .

I think it's not worth for this change, but we should probably reimplement the way the extra UI fields are defined to support inheritance better.

@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

@ashb See ^^ I think you mentioned you had some idea that we should get rid of the __extra__CONN_TYPE__ prefix ? correct? It indeed makes things complex for inheritance case and we have about half-half connections now - the UI ones are implementing extra__CONN_TYPE prefix but there are a number of them that have "bare" extras.

@ashb
Copy link
Member

ashb commented Jun 11, 2021

@ashb See ^^ I think you mentioned you had some idea that we should get rid of the __extra__CONN_TYPE__ prefix ? correct? It indeed makes things complex for inheritance case and we have about half-half connections now - the UI ones are implementing extra__CONN_TYPE prefix but there are a number of them that have "bare" extras.

@potiuk Yeah, I always just found it annoying to have the prefix on it, mostly when setting the connection via a URI. Are you saying this issue means we should keep the prefix, or you would like to remove it too?

@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

@potiuk Yeah, I always just found it annoying to have the prefix on it, mostly when setting the connection via a URI. Are you saying this issue means we should keep the prefix, or you would like to remove it too?

SSH/SFTP does not have the prefix (it does not support UI fields for extras). Since long term remove the prefix (I agree it's annoying)- we should not touch it here then and keep it without the prefix and implement removal of the prefix later (and then we can add UI support for SSH/SFTP).

So summarizing @NBardelot : just don't forget to add the "cmd_timeout" and "conn_timeout" to extras (without prefix).

@ashb
Copy link
Member

ashb commented Jun 11, 2021

Do we need cmd_timeout on the Hook? Isn't that more a property that only makes sense for the operator because it changes from task-to-task, and isn't anything to do with the Connection.

@potiuk
Copy link
Member

potiuk commented Jun 11, 2021

Do we need cmd_timeout on the Hook? Isn't that more a property that only makes sense for the operator because it changes from task-to-task, and isn't anything to do with the Connection.

I think both should be in Hook/connection - acting as default value for all SSH commands. It could be done via dag defaults as well, but both "cmd_timeout" and "timeout" are a bit too generic IMHO for Dag defaults, so it would have to be "ssh_command_timeout" or something like tha. In this case, it rather belongs to connection.

@eladkal eladkal added area:core-operators Operators, Sensors and hooks within Core Airflow good first issue labels Jun 20, 2021
@fatmumuhomer
Copy link
Contributor

@potiuk or @ashb - Can someone please assign to me if no one else is already working on it? Thanks!

@potiuk
Copy link
Member

potiuk commented Jul 18, 2021

Please!

@fatmumuhomer
Copy link
Contributor

fatmumuhomer commented Jul 19, 2021

Just to be clear on the preferred usage going forward:

  • SSHOperator uses two different timeouts:
    • A connection timeout passed into SSHHook (see below)
    • A command timeout passed into paramiko.SSHClient.exec_command() which sets the command’s channel timeout
  • SSHHook uses a single timeout:
    • A connection timeout passed into paramiko.SSHClient.connect() which is a timeout for the TCP connect.

It doesn't seem to make sense to accept a cmd_timeout in the hook, so I am planning to do the following:

  • Accept both conn_timeout and cmd_timeout in SSHOperator.
  • Only accept conn_timeout in SSHHook.
  • Add a deprecation warning to SSHHook for the use of timeout in extra_options; favor conn_timeout if provided.

@uranusjr
Copy link
Member

Sounds reasonable to me. You’ll also need to keep the timeout arguments on both the operator and hook for backward compatibility (also with a deprecation warning).

@fatmumuhomer
Copy link
Contributor

Thanks, @uranusjr. For the deprecation warning, is it fair to warn that it will be removed in a future version without specifying the exact one?

@uranusjr
Copy link
Member

Yes; you can search for DeprecationWarning to get some inspiratrions how to format the message.

@fatmumuhomer
Copy link
Contributor

@uranusjr, @potiuk I created a pull request for this change - #17236

Can someone take a look when you have time? Also, it says I need a maintainer to approve running workflows since I am a first time contributor.

Thank you.

@potiuk
Copy link
Member

potiuk commented Dec 10, 2022

Airflow version 2.5.0 - Facing ssh command timed out for the long-running tasks with SSHOperator connecting & executing java scripts in remote server

@jayam26 Not sure what was your goal here. If you see an issue with using the timeouts, commenting here with such statement. makes very little sense if you want help - I suggsest you start a discussion with providing some more things you tried and failed and ask what you want help with. But you need to start with explaining the details (and don't do it in the closed issue - you might refer to it if you want to refer to somethign discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow good first issue kind:bug This is a clearly a bug
Projects
None yet
6 participants