Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlueJobOperator failing with Invalid type for parameter RoleName after updating provider version. #29960

Closed
1 of 2 tasks
Dafyddjb opened this issue Mar 7, 2023 · 11 comments · Fixed by #30162
Closed
1 of 2 tasks
Assignees
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues

Comments

@Dafyddjb
Copy link

Dafyddjb commented Mar 7, 2023

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon = "7.3.0"

Apache Airflow version

2.5.1

Operating System

Debian GNU/Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

After updating the provider version to 7.3.0 from 6.0.0, our glue jobs started failing. We currently use the GlueJobOperator to run existing Glue jobs that we manage in Terraform. The full traceback is below:

Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 150, in execute
    glue_job_run = glue_job.initialize_job(self.script_args, self.run_job_kwargs)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 165, in initialize_job
    job_name = self.create_or_update_glue_job()
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 325, in create_or_update_glue_job
    config = self.create_glue_job_config()
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 108, in create_glue_job_config
    execution_role = self.get_iam_execution_role()
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 143, in get_iam_execution_role
    glue_execution_role = iam_client.get_role(RoleName=self.role_name)
  File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 919, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/home/airflow/.local/lib/python3.9/site-packages/botocore/client.py", line 990, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/home/airflow/.local/lib/python3.9/site-packages/botocore/validate.py", line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter RoleName, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

What you think should happen instead

The operator creates a new job run for a glue job without additional configuration.

How to reproduce

Create a DAG with a GlueJobOperator without using iam_role_name. Example:

task = GlueJobOperator(task_id="glue-task", job_name=<glue-job-name>)

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Dafyddjb Dafyddjb added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 7, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 7, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@potiuk potiuk removed the needs-triage label for new issues that we didn't triage yet label Mar 8, 2023
@potiuk
Copy link
Member

potiuk commented Mar 8, 2023

cc: @o-nikolas @vincbeck @ferruzzi

@o-nikolas
Copy link
Contributor

Looks like this is a bug introduced in #27893

We now try generate the config for the Glue Job before we know whether we're creating a new one or updating an existing job. If the job already exists the user doesn't need to provide as much info but the new method for generating the config assumes all info for creating a new job is present (in this case the role). So it breaks since this user hasn't provided a role, because their job already exists.

@o-nikolas
Copy link
Contributor

Worse than that, the whole updating job logic makes all sorts of assumptions that don't hold true if the job is created outside of the operator (like in this case where the user created the job with terraform). The new logic will detect that the defaults in the operator generated config are different from whatever was used outside of airflow and it will try to clobber the Job. I didn't get a chance to review #27893, but it has many back-compat issues...

@vincbeck
Copy link
Contributor

vincbeck commented Mar 8, 2023

Oops! I did review this PR and did not see this. @romibuzi Any chance you can take a look at this issue and come up with a fix?

@romibuzi
Copy link
Contributor

romibuzi commented Mar 8, 2023

Oh damn indeed the operator will now try to update configuration :(
Maybe the best way to handle both ways would be to add a parameter in the operator update_config: bool = False. And the update behavior would only happens in case this parameter is set to True?

Or as @Taragolis advised we split this operator in 2 and create another one GlueStartJobOperator which will only be responsible to start an existing Glue job? Note that with this strategy the backword compatibility would be more problematic

@eladkal eladkal added provider:amazon-aws AWS/Amazon - related issues good first issue labels Mar 8, 2023
@o-nikolas
Copy link
Contributor

Oh damn indeed the operator will now try to update configuration :( Maybe the best way to handle both ways would be to add a parameter in the operator update_config: bool = False. And the update behavior would only happens in case this parameter is set to True?

This would be a straightforward solution for sure! But it would be an API change, and we can't put the genie back in the bottle: your change is already out and folks could be already using it with the expectation that it updates the job without the need for a flag. However, I think that's a pretty small contingent of folks, and we could consider this behaviour a bug, honestly, so fixing it in this way should probably not trigger back compat. I'd be interested to hear what others think.

@potiuk
Copy link
Member

potiuk commented Mar 9, 2023

This would be a straightforward solution for sure! But it would be an API change, and we can't put the genie back in the bottle: your change is already out and folks could be already using it with the expectation that it updates the job without the need for a flag. However, I think that's a pretty small contingent of folks, and we could consider this behaviour a bug, honestly, so fixing it in this way should probably not trigger back compat. I'd be interested to hear what others think.

Yes. API change is sometimes necessary to fix bugs. We are not (and should not be) extremely strict with "any api change is backwards incompatible" - if for example the interface is unusable or it behaves in unpredictable or illogical ways, it's fine to introduce such changes as bugfixes.

@o-nikolas
Copy link
Contributor

Thanks for the reply @potiuk, and I also see some 👍 responses, so I think that's consensus 😃

@romibuzi do you have the time and interest in creating a PR with the fix?

@romibuzi
Copy link
Contributor

@o-nikolas Yeah i can definitely work on it and submit a PR this week!

@romibuzi
Copy link
Contributor

I have proposed a PR adding this new parameter here: #30162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants