Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiveHook fix get_pandas_df() failure when it tries to read an empty table #17777

Merged
merged 19 commits into from
Aug 30, 2021

Conversation

iblaine
Copy link
Contributor

@iblaine iblaine commented Aug 23, 2021

closes: #17765

bug fix for get_pandas_df() to avoid an exception when reading an empty table.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@iblaine
Copy link
Contributor Author

iblaine commented Aug 23, 2021

Do we want to add a new test to test_hive.py? I'm not convinced one is needed because this bug is subtle. And I couldn't figure out how to mock an empty table to trigger this bug.

@uranusjr
Copy link
Member

I wonder if it’d work to do SELECT * FROM table LIMIT 0

@iblaine
Copy link
Contributor Author

iblaine commented Aug 23, 2021

I wonder if it’d work to do SELECT * FROM table LIMIT 0

Mock seems to return a result set regardless of the SQL.

@iblaine
Copy link
Contributor Author

iblaine commented Aug 23, 2021

Setting this line to self.iterable = [] will trigger this bug.

@iblaine
Copy link
Contributor Author

iblaine commented Aug 24, 2021

  • A new flag empty_table_flag has been added to enable tests against an empty hive table
  • A new test has been added to validate get_pandas_df against an empty table

@iblaine iblaine requested a review from uranusjr August 25, 2021 19:38
@eladkal eladkal changed the title get_pandas_df() fails when it tries to read an empty table HiveHook fix get_pandas_df() failure when it tries to read an empty table Aug 26, 2021
@iblaine
Copy link
Contributor Author

iblaine commented Aug 27, 2021

Test Always API Core Other CLI Providers WWW Integration works locally but is failing here.

Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That transaction failure happens from time to time, likely some sort of race condition in the test runner. We haven't identified the cause yet but it's not related to this change. m

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 28, 2021
@iblaine
Copy link
Contributor Author

iblaine commented Aug 30, 2021

We have a few of new failures. These new failures appear to be unrelated to this PR. Should I rebase and try this again, or something else?

@uranusjr
Copy link
Member

The second one is also failing in main and there’s a PR to fix it, so don’t worry about it. I think the first one is already fixed; maybe try rebasing the PR to latest main?

@potiuk
Copy link
Member

potiuk commented Aug 30, 2021

Only failures that have been fixed recently in main. Merging now to regenerate providers doc.

@potiuk potiuk merged commit da99c3f into apache:main Aug 30, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 30, 2021

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_pandas_df() fails when it tries to read an empty table
3 participants