Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(flink): port to sqlglot #8268

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 7, 2024

This PR ports the Flink backend to use sqlglot.

All the fancy window functionality, which didn't have execution tests (only
compilation tests), is not implemented here. I haven't yet started on adding this functionality to sqlglot, but I think that would be a prerequisite for cutting a 9.0 release. In the meantime I am going to explore whether I can hack that functionality in without creating a mess.
The hack was successful: I was able to use sge.Var to create the necessary syntax for table-valued window functions.

@cpcloud cpcloud added this to the 9.0 milestone Feb 7, 2024
@cpcloud cpcloud added refactor Issues or PRs related to refactoring the codebase flink Issues or PRs related to Flink tes-required-for-merge Issues that must addressed before merging the-epic-split branch into main labels Feb 7, 2024
@cpcloud cpcloud force-pushed the tes-flink branch 2 times, most recently from c7f264b to 879543f Compare February 8, 2024 19:25
@cpcloud cpcloud force-pushed the tes-flink branch 14 times, most recently from 35c3a8c to 3ebc25b Compare February 9, 2024 21:00
@cpcloud cpcloud marked this pull request as ready for review February 10, 2024 10:02
ibis/backends/flink/tests/conftest.py Show resolved Hide resolved
ibis/backends/flink/tests/conftest.py Show resolved Hide resolved
@pytest.mark.xfail(
raises=(Py4JJavaError, AssertionError),
reason="subquery probably uses too much memory/resources, flink complains about network buffers",
strict=False,
Copy link
Member Author

@cpcloud cpcloud Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pretty flaky.

It passes in CI most of the time but not all, and locally I couldn't get it to pass.

Is there a backend test in ibis/backends/tests that covers this functionality?

cc @@deepyaman @@chloeh13q

ibis/backends/flink/ddl.py Outdated Show resolved Hide resolved
ibis/backends/flink/tests/test_window.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_window.py Show resolved Hide resolved
ibis/backends/tests/test_timecontext.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_string.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_temporal.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member Author

cpcloud commented Feb 10, 2024

HUZZAHHHHH got at least one passing CI run 🎉

@cpcloud cpcloud added ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI and removed ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI labels Feb 10, 2024
@cpcloud
Copy link
Member Author

cpcloud commented Feb 10, 2024

Figured out a solution to Flink's broken FILTER(WHERE) implementation. There's a long comment explaining the rationale in the compiler's _aggregate information.

@cpcloud
Copy link
Member Author

cpcloud commented Feb 10, 2024

This is ready for review now. Highly recommend using something other than GitHub web UI to review this PR (codespaces work well for this purpose) due to the large number of files and lines changed.

@cpcloud cpcloud force-pushed the tes-flink branch 9 times, most recently from f6dc380 to 79be99a Compare February 11, 2024 11:48
@cpcloud
Copy link
Member Author

cpcloud commented Feb 11, 2024

Ok, backend is green, I've worked out all the kinks except for this NPE that's occurring in test_array_contains.

Additionally, the flink test suite is now running in parallel in CI 🎉

@cpcloud cpcloud force-pushed the tes-flink branch 2 times, most recently from 43806b8 to 1316a23 Compare February 11, 2024 12:52
Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpcloud cpcloud force-pushed the tes-flink branch 3 times, most recently from da3ca05 to 5e000ab Compare February 12, 2024 10:40
@cpcloud
Copy link
Member Author

cpcloud commented Feb 12, 2024

Merging on green 🚀

@cpcloud
Copy link
Member Author

cpcloud commented Feb 12, 2024

Disable parallel testing for now due to CI flakiness.

@kszucs kszucs merged commit 751639c into ibis-project:the-epic-split Feb 12, 2024
71 checks passed
@kszucs
Copy link
Member

kszucs commented Feb 12, 2024

Thanks @cpcloud!

@cpcloud cpcloud deleted the tes-flink branch February 12, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flink Issues or PRs related to Flink refactor Issues or PRs related to refactoring the codebase tes-required-for-merge Issues that must addressed before merging the-epic-split branch into main
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants