Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to spark3.1 #366

Merged
merged 1 commit into from
Aug 5, 2021
Merged

update to spark3.1 #366

merged 1 commit into from
Aug 5, 2021

Conversation

aviatesk
Copy link
Contributor

AFAIU the only requirement is update for apache/spark#29983.
In order to be consistent with the previous behavior and pass the
existing test suite, this PR is essentially equavalent to setting
spark.sql.legacy.statisticalAggregate to true.

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

AFAIU the only requirement is update for <apache/spark#29983>.
In order to be consistent with the previous behavior and pass the
existing test suite, this PR is essentially equavalent to setting
`spark.sql.legacy.statisticalAggregate` to `true`.

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.
@twollnik
Copy link
Contributor

twollnik commented Jun 8, 2021

Hi @aviatesk,
Thanks for submitting this PR! Unfortunately we can't make changes that are incompatible with Spark-2.4. Is it possible to keep the backwards compatibility to your knowledge?

@chethanuk
Copy link

@aviatesk When will support for 3.1 gonna get released?

@twollnik
Copy link
Contributor

Sadly, we can't drop 2.x compatibility. We have no immediate plan to support spark 3.1. thanks you anyways for introducing this PR!

@twollnik twollnik closed this Jul 20, 2021
@chethanuk
Copy link

chethanuk commented Jul 21, 2021

Now the code is incompatible with spark-2.x or spark-3.0, and so I'd
like to recommend only supporting spark 3.1 and higher and scala 2.12
from now on.

Let's do this instead, create a separate folder for spark3.1 and keep the required changes in there.

Also, Just ignoring Spark 3.1 bcoz we can't drop 2.x compatibility will not help Spark 3X users.
I mean if support for 3.1X is not available, users like me will have to use ge or other data quality tool if no support

@apython1998
Copy link

If anybody has suggestions for scala spark alternatives to Deequ that supports 3.1.x out of the box, please update this thread.

@lange-labs lange-labs reopened this Aug 5, 2021
@lange-labs lange-labs merged commit 128f21d into awslabs:master Aug 5, 2021
@aviatesk aviatesk deleted the avi/spark3.1 branch August 5, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants