Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3 & Hadoop 3 Support #3218

Closed
echeipesh opened this issue Apr 1, 2020 · 2 comments · Fixed by #3294
Closed

Spark 3 & Hadoop 3 Support #3218

echeipesh opened this issue Apr 1, 2020 · 2 comments · Fixed by #3294
Assignees
Labels

Comments

@echeipesh
Copy link
Contributor

https://spark.apache.org/news/spark-3.0.0-preview.html

Looks like it might be an easy upgrade. At this point is a place-holder issue leading to pushing a SNAPSHOT release with Spark 3.0.0-preview2 dependency.

@echeipesh echeipesh added the spark label Apr 1, 2020
@pomadchin
Copy link
Member

This task is unblocked by the EMR 6.1 release. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-6x.html

@pomadchin
Copy link
Member

GeoTrellis upgrade doesn't seem to be very complicated and requires just a single dependency upgrade (as well as some other libraries upgrades). EMR 6.1 allows us to move towards some more fresh libraries and allows to drop the Scala 2.11 support.

However, the Vectorpipe project would require some effort to bump the dependency version. Things are getting more complicated taking into account that Vectorpipe depends on GeoTrellis and GeoMesa:

  1. GeoMesa needs to be upgraded up to 2.12 (we can help to work on the 2.11 / 2.12 crosscompilation for the GeoMesa project)
  2. GeoMesa still needs 2.11 which means that it can be hard to make a 2.12 with Spark 3 release (to make it possible to have the geotrellis-geomesa project), however it still can be possible.

=> Vectorpipe depends on GT & GM and requires:

  1. geomesa-spark-jts to be uptdated up to Spark 3 and Scala 2.12
  2. geotrellis-geomesa to be updated up to Spark 3 and Scala 2.12

RasterFrames depends on GeoTrellis and GM, also depends on Spark 2, and heavily uses its internal spark-sql logic which is more fragile and can even break across minor releases. RasterFrames also requires upgrading up to Scala 2.12 and Spark 3 (which in case of RasterFrames can be much more complicated rather than in the GM case) in case GeoTrellis would shift towards the Spark 3 support.

Small Conclusion

GeoTrellis itself is not that hard to upgrade. However, things are getting a bit more complicated with GeoMesa, RasterFrames and Vectorpipe. GM and RF projects are still interested in the Scala 2.11 support. To keep the locationtech ecosystem relatively in sync and up to date we need to maintain GM and RF both cross scala 2.12 / 2.11 builds and Spark 3 / 2 which can be non trivial.

cc @jnh5y @echeipesh @elahrvivaz @metasim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants