Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Spark 3.0.0 #336

Open
koertkuipers opened this issue Jun 18, 2019 · 8 comments
Open

Support Spark 3.0.0 #336

koertkuipers opened this issue Jun 18, 2019 · 8 comments

Comments

@koertkuipers
Copy link

Spark 3.0.0 is actively being developed (its the current master branch for Spark).

I know its early days but i figured i start commenting here on what i ran in to so far to safe others time in the future.

My goal is to get a branch working against Spark 3.0.0 when it comes out or a Spark 3.0.0-SNAPSHOT before that. The intention is not to get this merged in: I assume Transmogrify will wait for Spark 3.0.1 before even considering that.

This builds on #184 and #332

@koertkuipers
Copy link
Author

https://issues.apache.org/jira/browse/SPARK-26133
in spark 3 OneHotEncoderEstimator was renamed to OneHotEncoder

@koertkuipers
Copy link
Author

Jackson has to be upgraded to 2.9.8. EitherModule was moved.
See also #162 and #173

@koertkuipers
Copy link
Author

https://issues.apache.org/jira/browse/SPARK-26127
Spark removed a bunch of setters on regression and classification models. Since in TransmogrifAI
these are overridden this no longer compiles.

@koertkuipers
Copy link
Author

koertkuipers commented Jun 21, 2019

in-house we are now using/testing transmogrifai with spark 3.0.0 snapshots. compileTestScala works, but not all tests (as in ./gradlew test) pass.

@koertkuipers
Copy link
Author

it seems OpDefaultScalaModule is broken with jackson 2.9.9 (which is what spark 3 uses)
this took me a bit of time to track down because the error was hidden in JsonUtils.fromString which uses a Try/Recover that recovers from all errors masking the issue.

anyhow this issue:

scala> val m = new ObjectMapper()
scala> m.registerModule(OpDefaultScalaModule)
scala> m.readValue("""{"a": "b"}""", classTag[Map[String, String]].runtimeClass)
java.lang.NullPointerException
  at com.fasterxml.jackson.databind.type.TypeBindings.<init>(TypeBindings.java:61)
  at com.fasterxml.jackson.databind.type.TypeBindings.createIfNeeded(TypeBindings.java:191)
  at com.fasterxml.jackson.databind.type.TypeFactory.constructMapLikeType(TypeFactory.java:858)
  at com.fasterxml.jackson.module.scala.deser.UnsortedMapDeserializer.<init>(OpUnsortedMapDeserializerModule.scala:66)
  at com.fasterxml.jackson.module.scala.deser.UnsortedMapDeserializerResolver$.findMapLikeDeserializer(OpUnsortedMapDeserializerModule.scala:123)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory._findCustomMapLikeDeserializer(BasicDeserializerFactory.java:1980)
  at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.createMapLikeDeserializer(BasicDeserializerFactory.java:1430)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:386)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:349)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:264)
  at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244)
  at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)
  at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:477)
  at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:4190)
  at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4009)
  at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3004)
  ... 36 elided

@tovbinm
Copy link
Collaborator

tovbinm commented Jul 9, 2019

Perhaps try to remove OpDefaultScalaModule completely (together with OpSortedMapDeserializerModule and OpUnsortedMapDeserializerModule)?

they were added to mitigate some of the issues with older jackson library - https://github.com/salesforce/TransmogrifAI/blob/master/utils/src/main/scala/com/fasterxml/jackson/module/scala/OpDefaultScalaModule.scala#L37

@koertkuipers
Copy link
Author

ok we have a fully working branch against spark 3.0.0-SNAPSHOT that passes all tests
it was mostly a pain because they changed the random number generator in spark 3 so all the outcomes of test are slightly different.
i plan to make this branch public once i get permission to do so.

@tovbinm
Copy link
Collaborator

tovbinm commented Aug 29, 2019

Yay!! Thank you. Looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants