-
-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatibility between enums and Spark SQL #87
Comments
Maybe relevant? https://issues.apache.org/jira/browse/SPARK-2449 |
Maybe if the enum we generate would have been annotated with UserDefinedType like explained here this would work. Maybe try to edit the generated code directly based on the example at SO and see if it helps? Then we can see if this can be fixed without needing special support from SparkSQL. |
Sorry, I think that's beyond my Scala abilities. I'm happy to provide a more detailed repro if it helps. |
Yes, a small repo I can fork that can help me reproduce this problem would be great. Out of curiosity, where/how do you store the input protocol buffers? |
@thesamet I put together a minimal-ish repro here: https://github.com/danvk/scalapb-repro/ The code in that repro builds & runs successfully. If you uncomment the lines which use a message with an enum, however, you'll get the |
I have good news! I have added SparkSQL support for ScalaPB. See docs here: http://trueaccord.github.io/ScalaPB/sparksql.html |
@thesamet how did you resolve the enum issue with Spark SQL? |
It's been a while - sorry. If there's a specific issue you're encountering please let me know. |
Spark SQL attempts to infer the schema of your data using reflection. This works for case classes. ScalaPB messages are case classes, so I'd hoped this would just work for my collection of protos.
It's close. Schema discovery seems to work fine unless my message contains enums.
Here's some code:
I'm working with the NYC Taxi data. See the full
Rides
proto.Payment
is an enum. If I drop that field from thecase class
, this code works. If I include it, I get the following runtime error:I'm honestly not sure if this is more an issue for SparkSQL or for ScalaPB, but it would be nice if I could use SparkSQL with my protos!
The text was updated successfully, but these errors were encountered: