-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No way to define data class with Decimal(38, 0) in Spark schema #181
Comments
Hi! |
Hi! Yes, I've noticed that data class A(@DecimalType(38, 0) val value: BigDecimal) |
I see... So you actually would need a way to give the DataType in case it doesn't exist in or is different from the |
Hmm so whatever So... @jkylling, would an annotation approach still help? Cause I think for all unsupported classes UDTs would do the trick just fine (since you need both an encoder and datatype anyways). I could maybe see if I can make the @SQLUserDefinedType work for data class properties too instead of registering them for entire classes everywhere... but I'm not sure how many people would use that. |
Just support for |
Well yes, but it gets converted to |
Interesting, this is what is used for |
Have you got an example for me to try? :) |
It turns out the particular error I got was related to the code running in the Databricks runtime. Probably because the Databricks runtime has stricter conversion checks than the open source Spark runtime, which creates null values instead of throwing an exception. A minimal example of this behavior is below: import org.jetbrains.kotlinx.spark.api.`as`
import org.jetbrains.kotlinx.spark.api.map
import org.jetbrains.kotlinx.spark.api.withSpark
import java.math.BigDecimal
data class A(val value: BigDecimal)
fun main() = withSpark {
val table = "tbl"
spark.sql("CREATE TABLE $table (value DECIMAL(38, 0)) USING parquet")
spark.sql("INSERT INTO $table VALUES (2)")
spark.sql("INSERT INTO $table VALUES (1${"0".repeat(37)})")
val df = spark.sql("select * from $table order by 1 asc limit 1")
df.`as`<A>()
.map { A(it.value.add(BigDecimal("1" + "0".repeat(19)))) }
.also { println(it.schema()) }
.write().insertInto(table)
df.`as`<A>()
.map { A(it.value.add(BigDecimal("1" + "0".repeat(20)))) }
.write().insertInto(table)
spark.sql("select * from $table").show()
} This outputs
In the Databricks runtime this example should instead throw an exception. Is there a way to write transforms like above which are able to write |
I'm afraid I don't know enough about this specific part of Spark to give a helpful answer. Maybe you should try StackOverflow for that too :) |
Let me rephrase the question: How would I use the Kotlin Spark API to get a Spark data frame with schema |
I just played around with it a bit. If you add data class A(val value: BigInteger)
val df = dsOf(A(2.toBigInteger()))
.showDS()
.also { println(it.dtypes().toList()) }
df
.map { A(it.value.add(BigInteger("1" + "0".repeat(20)))) }
.showDS(truncate = false)
.also { println(it.dtypes().toList()) }
|
Great! Thank you for fixing this! I'll give it a go soon. |
There seems to be no way to define data classes where the data class encoder produces a Spark schema with fields of type
Decimal(38, 0)
. The natural approach would be to define a data class with a field of type BigInteger, but this is unsupported by the data class encoder.This can be seen by the following code
which throws
java.lang.IllegalArgumentException: java.math.BigInteger is unsupported
.The text was updated successfully, but these errors were encountered: