-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for specifying custom date format for date and timestamp types. #280
Conversation
@falaki Just to let you know, the original functions, |
Current coverage is
|
@@ -128,6 +128,8 @@ class DefaultSource | |||
val charset = parameters.getOrElse("charset", TextFile.DEFAULT_CHARSET.name()) | |||
// TODO validate charset? | |||
|
|||
val dataFormat = parameters.getOrElse("charset", TextFile.DEFAULT_CHARSET.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line needs to be removed.
nullValue: String = ""): DataType = { | ||
nullValue: String = "", | ||
dateFormatter: SimpleDateFormat = null): DataType = { | ||
def tryParseInteger(field: String): DataType = if ((allCatch opt field.toInt).isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent is off for this entire block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um.. Do you mean the indentation correction as below?
- from
private[csv] def inferField(typeSoFar: DataType,
field: String,
nullValue: String = "",
dateFormatter: SimpleDateFormat = null): DataType = {
def tryParseInteger(field: String): DataType = if ((allCatch opt field.toInt).isDefined) {
IntegerType
} else {
tryParseLong(field)
}
...
- to
private[csv] def inferField(typeSoFar: DataType,
field: String,
nullValue: String = "",
dateFormatter: SimpleDateFormat = null): DataType = {
def tryParseInteger(field: String): DataType = if ((allCatch opt field.toInt).isDefined) {
IntegerType
} else {
tryParseLong(field)
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. The problem is for lines above:
private[csv] def inferField(typeSoFar: DataType,
field: String,
nullValue: String = "",
dateFormatter: SimpleDateFormat = null): DataType = {
def tryParseInteger(field: String): DataType = if ((allCatch opt field.toInt).isDefined) {
IntegerType
} else {
tryParseLong(field)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks!
@HyukjinKwon left one more comment. Otherwise looks good. I can merge this before I cut the branch tonight. |
@falaki (although this is not the right thread to say this), what do you think about This was merged in Spark, apache/spark#11464 and I was working on this for this library as well. However, I just realised that this might not be some kind of what we must do identically for this, like If you think it is good to support |
Let's open an issue for it. |
Thank you for adding this! I will pull and build a local snapshot until 1.4.0 officially releases. |
@barrybecker4 Maybe would you create a PR for that typo (also for other typos if you know)? |
I still face error in python. Maybe i did not use it correctly. Please kindly advise. schema has timestamp type. And the string in csv file '"25/02/2014 00:00:00" exception: Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] |
The docs are not clear on how to do this so hopefully this can help: https://stackoverflow.com/questions/43259485/how-to-load-csvs-with-timestamps-in-custom-format
|
this is not working as expected. https://stackoverflow.com/questions/55965978/how-to-set-jdbc-partitioncolumn-type-to-date-in-spark-2-4-1/55966481#55966481 |
https://github.com/databricks/spark-csv/issues/279
https://github.com/databricks/spark-csv/issues/262
https://github.com/databricks/spark-csv/issues/266
This PR adds the support to specify custom date format for
DateType
andTimestampType
.For
TimestampType
, this uses the given format to infer schema and also to convert the valuesFor
DateType
, this uses the given format to convert the values.If the
dateFormat
is not given, then it works withTimestamp.valueOf()
andDate.valueOf()
for backwords compatibility.When it's given, then it uses
SimpleDateFormat
for parsing data.In addition,
IntegerType
,DoubleType
andLongType
have a higher priority thanTimestampType
in type inference. This means even if the given format isyyyy
oryyyy.MM
, it will be inferred asIntegerType
orDoubleType
. Since it is type inference, I think it is okay to give such precedences.