You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Joris Van den Bossche / @jorisvandenbossche:
Is there functionality available for this that we could reuse (eg in date.h)? As I am not sure we should start implementing custom logic for that ourselves
Rok Mihevc / @rok:
I don't expect this would be in date.h scope. If I understand correctly Pandas and R/lubridate both infer a format on a subset of rows and use that format to parse the rest. Perhaps we can directly use that logic for now (I believe this was @dragosmg's idea too) and see if we actually need this in C++?
Matthew Roeschke / @mroeschke:
Speaking from experience on the pandas side, I agree with @jorisvandenbossche and would caution against "inference" logic. While convenient for users, the maintenance burden can be quite significant since inference tends to have an indefinite scope, leading to more custom logic, edge cases, etc
Rok Mihevc / @rok:
Thanks for the warning Matthew, much appreciated!
Looking at the utility-to-complexity ratio this does seem like something we'd better avoid.
An idea would be to perhaps use the already existing pandas logic (if pandas is available at runtime) to do the format inference and then pass the inferred format to c++ and do the rest of the op there. Same for lubridate in R.
We want to have an option to infer timestamp format.
See pandas.to_datetime and lubridate parse_date_time for examples.
Reporter: Rok Mihevc / @rok
Watchers: Rok Mihevc / @rok
Related issues:
Note: This issue was originally created as ARROW-15666. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: