Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix century for strftime #45884

Open
chriscasano opened this issue Mar 9, 2020 · 19 comments
Open

sql: fix century for strftime #45884

chriscasano opened this issue Mar 9, 2020 · 19 comments
Labels
A-sql-builtins SQL built-in functions and semantics thereof. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@chriscasano
Copy link

chriscasano commented Mar 9, 2020

The year with century notation (%Y) doesn't always output the century:

root@:26257/defaultdb> select experimental_strftime('0001-01-01 00:00:00.000000'::TIMESTAMP, '%Y-%m-%d %H:%M:%S.%f');
experimental_strftime
+-----------------------------+
01-01-01 00:00:00.000000000
(1 row)

Jira issue: CRDB-5122

@awoods187 awoods187 added the C-investigation Further steps needed to qualify. C-label will change. label Mar 9, 2020
@knz
Copy link
Contributor

knz commented Mar 9, 2020

@chriscasano thanks for finding this issue.

Can you provide some context? What was the use case? Why is this important?

So you know:

  1. the current behavior is somewhat compatible with spec; %Y does not print the year "with century"; it prints it "in full". The year 101 for example is printed as-is.

  2. the feature is marked as "experimental" and thus has limited support.

  3. Also technically this is more difficult than other things because we are relying on an external component to provide this conversion and it's relatively out of our hands.

I suppose we can look at it but it's going to be hard(er) than usual. There's a reason why the feature is still marked as "experimental" even though it was introduced back in 1.1 or 2.0.

@knz knz changed the title Fix century for strftime sql: fix century for strftime Mar 9, 2020
@knz
Copy link
Contributor

knz commented Mar 9, 2020

@otan maybe I got this wrong and the code uses Go for strftime and C for strptime. If that is the case, it may be interesting to look into using C for both. I'll let you look into it.

@chriscasano
Copy link
Author

chriscasano commented Mar 9, 2020

Sure np - Customer's app utilizes these timestamps and can discuss outside of public forum.

1 - Wouldn't century be 00? The year is left padded, shouldn't the century as well?
2 - Agree, that's why I'm adding to help make this feature more hardened since using extract() to formulate timestamps is cumbersome. strftime seems like a good way for formatting date/times. Is there's a better way?
3 - Understood

@knz
Copy link
Contributor

knz commented Mar 9, 2020

I personally love strftime/strptime and i wish we can make them more robust and production-ready.

@knz knz added S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. A-sql-builtins SQL built-in functions and semantics thereof. labels Mar 9, 2020
@otan
Copy link
Contributor

otan commented Mar 9, 2020

interesting... so in python, strftime is funny before 1900:

>>> datetime(1, 2, 3).strftime("%Y-%m-%d")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year=1 is before 1900; the datetime strftime() methods require year >= 1900

edit:

in C, we do lpad 4 digits:

 ✘ ⚙ otan@otan-cockroach-mbp  ~  gcc -Wall -Werror -O -o time time.c
 ⚙ otan@otan-cockroach-mbp  ~  ./time
0075-11-03
 ⚙ otan@otan-cockroach-mbp  ~  fg
[2]  - 5004 continued  mvim -v time.c

[2]  + 5004 suspended  mvim -v time.c
 ✘ ⚙ otan@otan-cockroach-mbp  ~  cat time.c
#include <stdio.h>
#include <time.h>
#include <limits.h>

#define SIZE 50

int main() {
  time_t t = -59773907440;
  struct tm *tmp;
  char timeStr[SIZE];
  tmp = localtime(&t);
  // using strftime to display time
  strftime(timeStr, sizeof(timeStr), "%Y-%m-%d", tmp);
  printf("%s\n", timeStr);
}

(but not for negatives)

 ✘ ⚙ otan@otan-cockroach-mbp  ~  gcc -Wall -Werror -O -o time time.c
 ⚙ otan@otan-cockroach-mbp  ~  ./time
-115-09-15
 ⚙ otan@otan-cockroach-mbp  ~  cat time.c
#include <stdio.h>
#include <time.h>
#include <limits.h>

#define SIZE 50

int main() {
  time_t t = -65773907440;
  struct tm *tmp;
  char timeStr[SIZE];
  tmp = localtime(&t);
  // using strftime to display time
  strftime(timeStr, sizeof(timeStr), "%Y-%m-%d", tmp);
  printf("%s\n", timeStr);
}

@otan
Copy link
Contributor

otan commented Mar 9, 2020

oops, time_t is not an int, let me recheck!

@otan
Copy link
Contributor

otan commented Mar 9, 2020

a bit hesitant to modifying strtime to use the C implementation, how do we feel about using https://github.com/lestrrat-go/strftime (MIT License) instead? if so, do we want to modify knz/strtime to do that instead, or just wholesale replace it with the cockroach code?

@knz
Copy link
Contributor

knz commented Mar 9, 2020

Last time I checked the alternate implementations 1) supported fewer features (in particular did not deal with DST properly) and 2) did not support strptime properly.

I could see wisdom in extending knz/strtime to use C for both strftime and strptime (I think it currently uses Go for strftime, C for strptime), using a robust and battle-tested implementation.

@knz
Copy link
Contributor

knz commented Mar 9, 2020

We have a test suite (both inside crdb and in knz/strtime) to exercise any substitute we may find.

@otan
Copy link
Contributor

otan commented Mar 9, 2020

@chriscasano in the meantime, you can use %C%y if you want the four digit padding:

[email protected]:54350/defaultdb> select experimental_strftime('0001-01-01 00:00:00.000000'::TIMESTAMP, '%C%y-%m-%d %H:%M:%S.%f');
    experimental_strftime
------------------------------
  0001-01-01 00:00:00.000000
(1 row)

Time: 330µs

[email protected]:54350/defaultdb> select experimental_strftime('10001-01-01 00:00:00.000000'::TIMESTAMP, '%C%y-%m-%d %H:%M:%S.%f');
     experimental_strftime
-------------------------------
  10001-01-01 00:00:00.000000
(1 row)

Time: 428µs

@knz i'm not 100% buying into rewriting our own implementation again in C, given it may cause even more bugs but I think we can make this implementation C compatible for now and come back to that decision later. However, it appears as if C does padding so I will modify it to %04d - at least on mac and my current locale (note it currently has %02d, which i think is just a copy paste error from %y's %02d). One could argue though it should be just %d given the %C%y workaround above.

@otan
Copy link
Contributor

otan commented Mar 9, 2020

also, the POSIX standard specifies %Y can either be %d or %04d - see rationale in (https://pubs.opengroup.org/onlinepubs/9699919799/functions/strftime.html)

@chriscasano
Copy link
Author

Ah that makes more sense. Thanks @otan!

@knz
Copy link
Contributor

knz commented Mar 9, 2020

i'm not 100% buying into rewriting our own implementation again in C, gi

my preference is to copy-paste the BSD code for strftime, like I already did for strptime. The advantages are that 1) it's going to guarantee consistent behavior between the two functions 2) that code is absolutely battle-tested 3) the folk who wrote it have a better respect for API standards than the Go developers.

@otan
Copy link
Contributor

otan commented Mar 9, 2020

running into minor problems converting golang's time to c's time struct DST field, as you cannot determine in golang whether something is DST or not without some extreme hacks.

pending WIP: https://github.com/otan-cockroach/strtime/pull/new/c_impl

@otan
Copy link
Contributor

otan commented Mar 9, 2020

(since this is lower priority with a workaround, unassigning for now)

@otan otan removed their assignment Mar 9, 2020
@chriscasano
Copy link
Author

makes sense. workaround is perfectly fine.

@github-actions
Copy link

github-actions bot commented Sep 1, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@knz
Copy link
Contributor

knz commented Sep 1, 2021

issue is still relevant

@knz knz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. and removed C-investigation Further steps needed to qualify. C-label will change. no-issue-activity labels Sep 1, 2021
@blathers-crl blathers-crl bot added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label Sep 1, 2021
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-builtins SQL built-in functions and semantics thereof. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-3-erroneous-edge-case Database produces or stores erroneous data without visible error/warning, in rare edge cases. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

4 participants