Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement datediff and dateadd in c to improve performance #1998

Conversation

Jakeowen1
Copy link
Contributor

@Jakeowen1 Jakeowen1 commented Nov 8, 2023

Description

This change reimplements the datediff, datediff_big, and dateadd functions in C to improve performance by 65% compared to the original implementation.

Issues Resolved

Task: Babel-4496

Test Scenarios Covered

  • Use case based -
1> select datediff(hour, cast("2023-01-01 01:01:20.99" as datetime), cast("2024-01-01 01:01:20.99" as datetime))
2> go
datediff   
-----------
       8760

(1 rows affected)
  • Boundary conditions -
1> select datediff(week, cast("2023-01-01 01:01:20.99" as datetime), cast("2023-01-05 01:01:20.99" as datetime))
2> go
datediff   
-----------
          1

(1 rows affected)
  • Arbitrary inputs -
1> select dateadd(dayofyear, 5, cast("text" as datetimeoffset));
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
invalid input syntax for type timestamp with time zone: "text"
1> select dateadd(dayofyear, 5, null)
2> go
dateadd                
-----------------------
                   NULL

(1 rows affected)
1> select datediff(y, NULL, cast("1900-01-02" as datetime))
2> go
datediff   
-----------
       NULL

(1 rows affected)
  • Negative test cases -
1> select datediff(nanosecond, cast("2023-02-15" as datetime), cast("1950-02-20" as datetime))
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart
1> 
1> select dateadd(day, 2, cast("01:01:21" as time));
2> go
Msg 33557097, Level 16, State 1, Server BABELFISH, Line 1
the datepart "day" is not supported by function dateadd for datatype time
  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

Join on two tables with 60,000 rows each

Query 1:

select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
GO

Performance with commit:
~~START~~
text
Query Text: select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..2978.79 rows=332 width=8) (actual time=91.801..677.766 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.046..56.757 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=89.527..89.529 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.040..43.681 rows=60001 loops=1)
Planning Time: 0.020 ms
Execution Time: 747.474 ms
~~END~~

Performance without commit:
~~START~~
text
Query Text: select d.D
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..2978.79 rows=332 width=8) (actual time=94.734..2920.237 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.049..59.248 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=89.456..89.458 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.042..43.743 rows=60001 loops=1)
Planning Time: 0.014 ms
Execution Time: 3002.463 ms
~~END~~


Query 2: 
select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
GO

Performance with commit: 
~~START~~
text
Query Text: select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..3061.79 rows=332 width=8) (actual time=80.382..1347.963 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.015..42.137 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=77.801..77.803 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.011..33.449 rows=60001 loops=1)
Planning Time: 0.013 ms
Execution Time: 1411.302 ms
~~END~~

Performance without commit
~~START~~
text
Query Text: select dateadd(day, 1, d.D)
    from dates d
    join more_dates md
        on d.Id = md.Id 
    where datediff(day, d.D, md.D) = 0
Hash Join  (cost=1816.75..3061.79 rows=332 width=8) (actual time=81.898..4002.259 rows=60001 loops=1)
  Hash Cond: (d.id = md.id)
  Join Filter: (datediff('day'::text COLLATE "default", d.d, md.d) = 0)
  ->  Seq Scan on dates d  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.016..48.974 rows=60001 loops=1)
  ->  Hash  (cost=988.00..988.00 rows=66300 width=12) (actual time=78.335..78.337 rows=60001 loops=1)
        Buckets: 131072  Batches: 1  Memory Usage: 3837kB
        ->  Seq Scan on more_dates md  (cost=0.00..988.00 rows=66300 width=12) (actual time=0.010..33.678 rows=60001 loops=1)
Planning Time: 0.071 ms
Execution Time: 4084.733 ms
~~END~~

  • Tooling impact -
    None

  • Client tests -
    None

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

AS
$body$
BEGIN
return sys.datediff_internal(datepart, startdate::TIMESTAMP, enddate::TIMESTAMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a explicit type change happen in here ?

Signed-off-by: Jake Owen <[email protected]>
DECLARE
timezone INTEGER;
BEGIN
timezone = sys.babelfish_get_datetimeoffset_tzoffset(startdate)::INTEGER * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this part is not moved into c impl ?

break;
}
} else {
elog(ERROR, "the datepart %s is not supported by function dateadd for datatype datetimeoffset", lowunits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original msg on the script impl is :

	RAISE EXCEPTION '"%" is not a recognized dateadd option.', datepart;

interval = (Interval *) DirectFunctionCall7(make_interval, 0, 0, 0, 0, 0, 0, Float8GetDatum((float) num * 0.000000001));
break;
default:
elog(ERROR, "the datepart %s is not supported by function dateadd for datatype datetimeoffset",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the error msg is different
and could you combine too err out places together

AS
$body$
BEGIN
return sys.datediff_internal(datepart, startdate::TIMESTAMP, enddate::TIMESTAMP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original impl, the datediff internal script version , it's passing startdate and enddate as PG_CATALOG.date into datediff_internal.
This implement, just did an extra explicit type conversion in here, and I don't understand why it should do that.

case DTK_YEAR:
if(dttype == TIME) {
elog(ERROR, "the datepart %s is not supported by date function dateadd for data type time",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original err msg should be :

The datepart % is not supported by date function dateadd for data type date.{datepart}

slight different from this implement

}
} else {
elog(ERROR, "the datepart %s is not supported by date function dateadd for data type %s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can we make the errmsg in the same code appear place , not repeat so many times

ereport(ERROR,
(errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
errmsg("data out of range for %s"), datetypeName(dttype)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original sql script the err msg should be :

		RAISE EXCEPTION '''%'' is not a recognized dateadd option.', datepart;


~~ERROR (Message: integer out of range)~~
~~ERROR (Message: The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart)~~
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that expected ?

@forestkeeper
Copy link
Contributor

Would you also mention how much percentage performance gain we have for this c impl ?

break;
default:
elog(ERROR, "the datepart \"%s\" is not supported by function datediff for datatype date", lowunits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls also make sure this error msg is the same as previous


if(overflow) {
elog(ERROR, "The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the error msg comes from ?

bool overflow = false;

ok1 = timestamp2tm(timestamp1, NULL, tm1, &fsec1, NULL, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a better name for ok1, ok2 ?

int32 microsecdiff;
struct pg_tm tt1,
*tm1 = &tt1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To define a tt1, and make a pointer to tt1, looks weired, do we have a better way to define those ? And also, pls give a meaningful names instead of tm1, tm2 .

}

if(!validDateDiff) {
elog(ERROR, "The datepart %s is not a recognized datediff option.", lowunits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original error didn't have 'The'

		RAISE EXCEPTION '"%" is not a recognized datediff option.', datepart;

}
if(overflow) {
elog(ERROR, "The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also , we should use ereport to error out in c impl, and to make sure the transaction behavior is the same compare to sql server for this error msg


if(!validDateDiff) {
elog(ERROR, "The datepart %s is not a recognized datediff option.", lowunits);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add test cases for each error out and make sure transaction behavior is the same as sql server ( when error out, shouldn't abort the current transaction )

case DTK_MICROSEC:
if(dttype == SMALLDATETIME || dttype == DATETIME || dttype == DATE) {
elog(ERROR, "The datepart %s is not supported by date function dateadd for data type %s.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you reformat this part of code, to make sure all this error out in the same place and the err msg is also not the same cause it has an additional 'The' in the msg body.

ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
errmsg("The datepart %s is not supported by date function %s for data type %s.", lowunits, "dateadd", "date")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are all very similar err msg :

ereport(ERROR,
							(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
							 errmsg("The datepart %s is not supported by date function %s for data type %s.", lowunits, "dateadd", "date")));
				}

please make sure we don't repeat this code everywhere

@@ -371,13 +371,6 @@ datetime
2016-12-27 00:25:29.0
~~END~~

-- out of range
select dateadd(year, 150, cast('9900-12-26 23:29:29' as datetime))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we delete this test case ?

@forestkeeper forestkeeper merged commit d5d1d05 into babelfish-for-postgresql:BABEL_3_X_DEV Nov 17, 2023
29 checks passed
Jakeowen1 added a commit to amazon-aurora/babelfish_extensions that referenced this pull request Nov 17, 2023
…sh-for-postgresql#1998)

This change reimplements the datediff, datediff_big, and dateadd functions in C to improve performance by 65% compared to the original implementation.

Task: BABEL-4496
Signed-off-by: Jake Owen <[email protected]>
Deepesh125 pushed a commit to amazon-aurora/babelfish_extensions that referenced this pull request Nov 20, 2023
…sh-for-postgresql#1998)


This change reimplements the datediff, datediff_big, and dateadd functions in C to improve performance by 65% compared to the original implementation.

Task: BABEL-4496
Signed-off-by: Jake Owen <[email protected]>
@Jakeowen1 Jakeowen1 mentioned this pull request Nov 20, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants