Make `log` function to be in sync with PostgresSql #5259

comphead · 2023-02-12T07:13:58Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The log function has some discrepancy with Postgres for edgecases e.g.

❯ select log(1, 64), log(0);
+-------------------------+---------------+
| log(Int64(1),Int64(64)) | log(Int64(0)) |
+-------------------------+---------------+
| inf                     | -inf          |
+-------------------------+---------------+

The query executes ok, but in PostgresSql the query fail fast

postgres=# select log(0);
ERROR:  cannot take logarithm of zero
postgres=# select log(1, 64);
ERROR:  division by zero

Describe the solution you'd like
Investigate and implement log function to be the same behavior as for PostgresSql for edgecases above and perhaps for other like negative numbers, etc

Describe alternatives you've considered
Not doing this

Additional context
Found when working on #5245

The text was updated successfully, but these errors were encountered:

comphead · 2023-02-13T00:25:26Z

Other edgecases

PG log(-1), log(-10, 1), log(10, -1)
cannot take logarithm of a negative number

DF
+----------------+--------------------------+--------------------------+
| log(Int64(-1)) | log(Int64(-1),Int64(10)) | log(Int64(1),Int64(-10)) |
+----------------+--------------------------+--------------------------+
| NaN            | NaN                      | NaN                      |
+----------------+--------------------------+--------------------------+

ozankabak · 2023-02-13T04:10:37Z

In short, this behavior doesn't seem unreasonable to me. IIRC, it also agrees with IEEE 754.

Still, let's dig a little deeper and think about this in a larger context. What should we do when such values occur in the middle of a large table? What does PG do in that case? What should we do in case of an unbounded table/stream?

Theoretically, this is a local out-of-domain issue and the output value indicates this clearly. Therefore, erroring out does not seem reasonable to me. Having worked on numerics for a long time in a wide variety of contexts, Datafusion's current behavior seems quite reasonable 🙂

On the other hand, we want PG compatibility in general. Maybe we can add a configuration flag that controls what we do in such cases.

comphead · 2023-02-14T01:34:09Z

@ozankabak Its probably point of long disputes what is better: failfast or return null/nan etc.
I'm still thinking we should stick to PG behavior, as it is consistent and expectable from users migrating their queries.

From personal point of view it seems failfast is better solution anyway as it doesn't silently corrupt the data and give the user responsibility how to deal with bad data on his side. Math function has to be simple and support only supportable data for given function.

Other day old spark versions turned the value into null if it cannot be casted to timestamp and this was sooooo frustrating, that spark went away and now the to_timestamp function fails fast.

ozankabak · 2023-02-14T05:14:56Z

As I said, I see the value in failing fast but doing that unconditionally is simply not an option in certain applications when you are executing a long-running query on a stream. That's why I suggested to have a configuration option to control this. Having both behaviors available allows us to cater to all use cases without impacting anyone badly, and has a nice side-effect of obviating the need for long philosophical discussions about failing fast vs. special values.

comphead · 2023-02-16T00:56:17Z

@ozankabak thanks for feedback, the discussion is already much broader than log function itself.
You may want to create a separate topic to discuss failfast/failsafe approaches in detail. Moreover this behavior has to be consistent across all math functions, and probably cast functions as well.

I just noticed, since we use rust math functions, it returns NaN values for outlier scenarios, and this going to be consistent, even not complying to how PG works

❯ select sqrt(-1);
+-----------------+
| sqrt(Int64(-1)) |
+-----------------+
| NaN             |
+-----------------+

whereas PG bursts as

cannot take square root of a negative number

I'll leave it for @alamb @andygrove to decide

ozankabak · 2023-02-16T02:01:34Z

Yes, Rust is following the usual numerics convention (which is unsurprising) and the behavior naturally propagates as we rely on these functions. I agree with the emphasis on consistency. If we want to support PG-like behavior through a general configuration option, we will need to write wrappers around these functions that consult the config.

alamb · 2023-02-16T18:32:24Z

If we want to support PG-like behavior through a general configuration option, we will need to write wrappers around these functions that consult the config.

I agree the only way everyone will be satisfied is with a config that controls it.

I suspect this is a non trivial amount of effort to implement this consistently - so we should be aware of that as we start down the road

ozankabak · 2023-02-16T18:41:04Z

Agreed

comphead · 2023-02-16T18:41:27Z

Thanks @alamb so we can close this ticket then and leave all math scalar functions to work in sync with rust math functions and out of sync with PG. Probably its worth to document somewhere

alamb · 2023-02-16T19:37:23Z

Probably its worth to document somewhere

Agree -- the best place would be https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/expressions.md perhaps. Do either of you have some time to write one? Otherwise I will do so

comphead · 2023-02-16T19:53:01Z

i'll update the doc. Filed #5312

comphead added the enhancement New feature or request label Feb 12, 2023

comphead mentioned this issue Feb 12, 2023

Add SQL function overload LOG(base, x) for logarithm of x to base #5245

Merged

alamb closed this as completed Feb 16, 2023

comphead mentioned this issue Feb 16, 2023

[DOC] Update math expression documentation #5312

Closed

Jefffrey mentioned this issue Feb 18, 2024

Minior: Add negative tests about log function #9245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `log` function to be in sync with PostgresSql #5259

Make `log` function to be in sync with PostgresSql #5259

comphead commented Feb 12, 2023

comphead commented Feb 13, 2023

ozankabak commented Feb 13, 2023 •

edited

Loading

comphead commented Feb 14, 2023

ozankabak commented Feb 14, 2023

comphead commented Feb 16, 2023

ozankabak commented Feb 16, 2023 •

edited

Loading

alamb commented Feb 16, 2023

ozankabak commented Feb 16, 2023

comphead commented Feb 16, 2023

alamb commented Feb 16, 2023

comphead commented Feb 16, 2023 •

edited

Loading

Make log function to be in sync with PostgresSql #5259

Make log function to be in sync with PostgresSql #5259

Comments

comphead commented Feb 12, 2023

comphead commented Feb 13, 2023

ozankabak commented Feb 13, 2023 • edited Loading

comphead commented Feb 14, 2023

ozankabak commented Feb 14, 2023

comphead commented Feb 16, 2023

ozankabak commented Feb 16, 2023 • edited Loading

alamb commented Feb 16, 2023

ozankabak commented Feb 16, 2023

comphead commented Feb 16, 2023

alamb commented Feb 16, 2023

comphead commented Feb 16, 2023 • edited Loading

Make `log` function to be in sync with PostgresSql #5259

Make `log` function to be in sync with PostgresSql #5259

ozankabak commented Feb 13, 2023 •

edited

Loading

ozankabak commented Feb 16, 2023 •

edited

Loading

comphead commented Feb 16, 2023 •

edited

Loading