Review `NaN` handling in `median` and `approx_median` #3039

andygrove · 2022-08-05T12:47:39Z

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We do not have tests involving NaN for approx_median, and the current behavior of median is likely not desirable regarding NaN. This issue is to follow up and document and possibly change the behavior and add more tests.

Describe the solution you'd like
Ideally, make sure we are compatible with PostgreSQL.

Describe alternatives you've considered
None

Additional context
Tests in aggregates.rs (being added in #3009)

#[tokio::test]
async fn median_f64_nan() -> Result<()> {
    median_test(
        "median",
        DataType::Float64,
        Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
        "NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
    )
    .await
}

#[tokio::test]
async fn approx_median_f64_nan() -> Result<()> {
    median_test(
        "approx_median",
        DataType::Float64,
        Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
        "NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
    )
    .await
}

The text was updated successfully, but these errors were encountered:

comphead · 2022-11-07T19:07:48Z

Hi @andygrove Im working on #4051 and found your comments.
I ran tests in postgres

/*
create table t (x real);
		insert into t (x) values (1.1);
		insert into t (x) values ('NaN');
		insert into t (x) values ('NaN');
		insert into t (x) values ('NaN');
*/
        
select count(1) from (select PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY x) a FROM t) b where a = 'NaN'::NUMERIC;

So looks like datafusion behaviuor and postgres are the same

andygrove added the enhancement New feature or request label Aug 5, 2022

andygrove mentioned this issue Aug 5, 2022

Implement exact median, add AggregateState #3009

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review `NaN` handling in `median` and `approx_median` #3039

Review `NaN` handling in `median` and `approx_median` #3039

andygrove commented Aug 5, 2022 •

edited

Loading

comphead commented Nov 7, 2022 •

edited

Loading

Review NaN handling in median and approx_median #3039

Review NaN handling in median and approx_median #3039

Comments

andygrove commented Aug 5, 2022 • edited Loading

comphead commented Nov 7, 2022 • edited Loading

Review `NaN` handling in `median` and `approx_median` #3039

Review `NaN` handling in `median` and `approx_median` #3039

andygrove commented Aug 5, 2022 •

edited

Loading

comphead commented Nov 7, 2022 •

edited

Loading