Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review NaN handling in median and approx_median #3039

Open
andygrove opened this issue Aug 5, 2022 · 1 comment
Open

Review NaN handling in median and approx_median #3039

andygrove opened this issue Aug 5, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@andygrove
Copy link
Member

andygrove commented Aug 5, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We do not have tests involving NaN for approx_median, and the current behavior of median is likely not desirable regarding NaN. This issue is to follow up and document and possibly change the behavior and add more tests.

Describe the solution you'd like
Ideally, make sure we are compatible with PostgreSQL.

Describe alternatives you've considered
None

Additional context
Tests in aggregates.rs (being added in #3009)

#[tokio::test]
async fn median_f64_nan() -> Result<()> {
    median_test(
        "median",
        DataType::Float64,
        Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
        "NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
    )
    .await
}

#[tokio::test]
async fn approx_median_f64_nan() -> Result<()> {
    median_test(
        "approx_median",
        DataType::Float64,
        Arc::new(Float64Array::from(vec![1.1, f64::NAN, f64::NAN, f64::NAN])),
        "NaN", // probably not the desired behavior? - see https://github.com/apache/arrow-datafusion/issues/3039
    )
    .await
}
@comphead
Copy link
Contributor

comphead commented Nov 7, 2022

Hi @andygrove Im working on #4051 and found your comments.
I ran tests in postgres

/*
create table t (x real);
		insert into t (x) values (1.1);
		insert into t (x) values ('NaN');
		insert into t (x) values ('NaN');
		insert into t (x) values ('NaN');
*/
        
select count(1) from (select PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY x) a FROM t) b where a = 'NaN'::NUMERIC;        

So looks like datafusion behaviuor and postgres are the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants