Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window function: Internal error: Operator + is not implemented for types #5346

Closed
alamb opened this issue Feb 20, 2023 · 2 comments · Fixed by #5384
Closed

Window function: Internal error: Operator + is not implemented for types #5346

alamb opened this issue Feb 20, 2023 · 2 comments · Fixed by #5384
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Feb 20, 2023

Describe the bug
I get an internal error on a window query:

query error DataFusion error: Internal error: Operator + is not implemented for types Int8(5) and Utf8("1"). This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

To Reproduce
Run this sql in the datafusion checkout (where aggregate_test_100.csv is)

CREATE EXTERNAL TABLE aggregate_test_100 (
  c1  VARCHAR NOT NULL,
  c2  TINYINT NOT NULL,
  c3  SMALLINT NOT NULL,
  c4  SMALLINT,
  c5  INT,
  c6  BIGINT NOT NULL,
  c7  SMALLINT NOT NULL,
  c8  INT NOT NULL,
  c9  BIGINT UNSIGNED NOT NULL,
  c10 VARCHAR NOT NULL,
  c11 FLOAT NOT NULL,
  c12 DOUBLE NOT NULL,
  c13 VARCHAR NOT NULL
)
STORED AS CSV
WITH HEADER ROW
LOCATION './testing/data/csv/aggregate_test_100.csv';

SELECT
SUM(c4) OVER(ORDER BY c2 DESC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING),
SUM(c3) OVER(ORDER BY c2 DESC RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING),
COUNT(*) OVER(ORDER BY c2 DESC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM aggregate_test_100
ORDER BY c9
LIMIT 5;

Results in

❯ SELECT
SUM(c4) OVER(ORDER BY c2 DESC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING),
SUM(c3) OVER(ORDER BY c2 DESC RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING),
COUNT(*) OVER(ORDER BY c2 DESC RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM aggregate_test_100
ORDER BY c9
LIMIT 5
;
Internal("Operator + is not implemented for types Int8(5) and Utf8(\"1\")")
❯

Expected behavior
Test should pass

Additional context
Found on #5330 (for some reason the error didn't happen in the rust based test setup)

@alamb alamb added the bug Something isn't working label Feb 20, 2023
@ozankabak
Copy link
Contributor

We will take a look at this once we finish up tasks at hand.

@mustafasrepo
Copy link
Contributor

mustafasrepo commented Feb 23, 2023

I have debugged this problem. It originates when value of RANGE cannot be casted to type used in the ORDER BY column. The reason above test passes in rust and not in cli is that, In rust for c2 column Uint32 is used, whereas in cli test Int8 is used. Below, test reproduces the issue in rust.

#[tokio::test]
async fn window_frame_large_range() -> Result<()> {
    let ctx = SessionContext::new();
    register_aggregate_csv(&ctx).await?;
    // 10000 is outside the valid range for Int8 (type of c3). In this case we should be able to still produce correct result.
    // See issue: https://github.com/apache/arrow-datafusion/issues/5346
    let sql = "SELECT
               SUM(c3) OVER(ORDER BY c3 DESC RANGE BETWEEN 10000 PRECEDING AND 10000 FOLLOWING) as summation1
               FROM aggregate_test_100
               ORDER BY c9
               LIMIT 5";
    let actual = execute_to_batches(&ctx, sql).await;
    let expected = vec![
        "+------------+",
        "| summation1 |",
        "+------------+",
        "| 781        |",
        "| 781        |",
        "| 781        |",
        "| 781        |",
        "| 781        |",
        "+------------+",
    ];
    assert_batches_eq!(expected, &actual);
    Ok(())
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants