Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL][1.2][Result mismatch] Cast string to integral type does not ignore ISO control characters #7749

Closed
wForget opened this issue Oct 31, 2024 · 3 comments · Fixed by #7806
Closed
Labels
bug Something isn't working triage

Comments

@wForget
Copy link
Member

wForget commented Oct 31, 2024

Backend

VL (Velox)

Bug description

  1. create test dataset

    create table t1 as select url_decode('111111%00') as c1;
    
  2. cast to integral type
    check sqls:

    select cast(c1 as int) from t1;
    select cast(c1 as bigint) from t1;
    

    gluten 1.2 + spark 3.5:

    NULL
    NULL
    

    vanilla spark:

    111111
    111111
    
  3. cast to fractional type
    check sqls:

    select cast(c1 as float) from t1;
    select cast(c1 as double) from t1;
    

    gluten 1.2 + spark 3.5:

    111111.0
    111111.0
    

    vanilla spark:

    111111.0
    111111.0
    

Related PRs: apache/spark#41535

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

@wForget wForget added bug Something isn't working triage labels Oct 31, 2024
@wForget wForget changed the title [VL][1.2] Cast string to integral type does not ignore ISO control characters [VL][1.2][Result mismatch] Cast string to integral type does not ignore ISO control characters Oct 31, 2024
@wForget
Copy link
Member Author

wForget commented Oct 31, 2024

@jackylee-ch
Copy link
Contributor

jackylee-ch commented Oct 31, 2024

String type conversions were also fixed in #1569 . They used trim to remove invalid characters, but that didn't handle invalid characters inside a String. It would be great to fix this in velox so we can remove #1569 in gluten. cc @PHILO-HE

@wForget
Copy link
Member Author

wForget commented Nov 4, 2024

String type conversions were also fixed in #1569 . They used trim to remove invalid characters, but that didn't handle invalid characters inside a String. It would be great to fix this in velox so we can remove #1569 in gluten. cc @PHILO-HE

Thank you for the information. Since the trim behavior for different types of cast in spark is different, it seems that extending #1569 is a simpler solution, can we fix it that way first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants