Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cast from nan to Long and ints is inconsistent #4644

Closed
razajafri opened this issue Mar 21, 2020 · 10 comments
Closed

[BUG] cast from nan to Long and ints is inconsistent #4644

razajafri opened this issue Mar 21, 2020 · 10 comments
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.

Comments

@razajafri
Copy link
Contributor

Describe the bug
When casting float.NaN to long returns a negative number (-9223372036854775808)

Steps/Code to reproduce bug

    try (ColumnVector vector = ColumnVector.fromFloats(Float.NaN);
         ColumnVector asLong = vector.castTo(DType.INT64);
         ColumnVector expected = ColumnVector.fromLongs(0)) {
         assertColumnsAreEqual(expected, asLong);
    }

Expected behavior
The above test should pass

Additional context
casting the value to integer returns the zero (0) which is the desired result

@razajafri razajafri added bug Something isn't working Needs Triage Need team to review and classify labels Mar 21, 2020
@jrhemstad
Copy link
Contributor

Sounds like expected behavior to me.

https://wandbox.org/permlink/BgMPjI43UJ88WjVr

Why is NaN casted to an int expected to be zero?

@razajafri razajafri changed the title [BUG] [BUG] cast from nan to Long and ints is inconsistent Mar 23, 2020
@razajafri
Copy link
Contributor Author

razajafri commented Mar 23, 2020

Hmm...

c={Float.NaN}.asInt() => d = {0}
c={Float.NaN}.asLong() => d = {-9223372036854775808}

all I'm saying is shouldn't they both be consistent?

Btw, neat tool thanks. I have added the int32 cast which isn't 0 in CPP. So I think this is a bug either way?
https://wandbox.org/permlink/BgMPjI43UJ88WjVr

@jrhemstad
Copy link
Contributor

I'm not following, casting a NaN to int32_t isn't zero either: https://wandbox.org/permlink/WKIvzkD9jDsglJK5

(Your link was the same as my previous one. You need to hit the "Share" button to generate a new link to the updated content.)

@jlowe
Copy link
Member

jlowe commented Mar 23, 2020

I believe the confusion here lies in the context for the casting operation. I'm pretty sure @razajafri is coming from the JVM perspective, and the Java spec does specify that NaN casts to zero. That isn't the same as C/C++ casting semantics.

IMO as long as libcudf is consistent wrt. C/C++ casting semantics then libcudf is correct. libcudf is not a Java library, it is a C++ library. If a Java application wants to implement Java casting semantics for NaN using libcudf then it needs to implement that logic at a higher level (e.g.: screen for NaN values and convert to 0.0 before casting).

@razajafri
Copy link
Contributor Author

@jrhemstad at the risk of sounding really dumb here is what I am saying.

cudf isn't consistent with itself when I run the following snippet

    JNI_NULL_CHECK(env, handle, "native handle is null", 0);
    try {
        cudf::column_view * column_view = reinterpret_cast<cudf::column_view *>(handle);

        std::unique_ptr<cudf::column> result0 = cudf::experimental::cast(*column_view, cudf::data_type{cudf::INT32});
        std::unique_ptr<cudf::column> result1 = cudf::experimental::cast(*column_view, cudf::data_type{cudf::INT64});

        std::cout << "NaN to int32: ";
        cudf::test::print(result0->view());
        std::cout << "\nNaN to int64: ";
        cudf::test::print(result1->view());

    }
    CATCH_STD(env, 0);

[DEBUG] NaN to int32: 0
[DEBUG] NaN to int64: -9223372036854775808

Its casting NaN to a zero when cast to int32 column but -9223372036854775808 when cast to Int64. Whereas in cpp casting to int32 or int64 both result in a non-zero value.
https://wandbox.org/permlink/JDUG5PrNA1MUqz6n

@kkraus14 kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 25, 2020
@jrhemstad
Copy link
Contributor

Okay, I agree the inconsistency definitely sounds like a bug.

@vuule
Copy link
Contributor

vuule commented Apr 2, 2020

If I understand the code correctly, when the source and destination types are numeric, the cast is a simple call to static_cast, and casting from NaN is an undefined behavior in C++.

I'll add a floating point specialization that checks for NaN and see if that suffices.

@jrhemstad
Copy link
Contributor

I didn't realize the cast from Nan was undefined. That makes sense now.

@vuule
Copy link
Contributor

vuule commented Apr 6, 2020

Based on this discussion it looks like we want to keep libcudf behavior consistent with C++.
If this is the case, can this issue we closed?

@jrhemstad
Copy link
Contributor

Based on this discussion it looks like we want to keep libcudf behavior consistent with C++.
If this is the case, can this issue we closed?

Yep!

In summary, casting from NaN is undefined behavior in C++. Therefore, the user is required to first replace all NaN values before casting if a specified value is desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

No branches or pull requests

5 participants