[BUG] cast from nan to Long and ints is inconsistent #4644

razajafri · 2020-03-21T03:53:13Z

Describe the bug
When casting float.NaN to long returns a negative number (-9223372036854775808)

Steps/Code to reproduce bug

    try (ColumnVector vector = ColumnVector.fromFloats(Float.NaN);
         ColumnVector asLong = vector.castTo(DType.INT64);
         ColumnVector expected = ColumnVector.fromLongs(0)) {
         assertColumnsAreEqual(expected, asLong);
    }

Expected behavior
The above test should pass

Additional context
casting the value to integer returns the zero (0) which is the desired result

The text was updated successfully, but these errors were encountered:

jrhemstad · 2020-03-21T12:49:15Z

Sounds like expected behavior to me.

https://wandbox.org/permlink/BgMPjI43UJ88WjVr

Why is NaN casted to an int expected to be zero?

razajafri · 2020-03-23T02:10:28Z

Hmm...

c={Float.NaN}.asInt() => d = {0}
c={Float.NaN}.asLong() => d = {-9223372036854775808}

all I'm saying is shouldn't they both be consistent?

Btw, neat tool thanks. I have added the int32 cast which isn't 0 in CPP. So I think this is a bug either way?
https://wandbox.org/permlink/BgMPjI43UJ88WjVr

jrhemstad · 2020-03-23T13:35:49Z

I'm not following, casting a NaN to int32_t isn't zero either: https://wandbox.org/permlink/WKIvzkD9jDsglJK5

(Your link was the same as my previous one. You need to hit the "Share" button to generate a new link to the updated content.)

jlowe · 2020-03-23T13:38:32Z

I believe the confusion here lies in the context for the casting operation. I'm pretty sure @razajafri is coming from the JVM perspective, and the Java spec does specify that NaN casts to zero. That isn't the same as C/C++ casting semantics.

IMO as long as libcudf is consistent wrt. C/C++ casting semantics then libcudf is correct. libcudf is not a Java library, it is a C++ library. If a Java application wants to implement Java casting semantics for NaN using libcudf then it needs to implement that logic at a higher level (e.g.: screen for NaN values and convert to 0.0 before casting).

razajafri · 2020-03-23T18:03:34Z

@jrhemstad at the risk of sounding really dumb here is what I am saying.

cudf isn't consistent with itself when I run the following snippet

    JNI_NULL_CHECK(env, handle, "native handle is null", 0);
    try {
        cudf::column_view * column_view = reinterpret_cast<cudf::column_view *>(handle);

        std::unique_ptr<cudf::column> result0 = cudf::experimental::cast(*column_view, cudf::data_type{cudf::INT32});
        std::unique_ptr<cudf::column> result1 = cudf::experimental::cast(*column_view, cudf::data_type{cudf::INT64});

        std::cout << "NaN to int32: ";
        cudf::test::print(result0->view());
        std::cout << "\nNaN to int64: ";
        cudf::test::print(result1->view());

    }
    CATCH_STD(env, 0);

[DEBUG] NaN to int32: 0
[DEBUG] NaN to int64: -9223372036854775808

Its casting NaN to a zero when cast to int32 column but -9223372036854775808 when cast to Int64. Whereas in cpp casting to int32 or int64 both result in a non-zero value.
https://wandbox.org/permlink/JDUG5PrNA1MUqz6n

jrhemstad · 2020-03-30T22:05:18Z

Okay, I agree the inconsistency definitely sounds like a bug.

vuule · 2020-04-02T02:53:28Z

If I understand the code correctly, when the source and destination types are numeric, the cast is a simple call to static_cast, and casting from NaN is an undefined behavior in C++.

I'll add a floating point specialization that checks for NaN and see if that suffices.

jrhemstad · 2020-04-02T03:42:31Z

I didn't realize the cast from Nan was undefined. That makes sense now.

vuule · 2020-04-06T20:01:46Z

Based on this discussion it looks like we want to keep libcudf behavior consistent with C++.
If this is the case, can this issue we closed?

jrhemstad · 2020-04-06T20:52:44Z

Based on this discussion it looks like we want to keep libcudf behavior consistent with C++.
If this is the case, can this issue we closed?

Yep!

In summary, casting from NaN is undefined behavior in C++. Therefore, the user is required to first replace all NaN values before casting if a specified value is desired.

razajafri added bug Something isn't working Needs Triage Need team to review and classify labels Mar 21, 2020

razajafri changed the title ~~[BUG]~~ [BUG] cast from nan to Long and ints is inconsistent Mar 23, 2020

kkraus14 added libcudf Affects libcudf (C++/CUDA) code. and removed Needs Triage Need team to review and classify labels Mar 25, 2020

harrism assigned vuule Apr 1, 2020

jrhemstad mentioned this issue Apr 2, 2020

[DISCUSSION] Behavior for NaN comparisons in libcudf #4760

Closed

jrhemstad closed this as completed Apr 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] cast from nan to Long and ints is inconsistent #4644

[BUG] cast from nan to Long and ints is inconsistent #4644

razajafri commented Mar 21, 2020

jrhemstad commented Mar 21, 2020

razajafri commented Mar 23, 2020 •

edited

Loading

jrhemstad commented Mar 23, 2020

jlowe commented Mar 23, 2020

razajafri commented Mar 23, 2020

jrhemstad commented Mar 30, 2020

vuule commented Apr 2, 2020

jrhemstad commented Apr 2, 2020

vuule commented Apr 6, 2020

jrhemstad commented Apr 6, 2020

[BUG] cast from nan to Long and ints is inconsistent #4644

[BUG] cast from nan to Long and ints is inconsistent #4644

Comments

razajafri commented Mar 21, 2020

jrhemstad commented Mar 21, 2020

razajafri commented Mar 23, 2020 • edited Loading

jrhemstad commented Mar 23, 2020

jlowe commented Mar 23, 2020

razajafri commented Mar 23, 2020

jrhemstad commented Mar 30, 2020

vuule commented Apr 2, 2020

jrhemstad commented Apr 2, 2020

vuule commented Apr 6, 2020

jrhemstad commented Apr 6, 2020

razajafri commented Mar 23, 2020 •

edited

Loading