[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

nartal1 · 2024-01-31T19:27:11Z

Describe the bug
test_from_json_struct_decimal failed on databricks nightly builds

 =================================== FAILURES ===================================
________________________ test_from_json_struct_decimal _________________________

     @allow_non_gpu(*non_utc_allow)
     def test_from_json_struct_decimal():
         json_string_gen = StringGen(r'{ "a": "[+-]?([0-9]{0,5})?(\.[0-9]{0,2})?([eE][+-]?[0-9]{1,2})?" }') \
             .with_special_pattern('', weight=50) \
             .with_special_pattern('null', weight=50)
>       assert_gpu_and_cpu_are_equal_collect(
             lambda spark : unary_op_df(spark, json_string_gen) \
                 .select(f.from_json('a', 'struct<a:decimal>')),
             conf={"spark.rapids.sql.expression.JsonToStructs": True})

../../src/main/python/json_test.py�[0m:634: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
./../src/main/python/asserts.py�[0m:595: in assert_gpu_and_cpu_are_equal_collect
     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py�[0m:517: in _assert_gpu_and_cpu_are_equal
    assert_equal(from_cpu, from_gpu)
../../src/main/python/asserts.py�[0m:107: in assert_equal
     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
./../src/main/python/asserts.py�[0m:43: in _assert_equal
     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
../../src/main/python/asserts.py�[0m:36: in _assert_equal
     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
../../src/main/python/asserts.py�[0m:36: in _assert_equal
     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

 cpu = Decimal('0'), gpu = None
float_check = <function get_float_check.<locals>.<lambda> at 0x7fc6bea6e560>
path = [1438, 'from_json(a)', 'a']

     def _assert_equal(cpu, gpu, float_check, path):
        t = type(cpu)
         if (t is Row):
            assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
                assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
                 for field in cpu.__fields__:
                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
             else:
                 for index in range(len(cpu)):
                    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is list):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             for index in range(len(cpu)):
                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is tuple):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
            for index in range(len(cpu)):
                _assert_equal(cpu[index], gpu[index], float_check, path + [index])
        elif (t is pytypes.GeneratorType):
             index = 0
            # generator has no zip :( so we have to do this the hard way
             done = False
             while not done:
                sub_cpu = None
                 sub_gpu = None
                 try:
                    sub_cpu = next(cpu)
                 except StopIteration:
                     done = True
     
                try:
                     sub_gpu = next(gpu)
                except StopIteration:
                    done = True
     
                 if done:
                    assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
                 else:
                    _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
    
                 index = index + 1
         elif (t is dict):
             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
             # so sort the items to do our best with ignoring the order of dicts
             cpu_items = list(cpu.items()).sort(key=_RowCmp)
             gpu_items = list(gpu.items()).sort(key=_RowCmp)
            _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
         elif (t is int):
            assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
        elif (t is float):
             if (math.isnan(cpu)):
                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
             else:
                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
         elif isinstance(cpu, str):
             assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
         elif isinstance(cpu, datetime):
            assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
        elif isinstance(cpu, date):
             assert cpu == gpu, "GPU and CPU date values are different at {}".format(path)
        elif isinstance(cpu, bool):
             assert cpu == gpu, "GPU and CPU boolean values are different at {}".format(path)
        elif isinstance(cpu, Decimal):
>           assert cpu == gpu, "GPU and CPU decimal values are different at {}".format(path)
[E           AssertionError: GPU and CPU decimal values are different at [1438, 'from_json(a)', 'a']�[0m

../../src/main/python/asserts.py�[0m:93: AssertionError

The text was updated successfully, but these errors were encountered:

andygrove · 2024-01-31T23:27:41Z

cpu = Decimal('0'), gpu = None

This is an example of the issue where we let cuDF infer types in from_json rather than ask for primitives as strings and then cast in the plugin, as we do with GpuJsonScan. This is covered in issue #8204.

I will create a PR to use a fixed seed until we resolve this.

mattahrens · 2024-02-06T21:16:43Z

Scope for this bug to stay open as a P1 is to re-enable random seed once cudf dependency is satisfied.

nartal1 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 31, 2024

nartal1 changed the title ~~[BUG]One test in json_test.py failed: test_from_json_struct_decimal~~ [BUG]Test in json_test.py failed: test_from_json_struct_decimal Jan 31, 2024

andygrove mentioned this issue Feb 1, 2024

Use fixed seed for test_from_json_struct_decimal #10353

Merged

mattahrens assigned andygrove Feb 6, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 6, 2024

andygrove mentioned this issue Mar 20, 2024

Use random seed for test_from_json_struct_decimal [databricks] #10614

Merged

andygrove closed this as completed in #10614 Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

nartal1 commented Jan 31, 2024 •

edited

Loading

andygrove commented Jan 31, 2024 •

edited

Loading

mattahrens commented Feb 6, 2024

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

Comments

nartal1 commented Jan 31, 2024 • edited Loading

andygrove commented Jan 31, 2024 • edited Loading

mattahrens commented Feb 6, 2024

nartal1 commented Jan 31, 2024 •

edited

Loading

andygrove commented Jan 31, 2024 •

edited

Loading