Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] orc_test.py is failing #1061

Closed
jlowe opened this issue Nov 3, 2020 · 3 comments · Fixed by #1337
Closed

[BUG] orc_test.py is failing #1061

jlowe opened this issue Nov 3, 2020 · 3 comments · Fixed by #1337
Assignees
Labels
bug Something isn't working P0 Must have for release

Comments

@jlowe
Copy link
Member

jlowe commented Nov 3, 2020

A recent 0.3 integration test run reported failures in test_simple_partitioned_read. Error details:

15:25:50  =================================== FAILURES ===================================
15:25:50  ________________________ test_simple_partitioned_read[] ________________________
15:25:50  
15:25:50  spark_tmp_path = '/tmp/pyspark_tests//652014/', v1_enabled_list = ''
15:25:50  
15:25:50      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:25:50      def test_simple_partitioned_read(spark_tmp_path, v1_enabled_list):
15:25:50          # Once https://github.com/NVIDIA/spark-rapids/issues/131 is fixed
15:25:50          # we should go with a more standard set of generators
15:25:50          orc_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
15:25:50          string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
15:25:50          TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))]
15:25:50          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:25:50          first_data_path = spark_tmp_path + '/ORC_DATA/key=0'
15:25:50          with_cpu_session(
15:25:50                  lambda spark : gen_df(spark, gen_list).write.orc(first_data_path))
15:25:50          second_data_path = spark_tmp_path + '/ORC_DATA/key=1'
15:25:50          with_cpu_session(
15:25:50                  lambda spark : gen_df(spark, gen_list).write.orc(second_data_path))
15:25:50          data_path = spark_tmp_path + '/ORC_DATA'
15:25:50          assert_gpu_and_cpu_are_equal_collect(
15:25:50                  lambda spark : spark.read.orc(data_path),
15:25:50  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:25:50  
15:25:50  src/main/python/orc_test.py:131: 
15:25:50  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:25:50  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:25:50      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:25:50  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:25:50      assert_equal(from_cpu, from_gpu)
15:25:50  src/main/python/asserts.py:86: in assert_equal
15:25:50      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:25:50  src/main/python/asserts.py:38: in _assert_equal
15:25:50      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50  src/main/python/asserts.py:31: in _assert_equal
15:25:50      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:25:50  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:25:50  
15:25:50  cpu = datetime.datetime(1898, 5, 12, 21, 45, 8, 426000)
15:25:50  gpu = datetime.datetime(1898, 5, 13, 6, 49, 21, 426000)
15:25:50  float_check = <function get_float_check.<locals>.<lambda> at 0x7f94408265f0>
15:25:50  path = [3077, '_c9']
15:25:50  
15:25:50      def _assert_equal(cpu, gpu, float_check, path):
15:25:50          t = type(cpu)
15:25:50          if (t is Row):
15:25:50              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:25:50              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:25:50                  for field in cpu.__fields__:
15:25:50                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:25:50              else:
15:25:50                  for index in range(len(cpu)):
15:25:50                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50          elif (t is list):
15:25:50              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:25:50              for index in range(len(cpu)):
15:25:50                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50          elif (t is pytypes.GeneratorType):
15:25:50              index = 0
15:25:50              # generator has no zip :( so we have to do this the hard way
15:25:50              done = False
15:25:50              while not done:
15:25:50                  sub_cpu = None
15:25:50                  sub_gpu = None
15:25:50                  try:
15:25:50                      sub_cpu = next(cpu)
15:25:50                  except StopIteration:
15:25:50                      done = True
15:25:50      
15:25:50                  try:
15:25:50                      sub_gpu = next(gpu)
15:25:50                  except StopIteration:
15:25:50                      done = True
15:25:50      
15:25:50                  if done:
15:25:50                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:25:50                  else:
15:25:50                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:25:50      
15:25:50                  index = index + 1
15:25:50          elif (t is int):
15:25:50              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:25:50          elif (t is float):
15:25:50              if (math.isnan(cpu)):
15:25:50                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:25:50              else:
15:25:50                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:25:50          elif isinstance(cpu, str):
15:25:50              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:25:50          elif isinstance(cpu, datetime):
15:25:50  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:25:50  E           AssertionError: GPU and CPU timestamp values are different at [3077, '_c9']
15:25:50  
15:25:50  src/main/python/asserts.py:72: AssertionError
15:25:50  ----------------------------- Captured stdout call -----------------------------
15:25:50  ### CPU RUN ###
15:25:50  ### GPU RUN ###
15:25:50  ### COLLECT: GPU TOOK 0.21247625350952148 CPU TOOK 0.16116714477539062 ###
15:25:50  ______________________ test_simple_partitioned_read[orc] _______________________
15:25:50  
15:25:50  spark_tmp_path = '/tmp/pyspark_tests//336324/', v1_enabled_list = 'orc'
15:25:50  
15:25:50      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:25:50      def test_simple_partitioned_read(spark_tmp_path, v1_enabled_list):
15:25:50          # Once https://github.com/NVIDIA/spark-rapids/issues/131 is fixed
15:25:50          # we should go with a more standard set of generators
15:25:50          orc_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
15:25:50          string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
15:25:50          TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))]
15:25:50          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:25:50          first_data_path = spark_tmp_path + '/ORC_DATA/key=0'
15:25:50          with_cpu_session(
15:25:50                  lambda spark : gen_df(spark, gen_list).write.orc(first_data_path))
15:25:50          second_data_path = spark_tmp_path + '/ORC_DATA/key=1'
15:25:50          with_cpu_session(
15:25:50                  lambda spark : gen_df(spark, gen_list).write.orc(second_data_path))
15:25:50          data_path = spark_tmp_path + '/ORC_DATA'
15:25:50          assert_gpu_and_cpu_are_equal_collect(
15:25:50                  lambda spark : spark.read.orc(data_path),
15:25:50  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:25:50  
15:25:50  src/main/python/orc_test.py:131: 
15:25:50  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:25:50  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:25:50      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:25:50  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:25:50      assert_equal(from_cpu, from_gpu)
15:25:50  src/main/python/asserts.py:86: in assert_equal
15:25:50      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:25:50  src/main/python/asserts.py:38: in _assert_equal
15:25:50      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50  src/main/python/asserts.py:31: in _assert_equal
15:25:50      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:25:50  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:25:50  
15:25:50  cpu = datetime.datetime(1934, 9, 27, 7, 18, 4, 744000)
15:25:50  gpu = datetime.datetime(1934, 9, 27, 16, 22, 17, 744000)
15:25:50  float_check = <function get_float_check.<locals>.<lambda> at 0x7f94406a7440>
15:25:50  path = [1370, '_c9']
15:25:50  
15:25:50      def _assert_equal(cpu, gpu, float_check, path):
15:25:50          t = type(cpu)
15:25:50          if (t is Row):
15:25:50              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:25:50              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:25:50                  for field in cpu.__fields__:
15:25:50                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:25:50              else:
15:25:50                  for index in range(len(cpu)):
15:25:50                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50          elif (t is list):
15:25:50              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:25:50              for index in range(len(cpu)):
15:25:50                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:25:50          elif (t is pytypes.GeneratorType):
15:25:50              index = 0
15:25:50              # generator has no zip :( so we have to do this the hard way
15:25:50              done = False
15:25:50              while not done:
15:25:50                  sub_cpu = None
15:25:50                  sub_gpu = None
15:25:50                  try:
15:25:50                      sub_cpu = next(cpu)
15:25:50                  except StopIteration:
15:25:50                      done = True
15:25:50      
15:25:50                  try:
15:25:50                      sub_gpu = next(gpu)
15:25:50                  except StopIteration:
15:25:50                      done = True
15:25:50      
15:25:50                  if done:
15:25:50                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:25:50                  else:
15:25:50                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:25:50      
15:25:50                  index = index + 1
15:25:50          elif (t is int):
15:25:50              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:25:50          elif (t is float):
15:25:50              if (math.isnan(cpu)):
15:25:50                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:25:50              else:
15:25:50                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:25:50          elif isinstance(cpu, str):
15:25:50              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:25:50          elif isinstance(cpu, datetime):
15:25:50  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:25:50  E           AssertionError: GPU and CPU timestamp values are different at [1370, '_c9']
15:25:50  
15:25:50  src/main/python/asserts.py:72: AssertionError
15:25:50  ----------------------------- Captured stdout call -----------------------------
15:25:50  ### CPU RUN ###
15:25:50  ### GPU RUN ###
15:25:50  ### COLLECT: GPU TOOK 0.17358732223510742 CPU TOOK 0.16180038452148438 ###
@jlowe jlowe added bug Something isn't working P0 Must have for release labels Nov 3, 2020
@tgravescs tgravescs changed the title [BUG] test_simple_partitioned_read failed [BUG] ORC test_simple_partitioned_read failed Nov 4, 2020
@jlowe
Copy link
Member Author

jlowe commented Nov 5, 2020

This recently failed in the integration tests against Spark-3.0.2-SNAPSHOT as well.

@jlowe jlowe changed the title [BUG] ORC test_simple_partitioned_read failed [BUG] orc_test.py is failing Nov 6, 2020
@jlowe
Copy link
Member Author

jlowe commented Nov 6, 2020

3.0.1 integration tests had many more ORC failures last night:

15:51:47  =================================== FAILURES ===================================
15:51:47  _ test_read_round_trip[-read_orc_df-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//978107/'
15:51:47  orc_gens = [Byte, Short, Integer, Long, Float, Double, ...]
15:51:47  read_func = <function read_orc_df at 0x7f65de189050>, v1_enabled_list = ''
15:51:47  
15:51:47      @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
15:51:47      @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_read_round_trip(spark_tmp_path, orc_gens, read_func, v1_enabled_list):
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(data_path))
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  read_func(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:73: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1590, 4, 25, 7, 25, 31, 697000)
15:51:47  gpu = datetime.datetime(1590, 4, 25, 16, 27, 32, 697000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65d43a4830>
15:51:47  path = [1, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [1, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.1357564926147461 CPU TOOK 0.12433862686157227 ###
15:51:47  _ test_read_round_trip[-read_orc_sql-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//478994/'
15:51:47  orc_gens = [Byte, Short, Integer, Long, Float, Double, ...]
15:51:47  read_func = <function read_orc_sql at 0x7f65de1890e0>, v1_enabled_list = ''
15:51:47  
15:51:47      @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
15:51:47      @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_read_round_trip(spark_tmp_path, orc_gens, read_func, v1_enabled_list):
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(data_path))
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  read_func(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:73: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1781, 6, 2, 17, 53, 14, 325000)
15:51:47  gpu = datetime.datetime(1781, 6, 3, 2, 55, 15, 325000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65d43a44d0>
15:51:47  path = [364, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [364, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.10378313064575195 CPU TOOK 0.09846067428588867 ###
15:51:47  _ test_read_round_trip[orc-read_orc_df-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//813331/'
15:51:47  orc_gens = [Byte, Short, Integer, Long, Float, Double, ...]
15:51:47  read_func = <function read_orc_df at 0x7f65de189050>, v1_enabled_list = 'orc'
15:51:47  
15:51:47      @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
15:51:47      @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_read_round_trip(spark_tmp_path, orc_gens, read_func, v1_enabled_list):
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(data_path))
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  read_func(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:73: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1781, 6, 2, 17, 53, 14, 325000)
15:51:47  gpu = datetime.datetime(1781, 6, 3, 2, 55, 15, 325000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65d4504cb0>
15:51:47  path = [364, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [364, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.10281252861022949 CPU TOOK 0.11835145950317383 ###
15:51:47  _ test_read_round_trip[orc-read_orc_sql-[Byte, Short, Integer, Long, Float, Double, String, Boolean, Date, Timestamp]] _
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//88131/'
15:51:47  orc_gens = [Byte, Short, Integer, Long, Float, Double, ...]
15:51:47  read_func = <function read_orc_sql at 0x7f65de1890e0>, v1_enabled_list = 'orc'
15:51:47  
15:51:47      @pytest.mark.parametrize('orc_gens', orc_gens_list, ids=idfn)
15:51:47      @pytest.mark.parametrize('read_func', [read_orc_df, read_orc_sql])
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_read_round_trip(spark_tmp_path, orc_gens, read_func, v1_enabled_list):
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(data_path))
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  read_func(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:73: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1785, 10, 5, 21, 12, 7, 425000)
15:51:47  gpu = datetime.datetime(1785, 10, 6, 6, 14, 8, 425000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65d4504440>
15:51:47  path = [1718, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [1718, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.13427114486694336 CPU TOOK 0.10119915008544922 ###
15:51:47  ________________________ test_simple_partitioned_read[] ________________________
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//998846/', v1_enabled_list = ''
15:51:47  
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_simple_partitioned_read(spark_tmp_path, v1_enabled_list):
15:51:47          # Once https://github.com/NVIDIA/spark-rapids/issues/131 is fixed
15:51:47          # we should go with a more standard set of generators
15:51:47          orc_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
15:51:47          string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
15:51:47          TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))]
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          first_data_path = spark_tmp_path + '/ORC_DATA/key=0'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(first_data_path))
15:51:47          second_data_path = spark_tmp_path + '/ORC_DATA/key=1'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(second_data_path))
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  lambda spark : spark.read.orc(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:131: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1934, 9, 27, 7, 18, 4, 744000)
15:51:47  gpu = datetime.datetime(1934, 9, 27, 16, 20, 5, 744000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65d4748440>
15:51:47  path = [2734, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [2734, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.1637420654296875 CPU TOOK 0.15129518508911133 ###
15:51:47  ______________________ test_simple_partitioned_read[orc] _______________________
15:51:47  
15:51:47  spark_tmp_path = '/tmp/pyspark_tests//428738/', v1_enabled_list = 'orc'
15:51:47  
15:51:47      @pytest.mark.parametrize('v1_enabled_list', ["", "orc"])
15:51:47      def test_simple_partitioned_read(spark_tmp_path, v1_enabled_list):
15:51:47          # Once https://github.com/NVIDIA/spark-rapids/issues/131 is fixed
15:51:47          # we should go with a more standard set of generators
15:51:47          orc_gens = [byte_gen, short_gen, int_gen, long_gen, float_gen, double_gen,
15:51:47          string_gen, boolean_gen, DateGen(start=date(1590, 1, 1)),
15:51:47          TimestampGen(start=datetime(1590, 1, 1, tzinfo=timezone.utc))]
15:51:47          gen_list = [('_c' + str(i), gen) for i, gen in enumerate(orc_gens)]
15:51:47          first_data_path = spark_tmp_path + '/ORC_DATA/key=0'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(first_data_path))
15:51:47          second_data_path = spark_tmp_path + '/ORC_DATA/key=1'
15:51:47          with_cpu_session(
15:51:47                  lambda spark : gen_df(spark, gen_list).write.orc(second_data_path))
15:51:47          data_path = spark_tmp_path + '/ORC_DATA'
15:51:47          assert_gpu_and_cpu_are_equal_collect(
15:51:47                  lambda spark : spark.read.orc(data_path),
15:51:47  >               conf={'spark.sql.sources.useV1SourceList': v1_enabled_list})
15:51:47  
15:51:47  src/main/python/orc_test.py:131: 
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  src/main/python/asserts.py:296: in assert_gpu_and_cpu_are_equal_collect
15:51:47      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
15:51:47  src/main/python/asserts.py:288: in _assert_gpu_and_cpu_are_equal
15:51:47      assert_equal(from_cpu, from_gpu)
15:51:47  src/main/python/asserts.py:86: in assert_equal
15:51:47      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
15:51:47  src/main/python/asserts.py:38: in _assert_equal
15:51:47      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47  src/main/python/asserts.py:31: in _assert_equal
15:51:47      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
15:51:47  
15:51:47  cpu = datetime.datetime(1781, 6, 2, 17, 53, 14, 325000)
15:51:47  gpu = datetime.datetime(1781, 6, 3, 2, 55, 15, 325000)
15:51:47  float_check = <function get_float_check.<locals>.<lambda> at 0x7f65cfe797a0>
15:51:47  path = [707, '_c9']
15:51:47  
15:51:47      def _assert_equal(cpu, gpu, float_check, path):
15:51:47          t = type(cpu)
15:51:47          if (t is Row):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {}".format(path)
15:51:47              if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
15:51:47                  for field in cpu.__fields__:
15:51:47                      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
15:51:47              else:
15:51:47                  for index in range(len(cpu)):
15:51:47                      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is list):
15:51:47              assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {}".format(path)
15:51:47              for index in range(len(cpu)):
15:51:47                  _assert_equal(cpu[index], gpu[index], float_check, path + [index])
15:51:47          elif (t is pytypes.GeneratorType):
15:51:47              index = 0
15:51:47              # generator has no zip :( so we have to do this the hard way
15:51:47              done = False
15:51:47              while not done:
15:51:47                  sub_cpu = None
15:51:47                  sub_gpu = None
15:51:47                  try:
15:51:47                      sub_cpu = next(cpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  try:
15:51:47                      sub_gpu = next(gpu)
15:51:47                  except StopIteration:
15:51:47                      done = True
15:51:47      
15:51:47                  if done:
15:51:47                      assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
15:51:47                  else:
15:51:47                      _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
15:51:47      
15:51:47                  index = index + 1
15:51:47          elif (t is int):
15:51:47              assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
15:51:47          elif (t is float):
15:51:47              if (math.isnan(cpu)):
15:51:47                  assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
15:51:47              else:
15:51:47                  assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
15:51:47          elif isinstance(cpu, str):
15:51:47              assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
15:51:47          elif isinstance(cpu, datetime):
15:51:47  >           assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
15:51:47  E           AssertionError: GPU and CPU timestamp values are different at [707, '_c9']
15:51:47  
15:51:47  src/main/python/asserts.py:72: AssertionError
15:51:47  ----------------------------- Captured stdout call -----------------------------
15:51:47  ### CPU RUN ###
15:51:47  ### GPU RUN ###
15:51:47  ### COLLECT: GPU TOOK 0.2654852867126465 CPU TOOK 0.10956788063049316 ###

@sameerz sameerz added this to the Nov 9 - Nov 20 milestone Nov 6, 2020
@sameerz sameerz modified the milestones: Nov 23 - Dec 4, Dec 7 - Dec 18 Dec 7, 2020
@pxLi pxLi linked a pull request Dec 8, 2020 that will close this issue
@jlowe
Copy link
Member Author

jlowe commented Dec 9, 2020

Per analysis from @nvdbaranec filed cudf tracking issue at rapidsai/cudf#6947

@pxLi pxLi closed this as completed in #1337 Dec 9, 2020
tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P0 Must have for release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants