Normalize Op using fp32 #1597

benfred · 2022-06-22T22:45:04Z

Normalize Op used fp64 and was not compatible with tensorflow.
I added out_dtypes as parameter to provide the user the option to use fp32 instead of fp64

using fp64 broke some unittests

benfred · 2022-06-22T22:45:17Z

This is a test to see if the jenkins CI will get picked up

nvidia-merlin-bot · 2022-06-22T22:58:20Z

Click to view CI Results

GitHub pull request #1597 of commit 3ea8babe9714ac7f9260027253c5cd58d2828682, no merge conflicts.
Running as SYSTEM
Setting status of 3ea8babe9714ac7f9260027253c5cd58d2828682 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4542/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 3ea8babe9714ac7f9260027253c5cd58d2828682^{commit} # timeout=10
Checking out Revision 3ea8babe9714ac7f9260027253c5cd58d2828682 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3ea8babe9714ac7f9260027253c5cd58d2828682 # timeout=10
Commit message: "keep fp64"
 > git rev-list --no-walk 12c68fbc4445db27bada9b4e7c05a90c4832b366 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins14741025011272362175.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fc62f00a850>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7effe81ac0a0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

distributed.worker - WARNING - Compute Failed

Function:  subgraph_callable-a45a4379-4e33-482e-b1e1-5e114b51

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head

result = result.compute()

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f

result[0] = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather

raise exception.with_traceback(traceback)

File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 6 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 142 warnings

tests/unit/loader/test_torch_dataloader.py: 91 warnings

tests/unit/ops/test_categorify.py: 70 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 3 warnings

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 34 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 18 warnings

tests/unit/test_tools.py: 1213 warnings

tests/unit/loader/test_tf_dataloader.py: 20 warnings

tests/unit/loader/test_torch_dataloader.py: 432 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future

warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/ops/test_ops.py::test_data_stats[True-parquet]

tests/unit/ops/test_ops.py::test_data_stats[False-parquet]

/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.

warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.

warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35833 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 716.72s (0:11:56) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins13234493373999829933.sh

github-actions · 2022-06-22T23:05:26Z

Documentation preview

https://nvidia-merlin.github.io/NVTabular/review/pr-1597

nvidia-merlin-bot · 2022-06-22T23:38:05Z

Click to view CI Results

GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4544/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 12c68fbc4445db27bada9b4e7c05a90c4832b366 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6465818130657288239.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f001d17ff10>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7ff16825ab50>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

distributed.worker - WARNING - Compute Failed

Function:  subgraph_callable-a176c54b-372a-4240-a5fa-9173c6b8

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head

result = result.compute()

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f

result[0] = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather

raise exception.with_traceback(traceback)

File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 6 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 142 warnings

tests/unit/loader/test_torch_dataloader.py: 91 warnings

tests/unit/ops/test_categorify.py: 70 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 3 warnings

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 34 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 18 warnings

tests/unit/test_tools.py: 1213 warnings

tests/unit/loader/test_tf_dataloader.py: 20 warnings

tests/unit/loader/test_torch_dataloader.py: 432 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future

warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/ops/test_ops.py::test_data_stats[True-parquet]

tests/unit/ops/test_ops.py::test_data_stats[False-parquet]

/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.

warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.

warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42281 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 722.78s (0:12:02) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins14541306371745372241.sh

benfred · 2022-06-23T03:45:02Z

rerun tests

nvidia-merlin-bot · 2022-06-23T03:57:59Z

Click to view CI Results

GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4545/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4152090352994964919.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f005d441940>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f1ddc69e3a0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

distributed.worker - WARNING - Compute Failed

Function:  subgraph_callable-dea639fc-418f-4fe4-9a31-ee400d66

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head

result = result.compute()

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f

result[0] = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather

raise exception.with_traceback(traceback)

File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 6 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 142 warnings

tests/unit/loader/test_torch_dataloader.py: 91 warnings

tests/unit/ops/test_categorify.py: 70 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 3 warnings

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 34 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 18 warnings

tests/unit/test_tools.py: 1213 warnings

tests/unit/loader/test_tf_dataloader.py: 20 warnings

tests/unit/loader/test_torch_dataloader.py: 432 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future

warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/ops/test_ops.py::test_data_stats[True-parquet]

tests/unit/ops/test_ops.py::test_data_stats[False-parquet]

/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.

warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.

warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38651 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 726.27s (0:12:06) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins12990449357605842314.sh

bschifferer · 2022-06-23T07:09:59Z

rerun tests

bschifferer · 2022-06-23T07:19:07Z

rerun tests

bschifferer · 2022-06-23T08:08:27Z

rerun tests

nvidia-merlin-bot · 2022-06-23T08:21:16Z

Click to view CI Results

GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4548/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 6726264cbcfc23d701f39fcd74207b50c985ff8d # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins15121190167887164647.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fc7e0324be0>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

Unable to start CUDA Context

Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer

_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)

File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr

func = self.getitem(name)

File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem

func = self._FuncPtr((name_or_ordinal, self))

AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context

ctx = has_cuda_context()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context

running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2

fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")

File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer

raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)

pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f8e9c39f3d0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

distributed.worker - WARNING - Compute Failed

Function:  subgraph_callable-fc610eca-40ed-4a75-98a9-9d8b996f

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head

result = result.compute()

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f

result[0] = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather

raise exception.with_traceback(traceback)

File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 6 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 142 warnings

tests/unit/loader/test_torch_dataloader.py: 91 warnings

tests/unit/ops/test_categorify.py: 70 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 3 warnings

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 34 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 18 warnings

tests/unit/test_tools.py: 1213 warnings

tests/unit/loader/test_tf_dataloader.py: 20 warnings

tests/unit/loader/test_torch_dataloader.py: 432 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be                 removed in the future

warnings.warn(
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]

tests/unit/ops/test_ops.py::test_data_stats[True-parquet]

tests/unit/ops/test_ops.py::test_data_stats[False-parquet]

/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.

warnings.warn(
tests/unit/ops/test_ops.py::test_difference_lag[False]

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.

warnings.warn(
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34863 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 725.62s (0:12:05) =====

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins17765714849656057367.sh

bschifferer · 2022-06-27T07:31:09Z

rerun tests

nvidia-merlin-bot · 2022-06-27T07:42:30Z

Click to view CI Results

GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4550/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 6726264cbcfc23d701f39fcd74207b50c985ff8d # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6000918569497214242.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py FFF..F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

___________________________ test_criteo_tf_notebook ____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0')
def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fec97790fd0>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

___________________________ test_criteo_pyt_notebook ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0')
def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7feb73af0ca0>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

_____________________________ test_optimize_criteo _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_optimize_criteo0')
def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:140:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:268:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...

FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...

FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...

===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 634.60s (0:10:34) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins723176057548274873.sh

nvidia-merlin-bot · 2022-06-27T11:34:37Z

Click to view CI Results

GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4551/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins17101250722631245591.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py FFF..F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

___________________________ test_criteo_tf_notebook ____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0')
def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f77072fffd0>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

___________________________ test_criteo_pyt_notebook ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0')
def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f76d084d820>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

_____________________________ test_optimize_criteo _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_optimize_criteo0')
def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:140:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:268:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...

FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...

FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...

===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 628.45s (0:10:28) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins14389672305796437863.sh

oliverholworthy · 2022-06-27T13:37:28Z

rerun tests

nvidia-merlin-bot · 2022-06-27T13:48:41Z

Click to view CI Results

GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4552/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7398654711454226637.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py FFF..F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

___________________________ test_criteo_tf_notebook ____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0')
def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f12422f6e20>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

___________________________ test_criteo_pyt_notebook ___________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0')
def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line


  _run_notebook(


        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f115573a070>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py", line 24, in 

from dask_cuda import LocalCUDACluster

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in 

from .cuda_worker import CUDAWorker

File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in 

from distributed.worker_memory import parse_memory_limit

ModuleNotFoundError: No module named 'distributed.worker_memory'

_____________________________ test_optimize_criteo _____________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_optimize_criteo0')
def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:140:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):


  with get_cuda_cluster() as cuda_cluster:


tests/unit/test_notebooks.py:268:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

tests/conftest.py:107: in get_cuda_cluster

from dask_cuda import LocalCUDACluster

/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in 

from .cuda_worker import CUDAWorker

from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)


from distributed.worker_memory import parse_memory_limit

E   ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...

FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...

FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...

===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 624.99s (0:10:24) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2814575972107436327.sh

jperez999 · 2022-06-27T13:58:36Z

rerun tests

nvidia-merlin-bot · 2022-06-27T14:11:00Z

Click to view CI Results

GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4553/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2396919500537035105.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fa2b0a494f0>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

2022-06-27 14:03:03,684 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

2022-06-27 14:03:03,699 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f4ff01d9520>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

2022-06-27 14:03:10,229 - distributed.worker - WARNING - Compute Failed

Key:       ('_transform_partition-de1fb4afa9fbe0588780bae08018df01', 0)

Function:  subgraph_callable-a957df51-92af-4c56-8bb3-127a554b

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head

result = result.compute()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f

result = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather

raise exception.with_traceback(traceback)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35425 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 704.25s (0:11:44) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins12687965631720200653.sh

oliverholworthy · 2022-06-27T15:46:15Z

The earlier Jenkins failures were due to the dask version somehow being reverted to an older version.

The tests are now failing due to an AttributeError that may be related to this change.

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

nvidia-merlin-bot · 2022-06-27T16:18:26Z

Click to view CI Results

GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
Running as SYSTEM
Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4554/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10
Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
Commit message: "set PYTHONPATH"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6736196258958809971.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f5a3b4d4700>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

2022-06-27 16:10:25,809 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

2022-06-27 16:10:25,824 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7fda9332ebb0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

2022-06-27 16:10:32,351 - distributed.worker - WARNING - Compute Failed

Key:       ('_transform_partition-cc4489304011e1b56029a340724da0da', 0)

Function:  subgraph_callable-801537e4-1ea4-496e-911c-1f63d33b

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head

result = result.compute()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f

result = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather

raise exception.with_traceback(traceback)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35429 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 706.60s (0:11:46) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins15023768827440417762.sh

benfred · 2022-06-27T16:20:04Z

rerun tests

nvidia-merlin-bot · 2022-06-27T16:32:38Z

Click to view CI Results

GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
Running as SYSTEM
Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4555/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10
Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
Commit message: "set PYTHONPATH"
 > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins363619808597111859.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py .....F                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=================================== FAILURES ===================================

__________________________ test_multigpu_dask_example __________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0')
def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )


      _run_notebook(tmpdir, notebook_path, _nb_modify)


tests/unit/test_notebooks.py:287:

tests/unit/test_notebooks.py:307: in _run_notebook

subprocess.check_output([sys.executable, script_path])

/usr/lib/python3.8/subprocess.py:415: in check_output

return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,

input = None, capture_output = False, timeout = None, check = True

popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py'],)

kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f0d81fb4220>

stdout = b'', stderr = None, retcode = 1
def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:


          raise CalledProcessError(retcode, process.args,


                                     output=stdout, stderr=stderr)

E               subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.
/usr/lib/python3.8/subprocess.py:516: CalledProcessError

----------------------------- Captured stderr call -----------------------------

2022-06-27 16:24:38,812 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

2022-06-27 16:24:38,893 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(

Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7fce18319fd0>

Traceback (most recent call last):

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

2022-06-27 16:24:45,364 - distributed.worker - WARNING - Compute Failed

Key:       ('_transform_partition-0ef7e06a0ef7da2311149dcd0bb5500b', 0)

Function:  subgraph_callable-405bda3c-1888-47a6-aefa-4ef805c5

args:      ([{'piece': ('/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])

kwargs:    {}

Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'
Traceback (most recent call last):

File "/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py", line 123, in 

workflow.fit_transform(dataset).to_parquet(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform

self.fit(dataset)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit

self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes

_real_meta = self.engine.sample_data(n=n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data

_head = _ddf.partitions[partition_index].head(n)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head

return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head

result = result.compute()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute

(result,) = compute(self, traverse=False, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute

results = schedule(dsk, keys, **kwargs)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get

results = self.gather(packed, asynchronous=asynchronous, direct=direct)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather

return self.sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync

return sync(

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync

raise exc.with_traceback(tb)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f

result = yield future

File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run

value = future.result()

File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather

raise exception.with_traceback(traceback)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get

result = _execute_task(task, cache)

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task

return func(*(_execute_task(a, cache) for a in args))

File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply

return func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition

parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition

output_df = node.op.transform(selection, input_df)

File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner

result = func(*args, **kwargs)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform

values = values.astype(self.output_dtype)

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36631 instead

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....

===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 706.07s (0:11:46) ======

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins2171851430870167144.sh

benfred · 2022-06-27T18:39:36Z

rerun tests

This reverts commit 8ddd8fb.

nvidia-merlin-bot · 2022-06-27T18:51:52Z

Click to view CI Results

GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts. GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts. Running as SYSTEM Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4556/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10 Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10 Commit message: "set PYTHONPATH" > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10 [nvtabular_tests] $ /bin/bash /tmp/jenkins15616336802954477098.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1421 passed, 2 skipped, 697 warnings in 690.29s (0:11:30) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4466419217789100275.sh

nvidia-merlin-bot · 2022-06-27T19:04:05Z

Click to view CI Results

GitHub pull request #1597 of commit 9e3dd22ec645575176449555b6ffabc1eb0f917b, no merge conflicts.
Running as SYSTEM
Setting status of 9e3dd22ec645575176449555b6ffabc1eb0f917b to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4557/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 9e3dd22ec645575176449555b6ffabc1eb0f917b^{commit} # timeout=10
Checking out Revision 9e3dd22ec645575176449555b6ffabc1eb0f917b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9e3dd22ec645575176449555b6ffabc1eb0f917b # timeout=10
Commit message: "Revert "set PYTHONPATH""
 > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins13391713593227507191.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py ......                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

========== 1421 passed, 2 skipped, 697 warnings in 692.93s (0:11:32) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins16557105157141183320.sh

nvidia-merlin-bot · 2022-06-27T19:39:48Z

Click to view CI Results

GitHub pull request #1597 of commit 994df8454d11f95c07e7d699bd790d9ae595fe76, no merge conflicts.
Running as SYSTEM
Setting status of 994df8454d11f95c07e7d699bd790d9ae595fe76 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4558/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 994df8454d11f95c07e7d699bd790d9ae595fe76^{commit} # timeout=10
Checking out Revision 994df8454d11f95c07e7d699bd790d9ae595fe76 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 994df8454d11f95c07e7d699bd790d9ae595fe76 # timeout=10
Commit message: "Update docstring"
 > git rev-list --no-walk 9e3dd22ec645575176449555b6ffabc1eb0f917b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5594765192554062404.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py ......                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

========== 1421 passed, 2 skipped, 697 warnings in 702.60s (0:11:42) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins1095469809036282617.sh

nvidia-merlin-bot · 2022-06-27T20:36:23Z

Click to view CI Results

GitHub pull request #1597 of commit 209c4eacb196d1622b8e3a2a563885a6171fd7fb, no merge conflicts.
Running as SYSTEM
Setting status of 209c4eacb196d1622b8e3a2a563885a6171fd7fb to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4560/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 209c4eacb196d1622b8e3a2a563885a6171fd7fb^{commit} # timeout=10
Checking out Revision 209c4eacb196d1622b8e3a2a563885a6171fd7fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 209c4eacb196d1622b8e3a2a563885a6171fd7fb # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk feea5fce5b4f32301ff93130f130e1fe297ee0e8 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8408759014724739806.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped
tests/unit/test_dask_nvt.py ............................................ [  3%]

........................................................................ [  8%]

[  8%]

tests/unit/test_notebooks.py ......                                      [  8%]

tests/unit/test_tf4rec.py .                                              [  8%]

tests/unit/test_tools.py ......................                          [ 10%]

tests/unit/test_triton_inference.py ................................     [ 12%]

tests/unit/framework_utils/test_tf_feature_columns.py .                  [ 12%]

tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]

...................................................                      [ 18%]

tests/unit/framework_utils/test_torch_layers.py .                        [ 18%]

tests/unit/loader/test_dataloader_backend.py ......                      [ 18%]

tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]

........................................s..                              [ 23%]

tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]

......................................................                   [ 29%]

tests/unit/ops/test_categorify.py ...................................... [ 32%]

........................................................................ [ 37%]

...........................................                              [ 40%]

tests/unit/ops/test_column_similarity.py ........................        [ 42%]

tests/unit/ops/test_drop_low_cardinality.py ..                           [ 42%]

tests/unit/ops/test_fill.py ............................................ [ 45%]

........                                                                 [ 45%]

tests/unit/ops/test_groupyby.py .................                        [ 47%]

tests/unit/ops/test_hash_bucket.py .........................             [ 48%]

tests/unit/ops/test_join.py ............................................ [ 51%]

........................................................................ [ 56%]

..................................                                       [ 59%]

tests/unit/ops/test_lambda.py ..........                                 [ 60%]

tests/unit/ops/test_normalize.py ....................................... [ 62%]

..                                                                       [ 62%]

tests/unit/ops/test_ops.py ............................................. [ 66%]

....................                                                     [ 67%]

tests/unit/ops/test_ops_schema.py ...................................... [ 70%]

........................................................................ [ 75%]

........................................................................ [ 80%]

........................................................................ [ 85%]

.......................................                                  [ 88%]

tests/unit/ops/test_reduce_dtype_size.py ..                              [ 88%]

tests/unit/ops/test_target_encode.py .....................               [ 89%]

tests/unit/workflow/test_cpu_workflow.py ......                          [ 90%]

tests/unit/workflow/test_workflow.py ................................... [ 92%]

..........................................................               [ 96%]

tests/unit/workflow/test_workflow_chaining.py ...                        [ 96%]

tests/unit/workflow/test_workflow_node.py ...........                    [ 97%]

tests/unit/workflow/test_workflow_ops.py ...                             [ 97%]

tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]

...                                                                      [100%]
=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
nvtabular/loader/init.py:19

/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.

warnings.warn(
tests/unit/test_dask_nvt.py: 2 warnings

tests/unit/workflow/test_workflow.py: 78 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.

warnings.warn(
tests/unit/test_dask_nvt.py: 1 warning

tests/unit/test_tf4rec.py: 1 warning

tests/unit/test_tools.py: 5 warnings

tests/unit/test_triton_inference.py: 8 warnings

tests/unit/loader/test_dataloader_backend.py: 6 warnings

tests/unit/loader/test_tf_dataloader.py: 66 warnings

tests/unit/loader/test_torch_dataloader.py: 67 warnings

tests/unit/ops/test_categorify.py: 69 warnings

tests/unit/ops/test_drop_low_cardinality.py: 2 warnings

tests/unit/ops/test_fill.py: 8 warnings

tests/unit/ops/test_hash_bucket.py: 4 warnings

tests/unit/ops/test_join.py: 88 warnings

tests/unit/ops/test_lambda.py: 1 warning

tests/unit/ops/test_normalize.py: 9 warnings

tests/unit/ops/test_ops.py: 11 warnings

tests/unit/ops/test_ops_schema.py: 17 warnings

tests/unit/workflow/test_workflow.py: 27 warnings

tests/unit/workflow/test_workflow_chaining.py: 1 warning

tests/unit/workflow/test_workflow_node.py: 1 warning

tests/unit/workflow/test_workflow_schemas.py: 1 warning

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/test_dask_nvt.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.

warnings.warn(
tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.

warnings.warn(
tests/unit/test_notebooks.py: 1 warning

tests/unit/test_tools.py: 17 warnings

tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 54 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future

warnings.warn(
tests/unit/loader/test_tf_dataloader.py: 2 warnings

tests/unit/loader/test_torch_dataloader.py: 12 warnings

tests/unit/workflow/test_workflow.py: 9 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.

warnings.warn(
tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]

tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]

tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

self._setitem_single_block(indexer, value, name)
tests/unit/workflow/test_cpu_workflow.py: 6 warnings

tests/unit/workflow/test_workflow.py: 12 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/workflow/test_workflow.py: 48 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.

warnings.warn(
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]

tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]

/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

========== 1421 passed, 2 skipped, 697 warnings in 694.76s (0:11:34) ===========

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[nvtabular_tests] $ /bin/bash /tmp/jenkins14227601548879509263.sh

bschifferer added 2 commits June 22, 2022 15:43

update normalize ops floar32

a960ccd

keep fp64

3ea8bab

benfred added the bug Something isn't working label Jun 22, 2022

jperez999 approved these changes Jun 22, 2022

View reviewed changes

Merge branch 'main' into normalize_fp32

31a8e3b

change documentation

99df937

bschifferer mentioned this pull request Jun 27, 2022

Normalize Op using fp32 #1596

Closed

set PYTHONPATH

8ddd8fb

Revert "set PYTHONPATH"

9e3dd22

This reverts commit 8ddd8fb.

Update docstring

994df84

Merge branch 'main' into normalize_fp32

209c4ea

benfred merged commit a0fff41 into main Jun 27, 2022

benfred deleted the normalize_fp32 branch June 27, 2022 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalize Op using fp32 #1597

Normalize Op using fp32 #1597

benfred commented Jun 22, 2022

benfred commented Jun 22, 2022

nvidia-merlin-bot commented Jun 22, 2022

github-actions bot commented Jun 22, 2022

nvidia-merlin-bot commented Jun 22, 2022

benfred commented Jun 23, 2022

nvidia-merlin-bot commented Jun 23, 2022

bschifferer commented Jun 23, 2022

bschifferer commented Jun 23, 2022

bschifferer commented Jun 23, 2022

nvidia-merlin-bot commented Jun 23, 2022

bschifferer commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

oliverholworthy commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

jperez999 commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

oliverholworthy commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

benfred commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

benfred commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

Normalize Op using fp32 #1597

Normalize Op using fp32 #1597

Conversation

benfred commented Jun 22, 2022

benfred commented Jun 22, 2022

nvidia-merlin-bot commented Jun 22, 2022

github-actions bot commented Jun 22, 2022

Documentation preview

nvidia-merlin-bot commented Jun 22, 2022

benfred commented Jun 23, 2022

nvidia-merlin-bot commented Jun 23, 2022

bschifferer commented Jun 23, 2022

bschifferer commented Jun 23, 2022

bschifferer commented Jun 23, 2022

nvidia-merlin-bot commented Jun 23, 2022

bschifferer commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

oliverholworthy commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

jperez999 commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

oliverholworthy commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

benfred commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

benfred commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022

nvidia-merlin-bot commented Jun 27, 2022