Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize Op using fp32 #1597

Merged
merged 8 commits into from
Jun 27, 2022
Merged

Normalize Op using fp32 #1597

merged 8 commits into from
Jun 27, 2022

Conversation

benfred
Copy link
Member

@benfred benfred commented Jun 22, 2022

Normalize Op used fp64 and was not compatible with tensorflow.
I added out_dtypes as parameter to provide the user the option to use fp32 instead of fp64

using fp64 broke some unittests

@benfred
Copy link
Member Author

benfred commented Jun 22, 2022

This is a test to see if the jenkins CI will get picked up

@benfred benfred added the bug Something isn't working label Jun 22, 2022
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 3ea8babe9714ac7f9260027253c5cd58d2828682, no merge conflicts.
Running as SYSTEM
Setting status of 3ea8babe9714ac7f9260027253c5cd58d2828682 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4542/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 3ea8babe9714ac7f9260027253c5cd58d2828682^{commit} # timeout=10
Checking out Revision 3ea8babe9714ac7f9260027253c5cd58d2828682 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3ea8babe9714ac7f9260027253c5cd58d2828682 # timeout=10
Commit message: "keep fp64"
 > git rev-list --no-walk 12c68fbc4445db27bada9b4e7c05a90c4832b366 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins14741025011272362175.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fc62f00a850>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7effe81ac0a0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
distributed.worker - WARNING - Compute Failed
Function: subgraph_callable-a45a4379-4e33-482e-b1e1-5e114b51
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head
result = result.compute()
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f
result[0] = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.
warnings.warn(

tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35833 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 716.72s (0:11:56) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins13234493373999829933.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/NVTabular/review/pr-1597

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4544/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 12c68fbc4445db27bada9b4e7c05a90c4832b366 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins6465818130657288239.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f001d17ff10>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7ff16825ab50>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
distributed.worker - WARNING - Compute Failed
Function: subgraph_callable-a176c54b-372a-4240-a5fa-9173c6b8
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-5/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head
result = result.compute()
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f
result[0] = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.
warnings.warn(

tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42281 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 722.78s (0:12:02) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins14541306371745372241.sh

@benfred
Copy link
Member Author

benfred commented Jun 23, 2022

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4545/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins4152090352994964919.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f005d441940>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f1ddc69e3a0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
distributed.worker - WARNING - Compute Failed
Function: subgraph_callable-dea639fc-418f-4fe4-9a31-ee400d66
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-6/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head
result = result.compute()
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f
result[0] = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.
warnings.warn(

tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38651 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 726.27s (0:12:06) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins12990449357605842314.sh

@bschifferer
Copy link
Contributor

rerun tests

2 similar comments
@bschifferer
Copy link
Contributor

rerun tests

@bschifferer
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4548/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 6726264cbcfc23d701f39fcd74207b50c985ff8d # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins15121190167887164647.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fc7e0324be0>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
distributed.preloading - INFO - Import preload module: dask_cuda.initialize
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
Unable to start CUDA Context
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 782, in _nvmlGetFunctionPointer
_nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
File "/usr/lib/python3.8/ctypes/init.py", line 386, in getattr
func = self.getitem(name)
File "/usr/lib/python3.8/ctypes/init.py", line 391, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1: undefined symbol: nvmlDeviceGetComputeRunningProcesses_v2

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/initialize.py", line 42, in _create_cuda_context
ctx = has_cuda_context()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/diagnostics/nvml.py", line 78, in has_cuda_context
running_processes = pynvml.nvmlDeviceGetComputeRunningProcesses_v2(handle)
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 2191, in nvmlDeviceGetComputeRunningProcesses_v2
fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
File "/usr/local/lib/python3.8/dist-packages/pynvml/nvml.py", line 785, in _nvmlGetFunctionPointer
raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
pynvml.nvml.NVMLError_FunctionNotFound: Function Not Found
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f8e9c39f3d0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
distributed.worker - WARNING - Compute Failed
Function: subgraph_callable-fc610eca-40ed-4a75-98a9-9d8b996f
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-13/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1098, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py", line 1132, in _head
result = result.compute()
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 288, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/dask/base.py", line 571, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2725, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1980, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 868, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 332, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 315, in f
result[0] = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 1845, in _gather
raise exception.with_traceback(traceback)
File "/usr/local/lib/python3.8/dist-packages/dask/optimization.py", line 969, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/usr/local/lib/python3.8/dist-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/usr/local/lib/python3.8/dist-packages/dask/utils.py", line 37, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:32: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 6 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 142 warnings
tests/unit/loader/test_torch_dataloader.py: 91 warnings
tests/unit/ops/test_categorify.py: 70 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 3 warnings
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 34 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 18 warnings
tests/unit/test_tools.py: 1213 warnings
tests/unit/loader/test_tf_dataloader.py: 20 warnings
tests/unit/loader/test_torch_dataloader.py: 432 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:3235: DeprecationWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/test_tools.py::test_inspect_datagen[uniform-parquet]
tests/unit/ops/test_ops.py::test_data_stats[True-parquet]
tests/unit/ops/test_ops.py::test_data_stats[False-parquet]
/usr/local/lib/python3.8/dist-packages/cudf/core/series.py:958: FutureWarning: Series.set_index is deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/ops/test_groupyby.py::test_groupby_casting_in_aggregations[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/_base_index.py:1541: FutureWarning: Calling take with a boolean array is deprecated and will be removed in the future.
warnings.warn(

tests/unit/ops/test_ops.py::test_difference_lag[False]
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:3025: FutureWarning: The as_gpu_matrix method will be removed in a future cuDF release. Consider using to_cupy instead.
warnings.warn(

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:160: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34863 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 2345 warnings in 725.62s (0:12:05) =====
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins17765714849656057367.sh

@bschifferer
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 31a8e3b5157f7882b728d63c3c7258582641a470, no merge conflicts.
Running as SYSTEM
Setting status of 31a8e3b5157f7882b728d63c3c7258582641a470 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4550/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 31a8e3b5157f7882b728d63c3c7258582641a470^{commit} # timeout=10
Checking out Revision 31a8e3b5157f7882b728d63c3c7258582641a470 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk 6726264cbcfc23d701f39fcd74207b50c985ff8d # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6000918569497214242.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py FFF..F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
___________________________ test_criteo_tf_notebook ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0')

def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fec97790fd0>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
___________________________ test_criteo_pyt_notebook ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0')

def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7feb73af0ca0>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
_____________________________ test_optimize_criteo _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_optimize_criteo0')

def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:140:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:268:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...
FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...
FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...
===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 634.60s (0:10:34) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins723176057548274873.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4551/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 31a8e3b5157f7882b728d63c3c7258582641a470 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins17101250722631245591.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py FFF..F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
___________________________ test_criteo_tf_notebook ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0')

def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f77072fffd0>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_tf_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
___________________________ test_criteo_pyt_notebook ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0')

def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f76d084d820>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-2/test_criteo_pyt_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
_____________________________ test_optimize_criteo _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_optimize_criteo0')

def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:140:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:268:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...
FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...
FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...
===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 628.45s (0:10:28) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins14389672305796437863.sh

@oliverholworthy
Copy link
Member

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4552/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins7398654711454226637.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py FFF..F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
___________________________ test_criteo_tf_notebook ____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0')

def test_criteo_tf_notebook(tmpdir):
    tor = pytest.importorskip("tensorflow")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")

        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:62:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f12422f6e20>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-1/test_criteo_tf_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
___________________________ test_criteo_pyt_notebook ___________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0')

def test_criteo_pyt_notebook(tmpdir):
    tor = pytest.importorskip("fastai")  # noqa
    # create a toy dataset in tmpdir, and point environment variables so the notebook
    # will read from it
    os.system("mkdir -p " + os.path.join(tmpdir, "converted/criteo"))
    for i in range(24):
        df = _get_random_criteo_data(1000)
        df.to_parquet(os.path.join(tmpdir, "converted/criteo", f"day_{i}.parquet"))
    os.environ["BASE_DIR"] = str(tmpdir)

    def _nb_modify(line):
        # Disable LocalCUDACluster
        line = line.replace("client.run(_rmm_pool)", "# client.run(_rmm_pool)")
        line = line.replace("if cluster is None:", "if False:")
        line = line.replace("client = Client(cluster)", "# client = Client(cluster)")
        line = line.replace(
            "workflow = nvt.Workflow(features, client=client)", "workflow = nvt.Workflow(features)"
        )
        line = line.replace("client", "# client")
        line = line.replace("NUM_GPUS = [0, 1, 2, 3, 4, 5, 6, 7]", "NUM_GPUS = [0]")
        line = line.replace("part_size = int(part_mem_frac * device_size)", "part_size = '128MB'")
        return line
  _run_notebook(
        tmpdir,
        os.path.join(
            dirname(TEST_PATH),
            "examples/scaling-criteo/",
            "02-ETL-with-NVTabular.ipynb",
        ),
        # disable rmm.reinitialize, seems to be causing issues
        transform=_nb_modify,
    )

tests/unit/test_notebooks.py:114:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f115573a070>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-1/test_criteo_pyt_notebook0/notebook.py", line 24, in
from dask_cuda import LocalCUDACluster
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py", line 12, in
from .cuda_worker import CUDAWorker
File "/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py", line 20, in
from distributed.worker_memory import parse_memory_limit
ModuleNotFoundError: No module named 'distributed.worker_memory'
_____________________________ test_optimize_criteo _____________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_optimize_criteo0')

def test_optimize_criteo(tmpdir):
    input_path = str(tmpdir.mkdir("input"))
    _get_random_criteo_data(1000).to_csv(os.path.join(input_path, "day_0"), sep="\t", header=False)
    os.environ["INPUT_DATA_DIR"] = input_path
    os.environ["OUTPUT_DATA_DIR"] = str(tmpdir.mkdir("output"))
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:140:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
  with get_cuda_cluster() as cuda_cluster:

tests/unit/test_notebooks.py:268:


/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)
tests/conftest.py:107: in get_cuda_cluster
from dask_cuda import LocalCUDACluster
/usr/local/lib/python3.8/dist-packages/dask_cuda/init.py:12: in
from .cuda_worker import CUDAWorker


from __future__ import absolute_import, division, print_function

import asyncio
import atexit
import os
import warnings

from toolz import valmap
from tornado.ioloop import IOLoop

import dask
from dask.utils import parse_bytes
from distributed import Nanny
from distributed.core import Server
from distributed.deploy.cluster import Cluster
from distributed.proctitle import (
    enable_proctitle_on_children,
    enable_proctitle_on_current,
)

from distributed.worker_memory import parse_memory_limit
E ModuleNotFoundError: No module named 'distributed.worker_memory'

/usr/local/lib/python3.8/dist-packages/dask_cuda/cuda_worker.py:20: ModuleNotFoundError
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_criteo_tf_notebook - subprocess.Cal...
FAILED tests/unit/test_notebooks.py::test_criteo_pyt_notebook - subprocess.Ca...
FAILED tests/unit/test_notebooks.py::test_optimize_criteo - ModuleNotFoundErr...
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - ModuleNotFo...
===== 4 failed, 1417 passed, 2 skipped, 617 warnings in 624.99s (0:10:24) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2814575972107436327.sh

@jperez999
Copy link
Contributor

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
GitHub pull request #1597 of commit 99df9374422c34e000b9d8d11da0d7061ef46aed, no merge conflicts.
Running as SYSTEM
Setting status of 99df9374422c34e000b9d8d11da0d7061ef46aed to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4553/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 99df9374422c34e000b9d8d11da0d7061ef46aed^{commit} # timeout=10
Checking out Revision 99df9374422c34e000b9d8d11da0d7061ef46aed (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
Commit message: "change documentation"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins2396919500537035105.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7fa2b0a494f0>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
2022-06-27 14:03:03,684 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-06-27 14:03:03,699 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7f4ff01d9520>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
2022-06-27 14:03:10,229 - distributed.worker - WARNING - Compute Failed
Key: ('_transform_partition-de1fb4afa9fbe0588780bae08018df01', 0)
Function: subgraph_callable-a957df51-92af-4c56-8bb3-127a554b
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-2/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head
result = result.compute()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f
result = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather
raise exception.with_traceback(traceback)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35425 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 704.25s (0:11:44) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins12687965631720200653.sh

@oliverholworthy
Copy link
Member

The earlier Jenkins failures were due to the dask version somehow being reverted to an older version.

The tests are now failing due to an AttributeError that may be related to this change.

File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype

return self.out_dtype or numpy.float64

AttributeError: 'Normalize' object has no attribute 'out_dtype'

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
Running as SYSTEM
Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4554/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10
Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
Commit message: "set PYTHONPATH"
 > git rev-list --no-walk 99df9374422c34e000b9d8d11da0d7061ef46aed # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins6736196258958809971.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f5a3b4d4700>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
2022-06-27 16:10:25,809 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-06-27 16:10:25,824 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7fda9332ebb0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
2022-06-27 16:10:32,351 - distributed.worker - WARNING - Compute Failed
Key: ('_transform_partition-cc4489304011e1b56029a340724da0da', 0)
Function: subgraph_callable-801537e4-1ea4-496e-911c-1f63d33b
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-3/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head
result = result.compute()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f
result = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather
raise exception.with_traceback(traceback)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35429 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 706.60s (0:11:46) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins15023768827440417762.sh

@benfred
Copy link
Member Author

benfred commented Jun 27, 2022

rerun tests

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
Running as SYSTEM
Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4555/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10
Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
Commit message: "set PYTHONPATH"
 > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins363619808597111859.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py .....F [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
__________________________ test_multigpu_dask_example __________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0')

def test_multigpu_dask_example(tmpdir):
    with get_cuda_cluster() as cuda_cluster:
        os.environ["BASE_DIR"] = str(tmpdir)
        scheduler_port = cuda_cluster.scheduler_address

        def _nb_modify(line):
            # Use cuda_cluster "fixture" port rather than allowing notebook
            # to deploy a LocalCUDACluster within the subprocess
            line = line.replace("cluster = None", f"cluster = '{scheduler_port}'")
            # Use a much smaller "toy" dataset
            line = line.replace("write_count = 25", "write_count = 4")
            line = line.replace('freq = "1s"', 'freq = "1h"')
            # Use smaller partitions for smaller dataset
            line = line.replace("part_mem_fraction=0.1", "part_size=1_000_000")
            line = line.replace("out_files_per_proc=8", "out_files_per_proc=1")
            return line

        notebook_path = os.path.join(
            dirname(TEST_PATH), "examples/multi-gpu-toy-example/", "multi-gpu_dask.ipynb"
        )
      _run_notebook(tmpdir, notebook_path, _nb_modify)

tests/unit/test_notebooks.py:287:


tests/unit/test_notebooks.py:307: in _run_notebook
subprocess.check_output([sys.executable, script_path])
/usr/lib/python3.8/subprocess.py:415: in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,


input = None, capture_output = False, timeout = None, check = True
popenargs = (['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py'],)
kwargs = {'stdout': -1}, process = <subprocess.Popen object at 0x7f0d81fb4220>
stdout = b'', stderr = None, retcode = 1

def run(*popenargs,
        input=None, capture_output=False, timeout=None, check=False, **kwargs):
    """Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.
    """
    if input is not None:
        if kwargs.get('stdin') is not None:
            raise ValueError('stdin and input arguments may not both be used.')
        kwargs['stdin'] = PIPE

    if capture_output:
        if kwargs.get('stdout') is not None or kwargs.get('stderr') is not None:
            raise ValueError('stdout and stderr arguments may not be used '
                             'with capture_output.')
        kwargs['stdout'] = PIPE
        kwargs['stderr'] = PIPE

    with Popen(*popenargs, **kwargs) as process:
        try:
            stdout, stderr = process.communicate(input, timeout=timeout)
        except TimeoutExpired as exc:
            process.kill()
            if _mswindows:
                # Windows accumulates the output in a single blocking
                # read() call run on child threads, with the timeout
                # being done in a join() on those threads.  communicate()
                # _after_ kill() is required to collect that and add it
                # to the exception.
                exc.stdout, exc.stderr = process.communicate()
            else:
                # POSIX _communicate already populated the output so
                # far into the TimeoutExpired exception.
                process.wait()
            raise
        except:  # Including KeyboardInterrupt, communicate handled that.
            process.kill()
            # We don't call process.wait() as .__exit__ does that for us.
            raise
        retcode = process.poll()
        if check and retcode:
          raise CalledProcessError(retcode, process.args,
                                     output=stdout, stderr=stderr)

E subprocess.CalledProcessError: Command '['/usr/bin/python3', '/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py']' returned non-zero exit status 1.

/usr/lib/python3.8/subprocess.py:516: CalledProcessError
----------------------------- Captured stderr call -----------------------------
2022-06-27 16:24:38,812 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-06-27 16:24:38,893 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
Failed to transform operator <nvtabular.ops.normalize.Normalize object at 0x7fce18319fd0>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
2022-06-27 16:24:45,364 - distributed.worker - WARNING - Compute Failed
Key: ('_transform_partition-0ef7e06a0ef7da2311149dcd0bb5500b', 0)
Function: subgraph_callable-405bda3c-1888-47a6-aefa-4ef805c5
args: ([{'piece': ('/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/demo_dataset/part.0.parquet', [0], [])}])
kwargs: {}
Exception: 'AttributeError("'Normalize' object has no attribute 'out_dtype'")'

Traceback (most recent call last):
File "/tmp/pytest-of-jenkins/pytest-4/test_multigpu_dask_example0/notebook.py", line 123, in
workflow.fit_transform(dataset).to_parquet(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 286, in fit_transform
self.fit(dataset)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/workflow/workflow.py", line 261, in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py", line 1147, in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset_engine.py", line 71, in sample_data
_head = _ddf.partitions[partition_index].head(n)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1196, in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/dataframe/core.py", line 1230, in _head
result = result.compute()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 292, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py", line 575, in compute
results = schedule(dsk, keys, **kwargs)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 3015, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2167, in gather
return self.sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 309, in sync
return sync(
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 376, in sync
raise exc.with_traceback(tb)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/utils.py", line 349, in f
result = yield future
File "/var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg/tornado/gen.py", line 762, in run
value = future.result()
File "/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/client.py", line 2030, in _gather
raise exception.with_traceback(traceback)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/optimization.py", line 990, in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 149, in get
result = _execute_task(task, cache)
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/core.py", line 119, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "/var/jenkins_home/.local/lib/python3.8/site-packages/dask/utils.py", line 39, in apply
return func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 490, in _transform_partition
parent_df = _transform_partition(root_df, [parent], capture_dtypes=capture_dtypes)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 84, in transform
values = values.astype(self.output_dtype)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/normalize.py", line 111, in output_dtype
return self.out_dtype or numpy.float64
AttributeError: 'Normalize' object has no attribute 'out_dtype'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_gpu_workflow_api[True-True-parquet-0.01]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36631 instead
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_notebooks.py::test_multigpu_dask_example - subprocess....
===== 1 failed, 1420 passed, 2 skipped, 698 warnings in 706.07s (0:11:46) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins2171851430870167144.sh

@benfred
Copy link
Member Author

benfred commented Jun 27, 2022

rerun tests

This reverts commit 8ddd8fb.
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
GitHub pull request #1597 of commit 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25, no merge conflicts.
Running as SYSTEM
Setting status of 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4556/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25^{commit} # timeout=10
Checking out Revision 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
Commit message: "set PYTHONPATH"
 > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins15616336802954477098.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1421 passed, 2 skipped, 697 warnings in 690.29s (0:11:30) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins4466419217789100275.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 9e3dd22ec645575176449555b6ffabc1eb0f917b, no merge conflicts.
Running as SYSTEM
Setting status of 9e3dd22ec645575176449555b6ffabc1eb0f917b to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4557/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 9e3dd22ec645575176449555b6ffabc1eb0f917b^{commit} # timeout=10
Checking out Revision 9e3dd22ec645575176449555b6ffabc1eb0f917b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9e3dd22ec645575176449555b6ffabc1eb0f917b # timeout=10
Commit message: "Revert "set PYTHONPATH""
 > git rev-list --no-walk 8ddd8fb9e322af8db4a31bde7a94c8e91fae4c25 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins13391713593227507191.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1421 passed, 2 skipped, 697 warnings in 692.93s (0:11:32) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins16557105157141183320.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 994df8454d11f95c07e7d699bd790d9ae595fe76, no merge conflicts.
Running as SYSTEM
Setting status of 994df8454d11f95c07e7d699bd790d9ae595fe76 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4558/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 994df8454d11f95c07e7d699bd790d9ae595fe76^{commit} # timeout=10
Checking out Revision 994df8454d11f95c07e7d699bd790d9ae595fe76 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 994df8454d11f95c07e7d699bd790d9ae595fe76 # timeout=10
Commit message: "Update docstring"
 > git rev-list --no-walk 9e3dd22ec645575176449555b6ffabc1eb0f917b # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins5594765192554062404.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1421 passed, 2 skipped, 697 warnings in 702.60s (0:11:42) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins1095469809036282617.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1597 of commit 209c4eacb196d1622b8e3a2a563885a6171fd7fb, no merge conflicts.
Running as SYSTEM
Setting status of 209c4eacb196d1622b8e3a2a563885a6171fd7fb to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4560/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1597/*:refs/remotes/origin/pr/1597/* # timeout=10
 > git rev-parse 209c4eacb196d1622b8e3a2a563885a6171fd7fb^{commit} # timeout=10
Checking out Revision 209c4eacb196d1622b8e3a2a563885a6171fd7fb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 209c4eacb196d1622b8e3a2a563885a6171fd7fb # timeout=10
Commit message: "Merge branch 'main' into normalize_fp32"
 > git rev-list --no-walk feea5fce5b4f32301ff93130f130e1fe297ee0e8 # timeout=10
First time build. Skipping changelog.
[nvtabular_tests] $ /bin/bash /tmp/jenkins8408759014724739806.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1422 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
[ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ................. [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 48%]
tests/unit/ops/test_join.py ............................................ [ 51%]
........................................................................ [ 56%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 62%]
.. [ 62%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py: 2 warnings
tests/unit/workflow/test_workflow.py: 78 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/dask/base.py:1282: UserWarning: Running on a single-machine scheduler when a distributed client is active might lead to unexpected results.
warnings.warn(

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/var/jenkins_home/.local/lib/python3.8/site-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1421 passed, 2 skipped, 697 warnings in 694.76s (0:11:34) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins14227601548879509263.sh

@benfred benfred merged commit a0fff41 into main Jun 27, 2022
@benfred benfred deleted the normalize_fp32 branch June 27, 2022 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants