Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix groupby on lists with cudf 22.06+ #1654

Merged
merged 3 commits into from
Aug 23, 2022
Merged

Fix groupby on lists with cudf 22.06+ #1654

merged 3 commits into from
Aug 23, 2022

Conversation

benfred
Copy link
Member

@benfred benfred commented Aug 22, 2022

Groupby unittests are failing on cudf 22.06+ with an error like

FAILED tests/unit/ops/test_groupyby.py::test_groupby_op[id-True-False] - TypeError: 'NumericalColumn' object is not subscriptable

Fix.

Groupby unittests are failing on cudf 22.06+ with an error like

```
FAILED tests/unit/ops/test_groupyby.py::test_groupby_op[id-True-False] - TypeError: 'NumericalColumn' object is not subscriptable
```

Fix.
@benfred benfred added the bug Something isn't working label Aug 22, 2022
@benfred benfred added this to the Merlin 22.08 milestone Aug 22, 2022
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1654 of commit 07a9a2b80411d197d0c715b23a2aa7601859c267, no merge conflicts.
Running as SYSTEM
Setting status of 07a9a2b80411d197d0c715b23a2aa7601859c267 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4639/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1654/*:refs/remotes/origin/pr/1654/* # timeout=10
 > git rev-parse 07a9a2b80411d197d0c715b23a2aa7601859c267^{commit} # timeout=10
Checking out Revision 07a9a2b80411d197d0c715b23a2aa7601859c267 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 07a9a2b80411d197d0c715b23a2aa7601859c267 # timeout=10
Commit message: "Fix groupby on lists with cudf 22.06+"
 > git rev-list --no-walk 02a93eebfca6a825c00bf8c2d0b91863ec0150e4 # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins17991720946688651629.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1430 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py F [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
_________________________________ test_tf4rec __________________________________

def test_tf4rec():
    inputs = {
        "user_session": np.random.randint(1, 10000, NUM_ROWS),
        "product_id": np.random.randint(1, 51996, NUM_ROWS),
        "category_id": np.random.randint(0, 332, NUM_ROWS),
        "event_time_ts": np.random.randint(1570373000, 1670373390, NUM_ROWS),
        "prod_first_event_time_ts": np.random.randint(1570373000, 1570373382, NUM_ROWS),
        "price": np.random.uniform(0, 2750, NUM_ROWS),
    }
    df = make_df(inputs)

    # categorify features

    cat_feats = (
        ["user_session", "product_id", "category_id"]
        >> nvt.ops.Categorify()
        >> nvt.ops.LambdaOp(lambda col: col + 1)
    )

    # create time features
    sessionTs = ["event_time_ts"]

    sessionTime = (
        sessionTs
        >> nvt.ops.LambdaOp(lambda col: to_datetime(col, unit="s"))
        >> nvt.ops.Rename(name="event_time_dt")
    )

    sessionTime_weekday = (
        sessionTime
        >> nvt.ops.LambdaOp(lambda col: col.dt.weekday)
        >> nvt.ops.Rename(name="et_dayofweek")
    )

    def get_cycled_feature_value_sin(col, max_value):
        value_scaled = (col + 0.000001) / max_value
        value_sin = np.sin(2 * np.pi * value_scaled)
        return value_sin

    def get_cycled_feature_value_cos(col, max_value):
        value_scaled = (col + 0.000001) / max_value
        value_cos = np.cos(2 * np.pi * value_scaled)
        return value_cos

    weekday_sin = (
        sessionTime_weekday
        >> (lambda col: get_cycled_feature_value_sin(col + 1, 7))
        >> nvt.ops.Rename(name="et_dayofweek_sin")
    )
    weekday_cos = (
        sessionTime_weekday
        >> (lambda col: get_cycled_feature_value_cos(col + 1, 7))
        >> nvt.ops.Rename(name="et_dayofweek_cos")
    )
    from nvtabular.ops import Operator

    # custom op for item recency
    class ItemRecency(Operator):
        def transform(self, columns, gdf):
            for column in columns.names:
                col = gdf[column]
                item_first_timestamp = gdf["prod_first_event_time_ts"]
                delta_days = (col - item_first_timestamp) / (60 * 60 * 24)
                gdf[column + "_age_days"] = delta_days * (delta_days >= 0)
            return gdf

        def compute_selector(
            self,
            input_schema: Schema,
            selector: ColumnSelector,
            parents_selector: ColumnSelector,
            dependencies_selector: ColumnSelector,
        ) -> ColumnSelector:
            self._validate_matching_cols(input_schema, parents_selector, "computing input selector")
            return parents_selector

        def column_mapping(self, col_selector):
            column_mapping = {}
            for col_name in col_selector.names:
                column_mapping[col_name + "_age_days"] = [col_name]
            return column_mapping

        @property
        def dependencies(self):
            return ["prod_first_event_time_ts"]

        @property
        def output_dtype(self):
            return np.float64

    recency_features = ["event_time_ts"] >> ItemRecency()
    recency_features_norm = (
        recency_features
        >> nvt.ops.LogOp()
        >> nvt.ops.Normalize()
        >> nvt.ops.Rename(name="product_recency_days_log_norm")
    )

    time_features = (
        sessionTime + sessionTime_weekday + weekday_sin + weekday_cos + recency_features_norm
    )

    # Smoothing price long-tailed distribution
    price_log = (
        ["price"] >> nvt.ops.LogOp() >> nvt.ops.Normalize() >> nvt.ops.Rename(name="price_log_norm")
    )

    # Relative Price to the average price for the category_id
    def relative_price_to_avg_categ(col, gdf):
        epsilon = 1e-5
        col = ((gdf["price"] - col) / (col + epsilon)) * (col > 0).astype(int)
        return col

    avg_category_id_pr = (
        ["category_id"]
        >> nvt.ops.JoinGroupby(cont_cols=["price"], stats=["mean"])
        >> nvt.ops.Rename(name="avg_category_id_price")
    )
    relative_price_to_avg_category = (
        avg_category_id_pr
        >> nvt.ops.LambdaOp(relative_price_to_avg_categ, dependency=["price"])
        >> nvt.ops.Rename(name="relative_price_to_avg_categ_id")
    )

    groupby_feats = (
        ["event_time_ts"] + cat_feats + time_features + price_log + relative_price_to_avg_category
    )

    # Define Groupby Workflow
    groupby_features = groupby_feats >> nvt.ops.Groupby(
        groupby_cols=["user_session"],
        sort_cols=["event_time_ts"],
        aggs={
            "product_id": ["list", "count"],
            "category_id": ["list"],
            "event_time_dt": ["first"],
            "et_dayofweek_sin": ["list"],
            "et_dayofweek_cos": ["list"],
            "price_log_norm": ["list"],
            "relative_price_to_avg_categ_id": ["list"],
            "product_recency_days_log_norm": ["list"],
        },
        name_sep="-",
    )

    SESSIONS_MAX_LENGTH = 20
    MINIMUM_SESSION_LENGTH = 2

    groupby_features_nonlist = groupby_features["user_session", "product_id-count"]

    groupby_features_list = groupby_features[
        "price_log_norm-list",
        "product_recency_days_log_norm-list",
        "et_dayofweek_sin-list",
        "et_dayofweek_cos-list",
        "product_id-list",
        "category_id-list",
        "relative_price_to_avg_categ_id-list",
    ]

    groupby_features_trim = (
        groupby_features_list
        >> nvt.ops.ListSlice(0, SESSIONS_MAX_LENGTH)
        >> nvt.ops.Rename(postfix="_seq")
    )

    # calculate session day index based on 'event_time_dt-first' column
    day_index = (
        (groupby_features["event_time_dt-first"])
        >> nvt.ops.LambdaOp(lambda col: (col - col.min()).dt.days + 1)
        >> nvt.ops.Rename(f=lambda col: "day_index")
    )

    selected_features = groupby_features_nonlist + groupby_features_trim + day_index

    filtered_sessions = selected_features >> nvt.ops.Filter(
        f=lambda df: df["product_id-count"] >= MINIMUM_SESSION_LENGTH
    )

    dataset = nvt.Dataset(df)

    workflow = nvt.Workflow(filtered_sessions)
  workflow.fit(dataset)

tests/unit/test_tf4rec.py:198:


nvtabular/workflow/workflow.py:209: in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:1147: in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset_engine.py:71: in sample_data
_head = _ddf.partitions[partition_index].head(n)
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:1140: in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:1174: in _head
result = result.compute()
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/utils.py:40: in apply
return func(*args, **kwargs)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:85: in apply
output_df = node.op.transform(selection, input_df)
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
nvtabular/ops/groupby.py:132: in transform
new_df = _apply_aggs(
nvtabular/ops/groupby.py:245: in _apply_aggs
df[f"{col}{name_sep}{_agg}"] = _first_or_last(
nvtabular/ops/groupby.py:289: in _first_or_last
return _first(x)
nvtabular/ops/groupby.py:302: in _first
elements = x.list._column.elements.values


self = <cudf.core.column.datetime.DatetimeColumn object at 0x7f0e9286adc0>
[
2020-04-13 03:23:42,
2020-07-07 09:28:00,
...3:02,
2022-11-15 19:12:56,
2021-11-22 15:34:01,
2020-04-14 20:22:53,
2020-03-15 06:11:35
]
dtype: datetime64[s]

@property
def values(self):
    """
    Return a CuPy representation of the DateTimeColumn.
    """
  raise NotImplementedError(
        "DateTime Arrays is not yet implemented in cudf"
    )

E NotImplementedError: DateTime Arrays is not yet implemented in cudf

/usr/local/lib/python3.8/dist-packages/cudf/core/column/datetime.py:210: NotImplementedError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:executor.py:111 Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7f0e9280a490>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/executor.py", line 85, in apply
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 132, in transform
new_df = _apply_aggs(
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 245, in _apply_aggs
df[f"{col}{name_sep}{_agg}"] = _first_or_last(
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 289, in _first_or_last
return _first(x)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 302, in _first
elements = x.list._column.elements.values
File "/usr/local/lib/python3.8/dist-packages/cudf/core/column/datetime.py", line 210, in values
raise NotImplementedError(
NotImplementedError: DateTime Arrays is not yet implemented in cudf
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-Shuffle.PER_WORKER-True-device-0-parquet-0.1]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_tf4rec.py::test_tf4rec - NotImplementedError: DateTime...
===== 1 failed, 1428 passed, 2 skipped, 618 warnings in 709.98s (0:11:49) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins16832703381548470182.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1654 of commit fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc, no merge conflicts.
Running as SYSTEM
Setting status of fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4643/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1654/*:refs/remotes/origin/pr/1654/* # timeout=10
 > git rev-parse fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc^{commit} # timeout=10
Checking out Revision fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc # timeout=10
Commit message: "Merge branch 'main' into fix_groupby"
 > git rev-list --no-walk 852ee8f53df6c4c5aa10a0d98e293cbd30e1bdef # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins8251550900149403624.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1430 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py F [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=================================== FAILURES ===================================
_________________________________ test_tf4rec __________________________________

def test_tf4rec():
    inputs = {
        "user_session": np.random.randint(1, 10000, NUM_ROWS),
        "product_id": np.random.randint(1, 51996, NUM_ROWS),
        "category_id": np.random.randint(0, 332, NUM_ROWS),
        "event_time_ts": np.random.randint(1570373000, 1670373390, NUM_ROWS),
        "prod_first_event_time_ts": np.random.randint(1570373000, 1570373382, NUM_ROWS),
        "price": np.random.uniform(0, 2750, NUM_ROWS),
    }
    df = make_df(inputs)

    # categorify features

    cat_feats = (
        ["user_session", "product_id", "category_id"]
        >> nvt.ops.Categorify()
        >> nvt.ops.LambdaOp(lambda col: col + 1)
    )

    # create time features
    sessionTs = ["event_time_ts"]

    sessionTime = (
        sessionTs
        >> nvt.ops.LambdaOp(lambda col: to_datetime(col, unit="s"))
        >> nvt.ops.Rename(name="event_time_dt")
    )

    sessionTime_weekday = (
        sessionTime
        >> nvt.ops.LambdaOp(lambda col: col.dt.weekday)
        >> nvt.ops.Rename(name="et_dayofweek")
    )

    def get_cycled_feature_value_sin(col, max_value):
        value_scaled = (col + 0.000001) / max_value
        value_sin = np.sin(2 * np.pi * value_scaled)
        return value_sin

    def get_cycled_feature_value_cos(col, max_value):
        value_scaled = (col + 0.000001) / max_value
        value_cos = np.cos(2 * np.pi * value_scaled)
        return value_cos

    weekday_sin = (
        sessionTime_weekday
        >> (lambda col: get_cycled_feature_value_sin(col + 1, 7))
        >> nvt.ops.Rename(name="et_dayofweek_sin")
    )
    weekday_cos = (
        sessionTime_weekday
        >> (lambda col: get_cycled_feature_value_cos(col + 1, 7))
        >> nvt.ops.Rename(name="et_dayofweek_cos")
    )
    from nvtabular.ops import Operator

    # custom op for item recency
    class ItemRecency(Operator):
        def transform(self, columns, gdf):
            for column in columns.names:
                col = gdf[column]
                item_first_timestamp = gdf["prod_first_event_time_ts"]
                delta_days = (col - item_first_timestamp) / (60 * 60 * 24)
                gdf[column + "_age_days"] = delta_days * (delta_days >= 0)
            return gdf

        def compute_selector(
            self,
            input_schema: Schema,
            selector: ColumnSelector,
            parents_selector: ColumnSelector,
            dependencies_selector: ColumnSelector,
        ) -> ColumnSelector:
            self._validate_matching_cols(input_schema, parents_selector, "computing input selector")
            return parents_selector

        def column_mapping(self, col_selector):
            column_mapping = {}
            for col_name in col_selector.names:
                column_mapping[col_name + "_age_days"] = [col_name]
            return column_mapping

        @property
        def dependencies(self):
            return ["prod_first_event_time_ts"]

        @property
        def output_dtype(self):
            return np.float64

    recency_features = ["event_time_ts"] >> ItemRecency()
    recency_features_norm = (
        recency_features
        >> nvt.ops.LogOp()
        >> nvt.ops.Normalize()
        >> nvt.ops.Rename(name="product_recency_days_log_norm")
    )

    time_features = (
        sessionTime + sessionTime_weekday + weekday_sin + weekday_cos + recency_features_norm
    )

    # Smoothing price long-tailed distribution
    price_log = (
        ["price"] >> nvt.ops.LogOp() >> nvt.ops.Normalize() >> nvt.ops.Rename(name="price_log_norm")
    )

    # Relative Price to the average price for the category_id
    def relative_price_to_avg_categ(col, gdf):
        epsilon = 1e-5
        col = ((gdf["price"] - col) / (col + epsilon)) * (col > 0).astype(int)
        return col

    avg_category_id_pr = (
        ["category_id"]
        >> nvt.ops.JoinGroupby(cont_cols=["price"], stats=["mean"])
        >> nvt.ops.Rename(name="avg_category_id_price")
    )
    relative_price_to_avg_category = (
        avg_category_id_pr
        >> nvt.ops.LambdaOp(relative_price_to_avg_categ, dependency=["price"])
        >> nvt.ops.Rename(name="relative_price_to_avg_categ_id")
    )

    groupby_feats = (
        ["event_time_ts"] + cat_feats + time_features + price_log + relative_price_to_avg_category
    )

    # Define Groupby Workflow
    groupby_features = groupby_feats >> nvt.ops.Groupby(
        groupby_cols=["user_session"],
        sort_cols=["event_time_ts"],
        aggs={
            "product_id": ["list", "count"],
            "category_id": ["list"],
            "event_time_dt": ["first"],
            "et_dayofweek_sin": ["list"],
            "et_dayofweek_cos": ["list"],
            "price_log_norm": ["list"],
            "relative_price_to_avg_categ_id": ["list"],
            "product_recency_days_log_norm": ["list"],
        },
        name_sep="-",
    )

    SESSIONS_MAX_LENGTH = 20
    MINIMUM_SESSION_LENGTH = 2

    groupby_features_nonlist = groupby_features["user_session", "product_id-count"]

    groupby_features_list = groupby_features[
        "price_log_norm-list",
        "product_recency_days_log_norm-list",
        "et_dayofweek_sin-list",
        "et_dayofweek_cos-list",
        "product_id-list",
        "category_id-list",
        "relative_price_to_avg_categ_id-list",
    ]

    groupby_features_trim = (
        groupby_features_list
        >> nvt.ops.ListSlice(0, SESSIONS_MAX_LENGTH)
        >> nvt.ops.Rename(postfix="_seq")
    )

    # calculate session day index based on 'event_time_dt-first' column
    day_index = (
        (groupby_features["event_time_dt-first"])
        >> nvt.ops.LambdaOp(lambda col: (col - col.min()).dt.days + 1)
        >> nvt.ops.Rename(f=lambda col: "day_index")
    )

    selected_features = groupby_features_nonlist + groupby_features_trim + day_index

    filtered_sessions = selected_features >> nvt.ops.Filter(
        f=lambda df: df["product_id-count"] >= MINIMUM_SESSION_LENGTH
    )

    dataset = nvt.Dataset(df)

    workflow = nvt.Workflow(filtered_sessions)
  workflow.fit(dataset)

tests/unit/test_tf4rec.py:198:


nvtabular/workflow/workflow.py:209: in fit
self._transform_impl(dataset, capture_dtypes=True).sample_dtypes()
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:1147: in sample_dtypes
_real_meta = self.engine.sample_data(n=n)
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset_engine.py:71: in sample_data
_head = _ddf.partitions[partition_index].head(n)
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:1140: in head
return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
/usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:1174: in _head
result = result.compute()
/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/utils.py:40: in apply
return func(*args, **kwargs)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:56: in apply
parent_df = self.apply(df, [parent], capture_dtypes=capture_dtypes)
nvtabular/workflow/executor.py:85: in apply
output_df = node.op.transform(selection, input_df)
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
nvtabular/ops/groupby.py:132: in transform
new_df = _apply_aggs(
nvtabular/ops/groupby.py:245: in _apply_aggs
df[f"{col}{name_sep}{_agg}"] = _first_or_last(
nvtabular/ops/groupby.py:289: in _first_or_last
return _first(x)
nvtabular/ops/groupby.py:302: in _first
elements = x.list._column.elements.values


self = <cudf.core.column.datetime.DatetimeColumn object at 0x7f01081eb3c0>
[
2019-12-18 10:54:26,
2020-01-16 05:25:57,
...7:31,
2020-05-22 12:09:32,
2020-11-11 23:04:27,
2020-09-12 05:35:17,
2020-12-26 04:10:41
]
dtype: datetime64[s]

@property
def values(self):
    """
    Return a CuPy representation of the DateTimeColumn.
    """
  raise NotImplementedError(
        "DateTime Arrays is not yet implemented in cudf"
    )

E NotImplementedError: DateTime Arrays is not yet implemented in cudf

/usr/local/lib/python3.8/dist-packages/cudf/core/column/datetime.py:210: NotImplementedError
------------------------------ Captured log call -------------------------------
ERROR nvtabular:executor.py:111 Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7f01381c2f40>
Traceback (most recent call last):
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/workflow/executor.py", line 85, in apply
output_df = node.op.transform(selection, input_df)
File "/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py", line 101, in inner
result = func(*args, **kwargs)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 132, in transform
new_df = _apply_aggs(
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 245, in _apply_aggs
df[f"{col}{name_sep}{_agg}"] = _first_or_last(
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 289, in _first_or_last
return _first(x)
File "/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/ops/groupby.py", line 302, in _first
elements = x.list._column.elements.values
File "/usr/local/lib/python3.8/dist-packages/cudf/core/column/datetime.py", line 210, in values
raise NotImplementedError(
NotImplementedError: DateTime Arrays is not yet implemented in cudf
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-Shuffle.PER_WORKER-True-device-0-parquet-0.1]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/test_tf4rec.py::test_tf4rec - NotImplementedError: DateTime...
===== 1 failed, 1428 passed, 2 skipped, 618 warnings in 706.13s (0:11:46) ======
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins9948493744242059586.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/NVTabular/review/pr-1654

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #1654 of commit 7ff98a4f2b978dcc5c1c3dcad001122eea28e52d, no merge conflicts.
Running as SYSTEM
Setting status of 7ff98a4f2b978dcc5c1c3dcad001122eea28e52d to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4644/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git
 > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
 > git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1654/*:refs/remotes/origin/pr/1654/* # timeout=10
 > git rev-parse 7ff98a4f2b978dcc5c1c3dcad001122eea28e52d^{commit} # timeout=10
Checking out Revision 7ff98a4f2b978dcc5c1c3dcad001122eea28e52d (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7ff98a4f2b978dcc5c1c3dcad001122eea28e52d # timeout=10
Commit message: "fix tf4rec unittest"
 > git rev-list --no-walk fcf24b4d7c36a29975a5db431e9ac7ebe25a6acc # timeout=10
[nvtabular_tests] $ /bin/bash /tmp/jenkins15873753808315941829.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1430 items / 1 skipped

tests/unit/test_dask_nvt.py ............................................ [ 3%]
........................................................................ [ 8%]
.... [ 8%]
tests/unit/test_notebooks.py ...... [ 8%]
tests/unit/test_tf4rec.py . [ 8%]
tests/unit/test_tools.py ...................... [ 10%]
tests/unit/test_triton_inference.py ................................ [ 12%]
tests/unit/framework_utils/test_tf_feature_columns.py . [ 12%]
tests/unit/framework_utils/test_tf_layers.py ........................... [ 14%]
................................................... [ 18%]
tests/unit/framework_utils/test_torch_layers.py . [ 18%]
tests/unit/loader/test_dataloader_backend.py ...... [ 18%]
tests/unit/loader/test_tf_dataloader.py ................................ [ 20%]
........................................s.. [ 23%]
tests/unit/loader/test_torch_dataloader.py ............................. [ 25%]
...................................................... [ 29%]
tests/unit/ops/test_categorify.py ...................................... [ 32%]
........................................................................ [ 37%]
........................................... [ 40%]
tests/unit/ops/test_column_similarity.py ........................ [ 42%]
tests/unit/ops/test_drop_low_cardinality.py .. [ 42%]
tests/unit/ops/test_fill.py ............................................ [ 45%]
........ [ 45%]
tests/unit/ops/test_groupyby.py ..................... [ 47%]
tests/unit/ops/test_hash_bucket.py ......................... [ 49%]
tests/unit/ops/test_join.py ............................................ [ 52%]
........................................................................ [ 57%]
.................................. [ 59%]
tests/unit/ops/test_lambda.py .......... [ 60%]
tests/unit/ops/test_normalize.py ....................................... [ 63%]
.. [ 63%]
tests/unit/ops/test_ops.py ............................................. [ 66%]
.................... [ 67%]
tests/unit/ops/test_ops_schema.py ...................................... [ 70%]
........................................................................ [ 75%]
........................................................................ [ 80%]
........................................................................ [ 85%]
....................................... [ 88%]
tests/unit/ops/test_reduce_dtype_size.py .. [ 88%]
tests/unit/ops/test_target_encode.py ..................... [ 89%]
tests/unit/workflow/test_cpu_workflow.py ...... [ 90%]
tests/unit/workflow/test_workflow.py ................................... [ 92%]
.......................................................... [ 96%]
tests/unit/workflow/test_workflow_chaining.py ... [ 96%]
tests/unit/workflow/test_workflow_node.py ........... [ 97%]
tests/unit/workflow/test_workflow_ops.py ... [ 97%]
tests/unit/workflow/test_workflow_schemas.py ........................... [ 99%]
... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

nvtabular/loader/init.py:19
/var/jenkins_home/workspace/nvtabular_tests/nvtabular/nvtabular/loader/init.py:19: DeprecationWarning: The nvtabular.loader module has moved to merlin.models.loader. Support for importing from nvtabular.loader is deprecated, and will be removed in a future version. Please update your imports to refer to merlin.models.loader.
warnings.warn(

tests/unit/test_dask_nvt.py::test_dask_workflow_api_dlrm[True-Shuffle.PER_WORKER-True-device-0-parquet-0.1]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/test_dask_nvt.py: 1 warning
tests/unit/test_tf4rec.py: 1 warning
tests/unit/test_tools.py: 5 warnings
tests/unit/test_triton_inference.py: 8 warnings
tests/unit/loader/test_dataloader_backend.py: 6 warnings
tests/unit/loader/test_tf_dataloader.py: 66 warnings
tests/unit/loader/test_torch_dataloader.py: 67 warnings
tests/unit/ops/test_categorify.py: 69 warnings
tests/unit/ops/test_drop_low_cardinality.py: 2 warnings
tests/unit/ops/test_fill.py: 8 warnings
tests/unit/ops/test_hash_bucket.py: 4 warnings
tests/unit/ops/test_join.py: 88 warnings
tests/unit/ops/test_lambda.py: 1 warning
tests/unit/ops/test_normalize.py: 9 warnings
tests/unit/ops/test_ops.py: 11 warnings
tests/unit/ops/test_ops_schema.py: 17 warnings
tests/unit/workflow/test_workflow.py: 27 warnings
tests/unit/workflow/test_workflow_chaining.py: 1 warning
tests/unit/workflow/test_workflow_node.py: 1 warning
tests/unit/workflow/test_workflow_schemas.py: 1 warning
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/test_dask_nvt.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 8 files.
warnings.warn(

tests/unit/test_dask_nvt.py::test_merlin_core_execution_managers
/usr/local/lib/python3.8/dist-packages/merlin/core/utils.py:431: UserWarning: Existing Dask-client object detected in the current context. New cuda cluster will not be deployed. Set force_new to True to ignore running clusters.
warnings.warn(

tests/unit/test_notebooks.py: 1 warning
tests/unit/test_tools.py: 17 warnings
tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 54 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:2940: FutureWarning: Series.ceil and DataFrame.ceil are deprecated and will be removed in the future
warnings.warn(

tests/unit/loader/test_tf_dataloader.py: 2 warnings
tests/unit/loader/test_torch_dataloader.py: 12 warnings
tests/unit/workflow/test_workflow.py: 9 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 2 files.
warnings.warn(

tests/unit/ops/test_fill.py::test_fill_missing[True-True-parquet]
tests/unit/ops/test_fill.py::test_fill_missing[True-False-parquet]
tests/unit/ops/test_ops.py::test_filter[parquet-0.1-True]
/usr/local/lib/python3.8/dist-packages/pandas/core/indexing.py:1732: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_block(indexer, value, name)

tests/unit/workflow/test_cpu_workflow.py: 6 warnings
tests/unit/workflow/test_workflow.py: 12 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 1 files did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/workflow/test_workflow.py: 48 warnings
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 20 files.
warnings.warn(

tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_parquet_output[True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[True-True-None]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_WORKER]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-Shuffle.PER_PARTITION]
tests/unit/workflow/test_workflow.py::test_workflow_apply[False-True-None]
/usr/local/lib/python3.8/dist-packages/merlin/io/dataset.py:862: UserWarning: Only created 2 files did not have enough partitions to create 4 files.
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
========== 1429 passed, 2 skipped, 618 warnings in 694.75s (0:11:34) ===========
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/NVTabular/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[nvtabular_tests] $ /bin/bash /tmp/jenkins13380384454792578178.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants