Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix result column types for empty inputs to rolling window #8274

Merged

Conversation

mythrocks
Copy link
Contributor

Fixes the rolling-window part of #7611.

All the rolling window functions return empty results when the input aggregation column is empty, just as they should. But the column types are incorrectly set to match the input type. While this is alright for [MIN(), MAX(), LEAD(), LAG()], it is incorrect for some aggregations:

Aggregation Input Types Output Type
COUNT_VALID All types INT32
COUNT_ALL All types INT32
ROW_NUMBER All types INT32
SUM Numerics (e.g. INT8) 64-bit promoted type (e.g. INT64)
SUM Chrono Same as input type
SUM All else Unsupported
MEAN Numerics FLOAT64
MEAN Chrono FLOAT64
MEAN All else Unsupported
COLLECT_LIST All types T LIST with child of type T

This mapping is congruent with cudf::target_type_t from <cudf/detail/aggregation/aggregation.hpp>.

This commit corrects the type of the output column that results from an empty input. It adds test for all the combinations listed above.

Note: This is dependent on #8158, and should be merged after that is committed.

@mythrocks mythrocks self-assigned this May 18, 2021
@mythrocks mythrocks requested a review from a team as a code owner May 18, 2021 17:49
@mythrocks mythrocks marked this pull request as draft May 18, 2021 17:50
@mythrocks mythrocks added bug Something isn't working non-breaking Non-breaking change labels May 18, 2021
@mythrocks
Copy link
Contributor Author

I've branched this off of nvdbaranec/rolling-window-refactor. I haven't rebased since 3 days ago. Only the last commit is really relevant to this PR, at the moment.

@mythrocks mythrocks force-pushed the empty-inputs-rolling-window branch from 9be3c1d to c4645f4 Compare May 20, 2021 04:58
@github-actions github-actions bot added CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. labels May 20, 2021
@mythrocks
Copy link
Contributor Author

Rebased for the latest changes on nvdbaranec/rolling-window-refactor.

I have run into an odd thing. Perhaps someone sharper than myself might spot it.
The code runs fine in its current form. I'm now trying to replace a two-step type-aggregation dispatch (here) into a single dispatch_type_and_aggregation() call. I find that the exception thrown from CUDF_FAIL() becomes uncatch()able via EXPECT_THROWS(). I can't seem to figure out why.

@mythrocks
Copy link
Contributor Author

I can't seem to figure out why.

Ok, found it. It's the noexcept clause on dispatch_source::operator():
https://github.com/rapidsai/cudf/blob/branch-21.06/cpp/include/cudf/detail/aggregation/aggregation.hpp#L1046:

struct dispatch_source {
#pragma nv_exec_check_disable
  template <typename Element, typename F, typename... Ts>
  CUDA_HOST_DEVICE_CALLABLE decltype(auto) operator()(aggregation::Kind k,
                                                      F&& f,
                                                      Ts&&... args) const noexcept
  {
    return aggregation_dispatcher(
      k, dispatch_aggregation<Element>{}, std::forward<F>(f), std::forward<Ts>(args)...);
  }
};

Wouldn't this mean that we can't have CUDF_FAIL() paths in any functor dispatched via dispatch_type_and_aggregation()? Is that intentional?

@mythrocks mythrocks force-pushed the empty-inputs-rolling-window branch from c4645f4 to 889b670 Compare May 20, 2021 20:27
@mythrocks
Copy link
Contributor Author

Wouldn't this mean that we can't have CUDF_FAIL() paths in any functor dispatched via dispatch_type_and_aggregation()? Is that intentional?

Based on the feedback from the team, I have changed dispatch_type_and_aggregation() to accommodate functors that throw.

@mythrocks mythrocks force-pushed the empty-inputs-rolling-window branch from 889b670 to 00002f7 Compare May 26, 2021 15:52
@mythrocks mythrocks marked this pull request as ready for review May 26, 2021 15:53
@mythrocks mythrocks force-pushed the empty-inputs-rolling-window branch from 00002f7 to 07ffa1a Compare May 26, 2021 16:04
@mythrocks mythrocks changed the title [WIP] Fix result column types for empty inputs to rolling window Fix result column types for empty inputs to rolling window May 26, 2021
@mythrocks mythrocks requested review from nvdbaranec and removed request for trxcllnt May 26, 2021 17:26
Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only suggestion I have is that it might make sense to make this function name a little more rolling-specific:

cudf::detail::empty_output(input, agg);

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

@gpucibot merge

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Rerun tests.

1 similar comment
@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Rerun tests.

@mythrocks
Copy link
Contributor Author

Rerun tests.

It seems to be building, albeit slowly. :/

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Rerun tests.

It seems to be building, albeit slowly. :/

Failed multiple times 😢

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Again, fail with error "cpu4 is offline".
Rerun tests.

@harrism
Copy link
Member

harrism commented May 27, 2021

@ttnghia can you please let us see the failed output of CI next time so we can engage devops? If you always re-run t.e.s.t.s. then we can't see the output.

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

@ttnghia can you please let us see the failed output of CI next time so we can engage devops? If you always re-run t.e.s.t.s. then we can't see the output.

Sure, below is some output from CI tests that failed:

20:49:55 Started by upstream project "rapidsai/gpuci-v0.20/cudf/prb/cudf-prb" build number 1699
20:49:55 originally caused by:
20:49:55  GitHub pull request #8274 of commit 287f216f09364e0c4724c33e95f0482a78cbe684, no merge conflicts.
20:49:55 Running as SYSTEM
20:49:55 [EnvInject] - Loading node environment variables.
20:49:55 [EnvInject] - Preparing an environment for the build.
20:49:55 [EnvInject] - Keeping Jenkins system variables.
20:49:55 [EnvInject] - Keeping Jenkins build variables.
20:49:55 [EnvInject] - Injecting as environment variables the properties content 
20:49:55 PARALLEL_LEVEL=12
20:49:55 BUILD_CUDF=1
20:49:55 BUILD_TYPE=cpu
20:49:55 BUILD_MODE=pull-request
20:49:55 PROJECT_FLASH=1
20:49:55 

20:49:55 [EnvInject] - Variables injected successfully.
20:49:55 [EnvInject] - Injecting contributions.
20:49:55 Building remotely on EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571) (runner) in workspace /jenkins/workspace/rapidsai/gpuci-v0.20/cudf/prb/cudf-cpu-python-build_5
20:49:56 [WS-CLEANUP] Deleting project workspace...
20:49:56 [WS-CLEANUP] Deferred wipeout is disabled by the job configuration...
20:49:57 [WS-CLEANUP] Done
20:49:57 The recommended git tool is: NONE
20:49:57 No credentials specified
20:49:57 Warning: JENKINS-30600: special launcher com.gpuopenanalytics.jenkins.remotedocker.DockerLauncher@3f5b8e70; decorates RemoteLauncher[hudson.remoting.Channel@7c68849a:EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571)] will be ignored (a typical symptom is the Git executable not being run inside a designated container)
20:49:57 Wiping out workspace first.
20:49:57 Cloning the remote Git repository
20:49:57 Cloning repository https://github.com/rapidsai/cudf.git
20:49:57  > git init /jenkins/workspace/rapidsai/gpuci-v0.20/cudf/prb/cudf-cpu-python-build_5 # timeout=10
20:49:57 Fetching upstream changes from https://github.com/rapidsai/cudf.git
20:49:57  > git --version # timeout=10
20:49:57  > git --version # 'git version 2.17.1'
20:49:57  > git fetch --tags --progress -- https://github.com/rapidsai/cudf.git +refs/heads/*:refs/remotes/origin/* # timeout=10
20:50:09  > git config remote.origin.url https://github.com/rapidsai/cudf.git # timeout=10
20:50:09  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
20:50:10  > git config remote.origin.url https://github.com/rapidsai/cudf.git # timeout=10
20:50:10 Fetching upstream changes from https://github.com/rapidsai/cudf.git
20:50:10  > git fetch --tags --progress -- https://github.com/rapidsai/cudf.git +refs/pull/8274/*:refs/remotes/origin/pr/8274/* # timeout=10
20:50:10  > git rev-parse refs/remotes/origin/pr/8274/merge^{commit} # timeout=10
20:50:10 JENKINS-19022: warning: possible memory leak due to Git plugin usage; see: https://wiki.jenkins.io/display/JENKINS/Remove+Git+Plugin+BuildsByBranch+BuildData
20:50:10 Checking out Revision 8e80a3df28341a1bb1bc4e7ce868973d402a9109 (refs/remotes/origin/pr/8274/merge)
20:50:10  > git config core.sparsecheckout # timeout=10
20:50:10  > git checkout -f 8e80a3df28341a1bb1bc4e7ce868973d402a9109 # timeout=10
20:50:10 Commit message: "Merge 287f216f09364e0c4724c33e95f0482a78cbe684 into b9bc78ea0deff66820682e0f9fc67dab015f3e81"
20:50:10  > git rev-list --no-walk 8e80a3df28341a1bb1bc4e7ce868973d402a9109 # timeout=10
20:50:10 Triggering RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8
20:50:10 Triggering RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.0,3.8
20:50:10 Triggering RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.0,3.7
20:50:10 Triggering RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.7
20:50:49 Configuration RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8 is still in the queue: ‘AGX-gpu-m01’ doesn’t have label ‘cpu4’
20:50:49 ‘AGX-gpu-m02’ doesn’t have label ‘cpu4’
20:50:49 ‘AGX-gpu-m03’ doesn’t have label ‘cpu4’
20:50:49 ‘AGX-gpu-m04’ doesn’t have label ‘cpu4’
20:50:49 ‘AGX-gpu-m05’ doesn’t have label ‘cpu4’
20:50:49 ‘AGX-gpu-m06’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-hv100-sm04’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-p100-sm10’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-t4-sm09’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm01’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm02’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm03’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm05’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm06’ doesn’t have label ‘cpu4’
20:50:49 ‘ASE-gpu-v100-sm07’ doesn’t have label ‘cpu4’
20:50:49 ‘EC2 (aws-b) - cpu-m5dxl (i-015e1acf2c5ac4f02)’ doesn’t have label ‘cpu4’
20:50:49 ‘EC2 (aws-b) - cpu4-m5d4xl (i-021765b8128bb08e8)’ is offline
20:50:49 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0d40845fcb3bce498)’ is offline
20:50:49 ‘EC2 (aws-b) - runner-m5d2xl (i-03187f0cbfd89785a)’ doesn’t have label ‘cpu4’
20:50:49 ‘EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571)’ doesn’t have label ‘cpu4’
20:50:49 ‘Jenkins’ doesn’t have label ‘cpu4’
20:51:17 Configuration RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8 is still in the queue: ‘AGX-gpu-m01’ doesn’t have label ‘cpu4’
20:51:17 ‘AGX-gpu-m02’ doesn’t have label ‘cpu4’
20:51:17 ‘AGX-gpu-m03’ doesn’t have label ‘cpu4’
20:51:17 ‘AGX-gpu-m04’ doesn’t have label ‘cpu4’
20:51:17 ‘AGX-gpu-m05’ doesn’t have label ‘cpu4’
20:51:17 ‘AGX-gpu-m06’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-hv100-sm04’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-p100-sm10’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-t4-sm09’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm01’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm02’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm03’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm05’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm06’ doesn’t have label ‘cpu4’
20:51:17 ‘ASE-gpu-v100-sm07’ doesn’t have label ‘cpu4’
20:51:17 ‘EC2 (aws-b) - cpu-m5dxl (i-015e1acf2c5ac4f02)’ doesn’t have label ‘cpu4’
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-021765b8128bb08e8)’ is offline
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-04828397df093dfd1)’ is offline
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-07f55373768df2081)’ is offline
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0d40845fcb3bce498)’ is offline
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0df51c1b82892794d)’ is offline
20:51:17 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0e0332e717ef17fc7)’ is offline
20:51:17 ‘EC2 (aws-b) - runner-m5d2xl (i-03187f0cbfd89785a)’ doesn’t have label ‘cpu4’
20:51:17 ‘EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571)’ doesn’t have label ‘cpu4’
20:51:17 ‘Jenkins’ doesn’t have label ‘cpu4’
20:51:18 Configuration RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8 is still in the queue: Executor slot already in use
20:51:18 ‘AGX-gpu-m01’ doesn’t have label ‘cpu4’
20:51:18 ‘AGX-gpu-m02’ doesn’t have label ‘cpu4’
20:51:18 ‘AGX-gpu-m03’ doesn’t have label ‘cpu4’
20:51:18 ‘AGX-gpu-m04’ doesn’t have label ‘cpu4’
20:51:18 ‘AGX-gpu-m05’ doesn’t have label ‘cpu4’
20:51:18 ‘AGX-gpu-m06’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-hv100-sm04’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-p100-sm10’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-t4-sm09’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm01’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm02’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm03’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm05’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm06’ doesn’t have label ‘cpu4’
20:51:18 ‘ASE-gpu-v100-sm07’ doesn’t have label ‘cpu4’
20:51:18 ‘EC2 (aws-b) - cpu-m5dxl (i-015e1acf2c5ac4f02)’ doesn’t have label ‘cpu4’
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-021765b8128bb08e8)’ is offline
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-04828397df093dfd1)’ is offline
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-07f55373768df2081)’ is offline
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0d40845fcb3bce498)’ is offline
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0df51c1b82892794d)’ is offline
20:51:18 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0e0332e717ef17fc7)’ is offline
20:51:18 ‘EC2 (aws-b) - runner-m5d2xl (i-03187f0cbfd89785a)’ doesn’t have label ‘cpu4’
20:51:18 ‘EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571)’ doesn’t have label ‘cpu4’
20:51:18 ‘Jenkins’ doesn’t have label ‘cpu4’
20:51:32 Configuration RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8 is still in the queue: ‘AGX-gpu-m01’ doesn’t have label ‘cpu4’
20:51:32 ‘AGX-gpu-m02’ doesn’t have label ‘cpu4’
20:51:32 ‘AGX-gpu-m03’ doesn’t have label ‘cpu4’
20:51:32 ‘AGX-gpu-m04’ doesn’t have label ‘cpu4’
20:51:32 ‘AGX-gpu-m05’ doesn’t have label ‘cpu4’
20:51:32 ‘AGX-gpu-m06’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-hv100-sm04’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-p100-sm10’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-t4-sm09’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm01’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm02’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm03’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm05’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm06’ doesn’t have label ‘cpu4’
20:51:32 ‘ASE-gpu-v100-sm07’ doesn’t have label ‘cpu4’
20:51:32 ‘EC2 (aws-b) - cpu-m5dxl (i-015e1acf2c5ac4f02)’ doesn’t have label ‘cpu4’
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-021765b8128bb08e8)’ is offline
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-04828397df093dfd1)’ is offline
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-07f55373768df2081)’ is offline
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0d40845fcb3bce498)’ is offline
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0df51c1b82892794d)’ is offline
20:51:32 ‘EC2 (aws-b) - cpu4-m5d4xl (i-0e0332e717ef17fc7)’ is offline
20:51:32 ‘EC2 (aws-b) - runner-m5d2xl (i-03187f0cbfd89785a)’ doesn’t have label ‘cpu4’
20:51:32 ‘EC2 (aws-b) - runner-m5d2xl (i-0fee0b50d2641d571)’ doesn’t have label ‘cpu4’
20:51:32 ‘Jenkins’ doesn’t have label ‘cpu4’
21:46:53 RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.8 completed with result SUCCESS
21:46:53 RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.0,3.8 completed with result SUCCESS
21:46:53 RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.0,3.7 completed with result SUCCESS
21:46:53 RAPIDS » gpuci-v0.20 » cudf » prb » cudf-cpu-python-build » 11.2,3.7 completed with result FAILURE
21:46:53 Finished: FAILURE
20:54:10 Install cudf from s3
20:54:48 

20:54:48 gpuCI logger » [05/27/2021 02:54:48]
20:54:48 ┌─────────────────────────────────────────┐
20:54:48 |    Checking if ccache tarball exists    |
20:54:48 └─────────────────────────────────────────┘
20:54:48 

20:54:49 

20:54:49 An error occurred (404) when calling the HeadObject operation: Not Found
20:54:54 {
20:54:54     "AcceptRanges": "bytes",
20:54:54     "Expiration": "expiry-date=\"Tue, 27 Jul 2021 00:00:00 GMT\", rule-id=\"Delete data after 60 days\"",
20:54:54     "LastModified": "2021-05-27T02:18:06+00:00",
20:54:54     "ContentLength": 4590192640,
20:54:54     "ETag": "\"4f500f36f901c6cc27719f393dad4d78\"",
20:54:54     "ContentType": "binary/octet-stream",
20:54:54     "ServerSideEncryption": "AES256",
20:54:54     "Metadata": {}
20:54:54 }
20:54:54 Found existing ccache tarball: rapidsai/cudf/branch/branch-21.06/ccache/ccache-ubuntu16.04-cuda11.2-py3.7.tar
20:55:13 tar: Skipping to next header
20:55:15 tar: Exiting with failure status due to previous errors
20:55:15 Build step 'Execute shell' marked build as failure
20:55:45 [Set GitHub commit status (universal)] ERROR on repos [GHRepository@454a16ba[nodeId=MDEwOlJlcG9zaXRvcnk5MDUwNjkxOA==,description=cuDF - GPU DataFrame Library ,homepage=http://rapids.ai,name=cudf,fork=false,archived=false,size=86216,milestones={},language=C++,commits={},source=<null>,parent=<null>,isTemplate=<null>,url=https://api.github.com/repos/rapidsai/cudf,id=90506918,nodeId=<null>,createdAt=2017-05-07T03:43:37Z,updatedAt=2021-05-27T01:27:34Z]] (sha:287f216) with context:gpuCI/cudf/build/python/3.7/cuda/11.2
20:55:45 Setting commit status on GitHub for https://github.com/rapidsai/cudf/commit/287f216f09364e0c4724c33e95f0482a78cbe684
20:55:45 Finished: FAILURE

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

I observed that many failed tests are due to time out: the CI machines are overloaded by too many tests and randomly stop responding/very slowly respond to new test requests. So, the classic solution to this classic problem is still applicable here: throw more machines into it please!

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Since code freeze is about to start and this is the only PR left in branch 21.06, I'm going to trigger rerun tests again anyway.

@JohnZed
Copy link
Contributor

JohnZed commented May 27, 2021

The 2 failures now look more legit: RuntimeError: cuDF failure at: ../include/cudf/detail/aggregation/aggregation.hpp:1014: Unsupported aggregation.

@mythrocks
Copy link
Contributor Author

aggregation.hpp was changed in this PR to disable MEAN aggregations on non-fixed-width types (like STRUCT, LIST, and STRING).
This might have triggered the failure. I'm investigating.

@ttnghia
Copy link
Contributor

ttnghia commented May 27, 2021

Full error message:

12:41:16 =================================== FAILURES ===================================
12:41:16 ______________ test_rolling_dataframe_numba_udf_basic[True-data0] ______________
12:41:16 [gw0] linux -- Python 3.7.10 /opt/conda/envs/rapids/bin/python
12:41:16 
12:41:16 data = {'a': [], 'b': []}, center = True
12:41:16 
12:41:16     @pytest.mark.parametrize(
12:41:16         "data",
12:41:16         [
12:41:16             {"a": [], "b": []},
12:41:16             {"a": [1, 2, 3, 4], "b": [1, 2, 3, 4]},
12:41:16             {"a": [1, 2, 4, 9, 9, 4], "b": [1, 2, 4, 9, 9, 4]},
12:41:16             {
12:41:16                 "a": np.array([1, 2, 4, 9, 9, 4]),
12:41:16                 "b": np.array([1.5, 2.2, 2.2, 8.0, 9.1, 4.2]),
12:41:16             },
12:41:16         ],
12:41:16     )
12:41:16     @pytest.mark.parametrize("center", [True, False])
12:41:16     def test_rolling_dataframe_numba_udf_basic(data, center):
12:41:16     
12:41:16         pdf = pd.DataFrame(data)
12:41:16         gdf = cudf.from_pandas(pdf)
12:41:16     
12:41:16         def some_func(A):
12:41:16             b = 0
12:41:16             for a in A:
12:41:16                 b = b + a ** 2
12:41:16             return b / len(A)
12:41:16     
12:41:16         for window_size in range(1, len(data) + 1):
12:41:16             for min_periods in range(1, window_size + 1):
12:41:16                 assert_eq(
12:41:16                     pdf.rolling(window_size, min_periods, center)
12:41:16                     .apply(some_func)
12:41:16                     .fillna(-1),
12:41:16                     gdf.rolling(window_size, min_periods, center)
12:41:16 >                   .apply(some_func)
12:41:16                     .fillna(-1),
12:41:16                     check_dtype=False,
12:41:16                 )
12:41:16 
12:41:16 cudf/tests/test_rolling.py:270: 
12:41:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
12:41:16 cudf/core/window/rolling.py:280: in apply
12:41:16     return self._apply_agg(func)
12:41:16 cudf/core/window/rolling.py:231: in _apply_agg
12:41:16     return self._apply_agg_dataframe(self.obj, agg_name)
12:41:16 cudf/core/window/rolling.py:222: in _apply_agg_dataframe
12:41:16     result_col = self._apply_agg_series(df[col_name], agg_name)
12:41:16 cudf/core/window/rolling.py:205: in _apply_agg_series
12:41:16     agg_name,
12:41:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
12:41:16 
12:41:16 >   cpp_rolling_window(
12:41:16 E   RuntimeError: cuDF failure at: ../include/cudf/detail/aggregation/aggregation.hpp:1014: Unsupported aggregation.
12:41:16 
12:41:16 cudf/_lib/rolling.pyx:85: RuntimeError
12:41:16 _____________ test_rolling_dataframe_numba_udf_basic[False-data0] ______________
12:41:16 [gw0] linux -- Python 3.7.10 /opt/conda/envs/rapids/bin/python
12:41:16 
12:41:16 data = {'a': [], 'b': []}, center = False
12:41:16 
12:41:16     @pytest.mark.parametrize(
12:41:16         "data",
12:41:16         [
12:41:16             {"a": [], "b": []},
12:41:16             {"a": [1, 2, 3, 4], "b": [1, 2, 3, 4]},
12:41:16             {"a": [1, 2, 4, 9, 9, 4], "b": [1, 2, 4, 9, 9, 4]},
12:41:16             {
12:41:16                 "a": np.array([1, 2, 4, 9, 9, 4]),
12:41:16                 "b": np.array([1.5, 2.2, 2.2, 8.0, 9.1, 4.2]),
12:41:16             },
12:41:16         ],
12:41:16     )
12:41:16     @pytest.mark.parametrize("center", [True, False])
12:41:16     def test_rolling_dataframe_numba_udf_basic(data, center):
12:41:16     
12:41:16         pdf = pd.DataFrame(data)
12:41:16         gdf = cudf.from_pandas(pdf)
12:41:16     
12:41:16         def some_func(A):
12:41:16             b = 0
12:41:16             for a in A:
12:41:16                 b = b + a ** 2
12:41:16             return b / len(A)
12:41:16     
12:41:16         for window_size in range(1, len(data) + 1):
12:41:16             for min_periods in range(1, window_size + 1):
12:41:16                 assert_eq(
12:41:16                     pdf.rolling(window_size, min_periods, center)
12:41:16                     .apply(some_func)
12:41:16                     .fillna(-1),
12:41:16                     gdf.rolling(window_size, min_periods, center)
12:41:16 >                   .apply(some_func)
12:41:16                     .fillna(-1),
12:41:16                     check_dtype=False,
12:41:16                 )
12:41:16 
12:41:16 cudf/tests/test_rolling.py:270: 
12:41:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
12:41:16 cudf/core/window/rolling.py:280: in apply
12:41:16     return self._apply_agg(func)
12:41:16 cudf/core/window/rolling.py:231: in _apply_agg
12:41:16     return self._apply_agg_dataframe(self.obj, agg_name)
12:41:16 cudf/core/window/rolling.py:222: in _apply_agg_dataframe
12:41:16     result_col = self._apply_agg_series(df[col_name], agg_name)
12:41:16 cudf/core/window/rolling.py:205: in _apply_agg_series
12:41:16     agg_name,
12:41:16 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
12:41:16 
12:41:16 >   cpp_rolling_window(
12:41:16 E   RuntimeError: cuDF failure at: ../include/cudf/detail/aggregation/aggregation.hpp:1014: Unsupported aggregation.
12:41:16 
12:41:16 cudf/_lib/rolling.pyx:85: RuntimeError

@mythrocks mythrocks requested review from shwina and vyasr May 27, 2021 21:10
@mythrocks
Copy link
Contributor Author

Please pardon the delay.

I've now added special handling for UDFs. Thanks, @shwina, @vyasr. Could I please bother you guys to take a look?

Note: I'm not currently honouring the return-type for the UDF, as reported by Numba. I'm returning empty_like(input) in those cases. This should preserve the prior behaviour (i.e. 0.19.2) and keep this change non-breaking. We can reexamine this corner-case in the next release.

@vyasr
Copy link
Contributor

vyasr commented May 27, 2021

I'll open a new issue to discuss future steps on that.

@vyasr
Copy link
Contributor

vyasr commented May 27, 2021

@mythrocks I made a minor suggestion to clarify your comment, but otherwise this looks good to me. Your fix looks like it should address the immediate problem.

@vyasr vyasr self-requested a review May 27, 2021 21:35
@codecov
Copy link

codecov bot commented May 27, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.06@7231e3b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.06    #8274   +/-   ##
===============================================
  Coverage                ?   82.83%           
===============================================
  Files                   ?      109           
  Lines                   ?    17896           
  Branches                ?        0           
===============================================
  Hits                    ?    14824           
  Misses                  ?     3072           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7231e3b...4646680. Read the comment docs.

@rapids-bot rapids-bot bot merged commit 0eeb0c9 into rapidsai:branch-21.06 May 28, 2021
@mythrocks
Copy link
Contributor Author

Argh. Finally!

@mythrocks
Copy link
Contributor Author

Thanks for the reviews, all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working CMake CMake build issue libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] collect_list (rolling_window, groupby) fails for empty input
6 participants