-
Notifications
You must be signed in to change notification settings - Fork 907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Copy on write implementation #11718
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm pretty happy with this PR. The level of complexity is significantly lower than I had originally feared it would be, especially after looking through spilling. My main outstanding concerns are around 1) the different new properties introduced for the data an mask, and 2) the potential confusion around the different methods for accessing pointers, especially as it pertains to their different behaviors in base/COW/spilling. We also need to deal with the issue Lawrence raised about slices, but hopefully my suggestion there will be sufficient. I don't want the PR's scope to sprawl, but I also don't want to merge anything that adds even more complexity in those areas, especially if it does so in ways that make the code harder to understand even when COW or spilling are disabled. We may want to meet and just discuss minimal solutions for those problems.
Aside from the API-level implications of COW, I would also like for us to come out of this project with a consistent story on column mutability (or at least a plan for having one). @shwina and I have previously discussed that we would love for columns to be immutable, and that has been about 90% true for a long time. COW gets us closer, but the addition of the mutable_ptr
attribute for buffers is a clear indicator that we have one important exception (__setitem__
calls) that may always have to exist. I think finding a clear way not only to document this, but to design the API around this fact, should help reduce some of issue (2) above regarding the different accessors.
Regarding the docs guides, I'm happy to make PRs into yours with proposed changes to the docs so that you don't have to implement all of my suggestions (they're definitely quite a bit of work). I didn't make any new comments there in this review, but there are still some outstanding major requests from the first round (like structuring the developer docs around having already read the user guide) that I am happy to draft for review to help move things forward.
else: | ||
data = <void*><uintptr_t>(col.base_data.ptr) | ||
data = <void*><uintptr_t>(col.base_data.mutable_ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, my first review was me getting a lot of ideas out without really thinking through scoping, but I'm certainly in favor of limiting the scope of this PR and not trying to rewrite existing internals more than absolutely necessary. We can do that in follow-ups.
def _data_array_view(self) -> "cuda.devicearray.DeviceNDArray": | ||
""" | ||
Internal implementation for viewing the data as a device array object | ||
without triggering a deep-copy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to limit the scope of this PR, but TBH I also don't want to merge it until we at least have a plan for how to handle this. The spilling code already makes buffer internals too hard to follow without some serious care. I think we need to put a hard cap on any changes that increase the complexity further without having a plan for how to fix it. Otherwise we will quickly find ourselves in a state where nobody will be able to debug this code without spending way too much time relearning each time (I still feel like I have to relearn the spilling internals every time I review a spilling PR, which I find quite concerning and I think we need to prioritize reducing that complexity and mitigating the fallout).
"Spilling is not supported when Copy-on-write is enabled. " | ||
"Please set `copy_on_write` to `False`" | ||
) | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the option being registered below should indeed guarantee that we won't get a KeyError
rhs, cudf.core.column.CategoricalColumn | ||
): | ||
assert_column_memory_eq(lhs.categories, rhs.categories) | ||
assert_column_memory_eq(lhs.codes, rhs.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change necessary when it wasn't before? What difference does COW make here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't COW related. This was necessary to specifically test the special handling in CategoricalColumn.copy
, where we will need to deep copy the categories especially. See test_index_copy_category
below that uses this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it that makes sense.
@@ -186,14 +208,21 @@ def ptr(self) -> int: | |||
"""Device pointer to the start of the buffer.""" | |||
return self._ptr | |||
|
|||
@property | |||
def mutable_ptr(self) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Vyas Ramasubramani <[email protected]>
# Because hasattr will trigger invocation of | ||
# `__cuda_array_interface__` which could | ||
# be expensive in CopyOnWriteBuffer case. | ||
value_cai = inspect.getattr_static( | ||
value, | ||
"__cuda_array_interface__", | ||
None | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like this?
# Because hasattr will trigger invocation of | |
# `__cuda_array_interface__` which could | |
# be expensive in CopyOnWriteBuffer case. | |
value_cai = inspect.getattr_static( | |
value, | |
"__cuda_array_interface__", | |
None | |
) | |
has_cai = isinstance(value, Buffer) or hasattr(value, "__cuda_array_interface__") |
Or even better, we could create a small PR that implement an util function to access __cuda_array_interface__
, which we could use exclusively. This would also simplify SpillableBuffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm planning to add a utility to remove this code duplication across the repo.
@shwina @vyasr and I synced up offline and created and came up with the following todo items:
Far in the future items(next releases):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with merging this to the copy-on-write
branch.
This simplifies the code and is a step towards `copy-on-write`: #11718 (comment) cc. @galipremsagar, @shwina, @vyasr Authors: - Mads R. B. Kristensen (https://github.com/madsbk) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Ashwin Srinath (https://github.com/shwina) - Lawrence Mitchell (https://github.com/wence-) URL: #12564
This PR: With the introduction of `copy-on-write`(#11718), and `spilling`, there are different ways each `Buffer` implementation expects the code to behave when a `.ptr` is accessed, i.e., in case of COW, a `ptr` can be accessed for read or write purposes, in case of spilling the ptr can be accessed with a spill lock and the buffer will be spillable after the spill lock goes out of scope during execution. For these reasons we introduced `ptr`, `mutable_ptr` and `data_array_view`, `_data_array_view`, `mask_array_view`, and `_mask_array_view`. With so many ways to access buffer ptr and array views, this has become quite difficult for devs to know when to use what unless you are fully familiar with the implementation details. It will also lead us to a lot of special case handling for `Buffer`, `SpillableBuffer`, and `CopyonWriteBuffer`. For this reason, we have decided to simplify fetching the pointer with a `get_ptr(mode="read"/"write")` API, fetching data & mask array views will also become methods that accept `mode` like `data_array_view(mode="read"/"write")` & `mask_array_view(mode="read"/"write")`. It is the expectation that when the caller passed "read", they don't tamper with the buffer or the memory it is pointing to. In the case of "write", they are good to mutate the memory it is pointing to. Note that even with `mode="read"/"write"` the caller should still, if appropriate, acquire a spill lock for the duration of the access. If this is not done, and the buffer is a `SpillableBuffer`, it will permanently be marked as unspillable. - [x] Introduces `get_ptr()` to replace `ptr` property. - [x] Replaces `data_array_view` & `mask_array_view` methods with `data_array_view(mode=r/w)` & `mask_array_view(mode=r/w)` Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - https://github.com/brandon-b-miller - Lawrence Mitchell (https://github.com/wence-) - Ashwin Srinath (https://github.com/shwina) URL: #12587
This PR contains changes from #11718 primarily that will enable Copy on write feature in cudf. This PR introduces `copy-on-write`. As the name suggests when `copy-on-write` is enabled, when there is a shallow copy of a column made, both the columns share the same memory and only when there is a write operation being performed on either the parent or any of it's copies a true copy will be triggered. Copy-on-write(`c-o-w`) can be enabled in two ways: 1. Setting `CUDF_COPY_ON_WRITE` environment variable to `1` will enable `c-o-w`, unsetting will disable `c-o-w`. 2. Setting `copy_on_write` option in `cudf` options by doing `cudf.set_option("copy_on_write", True)` to enable it and `cudf.set_option("copy_on_write", False)` to disable it. Note: Copy-on-write is not being enabled by default, it is being introduced as an opt-in. A valid performance comparison can be done only with `copy_on_write=OFF` + `.copy(deep=True)` vs `copy_on_write=ON` + `.copy(deep=False)`: ```python In [1]: import cudf In [2]: s = cudf.Series(range(0, 100000000)) # branch-23.02 : 1209MiB # This-PR : 1209MiB In [3]: s_copy = s.copy(deep=True) #branch-23.02 In [3]: s_copy = s.copy(deep=False) #This-PR # branch-23.02 : 1973MiB # This-PR : 1209MiB In [4]: s Out[4]: 0 0 1 1 2 2 3 3 4 4 ... 99999995 99999995 99999996 99999996 99999997 99999997 99999998 99999998 99999999 99999999 Length: 100000000, dtype: int64 In [5]: s_copy Out[5]: 0 0 1 1 2 2 3 3 4 4 ... 99999995 99999995 99999996 99999996 99999997 99999997 99999998 99999998 99999999 99999999 Length: 100000000, dtype: int64 In [6]: s[2] = 10001 # branch-23.02 : 3121MiB # This-PR : 3121MiB In [7]: s Out[7]: 0 0 1 1 2 10001 3 3 4 4 ... 99999995 99999995 99999996 99999996 99999997 99999997 99999998 99999998 99999999 99999999 Length: 100000000, dtype: int64 In [8]: s_copy Out[8]: 0 0 1 1 2 2 3 3 4 4 ... 99999995 99999995 99999996 99999996 99999997 99999997 99999998 99999998 99999999 99999999 Length: 100000000, dtype: int64 ``` Stats around the performance and memory gains : - [x] Memory usage of new copies will be 0 GPU memory additional overhead i.e., users will save 2x, 5x, 10x,...20x memory usage for making 2x, 5x, 10x,...20x deep copies respectively. **So, The more you copy the more you save 😉**(_as long as you don't write on all of them_) - [x] **copying times are now cut by 99%** for all dtypes when copy-on-write is enabled(`copy_on_write=OFF` + `.copy(deep=True)` vs `copy_on_write=ON` + `.copy(deep=False)`). ```python In [1]: import cudf In [2]: df = cudf.DataFrame({'a': range(0, 1000000)}) In [3]: df = cudf.DataFrame({'a': range(0, 100000000)}) In [4]: df['b'] = df.a.astype('str') # GPU memory usage # branch-23.02 : 2345MiB # This-PR : 2345MiB In [5]: df Out[5]: a b 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 ... ... ... 99999995 99999995 99999995 99999996 99999996 99999996 99999997 99999997 99999997 99999998 99999998 99999998 99999999 99999999 99999999 [100000000 rows x 2 columns] In [6]: def make_two_copies(df, deep): ...: return df.copy(deep=deep), df.copy(deep=deep) ...: In [7]: x, y = make_two_copies(df, deep=True) # branch-23.02 In [7]: x, y = make_two_copies(df, deep=False) # This PR # GPU memory usage # branch-23.02 : 6147MiB # This-PR : 2345MiB In [8]: %timeit make_two_copies(df, deep=True) # branch-23.02 In [8]: %timeit make_two_copies(df, deep=False) # This PR # Execution times # branch-23.02 : 135 ms ± 4.49 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # This-PR : 100 µs ± 879 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` - [x] Even when `copy-on-write` is disabled, `string`, `list` & `struct` columns **deep copies are now 99% faster** ```python In [1]: import cudf In [2]: s = cudf.Series(range(0, 100000000), dtype='str') In [3]: %timeit s.copy(deep=True) # branch-23.02 : 28.3 ms ± 5.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # This PR : 19.9 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [9]: s = cudf.Series([[1, 2], [2, 3], [3, 4], [4, 5], [6, 7]]* 10000000) In [10]: %timeit s.copy(deep=True) # branch-23.02 : 25.7 ms ± 5.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # This-PR : 44.2 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [4]: df = cudf.DataFrame({'a': range(0, 100000000), 'b': range(0, 100000000)[::-1]}) In [5]: s = df.to_struct() In [6]: %timeit s.copy(deep=True) # branch-23.02 : 42.5 ms ± 3.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # This-PR : 89.7 µs ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) ``` - [x] Add pytests - [x] Docs page explaining copy on write and how to enable/disable it. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: #12619
This PR introduces
copy-on-write
. As the name suggests whencopy-on-write
is enabled, when there is a shallow copy of a column made, both the columns share the same memory and only when there is a write operation being performed on either the parent or any of it's copies a true copy will be triggered. Copy-on-write(c-o-w
) can be enabled in two ways:CUDF_COPY_ON_WRITE
environment variable to1
will enablec-o-w
, unsetting will disablec-o-w
.copy_on_write
option incudf
options by doingcudf.set_option("copy_on_write", True)
to enable it andcudf.set_option("copy_on_write", False)
to disable it.Note: Copy-on-write is not being enabled by default, it is being introduced as an opt-in.
A valid performance comparison can be done only with
copy_on_write=OFF
+.copy(deep=True)
vscopy_on_write=ON
+.copy(deep=False)
:Stats around the performance and memory gains :
copy_on_write=OFF
+.copy(deep=True)
vscopy_on_write=ON
+.copy(deep=False)
).copy-on-write
is disabled,string
,list
&struct
columns deep copies are now 99% faster