Start migrating I/O writers to pylibcudf (starting with JSON) #15952

lithomas1 · 2024-06-06T23:55:40Z

Description

Switches the JSON writer to use pylibcudf.
xref #15162

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

python/cudf/cudf/_lib/json.pyx

…f-io-writers

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

python/cudf/cudf/pylibcudf_tests/test_json.py

…f-io-writers

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

…f-io-writers

wence-

Thanks, I think there a few organisational tidy-ups we can do in the tests.

python/cudf/cudf/_lib/json.pyx

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

python/cudf/cudf/pylibcudf_tests/conftest.py

wence- · 2024-06-13T12:04:19Z

python/cudf/cudf/pylibcudf_tests/conftest.py

+    # Cleanup after ourselves
+    # since the BytesIO and StringIO objects get cached by pytest
+    if isinstance(fp_or_buf, io.IOBase):
+        fp_or_buf.seek(0)
+        fp_or_buf.truncate(0)


Wah?! This is "once-only" fixture, no, so it should be cleaned up automatically. What is this for?

I couldn't figure out how to reset the StringIO/BytesIO buffers 😅 .
(otherwise we have the writes from the previous calls to write_json in it)

I tried setting the scope to "function", but that didn't seem to do anything.

It should be the case that pytest conses a new object every time this fixture is used, so it seems something is wrong :(

Can you distill this to a smaller test case, because this sounds absolutely like a pytest bug.

I think I get it now.
The issue is that the StringIO and the BytesIO are created outside of the fixture (since they are passed in the params list).

Every time pytest runs a test, it probably runs the fixture again (passing the same StringIO/BytesIO instance from the params list into the fixture, but since we return the params unchanged, we end up reusing the StringIO/BytesIO).

(This is why calling copy.copy on fp_or_buf inside of the fixture - which I just tested - would also fix the issue.)

I guess the question in that case would be should pytest make a copy of the input parameters.
(probably not by default at the very least, I assume there's some classes that disallow copying)
Even if pytest doesn't change anything, this is still probably worth reporting so that they can add it to the docs or something.

Ah ok, yes, I was blind to this. This is just standard python, because pytest only sees the actual objects that have already been constructed (it's the same trap as def foo(param=[]))

The right way to handle this is something like:

@pytest.fixture(params=["string", "pathlib.Path", "BytesIO", "StringIO"]) def source_or_sink(request, tmp_path): if request.param == "string": return f"{tmp_path}/{request.param}" elif request.param == "pathlib.Path": return pathlib.Path(tmp_path) / request.param elif request.param == "BytesIO": return io.BytesIO() elif request.param == "StringIO": return io.StringIO() assert False

Pushed something up like this.

python/cudf/cudf/pylibcudf_tests/test_json.py

python/cudf/cudf/pylibcudf_tests/common/utils.py

python/cudf/cudf/pylibcudf_tests/test_json.py

… pylibcudf-io-writers

…f-io-writers

Co-authored-by: Lawrence Mitchell <[email protected]>

lithomas1 · 2024-06-25T21:08:22Z

All skips in test_copying should be now removed, thanks Lawrence for the help!

wence-

I don't have any more blocking comments I think. Thanks. I will let @vyasr decide if there are any more from their point of view.

My worry about needing to reset the StringIO bufs in the test fixture still applies. I absolutely don't think that should be necessary.

…f-io-writers

… pylibcudf-io-writers

python/cudf/cudf/_lib/pylibcudf/io/json.pxd

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

python/cudf/cudf/_lib/pylibcudf/io/json.pyx

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

python/cudf/cudf/pylibcudf_tests/conftest.py

python/cudf/cudf/pylibcudf_tests/io/test_json.py

python/cudf/cudf/pylibcudf_tests/test_copying.py

Co-authored-by: Vyas Ramasubramani <[email protected]>

vyasr

Couple of small things left, but enough that I think it's worth fixing before we merge

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

python/cudf/cudf/pylibcudf_tests/conftest.py

vyasr · 2024-06-28T22:43:45Z

python/cudf/cudf/pylibcudf_tests/conftest.py

+    This is the default fixture you should be using for testing
+    pylibcudf I/O writers.
+
+    Contains one of each category (e.g. int, bool, list, struct)
+    of dtypes.


Did this PR get opened? If not, can we either make it, or at least open an issue so that we don't forget?

…f-io-writers

lithomas1 · 2024-06-29T00:48:33Z

OK, switched it back to loops in conditionals.

vyasr

Couple of last suggestions, but I'm approving now so feel free to merge once you've addressed as you see fit.

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

…f-io-writers

python/cudf/cudf/_lib/pylibcudf/io/types.pyx

lithomas1 · 2024-07-02T14:38:13Z

/merge

commit 1a4c2aa Author: Thomas Li <[email protected]> Date: Tue Jul 2 07:38:18 2024 -0700 Start migrating I/O writers to pylibcudf (starting with JSON) (rapidsai#15952) Switches the JSON writer to use pylibcudf. xref rapidsai#15162 Authors: - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15952 commit a1447c7 Author: Robert Maynard <[email protected]> Date: Tue Jul 2 09:34:29 2024 -0400 Promote has_nested_columns to cudf public API (rapidsai#16131) The `has_nested_columns` functionality is used in numerous tests. It looks like it should be part of our stable public API. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16131 commit a4be7bd Author: Vyas Ramasubramani <[email protected]> Date: Tue Jul 2 00:50:42 2024 -0700 Use Arrow C Data Interface functions for Python interop (rapidsai#15904) This PR replaces the internals of `from_arrow` in pylibcudf with an implementation that uses the [Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html) using the [Python Capsule interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). This allows us to decouple our Python builds from using pyarrow Cython (partially, we haven't replaced the `to_arrow` conversion yet) and it will also allow us to support any other Python package that is a producer of the data interface. To support the above functionality, the following additional changes were needed in this PR: - Added the ability to produce cudf tables from `ArrowArrayStream` objects since that is what `pyarrow.Table` produces. This function is a simple wrapper around the existing `from_arrrow(ArrowArray)` API. - Added support for the large strings type, for which support has improved throughout cudf since the `from_arrow_host` API was added and for which we now require a basic overload for tests to pass. I did not add corresponding support for `from_arrow_device` to avoid ballooning the scope of this PR, so that work can be done in a follow-up. - Proper handling of `type_id::EMPTY` in concatenate because the most natural implementation of the ArrowArrayStream processing is to run `from_arrow` on each chunk and then concatenate the outputs, and from the Python side we can produce chunks of all null arrays from arrow. Contributes to rapidsai#14926 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) - Robert Maynard (https://github.com/robertmaynard) - David Wendt (https://github.com/davidwendt) URL: rapidsai#15904 commit 08552f8 Author: Lawrence Mitchell <[email protected]> Date: Tue Jul 2 03:12:50 2024 +0100 Update cudf-polars for v1 release of polars (rapidsai#16149) Minor changes to the IR, which we adapt to, and request `polars>=1.0` in dependencies. Authors: - Lawrence Mitchell (https://github.com/wence-) - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16149 commit 760c15c Author: Kyle Edwards <[email protected]> Date: Mon Jul 1 14:27:30 2024 -0400 Use verify-alpha-spec hook (rapidsai#16144) With the deployment of rapids-build-backend, we need to make sure our dependencies have alpha specs. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: rapidsai#16144 commit b691b1c Author: David Wendt <[email protected]> Date: Mon Jul 1 14:25:11 2024 -0400 Add stream parameter to cudf::io::text::multibyte_split (rapidsai#16034) Adds stream support the `cudf::io::text::multibyte_split` API. Also adds a stream test and deprecates an overloaded API. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mark Harris (https://github.com/harrism) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16034 commit 5efd72f Author: Matthew Roeschke <[email protected]> Date: Mon Jul 1 07:37:12 2024 -1000 Ensure cudf objects can astype to any type when empty (rapidsai#16106) pandas allows objects to `astype` to any other type if the object is empty. The PR mirrors that behavior for cudf. This PR also more consistently uses `astype` instead of `as_*_column` and fixes a bug in `IntervalDtype.__eq__` discovered when writing a unit test for this bug. Authors: - Matthew Roeschke (https://github.com/mroeschke) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#16106 commit 51fb873 Merge: 599ce95 e932fbd Author: gpuCI <[email protected]> Date: Mon Jul 1 12:17:38 2024 -0400 Merge pull request rapidsai#16145 from rapidsai/branch-24.06 Forward-merge branch-24.06 into branch-24.08 commit e932fbd Author: Vyas Ramasubramani <[email protected]> Date: Mon Jul 1 09:17:32 2024 -0700 Add patch for incorrect cuco noexcept clauses (rapidsai#16077) [cuco previously marked a number of methods as noexcept that can in fact throw exceptions](NVIDIA/cuCollections#510). This causes problems for cudf functions that call these methods. The issue [was fixed in cuco upstream](NVIDIA/cuCollections#511), but we cannot easily update to the latest commit of cuco, especially in a patch fix for 24.06. This PR instead adds a rapids-cmake patch for the cuco clone to address this issue. The patch may be removed once we update to a commit of cuco that contains the necessary fix. Resolves rapidsai#16059 commit 599ce95 Author: Lawrence Mitchell <[email protected]> Date: Mon Jul 1 09:35:35 2024 +0100 Implement handlers for series literal in cudf-polars (rapidsai#16113) A query plan can contain a "literal" polars Series. Often, for example, when calling a contains-like function. To translate these, introduce a new `LiteralColumn` node to capture the concept and add an evaluation rule (converting from arrow). Since list-dtype Series need the same casting treatment as in dataframe scan case, factor the casting out into a utility, and take the opportunity to handled casting of nested lists correctly. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Thomas Li (https://github.com/lithomas1) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16113 commit 3c3edfe Author: Yunsong Wang <[email protected]> Date: Fri Jun 28 13:58:22 2024 -0700 Update implementations to build with the latest cuco (rapidsai#15938) This PR updates existing libcudf to accommodate a cuco breaking change introduced in NVIDIA/cuCollections#479. It helps avoid breaking cudf when bumping the cuco version in `rapids-cmake`. Redundant equal/hash overloads will be removed once the version bump is done on the `rapids-cmake` end. Authors: - Yunsong Wang (https://github.com/PointKernel) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#15938 commit df88cf5 Author: Bradley Dice <[email protected]> Date: Fri Jun 28 15:40:52 2024 -0500 Use size_t to allow large conditional joins (rapidsai#16127) The conditional join kernels were using `cudf::size_type` where `std::size_t` was needed. This PR fixes that bug, which caused `cudaErrorIllegalAddress` as shown in rapidsai#16115. This closes rapidsai#16115. I did not add tests because we typically do not test very large workloads. However, I committed the test and reverted it in this PR, so there is a record of my validation code. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) - https://github.com/nvdbaranec - Yunsong Wang (https://github.com/PointKernel) URL: rapidsai#16127 commit fb12d98 Author: Robert Maynard <[email protected]> Date: Fri Jun 28 12:14:58 2024 -0400 Installed cudf header use cudf::allocate_like (rapidsai#16087) Remove usage of non public cudf::allocate_like from implementations in headers we install Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Nghia Truong (https://github.com/ttnghia) URL: rapidsai#16087 commit 78f4a8a Author: Robert Maynard <[email protected]> Date: Fri Jun 28 11:26:27 2024 -0400 Move common string utilities to public api (rapidsai#16070) As part of rapidsai#15982 a subset of the strings utility functions have been identified as being worth expsosing as part of the cudf public API. The `create_string_vector_from_column`, `get_offset64_threshold`, and `is_large_strings_enabled` are now made part of the public `cudf::strings` api. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - MithunR (https://github.com/mythrocks) - David Wendt (https://github.com/davidwendt) - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Lawrence Mitchell (https://github.com/wence-) URL: rapidsai#16070 commit a4b951a Author: nvdbaranec <[email protected]> Date: Fri Jun 28 10:20:42 2024 -0500 Templatization of fixed-width parquet decoding kernels. (rapidsai#15911) This PR merges all of the fixed-width parquet decoding kernels into a single templatized kernel that can be selectively instantiated with desired features (dictionary/no-dictionary, nested/non-nested, etc). It also adds support for (non-list) nested columns in this path. So structs do not have to use the much slower general decode kernel any more. A new benchmark was added specific to structs containing only fixed width columns. I added this because the performance improvement is fairly high (+20%) but we don't see it in the normal struct benchmarks because they include (and are dominated by) string decode times. The new benchmark shows: Before this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 21071216823 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 18974392387 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 20429356824 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 20572327813 | 598.421 MiB | 16.475 MiB | ``` After this PR: ``` | data_type | io_type | cardinality | run_length | bytes_per_second | peak_memory_usage | encoded_file_size | |-----------|---------------|-------------|------------|------------------|-------------------|-------------------| | STRUCT | DEVICE_BUFFER | 0 | 1 | 25805996399 | 1.047 GiB | 511.675 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 1 | 22422306660 | 821.312 MiB | 128.884 MiB | | STRUCT | DEVICE_BUFFER | 0 | 32 | 24460694014 | 621.787 MiB | 28.141 MiB | | STRUCT | DEVICE_BUFFER | 1000 | 32 | 24674861214 | 598.421 MiB | 16.475 MiB | ``` Split-page decoding for fixed-width types + structs are also going through this new path. New test added. This brings us closer to eliminating the "general" kernel. The only things left that run through it are lists and booleans. This is PR 1 of 2, with the followup moving a lot of code around. At this point, I think it makes sense to start consolidating our files a bit. I also left some breadcrumbs (a few small commented out code blocks) in the core kernel `gpuDecodePageDataGeneric` for the next step of adding list support. They can be removed if people don't like them. Authors: - https://github.com/nvdbaranec Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Vukasin Milovanovic (https://github.com/vuule) - Muhammad Haseeb (https://github.com/mhaseeb123) URL: rapidsai#15911 commit e434fdb Author: David Wendt <[email protected]> Date: Fri Jun 28 10:57:01 2024 -0400 Update libcudf compiler requirements in contributing doc (rapidsai#16103) Updates the compiler requirements in the contributing document. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Bradley Dice (https://github.com/bdice) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16103 commit 565c0d1 Author: Matthew Murray <[email protected]> Date: Fri Jun 28 10:16:55 2024 -0400 Migrate lists/contains to pylibcudf (rapidsai#15981) Part of rapidsai#15162. Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15981 commit c40e0cc Author: Matthew Murray <[email protected]> Date: Fri Jun 28 10:10:31 2024 -0400 Add support for proxy `np.flatiter` objects (rapidsai#16107) Closes rapidsai#15388 Authors: - Matthew Murray (https://github.com/Matt711) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16107 commit 673d766 Author: Paul Mattione <[email protected]> Date: Fri Jun 28 09:38:57 2024 -0400 Make binary operators work between fixed-point and floating args (rapidsai#16116) Some of the binary operators in cuDF don't work between fixed_point and floating-point numbers after [this earlier PR](rapidsai#15438) removed the ability to construct and implicitly cast fixed_point numbers from floating point numbers. This PR restores that functionality by detecting and performing the necessary explicit casts, and adds tests for the supported operators. Note that the `binary_op_has_common_type` code is modeled after `has_common_type` found in traits.hpp. This closes [issue 16090](rapidsai#16090) Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Jayjeet Chakraborty (https://github.com/JayjeetAtGithub) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16116 commit 224ac5b Author: David Wendt <[email protected]> Date: Fri Jun 28 09:26:37 2024 -0400 Add libcudf public/detail API pattern to developer guide (rapidsai#16086) Adds specific description for the public API to detail API function pattern to the libcudf developer guide. Also fixes some formatting issues and broken link. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Shruti Shivakumar (https://github.com/shrshi) - Karthikeyan (https://github.com/karthikeyann) URL: rapidsai#16086 commit 2b547dc Author: Matthew Roeschke <[email protected]> Date: Fri Jun 28 03:11:01 2024 -1000 Add ensure_index to not unnecessarily shallow copy cudf.Index (rapidsai#16117) The `cudf.Index` constructor will shallow copy a `cudf.Index` input. Sometimes, we just need to make sure an input is a `cudf.Index`, so created `ensure_index` (pandas has something similar) so we don't shallow copy these inputs unnecessarily Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) URL: rapidsai#16117 commit 57862a3 Author: Robert Maynard <[email protected]> Date: Fri Jun 28 08:43:12 2024 -0400 stable_distinct public api now has a stream parameter (rapidsai#16068) As part of rapidsai#15982 we determined that the cudf `stable_distinct` public API needs to be updated so that a user provided stream can be provided. Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Nghia Truong (https://github.com/ttnghia) - Srinivas Yadav (https://github.com/srinivasyadav18) - Bradley Dice (https://github.com/bdice) URL: rapidsai#16068 commit 6b04fd3 Author: Mads R. B. Kristensen <[email protected]> Date: Fri Jun 28 12:31:18 2024 +0200 Memory Profiling (rapidsai#15866) Use [RMM's new memory profiler](rapidsai/rmm#1563) to profile all functions already decorated with `_cudf_nvtx_annotate`. Example ```python import cudf from cudf.utils.performance_tracking import print_memory_report cudf.set_option("memory_profiling", True) df1 = cudf.DataFrame({"a": [1, 2, 3]}) df2 = cudf.DataFrame({"a": [2, 2, 3]}) df3 = df1.merge(df2) print_memory_report() ``` Output: ``` Memory Profiling ================ Ordered by: memory_peak ncalls memory_peak memory_total filename:lineno(function) 1 272 688 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:4072(DataFrame.merge) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1043(DataFrame._init_from_dict_like) 2 32 64 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:690(DataFrame.__init__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:1131(DataFrame._align_input_series_indices) 7 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:214(RangeIndex.__init__) 6 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:424(RangeIndex.__len__) 4 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:271(Frame.__len__) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/dataframe.py:3195(DataFrame._insert) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:270(RangeIndex.name) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/index.py:369(RangeIndex.copy) 5 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:134(Frame._from_data) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/frame.py:1039(Frame._copy_type_metadata) 2 0 0 /home/mkristensen/apps/miniforge3/envs/rmm-cudf-0527/lib/python3.11/site-packages/cudf/core/indexed_frame.py:315(IndexedFrame._from_columns_like_self) ``` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#15866 commit e35da6b Author: Lawrence Mitchell <[email protected]> Date: Fri Jun 28 09:54:03 2024 +0100 Implement Ternary copy_if_else (rapidsai#16114) A straightforward evaluation using `copy_if_else`. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - https://github.com/brandon-b-miller URL: rapidsai#16114 commit c847b98 Author: Lawrence Mitchell <[email protected]> Date: Thu Jun 27 21:33:29 2024 +0100 Finish implementation of cudf-polars boolean function handlers (rapidsai#16098) The missing nodes were `is_in`, `not` (both easy), `is_finite` and `is_infinite` (obtained by translating to `contains` calls). While here, remove the implementation of `IsBetween` and just translate to an expression with binary operations. This removes the need for special-casing scalar arguments to `IsBetween` and reproducing the code for binop evaluation. Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16098 commit 2ed69c9 Author: Matthew Roeschke <[email protected]> Date: Thu Jun 27 10:11:09 2024 -1000 Ensure MultiIndex.to_frame deep copies columns (rapidsai#16110) Additionally, this allows simplification in `MultiIndex.__repr__` which avoids a shallow copy and also caught a bug where `NaT` was not supposed to be quoted Authors: - Matthew Roeschke (https://github.com/mroeschke) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: rapidsai#16110 commit a71c249 Author: GALI PREM SAGAR <[email protected]> Date: Thu Jun 27 14:29:31 2024 -0500 Fix dtype errors in `StringArrays` (rapidsai#16111) This PR adds proxy classes for `ArrowStringArray` and `ArrowStringArrayNumpySemantics` that will increase the pandas test pass rate by 1%. Authors: - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#16111

Start migrating I/O writers to pylibcudf (starting with JSON)

591cdd2

lithomas1 added feature request New feature or request non-breaking Non-breaking change labels Jun 6, 2024

github-actions bot added Python Affects Python cuDF API. CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Jun 6, 2024

mroeschke reviewed Jun 7, 2024

View reviewed changes

python/cudf/cudf/_lib/json.pyx Outdated Show resolved Hide resolved

lithomas1 added 5 commits June 7, 2024 16:02

update docs

15daaaa

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

72204f1

…f-io-writers

update and start writing tests

c24664c

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

8c88c7c

…f-io-writers

add some tests

2b3853f

lithomas1 marked this pull request as ready for review June 11, 2024 16:50

lithomas1 requested a review from a team as a code owner June 11, 2024 16:50

lithomas1 requested review from wence- and brandon-b-miller June 11, 2024 16:50

lithomas1 commented Jun 11, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Show resolved Hide resolved

python/cudf/cudf/pylibcudf_tests/test_json.py Outdated Show resolved Hide resolved

lithomas1 added 2 commits June 11, 2024 17:00

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

cd6df5e

…f-io-writers

update

c54316e

lithomas1 mentioned this pull request Jun 11, 2024

[FEA] Implement all libcudf modules required by cuDF Python in pylibcudf #15162

Open

mroeschke reviewed Jun 12, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Outdated Show resolved Hide resolved

mroeschke reviewed Jun 12, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Outdated Show resolved Hide resolved

lithomas1 added 5 commits June 12, 2024 17:49

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

dc93356

…f-io-writers

address comments

8c4c4e4

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

63358e9

…f-io-writers

try something else

9150a6c

try fix

b1951d0

wence- requested changes Jun 13, 2024

View reviewed changes

lithomas1 added 2 commits June 13, 2024 18:20

update following feedback

1228569

cleanup tests

699efd3

lithomas1 added 2 commits June 24, 2024 17:19

Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into…

53b821c

… pylibcudf-io-writers

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

186a2fb

…f-io-writers

lithomas1 requested a review from wence- June 24, 2024 17:44

lithomas1 and others added 3 commits June 25, 2024 19:12

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

9a6a896

…f-io-writers

Fix error in testing utils

0ed9af6

Co-authored-by: Lawrence Mitchell <[email protected]>

small test fixes

aff6178

cleanup utils

f7cd9e6

wence- approved these changes Jun 26, 2024

View reviewed changes

lithomas1 requested a review from vyasr June 26, 2024 23:47

lithomas1 added 3 commits June 27, 2024 18:25

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

c5a3fbe

…f-io-writers

clean source_or_sink

79c1dfd

Merge branch 'pylibcudf-io-writers' of github.com:lithomas1/cudf into…

8fc139f

… pylibcudf-io-writers

vyasr requested changes Jun 27, 2024

View reviewed changes

Address code review

e940e30

Co-authored-by: Vyas Ramasubramani <[email protected]>

lithomas1 requested a review from vyasr June 27, 2024 22:47

vyasr requested changes Jun 28, 2024

View reviewed changes

lithomas1 added 2 commits June 29, 2024 00:26

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

e57a677

…f-io-writers

simplify again

7806ce4

vyasr approved these changes Jul 1, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Outdated Show resolved Hide resolved

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Outdated Show resolved Hide resolved

lithomas1 added 2 commits July 1, 2024 17:31

Merge branch 'branch-24.08' of github.com:rapidsai/cudf into pylibcud…

25c25d4

…f-io-writers

address more comments

60287e1

vyasr reviewed Jul 2, 2024

View reviewed changes

python/cudf/cudf/_lib/pylibcudf/io/types.pyx Outdated Show resolved Hide resolved

vyasr and others added 2 commits July 1, 2024 22:24

Update python/cudf/cudf/_lib/pylibcudf/io/types.pyx

205c32c

Merge branch 'branch-24.08' into pylibcudf-io-writers

d325b64

rapids-bot bot merged commit 1a4c2aa into rapidsai:branch-24.08 Jul 2, 2024
78 checks passed

lithomas1 deleted the pylibcudf-io-writers branch July 2, 2024 14:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start migrating I/O writers to pylibcudf (starting with JSON) #15952

Start migrating I/O writers to pylibcudf (starting with JSON) #15952

lithomas1 commented Jun 6, 2024 •

edited

Loading

wence- left a comment

wence- Jun 13, 2024

lithomas1 Jun 13, 2024 •

edited

Loading

wence- Jun 13, 2024

wence- Jun 26, 2024

lithomas1 Jun 26, 2024 •

edited

Loading

wence- Jun 27, 2024 •

edited

Loading

lithomas1 Jun 27, 2024

lithomas1 commented Jun 25, 2024

wence- left a comment

vyasr left a comment

vyasr Jun 28, 2024

lithomas1 commented Jun 29, 2024

vyasr left a comment

lithomas1 commented Jul 2, 2024

Start migrating I/O writers to pylibcudf (starting with JSON) #15952

Start migrating I/O writers to pylibcudf (starting with JSON) #15952

Conversation

lithomas1 commented Jun 6, 2024 • edited Loading

Description

Checklist

wence- left a comment

Choose a reason for hiding this comment

wence- Jun 13, 2024

Choose a reason for hiding this comment

lithomas1 Jun 13, 2024 • edited Loading

Choose a reason for hiding this comment

wence- Jun 13, 2024

Choose a reason for hiding this comment

wence- Jun 26, 2024

Choose a reason for hiding this comment

lithomas1 Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

wence- Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

lithomas1 Jun 27, 2024

Choose a reason for hiding this comment

lithomas1 commented Jun 25, 2024

wence- left a comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vyasr Jun 28, 2024

Choose a reason for hiding this comment

lithomas1 commented Jun 29, 2024

vyasr left a comment

Choose a reason for hiding this comment

lithomas1 commented Jul 2, 2024

lithomas1 commented Jun 6, 2024 •

edited

Loading

lithomas1 Jun 13, 2024 •

edited

Loading

lithomas1 Jun 26, 2024 •

edited

Loading

wence- Jun 27, 2024 •

edited

Loading