Releases · rapidsai/cudf

21 Nov 23:18

rapids-bot

v25.02.00a

01222c1

[NIGHTLY] v25.02.00 Pre-release

Pre-release

🔗 Links

📖 Documentation

Document interpreter install command for cudf.pandas (#17358) @bdice

🛠️ Improvements

Expose stream-ordering to interop APIs (#17397) @shrshi
Forward-merge branch-24.12 to branch-25.02 (#17379) @bdice

Contributors

bdice and shrshi

Assets 2

29 Oct 18:24

raydouglass

v24.10.01

7b0adfa

v24.10.01 Latest

Latest

This hotfix corrected some python packaging issues.

Full Changelog: v24.10.00...v24.10.01

Assets 2

09 Oct 15:25

raydouglass

v24.10.00

67193a8

v24.10.00

🚨 Breaking Changes

Whitespace normalization of nested column coerced as string column in JSONL inputs (#16759) @shrshi
Add libcudf wrappers around current_device_resource functions. (#16679) @harrism
Fix empty cluster handling in tdigest merge (#16675) @jihoonson
Remove java ColumnView.copyWithBooleanColumnAsValidity (#16660) @revans2
Support reading multiple PQ sources with mismatching nullability for columns (#16639) @mhaseeb123
Remove arrow_io_source (#16607) @vyasr
Remove legacy Arrow interop APIs (#16590) @vyasr
Remove NativeFile support from cudf Python (#16589) @vyasr
Revert "Make proxy NumPy arrays pass isinstance check in cudf.pandas" (#16586) @Matt711
Align public utility function signatures with pandas 2.x (#16565) @mroeschke
Disallow cudf.Index accepting column in favor of ._from_column (#16549) @mroeschke
Refactor dictionary encoding in PQ writer to migrate to the new cuco::static_map (#16541) @mhaseeb123
Change IPv4 convert APIs to support UINT32 instead of INT64 (#16489) @davidwendt
enable list to be forced as string in JSON reader. (#16472) @karthikeyann
Disallow cudf.Series to accept column in favor of ._from_column (#16454) @mroeschke
Align groupby APIs with pandas 2.x (#16403) @mroeschke
Align misc DataFrame and MultiIndex methods with pandas 2.x (#16402) @mroeschke
Align Index APIs with pandas 2.x (#16361) @mroeschke
Add stream param to stream compaction APIs (#16295) @JayjeetAtGithub

🐛 Bug Fixes

Add license to the pylibcudf wheel (#16976) @raydouglass
Parse newline as whitespace character while tokenizing JSONL inputs with non-newline delimiter (#16950) @shrshi
Add dask-cudf workaround for missing rename_axis support in cudf (#16899) @rjzamora
Update oldest deps for pyarrow & numpy (#16883) @galipremsagar
Update labeler for pylibcudf (#16868) @vyasr
Revert "Refactor mixed_semi_join using cuco::static_set" (#16855) @mhaseeb123
Fix metadata after implicit array conversion from Dask cuDF (#16842) @rjzamora
Add cudf.pandas dependencies.yaml to update-version.sh (#16840) @raydouglass
Use cupy 12.2.0 as oldest dependency pinning on CUDA 12 ARM (#16808) @bdice
Revert "Fix empty cluster handling in tdigest merge (#16675)" (#16800) @jihoonson
Intentionally leak thread_local CUDA resources to avoid crash (part 1) (#16787) @kingcrimsontianyu
Fix cov/corr bug in dask-cudf (#16786) @rjzamora
Fix slice_strings wide strings logic with multi-byte characters (#16777) @davidwendt
Fix nvbench output for sha512 (#16773) @davidwendt
Allow read_csv(header=None) to return int column labels in mode.pandas_compatible (#16769) @mroeschke
Whitespace normalization of nested column coerced as string column in JSONL inputs (#16759) @shrshi
Fix DataFrame.drop(columns=cudf.Series/Index, axis=1) (#16712) @mroeschke
Use merge base when calculating changed files (#16709) @KyleFromNVIDIA
Ensure we pass the has_nulls tparam to mixed_join kernels (#16708) @abellina
Add boost-devel to Java CI Docker image (#16707) @jlowe
[BUG] Add gpu node type to cudf-pandas 3rd-party integration nightly CI job (#16704) @Matt711
Fix typo in column_factories.hpp comment from 'depth 1' to 'depth 2' (#16700) @a-hirota
Fix Series.to_frame(name=None) setting a None name (#16698) @mroeschke
Disable gtests/ERROR_TEST during compute-sanitizer memcheck test (#16691) @davidwendt
Enable batched multi-source reading of JSONL files with large records (#16687) @shrshi
Handle ordered parameter in CategoricalIndex.__repr__ (#16683) @galipremsagar
Fix loc/iloc.setitem[:, loc] with non cupy types (#16677) @mroeschke
Fix empty cluster handling in tdigest merge (#16675) @jihoonson
Fix cudf::rank not getting enough params (#16666) @JayjeetAtGithub
Fix slowdown in CategoricalIndex.__repr__ (#16665) @galipremsagar
Remove java ColumnView.copyWithBooleanColumnAsValidity (#16660) @revans2
Fix slowdown in DataFrame repr in jupyter notebook (#16656) @galipremsagar
Preserve Series name in duplicated method. (#16655) @bdice
Fix interval_range right child non-zero offset (#16651) @mroeschke
fix libcudf wheel publishing, make package-type explicit in wheel publishing (#16650) @jameslamb
Revert "Hide all gtest symbols in cudftestutil (#16546)" (#16644) @robertmaynard
Fix integer overflow in indexalator pointer logic (#16643) @davidwendt
Allow for binops between two differently sized DecimalDtypes (#16638) @mroeschke
Move pragma once in rolling/jit/operation.hpp. (#16636) @bdice
Fix overflow bug in low-memory JSON reader (#16632) @shrshi
Add the missing num_aggregations axis for groupby_max_cardinality (#16630) @PointKernel
Fix strings::detail::copy_range when target contains nulls (#16626) @davidwendt
Fix function parameters with common dependency modified during their evaluation (#16620) @ttnghia
bug-fix: Don't enable the CUDA language if testing was requested when finding cudf (#16615) @cryos
bug-fix: cudf/io/json.hpp use after move (#16609) @NicolasDenoyelle
Remove CUDA whole compilation ODR violations (#16603) @robertmaynard
MAINT: Adapt to numpy hiding flagsobject away (#16593) @seberg
Revert "Make proxy NumPy arrays pass isinstance check in cudf.pandas" (#16586) @Matt711
Switch python version to 3.10 in cudf.pandas pandas test scripts (#16559) @galipremsagar
Hide all gtest symbols in cudftestutil (#16546) @robertmaynard
Update the java code to properly deal with lists being returned as strings (#16536) @revans2
Register read_parquet and read_csv with dask-expr (#16535) @rjzamora
Change cudf::empty_like to not include offsets for empty strings columns (#16529) @davidwendt
Fix DataFrame reductions with median returning scalar instead of Series (#16527) @mroeschke
Allow DataFrame.sort_values(by=) to select an index level (#16519) @mroeschke
Fix date_range(start, end, freq) when end-start is divisible by freq (#16516) @mroeschke
Preserve array name in MultiIndex.from_arrays (#16515) @mroeschke
Disallow indexing by selecting duplicate labels (#16514) @mroeschke
Fix .replace(Index, Index) raising a TypeError (#16513) @mroeschke
Check index bounds in compact protocol reader. (#16493) @bdice
Fix build failures with GCC 13 (#16488) @PointKernel
Fix all-empty input column for strings split APIs (#16466) @davidwendt
Fix segmented-sort overlapped input/output indices (#16463) @davidwendt
Fix merge conflict for auto merge 16447 (#16449) @davidwendt

📖 Documentation

Fix links in Dask cuDF documentation (#16929) @rjzamora
Improve aggregation documentation (#16822) @PointKernel
Add best practices page to Dask cuDF docs (#16821) @rjzamora
[DOC] Update Pylibcudf doc strings (#16810) @Matt711
Recommending miniforge for conda install (#16782) @mmccarty
Add labeling pylibcudf doc pages (#16779) @mroeschke
Migrate dask-cudf README improvements to dask-cudf sphinx docs (#16765) @rjzamora
[DOC] Remove out of date section from cudf.pandas docs (#16697) @Matt711
Add performance tips to cudf.pandas FAQ. (#16693) @bdice
Update documentation for Dask cuDF (#16671) @rjzamora
Add missing pylibcudf strings docs (#16471) @brandon-b-miller
DOC: Refresh pylibcudf guide (#15856) @lithomas1

🚀 New Features

Build cudf-polars with build.sh (#16898) @brandon-b-miller
Add polars to "all" dependency list. (#16875) @bdice
nvCOMP GZIP integration (#16770) @vuule
[FEA] Add support for cudf.NamedAgg (#16744) @Matt711
Add experimental filesystem="arrow" support in dask_cudf.read_parquet (#16684) @rjzamora
Relax Arrow pin (#16681) @vyasr
Add libcudf wrappers around current_device_resource functions. (#16679) @harrism
Move NDS-H examples into benchmarks (#16663) @JayjeetAtGithub
[FEA] Add third-party library integration testing of cudf.pandas to cudf (#16645) @Matt711
Make isinstance check pass for proxy ndarrays (#16601) @Matt711
[FEA] Add an environment variable to fail on fallback in cudf.pandas (#16562) @Matt711
[FEA] Add support for cudf.unique (#16554) @Matt711
[FEA] Support named aggregations in df.groupby().agg() (#16528) @Matt711
Change IPv4 convert APIs to support UINT32 instead of INT64 (#16489) @davidwendt
enable list to be forced as string in JSON reader. (#16472) @karthikeyann
Remove cuDF dependency from pylibcudf column from_device tests (#16441) @brandon-b-miller
Enable cudf.pandas REPL and -c command support (#16428) @bdice
Setup pylibcudf package (#16299) @lithomas1
Add a libcudf/thrust-based TPC-H derived datagen (#16294) @JayjeetAtGithub
Make proxy NumPy arrays pass isinstance check in cudf.pandas (#16286) @Matt711
Add skiprows and nrows to parquet reader (#16214) @lithomas1
Upgrade to nvcomp 4.0.1 (#16076) @vuule
Migrate ORC reader to pylibcudf (#16042) @lithomas1
JSON reader validation of values (#15968) @karthikeyann
Implement exposed null mask APIs in pylibcudf (#15908) @charlesbluca
Word-based nvtext::minhash function (#15368) @davidwendt

🛠️ Improvements

Make tests deterministic (#16910) @galipremsagar
Update update-version.sh to use packaging lib (#16891) @AyodeAwe
Pin polars for 24.10 and update polars test suite xfail list (#16886) @wence-
Add in support for setting delim when parsing JSON through java (#16867) (#16880) @revans2
Remove unnecessary flag from build.sh (#16879) @vyasr
Ignore numba warning specific to ARM runners (#16872) @galipremsagar
Display deltas for cudf.pandas test summary (#16864) @galipremsagar
Switch to using native traceback (#16851) @galipremsagar
JSON tree algorithm code reorg (#16836) @karthikeyann
Add string.repeats API to pylibcudf (#16834) @mroeschke
Use CI workflow branch 'branch-24.10' again (#16832) @jameslamb
Rename the NDS-H benchmark binaries (#16831) @JayjeetAtGithub
Add string.findall APIs t...

Contributors

msarahan, cryos, and 40 other contributors

Assets 2

27 Sep 16:13

rapids-bot

v24.12.00a

83f9d2b

[NIGHTLY] v24.12.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Fix reading Parquet string cols when nrows and input_pass_limit > 0 (#17321) @mhaseeb123
prefer wheel-provided libcudf.so in load_library(), use RTLD_LOCAL (#17316) @jameslamb
Deprecate single component extraction methods in libcudf (#17221) @Matt711
Move detail header floating_conversion.hpp to detail subdirectory (#17209) @davidwendt
Refactor Dask cuDF legacy code (#17205) @rjzamora
Make HostMemoryBuffer call into the DefaultHostMemoryAllocator (#17204) @revans2
Remove java reservation (#17189) @revans2
Separate evaluation logic from IR objects in cudf-polars (#17175) @rjzamora
Upgrade to polars 1.11 in cudf-polars (#17154) @wence-
Remove the additional host register calls initially intended for performance improvement on Grace Hopper (#17092) @kingcrimsontianyu
Correctly set is_device_accesible when creating host_spans from other container/span types (#17079) @vuule
Unify treatment of Expr and IR nodes in cudf-polars DSL (#17016) @wence-
Deprecate support for directly accessing logger (#16964) @vyasr
Made cudftestutil header-only and removed GTest dependency (#16839) @lamarrr

🐛 Bug Fixes

Ignore errors when testing glibc versions (#17389) @vyasr
Adapt to KvikIO API change in the compatibility mode (#17377) @kingcrimsontianyu
Support pivot with index or column arguments as lists (#17373) @mroeschke
Deselect failing polars tests (#17362) @pentschev
Fix integer overflow in compiled binaryop (#17354) @wence-
Update cmake to 3.28.6 in JNI Dockerfile (#17342) @jlowe
fix library-loading issues in editable installs (#17338) @jameslamb
Bug fix: restrict lines=True to JSON format in Kafka read_gdf method (#17333) @a-hirota
Fix various issues with replace API and add support in datetime and timedelta columns (#17331) @galipremsagar
Do not exclude nanoarrow and flatbuffers from installation if statically linked (#17322) @hyperbolic2346
Fix reading Parquet string cols when nrows and input_pass_limit > 0 (#17321) @mhaseeb123
Remove another reference to FindcuFile (#17315) @KyleFromNVIDIA
Fix reading of single-row unterminated CSV files (#17305) @vuule
Fixed lifetime issue in ast transform tests (#17292) @lamarrr
Switch to using TaskSpec (#17285) @galipremsagar
Fix data_type ctor call in JSON_TEST (#17273) @davidwendt
Expose delimiter character in JSON reader options to JSON reader APIs (#17266) @shrshi
Fix extract-datetime deprecation warning in ndsh benchmark (#17254) @davidwendt
Disallow cuda-python 12.6.1 and 11.8.4 (#17253) @bdice
Wrap custom iterator result (#17251) @galipremsagar
Fix binop with LHS numpy datetimelike scalar (#17226) @mroeschke
Fix Dataframe.__setitem__ slow-downs (#17222) @galipremsagar
Fix groupby.get_group with length-1 tuple with list-like grouper (#17216) @mroeschke
Fix discoverability of submodules inside pd.util (#17215) @galipremsagar
Fix Schema.Builder does not propagate precision value to Builder instance (#17214) @ttnghia
Mark column chunks in a PQ reader pass as large strings when the cumulative offsets exceeds the large strings threshold. (#17207) @mhaseeb123
[BUG] Replace repo_token with github_token in Auto Assign PR GHA (#17203) @Matt711
Remove unsanitized nulls from input strings columns in reduction gtests (#17202) @davidwendt
Fix to_parquet append behavior with global metadata file (#17198) @rjzamora
Check num_children() == 0 in Column.from_column_view (#17193) @cwharris
Fix host-to-device copy missing sync in strings/duration convert (#17149) @davidwendt
Add JNI Support for Multi-line Delimiters and Include Test (#17139) @SurajAralihalli
Ignore loud dask warnings about legacy dataframe implementation (#17137) @galipremsagar
Fix the GDS read/write segfault/bus error when the cuFile policy is set to GDS or ALWAYS (#17122) @kingcrimsontianyu
Fix DataFrame._from_arrays and introduce validations (#17112) @galipremsagar
[Bug] Fix Arrow-FS parquet reader for larger files (#17099) @rjzamora
Fix bug in recovering invalid lines in JSONL inputs (#17098) @shrshi
Reenable huge pages for arrow host copying (#17097) @vyasr
Correctly set is_device_accesible when creating host_spans from other container/span types (#17079) @vuule
Fix ORC reader when using device_read_async while the destination device buffers are not ready (#17074) @ttnghia
Fix regex handling of fixed quantifier with 0 range (#17067) @davidwendt
Limit the number of keys to calculate column sizes and page starts in PQ reader to 1B (#17059) @mhaseeb123
Adding assertion to check for regular JSON inputs of size greater than INT_MAX bytes (#17057) @shrshi
bug fix: use self.ck_consumer in poll method of kafka.py to align with __init__ (#17044) @a-hirota
Disable kvikio remote I/O to avoid openssl dependencies in JNI build (#17026) @pxLi
Fix host_span constructor to correctly copy is_device_accessible (#17020) @vuule
Add pinning for pyarrow in wheels (#17018) @vyasr
Use std::optional for host types (#17015) @robertmaynard
Fix write_json to handle empty string column (#16995) @karthikeyann
Restore export of nvcomp outside of wheel builds (#16988) @KyleFromNVIDIA
Allow melt(var_name=) to be a falsy label (#16981) @mroeschke
Fix astype from tz-aware type to tz-aware type (#16980) @mroeschke
Use libcudf wheel from PR rather than nightly for polars-polars CI test job (#16975) @brandon-b-miller
Fix order-preservation in pandas-compat unsorted groupby (#16942) @wence-
Fix cudf::strings::findall error with empty input (#16928) @davidwendt
Fix JsonLargeReaderTest.MultiBatch use of LIBCUDF_JSON_BATCH_SIZE env var (#16927) @davidwendt
Parse newline as whitespace character while tokenizing JSONL inputs with non-newline delimiter (#16923) @shrshi
Respect groupby.nunique(dropna=False) (#16921) @mroeschke
Update all rmm imports to use pylibrmm/librmm (#16913) @Matt711
Fix order-preservation in cudf-polars groupby (#16907) @wence-
Add a shortcut for when the input clusters are all empty for the tdigest merge (#16897) @jihoonson
Properly handle the mapped and registered regions in memory_mapped_source (#16865) @vuule
Fix performance regression for generate_character_ngrams (#16849) @davidwendt
Fix regex parsing logic handling of nested quantifiers (#16798) @davidwendt
Compute whole column variance using numerically stable approach (#16448) @wence-

📖 Documentation

Add documentation for low memory readers (#17314) @btepera
Fix the example in documentation for get_dremel_data() (#17242) @mhaseeb123
Fix some documentation rendering for pylibcudf (#17217) @mroeschke
Move detail header floating_conversion.hpp to detail subdirectory (#17209) @davidwendt
Add TokenizeVocabulary to api docs (#17208) @davidwendt
Add jaccard_index to generated cuDF docs (#17199) @davidwendt
[no ci] Add empty-columns section to the libcudf developer guide (#17183) @davidwendt
Add 2-cpp approvers text to contributing guide [no ci] (#17182) @davidwendt
Changing developer guide int_64_t to int64_t (#17130) @hyperbolic2346
docs: change 'CSV' to 'csv' in python/custreamz/README.md to match kafka.py (#17041) @a-hirota
[DOC] Document limitation using cudf.pandas proxy arrays (#16955) @Matt711
[DOC] Document environment variable for failing on fallback in cudf.pandas (#16932) @Matt711

🚀 New Features

Add version config (#17312) @vyasr
Java JNI for Multiple contains (#17281) @res-life
Add cudf::calendrical_month_sequence to pylibcudf (#17277) @Matt711
Raise errors on specific types of fallback in cudf.pandas (#17268) @Matt711
Add catboost to the third-party integration tests (#17267) @Matt711
Add type stubs for pylibcudf (#17258) @wence-
Use pylibcudf contiguous split APIs in cudf python (#17246) @Matt711
Upgrade nvcomp to 4.1.0.6 (#17201) @bdice
Added Arrow Interop Benchmarks (#17194) @lamarrr
Rewrite Java API Table.readJSON to return the output from libcudf read_json directly (#17180) @ttnghia
Support storing precision of decimal types in Schema class (#17176) @ttnghia
Migrate CSV writer to pylibcudf (#17163) @Matt711
Add compute_shared_memory_aggs used by shared memory groupby (#17162) @PointKernel
Added ast tree to simplify expression lifetime management (#17156) @lamarrr
Add compute_mapping_indices used by shared memory groupby (#17147) @PointKernel
Add remaining datetime APIs to pylibcudf (#17143) @Matt711
Added strings AST vs BINARY_OP benchmarks (#17128) @lamarrr
Use libcudf_exception_handler throughout pylibcudf.libcudf (#17109) @brandon-b-miller
Include timezone file path in error message (#17102) @bdice
Migrate NVText Byte Pair Encoding APIs to pylibcudf (#17101) @Matt711
Migrate NVText Tokenizing APIs to pylibcudf (#17100) @Matt711
Migrate NVtext subword tokenizing APIs to pylibcudf (#17096) @Matt711
Migrate NVText Stemming APIs to pylibcudf (#17085) @Matt711
Migrate NVText Replacing APIs to pylibcudf (#17084) @Matt711
Add IWYU to CI (#17078) @vyasr
cudf-polars string/numeric casting (#17076) @brandon-b-miller
Migrate NVText Normalizing APIs to Pylibcudf (#17072) @Matt711
Migrate remaining nvtext NGrams APIs to pylibcudf (#17070) @Matt711
Add profilers to CUDA 12 conda devcontainers (#17066) @vyasr
Add conda recipe for cudf-polars (#17037) @bdice
Implement batch construction for strings columns (#17035) @ttnghia
Add device aggregators used by shared memory groupby (#17031) @PointKernel
Add optional column_order in JSON reader (#17029) @karthikeyann
Migrate Min Hashing APIs to pylibcudf (#17021) @Matt711
Reorganize cudf_polars expression code (#17014) @brandon-b-miller
Migrate nvtext jacca...

Contributors

msarahan, robertmaynard, and 38 other contributors

Assets 2

17 Sep 00:25

raydouglass

v24.08.03

e479454

v24.08.03

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Creation of CI artifacts for cudf-polars wheels (#16680) @wence-
Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/c...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

14 Aug 22:39

raydouglass

v24.08.02

e776742

v24.08.02

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/count_elements to pylibcudf (#16072) @Matt711
Migrate lists/extrac...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

07 Aug 16:43

raydouglass

v24.08.00

4afeb5a

v24.08.00

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @JayjeetAtGithub
Installed cudf header use cudf::allocate_like (#16087) @robertmaynard
cudf-polars string slicing (#16082) @brandon-b-miller
Migrate Parquet reader to pylibcudf (#16078) @lithomas1
Migrate lists/count_elements to pylibcudf (#16072) @Matt711
Migrate lists/extract to pylibcudf (#16071) @Matt711
Move common string utilities to pu...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

02 Jul 15:37

raydouglass

v24.06.01

4e59b79

v24.06.01

🚨 Breaking Changes

Deprecate Groupby.collect (#15808) @galipremsagar
Raise FileNotFoundError when a literal JSON string that looks like a json filename is passed (#15806) @lithomas1
Support filtered I/O in chunked_parquet_reader and simplify the use of parquet_reader_options (#15764) @mhaseeb123
Raise errors for unsupported operations on certain types (#15712) @galipremsagar
Support DurationType in cudf parquet reader via arrow:schema (#15617) @mhaseeb123
Remove protobuf and use parsed ORC statistics from libcudf (#15564) @bdice
Remove legacy JSON reader from Python (#15538) @bdice
Removing all batching code from parquet writer (#15528) @mhaseeb123
Convert libcudf resource parameters to rmm::device_async_resource_ref (#15507) @harrism
Remove deprecated strings offsets_begin (#15454) @davidwendt
Floating <--> fixed-point conversion must now be called explicitly (#15438) @pmattione-nvidia
Bind read_parquet_metadata API to libcudf instead of pyarrow and extract RowGroup information (#15398) @mhaseeb123
Remove deprecated hash() and spark_murmurhash3_x86_32() (#15375) @davidwendt
Remove empty elements from exploded character-ngrams output (#15371) @davidwendt
[FEA] Performance improvement for mixed left semi/anti join (#15288) @tgujar
Align date_range defaults with pandas, support tz (#15139) @mroeschke

🐛 Bug Fixes

Backport: Use size_t to allow large conditional joins (#16127) (#16133) @bdice
Backport #16045 to 24.06 (#16102) @vyasr
Backport #16038 to 24.06 (#16101) @vyasr
Backport: Fix segfault in conditional join (#16094) (#16100) @bdice
Add patch for incorrect cuco noexcept clauses (#16077) @vyasr
Revert "Fix docs for IO readers and strings_convert" (#15872) @vyasr
Remove problematic call of index setter to unblock dask-cuda CI (#15844) @charlesbluca
Use rapids_cpm_nvtx3 to get same nvtx3 target state as rmm (#15840) @robertmaynard
Return boolean from config_host_memory_resource instead of throwing (#15815) @abellina
Add temporary dask-cudf workaround for categorical sorting (#15801) @rjzamora
Fix row group alignment in ORC writer (#15789) @vuule
Raise error when sorting by categorical column in dask-cudf (#15788) @rjzamora
Upgrade arrow to 16.1 (#15787) @galipremsagar
Add support for PandasArray for pandas<2.1.0 (#15786) @galipremsagar
Limit runtime dependency to libarrow>=16.0.0,<16.1.0a0 (#15782) @pentschev
Fix cat.as_ordered not propogating correct size (#15780) @mroeschke
Handle mixed-like homogeneous types in isin (#15771) @galipremsagar
Fix id_vars and value_vars not accepting string scalars in melt (#15765) @mroeschke
Fix DatetimeIndex.loc for all types of ordering cases (#15761) @galipremsagar
Fix arrow versioning logic (#15755) @vyasr
Avoid running sanitizer on Java test designed to cause an error (#15753) @jlowe
Handle empty dataframe object with index present in setitem of loc (#15752) @galipremsagar
Eliminate circular reference in DataFrame/Series.iloc/loc (#15749) @mroeschke
Cap the absolute row index per pass in parquet chunked reader. (#15735) @nvdbaranec
Fix Index.repeat for datetime64 types (#15722) @galipremsagar
Fix multibyte check for case convert for large strings (#15721) @davidwendt
Fix get_loc to properly fetch results from an index that is in decreasing order (#15719) @galipremsagar
Return same type as the original index for .loc operations (#15717) @galipremsagar
Correct static builds + static arrow (#15715) @robertmaynard
Raise errors for unsupported operations on certain types (#15712) @galipremsagar
Fix ColumnAccessor caching of nrows if empty previously (#15710) @mroeschke
Allow None when nan_as_null=False in column constructor (#15709) @galipremsagar
Refine CudaTest.testCudaException in case throwing wrong type of CudaError under aarch64 (#15706) @sperlingxx
Fix maxima of categorical column (#15701) @rjzamora
Add proxy for inplace operations in cudf.pandas (#15695) @galipremsagar
Make nan_as_null behavior consistent across all APIs (#15692) @galipremsagar
Fix CI s3 api command to fetch latest results (#15687) @galipremsagar
Add NumpyExtensionArray proxy type in cudf.pandas (#15686) @galipremsagar
Properly implement binaryops for proxy types (#15684) @galipremsagar
Fix copy assignment and the comparison operator of rmm_host_allocator (#15677) @vuule
Fix multi-source reading in JSON byte range reader (#15671) @shrshi
Return int64 when pandas compatible mode is turned on for get_indexer (#15659) @galipremsagar
Fix Index contains for error validations and float vs int comparisons (#15657) @galipremsagar
Preserve sub-second data for time scalars in column construction (#15655) @galipremsagar
Check row limit size in cudf::strings::join_strings (#15643) @davidwendt
Enable sorting on column with nulls using query-planning (#15639) @rjzamora
Fix operator precedence problem in Parquet reader (#15638) @etseidl
Fix decoding of dictionary encoded FIXED_LEN_BYTE_ARRAY data in Parquet reader (#15601) @etseidl
Fix debug warnings/errors in from_arrow_device_test.cpp (#15596) @davidwendt
Add "collect" aggregation support to dask-cudf (#15593) @rjzamora
Fix categorical-accessor support and testing in dask-cudf (#15591) @rjzamora
Disable compute-sanitizer usage in CI tests with CUDA<11.6 (#15584) @davidwendt
Preserve RangeIndex.step in to_arrow/from_arrow (#15581) @mroeschke
Ignore new cupy warning (#15574) @vyasr
Add cuda-sanitizer-api dependency for test-cpp matrix 11.4 (#15573) @davidwendt
Allow apply udf to reference global modules in cudf.pandas (#15569) @mroeschke
Fix deprecation warnings for json legacy reader (#15563) @davidwendt
Fix millisecond resampling in cudf Python (#15560) @mroeschke
Rename JSON_READER_OPTION to JSON_READER_OPTION_NVBENCH. (#15553) @bdice
Fix a JNI bug in JSON parsing fixup (#15550) @revans2
Remove conda channel setup from wheel CI image script. (#15539) @bdice
cudf.pandas: Series dt accessor is CombinedDatetimelikeProperties (#15523) @wence-
Fix for some compiler warnings in parquet/page_decode.cuh (#15518) @etseidl
Fix exponent overflow in strings-to-double conversion (#15517) @davidwendt
nanoarrow uses package override for proper pinned versions generation (#15515) @robertmaynard
Remove index name overrides in dask-cudf pyarrow table dispatch (#15514) @charlesbluca
Fix async synchronization issues in json_column.cu (#15497) @karthikeyann
Add new patch to hide more CCCL APIs (#15493) @vyasr
Make improvements in pandas-test reporting (#15485) @galipremsagar
Fixed page data truncation in parquet writer under certain conditions. (#15474) @nvdbaranec
Only use data_type constructor with scale for decimal types (#15472) @wence-
Avoid "p2p" shuffle as a default when dask_cudf is imported (#15469) @rjzamora
Fix debug build errors from to_arrow_device_test.cpp (#15463) @davidwendt
Fix base_normalator::integer_sizeof_fn integer dispatch (#15457) @davidwendt
Allow consumers of static builds to find nanoarrow (#15456) @robertmaynard
Allow jit compilation when using a splayed CUDA toolkit (#15451) @robertmaynard
Handle case of scan aggregation in groupby-transform (#15450) @wence-
Test static builds in CI and fix nanoarrow configure (#15437) @vyasr
Fixes potential race in JSON parser when parsing JSON lines format and when recovering from invalid lines (#15419) @elstehle
Fix errors in chunked ORC writer when no tables were (successfully) written (#15393) @vuule
Support implicit array conversion with query-planning enabled (#15378) @rjzamora
Fix arrow-based round trip of empty dataframes (#15373) @wence-
Remove empty elements from exploded character-ngrams output (#15371) @davidwendt
Remove boundscheck=False setting in cython files (#15362) @wence-
Patch dask-expr var logic in dask-cudf (#15347) @rjzamora
Fix for logical and syntactical errors in libcudf c++ examples (#15346) @mhaseeb123
Disable dask-expr in docs builds. (#15343) @bdice
Apply the cuFile error work around to data_sink as well (#15335) @vuule
Fix parquet predicate filtering with column projection (#15113) @karthikeyann
Check column type equality, handling nested types correctly. (#14531) @bdice

📖 Documentation

Fix docs for IO readers and strings_convert (#15842) @bdice
Update cudf.pandas docs for GA (#15744) @beckernick
Add contributing warning about circular imports (#15691) @er-eis
Update libcudf developer guide for strings offsets column (#15661) @davidwendt
Update developer guide with device_async_resource_ref guidelines (#15562) @harrism
DOC: add pandas intersphinx mapping (#15531) @raybellwaves
rm-dup-doc in frame.py (#15530) @raybellwaves
Update CONTRIBUTING.md to use latest cuda env (#15467) @raybellwaves
Doc: interleave columns pandas compat (#15383) @raybellwaves
Simplified README Examples (#15338) @wkaisertexas
Add debug tips section to libcudf developer guide (#15329) @davidwendt
Fix and clarify notes on result ordering (#13255) @shwina

🚀 New Features

Add JNI bindings for zstd compression of NVCOMP. (#15729) @firestarman
Fix spaces around CSV quoted strings (#15727) @thabetx
Add default pinned pool that falls back to new pinned allocations (#15665) @vuule
Overhaul ops-codeowners coverage (#15660) @raydouglass
Concatenate dictionary of objects along axis=1 (#15623) @er-eis
Construct pylibcudf columns from objects supporting __cuda_array_interface__ (#15615) @brandon-b-miller
Expose some Parquet per-column configuration options via the python API (#15613) @etseidl
Migrate string find operations to pylibcudf (#15604) @brandon-b-miller
Round trip FIXED_LEN_BYTE_ARRAY data properly in Parquet writer (#15600) @etseidl
Reading multi-line JSON in string columns using runtime configurable delimiter (#15556) @shrshi
Remove p...

Contributors

alliepiper, seberg, and 45 other contributors

Assets 2

05 Jun 15:08

raydouglass

v24.06.00

7c706cc

v24.06.00

🚨 Breaking Changes

Deprecate Groupby.collect (#15808) @galipremsagar
Raise FileNotFoundError when a literal JSON string that looks like a json filename is passed (#15806) @lithomas1
Support filtered I/O in chunked_parquet_reader and simplify the use of parquet_reader_options (#15764) @mhaseeb123
Raise errors for unsupported operations on certain types (#15712) @galipremsagar
Support DurationType in cudf parquet reader via arrow:schema (#15617) @mhaseeb123
Remove protobuf and use parsed ORC statistics from libcudf (#15564) @bdice
Remove legacy JSON reader from Python (#15538) @bdice
Removing all batching code from parquet writer (#15528) @mhaseeb123
Convert libcudf resource parameters to rmm::device_async_resource_ref (#15507) @harrism
Remove deprecated strings offsets_begin (#15454) @davidwendt
Floating <--> fixed-point conversion must now be called explicitly (#15438) @pmattione-nvidia
Bind read_parquet_metadata API to libcudf instead of pyarrow and extract RowGroup information (#15398) @mhaseeb123
Remove deprecated hash() and spark_murmurhash3_x86_32() (#15375) @davidwendt
Remove empty elements from exploded character-ngrams output (#15371) @davidwendt
[FEA] Performance improvement for mixed left semi/anti join (#15288) @tgujar
Align date_range defaults with pandas, support tz (#15139) @mroeschke

🐛 Bug Fixes

Revert "Fix docs for IO readers and strings_convert" (#15872) @vyasr
Remove problematic call of index setter to unblock dask-cuda CI (#15844) @charlesbluca
Use rapids_cpm_nvtx3 to get same nvtx3 target state as rmm (#15840) @robertmaynard
Return boolean from config_host_memory_resource instead of throwing (#15815) @abellina
Add temporary dask-cudf workaround for categorical sorting (#15801) @rjzamora
Fix row group alignment in ORC writer (#15789) @vuule
Raise error when sorting by categorical column in dask-cudf (#15788) @rjzamora
Upgrade arrow to 16.1 (#15787) @galipremsagar
Add support for PandasArray for pandas<2.1.0 (#15786) @galipremsagar
Limit runtime dependency to libarrow>=16.0.0,<16.1.0a0 (#15782) @pentschev
Fix cat.as_ordered not propogating correct size (#15780) @mroeschke
Handle mixed-like homogeneous types in isin (#15771) @galipremsagar
Fix id_vars and value_vars not accepting string scalars in melt (#15765) @mroeschke
Fix DatetimeIndex.loc for all types of ordering cases (#15761) @galipremsagar
Fix arrow versioning logic (#15755) @vyasr
Avoid running sanitizer on Java test designed to cause an error (#15753) @jlowe
Handle empty dataframe object with index present in setitem of loc (#15752) @galipremsagar
Eliminate circular reference in DataFrame/Series.iloc/loc (#15749) @mroeschke
Cap the absolute row index per pass in parquet chunked reader. (#15735) @nvdbaranec
Fix Index.repeat for datetime64 types (#15722) @galipremsagar
Fix multibyte check for case convert for large strings (#15721) @davidwendt
Fix get_loc to properly fetch results from an index that is in decreasing order (#15719) @galipremsagar
Return same type as the original index for .loc operations (#15717) @galipremsagar
Correct static builds + static arrow (#15715) @robertmaynard
Raise errors for unsupported operations on certain types (#15712) @galipremsagar
Fix ColumnAccessor caching of nrows if empty previously (#15710) @mroeschke
Allow None when nan_as_null=False in column constructor (#15709) @galipremsagar
Refine CudaTest.testCudaException in case throwing wrong type of CudaError under aarch64 (#15706) @sperlingxx
Fix maxima of categorical column (#15701) @rjzamora
Add proxy for inplace operations in cudf.pandas (#15695) @galipremsagar
Make nan_as_null behavior consistent across all APIs (#15692) @galipremsagar
Fix CI s3 api command to fetch latest results (#15687) @galipremsagar
Add NumpyExtensionArray proxy type in cudf.pandas (#15686) @galipremsagar
Properly implement binaryops for proxy types (#15684) @galipremsagar
Fix copy assignment and the comparison operator of rmm_host_allocator (#15677) @vuule
Fix multi-source reading in JSON byte range reader (#15671) @shrshi
Return int64 when pandas compatible mode is turned on for get_indexer (#15659) @galipremsagar
Fix Index contains for error validations and float vs int comparisons (#15657) @galipremsagar
Preserve sub-second data for time scalars in column construction (#15655) @galipremsagar
Check row limit size in cudf::strings::join_strings (#15643) @davidwendt
Enable sorting on column with nulls using query-planning (#15639) @rjzamora
Fix operator precedence problem in Parquet reader (#15638) @etseidl
Fix decoding of dictionary encoded FIXED_LEN_BYTE_ARRAY data in Parquet reader (#15601) @etseidl
Fix debug warnings/errors in from_arrow_device_test.cpp (#15596) @davidwendt
Add "collect" aggregation support to dask-cudf (#15593) @rjzamora
Fix categorical-accessor support and testing in dask-cudf (#15591) @rjzamora
Disable compute-sanitizer usage in CI tests with CUDA<11.6 (#15584) @davidwendt
Preserve RangeIndex.step in to_arrow/from_arrow (#15581) @mroeschke
Ignore new cupy warning (#15574) @vyasr
Add cuda-sanitizer-api dependency for test-cpp matrix 11.4 (#15573) @davidwendt
Allow apply udf to reference global modules in cudf.pandas (#15569) @mroeschke
Fix deprecation warnings for json legacy reader (#15563) @davidwendt
Fix millisecond resampling in cudf Python (#15560) @mroeschke
Rename JSON_READER_OPTION to JSON_READER_OPTION_NVBENCH. (#15553) @bdice
Fix a JNI bug in JSON parsing fixup (#15550) @revans2
Remove conda channel setup from wheel CI image script. (#15539) @bdice
cudf.pandas: Series dt accessor is CombinedDatetimelikeProperties (#15523) @wence-
Fix for some compiler warnings in parquet/page_decode.cuh (#15518) @etseidl
Fix exponent overflow in strings-to-double conversion (#15517) @davidwendt
nanoarrow uses package override for proper pinned versions generation (#15515) @robertmaynard
Remove index name overrides in dask-cudf pyarrow table dispatch (#15514) @charlesbluca
Fix async synchronization issues in json_column.cu (#15497) @karthikeyann
Add new patch to hide more CCCL APIs (#15493) @vyasr
Make improvements in pandas-test reporting (#15485) @galipremsagar
Fixed page data truncation in parquet writer under certain conditions. (#15474) @nvdbaranec
Only use data_type constructor with scale for decimal types (#15472) @wence-
Avoid "p2p" shuffle as a default when dask_cudf is imported (#15469) @rjzamora
Fix debug build errors from to_arrow_device_test.cpp (#15463) @davidwendt
Fix base_normalator::integer_sizeof_fn integer dispatch (#15457) @davidwendt
Allow consumers of static builds to find nanoarrow (#15456) @robertmaynard
Allow jit compilation when using a splayed CUDA toolkit (#15451) @robertmaynard
Handle case of scan aggregation in groupby-transform (#15450) @wence-
Test static builds in CI and fix nanoarrow configure (#15437) @vyasr
Fixes potential race in JSON parser when parsing JSON lines format and when recovering from invalid lines (#15419) @elstehle
Fix errors in chunked ORC writer when no tables were (successfully) written (#15393) @vuule
Support implicit array conversion with query-planning enabled (#15378) @rjzamora
Fix arrow-based round trip of empty dataframes (#15373) @wence-
Remove empty elements from exploded character-ngrams output (#15371) @davidwendt
Remove boundscheck=False setting in cython files (#15362) @wence-
Patch dask-expr var logic in dask-cudf (#15347) @rjzamora
Fix for logical and syntactical errors in libcudf c++ examples (#15346) @mhaseeb123
Disable dask-expr in docs builds. (#15343) @bdice
Apply the cuFile error work around to data_sink as well (#15335) @vuule
Fix parquet predicate filtering with column projection (#15113) @karthikeyann
Check column type equality, handling nested types correctly. (#14531) @bdice

📖 Documentation

Fix docs for IO readers and strings_convert (#15842) @bdice
Update cudf.pandas docs for GA (#15744) @beckernick
Add contributing warning about circular imports (#15691) @er-eis
Update libcudf developer guide for strings offsets column (#15661) @davidwendt
Update developer guide with device_async_resource_ref guidelines (#15562) @harrism
DOC: add pandas intersphinx mapping (#15531) @raybellwaves
rm-dup-doc in frame.py (#15530) @raybellwaves
Update CONTRIBUTING.md to use latest cuda env (#15467) @raybellwaves
Doc: interleave columns pandas compat (#15383) @raybellwaves
Simplified README Examples (#15338) @wkaisertexas
Add debug tips section to libcudf developer guide (#15329) @davidwendt
Fix and clarify notes on result ordering (#13255) @shwina

🚀 New Features

Add JNI bindings for zstd compression of NVCOMP. (#15729) @firestarman
Fix spaces around CSV quoted strings (#15727) @thabetx
Add default pinned pool that falls back to new pinned allocations (#15665) @vuule
Overhaul ops-codeowners coverage (#15660) @raydouglass
Concatenate dictionary of objects along axis=1 (#15623) @er-eis
Construct pylibcudf columns from objects supporting __cuda_array_interface__ (#15615) @brandon-b-miller
Expose some Parquet per-column configuration options via the python API (#15613) @etseidl
Migrate string find operations to pylibcudf (#15604) @brandon-b-miller
Round trip FIXED_LEN_BYTE_ARRAY data properly in Parquet writer (#15600) @etseidl
Reading multi-line JSON in string columns using runtime configurable delimiter (#15556) @shrshi
Remove public gtest dependency from libcudf conda package (#15534) @robertmaynard
Fea/move to latest nanoarrow (#15526) @robertmaynard
Migrate string case operations to pylibcudf (#15489) @brandon-b-miller
Add Parquet encoding statistics to column chunk metadata (#15452) @etseidl
Implement JNI fo...

Contributors

alliepiper, seberg, and 45 other contributors

Assets 2

23 Sep 16:41

rapids-bot

v24.08.00a

f5d1c24

[NIGHTLY] v24.08.00 Pre-release

Pre-release

🔗 Links

🚨 Breaking Changes

Align Index init APIs with pandas 2.x (#16362) @mroeschke
Align Series APIs with pandas 2.x (#16333) @mroeschke
Add missing stream param to dictionary factory APIs (#16319) @JayjeetAtGithub
Deprecate dtype= parameter in reduction methods (#16313) @mroeschke
Remove squeeze argument from groupby (#16312) @mroeschke
Align more DataFrame APIs with pandas (#16310) @mroeschke
Remove mr param from write_csv and write_json (#16231) @JayjeetAtGithub
Report number of rows per file read by PQ reader when no row selection and fix segfault in chunked PQ reader when skip_rows > 0 (#16195) @mhaseeb123
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
Deprecate Arrow support in I/O (#16132) @lithomas1
Return FrozenList for Index.names (#16047) @galipremsagar
Add compile option to enable large strings support (#16037) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Rename strings multiple target replace API (#15898) @davidwendt
Pinned vector factory that uses the global pool (#15895) @vuule
Apply clang-tidy autofixes (#15894) @vyasr
Support arrow:schema in Parquet writer to faithfully roundtrip duration types with Arrow (#15875) @mhaseeb123
Expose stream parameter to public rolling APIs (#15865) @srinivasyadav18
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Remove legacy JSON reader and concurrent_unordered_map.cuh. (#15813) @bdice

🐛 Bug Fixes

Ensure managed memory is supported in cudf.pandas. (#16552) @bdice
Add flatbuffers to libcudf build (#16446) @galipremsagar
Fix parquet_field_list read_func lambda capture invalid this pointer (#16440) @davidwendt
Enable prefetching in cudf.pandas.install() (#16439) @bdice
Enable prefetching before runpy (#16427) @galipremsagar
Support thread-safe for prefetch_config::get and prefetch_config::set (#16425) @ttnghia
Fix a pandas-2.0 missing attribute error (#16416) @galipremsagar
[Bug] Remove loud NativeFile deprecation noise for read_parquet from S3 (#16415) @rjzamora
Fix nightly memcheck error for empty STREAM_INTEROP_TEST (#16406) @davidwendt
Gate ArrowStringArrayNumpySemantics cudf.pandas proxy behind version check (#16401) @mroeschke
Don't export bs_thread_pool (#16398) @KyleFromNVIDIA
Require fixed width types for casting in cudf-polars (#16381) @brandon-b-miller
Fix docstring of DataFrame.apply (#16351) @galipremsagar
Make bool raise for more cudf objects (#16311) @mroeschke
Rename .devcontainers for CUDA 12.5 (#16293) @jakirkham
Fix split_record for all empty strings column (#16291) @davidwendt
Fix logic in to_arrow for empty list column (#16279) @wence-
[BUG] Make name attr of Index fast slow attrs (#16270) @Matt711
Add custom name setter and getter for proxy objects in cudf.pandas (#16234) @Matt711
Fall back when casting a timestamp to numeric in cudf-polars (#16232) @brandon-b-miller
Disable large string support for Java build (#16216) @jlowe
Remove CCCL patch for PR 211. (#16207) @bdice
Add single offset to an empty ListArray in cudf::to_arrow (#16201) @davidwendt
Fix memory_usage when calculating nested list column (#16193) @mroeschke
Support at/iat indexers in cudf.pandas (#16177) @mroeschke
Fix unused-return-value debug build error in from_arrow_stream_test.cpp (#16168) @davidwendt
Fix cudf::strings::replace_multiple hang on empty target (#16167) @davidwendt
Refactor from_arrow_device/host to use resource_ref (#16160) @harrism
interpolate returns new column if no values are interpolated (#16158) @mroeschke
Use provided memory resource for allocating mixed join results. (#16153) @bdice
Run DFG after verify-alpha-spec (#16151) @KyleFromNVIDIA
Use size_t to allow large conditional joins (#16127) @bdice
Allow only scale=0 fixed-point values in fixed_width_column_wrapper (#16120) @davidwendt
Fix pylibcudf Table.num_rows for 0 columns case and add interop to docs (#16108) @lithomas1
Add support for proxy np.flatiter objects (#16107) @Matt711
Ensure cudf objects can astype to any type when empty (#16106) @mroeschke
Support pd.read_pickle and pd.to_pickle in cudf.pandas (#16105) @Matt711
Fix unnecessarily strict check in parquet chunked reader for choosing split locations. (#16099) @nvdbaranec
Fix is_monotonic_* APIs to include nan's (#16085) @galipremsagar
More safely parse CUDA versions when subprocess output is contaminated (#16067) @brandon-b-miller
fast_slow_proxy: Don't import assert_eq at top-level (#16063) @wence-
Prevent bad ColumnAccessor state after .sort_index(axis=1, ignore_index=True) (#16061) @mroeschke
Fix ArrowDeviceArray interface to pass address of event (#16058) @zeroshade
Fix a size overflow bug in hash groupby (#16053) @PointKernel
Fix atomic_ref scope when multiple blocks are updating the same output (#16051) @vuule
Fix initialization error in to_arrow for empty string views (#16033) @wence-
Fix the int32 overflow when computing page fragment sizes for large string columns (#16028) @mhaseeb123
Fix the pool size alignment issue (#16024) @PointKernel
Improve multibyte-split byte-range performance (#16019) @davidwendt
Fix target counting in strings char-parallel replace (#16017) @davidwendt
Support IntervalDtype in cudf.from_pandas (#16014) @mroeschke
Fix memory size in create_byte_range_infos_consecutive (#16012) @davidwendt
Hide visibility of non public symbols (#15982) @robertmaynard
Fix Cython typo preventing proper inheritance (#15978) @vyasr
Fix convert_dtypes with convert_integer=False/convert_floating=True (#15964) @mroeschke
Fix nunique for MultiIndex, DataFrame, and all NA case with dropna=False (#15962) @mroeschke
Explicitly build for all GPU architectures (#15959) @vyasr
Preserve column type and class information in more DataFrame operations (#15949) @mroeschke
Add array_interface to cudf.pandas numpy.ndarray proxy (#15936) @mroeschke
Allow tests to be built when stream util is disabled (#15933) @robertmaynard
Fix JSON multi-source reading when total source size exceeds INT_MAX bytes (#15930) @shrshi
Fix dask_cudf.read_parquet regression for legacy timestamp data (#15929) @rjzamora
Fix offsetalator when accessing over 268 million rows (#15921) @davidwendt
Fix debug assert in rowgroup_char_counts_kernel (#15902) @davidwendt
Fix categorical conversion from chunked arrow arrays (#15886) @vyasr
Handling for NaN and inf when converting floating point to fixed point types (#15885) @ttnghia
Manual merge of Branch 24.08 from 24.06 (#15869) @galipremsagar
Avoid unnecessary Index cast in IndexedFrame.index setter (#15843) @charlesbluca
Fix large strings handling in nvtext::character_tokenize (#15829) @davidwendt
Fix multi-replace target count logic for large strings (#15807) @davidwendt
Fix JSON parsing memory corruption - Fix Mixed types nested children removal (#15798) @karthikeyann
Allow anonymous user in devcontainer name. (#15784) @bdice
Add support for additional metaclasses of proxies and use for ExcelWriter (#15399) @vyasr

📖 Documentation

Improve Polars docs (#16820) @bdice
Add docstring for from_dataframe (#16260) @mroeschke
Update libcudf compiler requirements in contributing doc (#16103) @davidwendt
Add libcudf public/detail API pattern to developer guide (#16086) @davidwendt
Explain line profiler and how to know which functions are GPU-accelerated. (#16079) @bdice
cudf.pandas documentation improvement (#15948) @Matt711
Reland "Fix docs for IO readers and strings_convert" (#15872)" (#15941) @lithomas1
Document how to use cudf.pandas in tandem with multiprocessing (#15940) @wence-
DOC: Add documentation for cudf.pandas in the Developer Guide (#15889) @Matt711
Improve options docs (#15888) @bdice
DOC: add linkcode to docs (#15860) @raybellwaves
DOC: use intersphinx mapping in pandas-compat ext (#15846) @raybellwaves
Fix inconsistent usage of 'results' and 'records' in read-json.md (#15766) @dagardner-nv
Update PandasCompat.py to resolve references (#15704) @raybellwaves

🚀 New Features

Creation of CI artifacts for cudf-polars wheels (#16680) @wence-
Warn on cuDF failure when POLARS_VERBOSE is true (#16308) @brandon-b-miller
Add drop_nulls in cudf-polars (#16290) @brandon-b-miller
[JNI] Add setKernelPinnedCopyThreshold and setPinnedAllocationThreshold (#16288) @abellina
Implement support for scan_ndjson in cudf-polars (#16263) @lithomas1
Publish cudf-polars nightlies (#16213) @lithomas1
Modify make_host_vector and make_device_uvector factories to optionally use pinned memory and kernel copy (#16206) @vuule
Migrate lists/set_operations to pylibcudf (#16190) @Matt711
Migrate lists/filling to pylibcudf (#16189) @Matt711
Fall back to CPU for unsupported libcudf binaryops in cudf-polars (#16188) @brandon-b-miller
Use resource_ref for upstream in stream_checking_resource_adaptor (#16187) @harrism
Migrate lists/modifying to pylibcudf (#16185) @Matt711
Migrate lists/filtering to pylibcudf (#16184) @Matt711
Migrate lists/sorting to pylibcudf (#16179) @Matt711
Add missing methods to lists/list_column_view.pxd in pylibcudf (#16175) @Matt711
Migrate pylibcudf lists gathering (#16170) @Matt711
Move kernel vis over to CUDF_HIDDEN (#16165) @robertmaynard
Add groupby_max multi-threaded benchmark (#16154) @srinivasyadav18
Promote has_nested_columns to cudf public API (#16131) @robertmaynard
Promote IO support queries to cudf API (#16125) @robertmaynard
cudf::merge public API now support passing a user stream (#16124) @robertmaynard
Add TPC-H inspired examples for Libcudf (#16088) @ja...

Contributors

seberg, trxcllnt, and 38 other contributors

Assets 2

Releases: rapidsai/cudf

[NIGHTLY] v25.02.00

🔗 Links

📖 Documentation

🛠️ Improvements

Contributors

v24.10.01

v24.10.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

[NIGHTLY] v24.12.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.08.03

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.08.02

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.08.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.06.01

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

v24.06.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

[NIGHTLY] v24.08.00

🔗 Links

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors