Releases: rapidsai/cudf
Releases · rapidsai/cudf
v0.19.1
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- Fix returned column type when extracting from an empty list column (#8031) @jlowe
- Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Add...
v0.19.0
🚨 Breaking Changes
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
🐛 Bug Fixes
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
📖 Documentation
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
🚀 New Features
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Adds
explode
API (#7607) @isVoid - Adds
list.take
, python binding forcudf::lists::segmented_gather
(#7591) @isVoid - Implement cudf::label_bins() (#7554) @vyasr
- Add Python b...
v0.18.1
v0.18.0
Breaking Changes 🚨
- Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf API for parsing of ORC statistics (#7136) @vuule
- Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller
Bug Fixes 🐛
- Remove incorrect std::move call on return variable (#7319) @davidwendt
- Fix failing CI ORC test (#7313) @vuule
- Disallow constructing frames from a ColumnAccessor (#7298) @shwina
- fix java cuFile tests (#7296) @rongou
- Fix style issues related to NumPy (#7279) @shwina
- Fix bug when
iloc
slice terminates at before-the-zero position (#7277) @isVoid - Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
- Move lists utility function definition out of header (#7266) @mythrocks
- Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
- Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) @isVoid - Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
- Disallow picking output columns from nested columns. (#7248) @devavret
- Fix
loc
for Series with a MultiIndex (#7243) @shwina - Fix Arrow column test leaks (#7241) @tgravescs
- Fix test column vector leak (#7238) @kuhushukla
- Fix some bugs in java scalar support for decimal (#7237) @revans2
- Improve
assert_eq
handling of scalar (#7220) @isVoid - Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
- Remove floating point types from radix sort fast-path (#7215) @davidwendt
- Fixing parquet benchmarks (#7214) @rgsl888prabhu
- Handle various parameter combinations in
replace
API (#7207) @galipremsagar - Export mock aws credentials for s3 tests (#7176) @ayushdg
- Add
MultiIndex.rename
API (#7172) @isVoid - Fix importing list & struct types in
from_arrow
(#7162) @galipremsagar - Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
- Update s3 tests to use moto_server (#7144) @ayushdg
- Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
- Fix compilation errors in libcudf (#7138) @galipremsagar
- Fix compilation failure caused by
-Wall
addition. (#7134) @codereport - Add informative error message for
sep
in CSV writer (#7095) @galipremsagar - Add JIT cache per compute capability (#7090) @devavret
- Implement
__hash__
method for ListDtype (#7081) @galipremsagar - Only upload packages that were built (#7077) @raydouglass
- Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
- Handle
nan
values correctly inSeries.one_hot_encoding
(#7059) @galipremsagar - Add
unstack()
support for non-multiindexed dataframes (#7054) @isVoid - Fix
read_orc
for decimal type (#7034) @rgsl888prabhu - Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
- Decimal casts in JNI became a NOOP (#7032) @revans2
- Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
- Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
- Fix
fillna
&dropna
to also considernp.nan
as a missing value (#7019) @galipremsagar - Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
- Skip Thrust sort patch if already applied (#7009) @harrism
- Fix
cudf::hash_partition
fordecimal32
anddecimal64
(#7006) @codereport - Fix Thrust unroll patch command (#7002) @harrism
- Fix loc behaviour when key of incorrect type is used (#6993) @shwina
- Fix int to datetime conversion in csv_read (#6991) @kaatish
- fix excluding cufile tests by default (#6988) @rongou
- Fix java cufile tests when cufile is not installed (#6987) @revans2
- Make
cudf::round
forfixed_point
whenscale = -decimal_places
a no-op (#6975) @codereport - Fix type comparison for java (#6970) @revans2
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
- Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value
double-shifts infixed_point
construction (#6950) @codereport- fix libcu++ include path for jni (#6948) @rongou
- Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
- Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
- Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
- Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
- Fix N/A detection for empty fields in CSV reader (#6922) @vuule
- Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
- Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
- Correct the sampling range when sampling with replacement (#6884) @ChrisJar
- Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
- Fix
columns
&index
handling in dataframe constructor (#6838) @galipremsagar
Documentation 📖
- Update readme (#7318) @shwina
- Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
- Update doxyfile project number (#7161) @davidwendt
- Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
- Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
- Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
- Add groupby docs (#7100) @shwina
- Update cudf python docstrings with new null representation (
<NA>
) (#7050) @galipremsagar - Make Doxygen comments formatting consistent (#7041) @vuule
- Add docs for working with missing data (#7010) @galipremsagar
- Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
- libcudf Developer Guide (#6977) @harrism
- Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou
New Features 🚀
- Support
numeric_only
field forrank()
(#7213) @isVoid - Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
anddecimal64
(#7198) @codereport - Implement COLLECT rolling window aggregation (#7189) @mythrocks
- Add support for array-like inputs in
cudf.get_dummies
(#7181) @galipremsagar - Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf lists column count_elements API (#7173) @davidwendt
- Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) @codereport - Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#7147) @codereport- Adding support for explode to cuDF (#7140) @hyperbolic2346
- Add libcudf API for parsing of ORC statistics (#7136) @vuule
- update GDS/cuFile location for 0.9 release (#7131) @rongou
- Add Segmented sort (#7122) @karthikeyann
- Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
fordecimal32
anddecimal64
(#7119) @codereport - Add
scale
andvalue
methods tofixed_point
(#7109) @codereport - Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Improve
digitize
API (#7071) @isVoid - Add List types support in data generator (#7064) @galipremsagar
cudf::scan
support fordecimal32
anddecimal64
(#7063) @codereportcudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#7061) @codereport- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Support contains() on lists of primitives (#7039) @mythrocks
- Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037) @codereport - Add
ffill
andbfill
to string columns (#7036) @isVoid - Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
- Extend
replace_nulls_policy
tostring
anddictionary
type (#7004) @isVoid - Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
- Add
method
field tofillna
for fixed width columns (#6998) @isVoid - Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2) (#6980) @codereport - Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
- Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
- Add
Index.set_names
api (#6929) @galipremsagar - Add
replace_null
API withreplace_policy
parameter,fixed_width
column support (#6907) @isVoid - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller - Implement update() function (#6883) @skirui-source
- Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 1) (#6814) @codereport - Implement cudf.DateOffset for months (#6775) @brandon-b-miller
- Add Python DecimalColumn (#6715) @shwina
- Add dictionary support to libcudf groupby functions (#6585) @davidwendt
Improvements 🛠️
- Update stale GHA with exemptions & new labels (#7395) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
- Unpin from numpy < 1.20 (#7335) @shwina
- Prepare Changelog for Automation (#7309) @galipremsagar
- Prepare Changelog for Automation (#7272) @ajschmidt8
- Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
- Add coverage for
skiprows
andnum_rows
in parquet rea...
v0.17.0
[NIGHTLY] v0.18.0
🔗 Links
🚨 Breaking Changes
- Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf API for parsing of ORC statistics (#7136) @vuule
- Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller
🐛 Bug Fixes
- Fix null-bounds calculation for ranged window queries (#7568) @mythrocks
- Remove incorrect std::move call on return variable (#7319) @davidwendt
- Fix failing CI ORC test (#7313) @vuule
- Disallow constructing frames from a ColumnAccessor (#7298) @shwina
- fix java cuFile tests (#7296) @rongou
- Fix style issues related to NumPy (#7279) @shwina
- Fix bug when
iloc
slice terminates at before-the-zero position (#7277) @isVoid - Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
- Move lists utility function definition out of header (#7266) @mythrocks
- Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
- Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) @isVoid - Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
- Disallow picking output columns from nested columns. (#7248) @devavret
- Fix
loc
for Series with a MultiIndex (#7243) @shwina - Fix Arrow column test leaks (#7241) @tgravescs
- Fix test column vector leak (#7238) @kuhushukla
- Fix some bugs in java scalar support for decimal (#7237) @revans2
- Improve
assert_eq
handling of scalar (#7220) @isVoid - Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
- Remove floating point types from radix sort fast-path (#7215) @davidwendt
- Fixing parquet benchmarks (#7214) @rgsl888prabhu
- Handle various parameter combinations in
replace
API (#7207) @galipremsagar - Export mock aws credentials for s3 tests (#7176) @ayushdg
- Add
MultiIndex.rename
API (#7172) @isVoid - Fix importing list & struct types in
from_arrow
(#7162) @galipremsagar - Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
- Update s3 tests to use moto_server (#7144) @ayushdg
- Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
- Fix compilation errors in libcudf (#7138) @galipremsagar
- Fix compilation failure caused by
-Wall
addition. (#7134) @codereport - Add informative error message for
sep
in CSV writer (#7095) @galipremsagar - Add JIT cache per compute capability (#7090) @devavret
- Implement
__hash__
method for ListDtype (#7081) @galipremsagar - Only upload packages that were built (#7077) @raydouglass
- Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
- Handle
nan
values correctly inSeries.one_hot_encoding
(#7059) @galipremsagar - Add
unstack()
support for non-multiindexed dataframes (#7054) @isVoid - Fix
read_orc
for decimal type (#7034) @rgsl888prabhu - Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
- Decimal casts in JNI became a NOOP (#7032) @revans2
- Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
- Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
- Fix
fillna
&dropna
to also considernp.nan
as a missing value (#7019) @galipremsagar - Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
- Skip Thrust sort patch if already applied (#7009) @harrism
- Fix
cudf::hash_partition
fordecimal32
anddecimal64
(#7006) @codereport - Fix Thrust unroll patch command (#7002) @harrism
- Fix loc behaviour when key of incorrect type is used (#6993) @shwina
- Fix int to datetime conversion in csv_read (#6991) @kaatish
- fix excluding cufile tests by default (#6988) @rongou
- Fix java cufile tests when cufile is not installed (#6987) @revans2
- Make
cudf::round
forfixed_point
whenscale = -decimal_places
a no-op (#6975) @codereport - Fix type comparison for java (#6970) @revans2
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
- Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value
double-shifts infixed_point
construction (#6950) @codereport- fix libcu++ include path for jni (#6948) @rongou
- Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
- Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
- Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
- Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
- Fix N/A detection for empty fields in CSV reader (#6922) @vuule
- Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
- Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
- Correct the sampling range when sampling with replacement (#6884) @ChrisJar
- Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
- Fix
columns
&index
handling in dataframe constructor (#6838) @galipremsagar
📖 Documentation
- Update readme (#7318) @shwina
- Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
- Update doxyfile project number (#7161) @davidwendt
- Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
- Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
- Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
- Add groupby docs (#7100) @shwina
- Update cudf python docstrings with new null representation (
<NA>
) (#7050) @galipremsagar - Make Doxygen comments formatting consistent (#7041) @vuule
- Add docs for working with missing data (#7010) @galipremsagar
- Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
- libcudf Developer Guide (#6977) @harrism
- Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou
🚀 New Features
- Support
numeric_only
field forrank()
(#7213) @isVoid - Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
anddecimal64
(#7198) @codereport - Implement COLLECT rolling window aggregation (#7189) @mythrocks
- Add support for array-like inputs in
cudf.get_dummies
(#7181) @galipremsagar - Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf lists column count_elements API (#7173) @davidwendt
- Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) @codereport - Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#7147) @codereport- Adding support for explode to cuDF (#7140) @hyperbolic2346
- Add libcudf API for parsing of ORC statistics (#7136) @vuule
- update GDS/cuFile location for 0.9 release (#7131) @rongou
- Add Segmented sort (#7122) @karthikeyann
- Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
fordecimal32
anddecimal64
(#7119) @codereport - Add
scale
andvalue
methods tofixed_point
(#7109) @codereport - Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Improve
digitize
API (#7071) @isVoid - Add List types support in data generator (#7064) @galipremsagar
cudf::scan
support fordecimal32
anddecimal64
(#7063) @codereportcudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#7061) @codereport- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Support contains() on lists of primitives (#7039) @mythrocks
- Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037) @codereport - Add
ffill
andbfill
to string columns (#7036) @isVoid - Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
- Extend
replace_nulls_policy
tostring
anddictionary
type (#7004) @isVoid - Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
- Add
method
field tofillna
for fixed width columns (#6998) @isVoid - Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2) (#6980) @codereport - Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
- Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
- Add
Index.set_names
api (#6929) @galipremsagar - Add
replace_null
API withreplace_policy
parameter,fixed_width
column support (#6907) @isVoid - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller - Implement update() function (#6883) @skirui-source
- Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 1) (#6814) @codereport - Implement cudf.DateOffset for months (#6775) @brandon-b-miller
- Add Python DecimalColumn (#6715) @shwina
- Add dictionary support to libcudf groupby functions (#6585) @davidwendt
🛠️ Improvements
- Update stale GHA with exemptions & new labels (#7395) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
- Unpin from numpy < 1.20 (#7335) @shwina
- Prep...
v0.16.0
v0.15.0
v0.15.0 Release