- Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Remove unused cudf::strings::create_offsets (#8663) @davidwendt
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Add accurate hash join size functions (#8453) @PointKernel
- Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
- Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
- Adapt
cudf::scalar
classes to changes inrmm::device_scalar
(#8411) @harrism - Remove special Index class from the general index class hierarchy (#8309) @vyasr
- Add first-class dtype utilities (#8308) @vyasr
- ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
- Upgrade arrow to 4.0.1 (#7495) @galipremsagar
- Fix
contains
check in string column (#8834) @galipremsagar - Remove unused variable from
row_bit_count_test
. (#8829) @mythrocks - Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
- Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
- Handle empty child columns in row_bit_count() (#8791) @mythrocks
- Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
- Fix isort error in utils.pyx (#8771) @charlesbluca
- Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
- Fix issues with
_CPackedColumns.serialize()
handling of host and device data (#8759) @charlesbluca - Fix issues with
MultiIndex
indropna
,stack
&reset_index
(#8753) @galipremsagar - Write pandas extension types to parquet file metadata (#8749) @devavret
- Fix
where
to handleDataFrame
&Series
input combination (#8747) @galipremsagar - Fix
replace
to handle null values correctly (#8744) @galipremsagar - Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
- Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
- Fix
cudf.Series
constructor to handle list of sequences (#8735) @galipremsagar - Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
- Fix orc reader assert on create data_type in debug (#8706) @davidwendt
- Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
- JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
- Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
- Bug fix:
replace_nulls_policy
functor not returning correct indices for gathermap (#8699) @isVoid - Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
- Add post-processing steps to
dask_cudf.groupby.CudfSeriesGroupby.aggregate
(#8694) @charlesbluca - JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
- Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
- Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
- Pin
*arrow
to use*cuda
inrun
(#8651) @jakirkham - Add proper support for tolerances in testing methods. (#8649) @vyasr
- Support multi-char case conversion in capitalize function (#8647) @davidwendt
- Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
- Temporarily disable libcudf example build tests (#8642) @isVoid
- Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
- Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
- Fix bug that columns only initialized once when specified
columns
andindex
in dataframe ctor (#8628) @isVoid - Propagate **kwargs through to as_*_column methods (#8618) @shwina
- Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
- Fix missed renumbering of Aggregation values (#8600) @revans2
- Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
- Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
- Apply metadata to keys before returning in
Frame._encode
(#8560) @charlesbluca - Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
- Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
- String-to-boolean conversion is different from Pandas (#8549) @skirui-source
- Fix
__repr__
output withdisplay.max_rows
isNone
(#8547) @galipremsagar - Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
- Properly retrieve last column when
-1
is specified for column index (#8529) @isVoid - Fix importing
apply
fromdask
(#8517) @galipremsagar - Fix offset of the string dictionary length stream (#8515) @vuule
- Fix double counting of selected columns in CSV reader (#8508) @ochan1
- Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
- replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
- Disallow groupby aggs for
StructColumns
(#8499) @charlesbluca - Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
- Adding support for writing empty dataframe (#8490) @shaneding
- Fix exclusive scan when including nulls and improve testing (#8478) @harrism
- Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
- Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
- Add nightly version for ucx-py in ci script (#8419) @galipremsagar
- Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
- CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
- Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
- Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
- Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
- BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
- Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
- Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca
- Update Python UDFs notebook (#8810) @brandon-b-miller
- Fix dask.dataframe API docs links after reorg (#8772) @jsignell
- Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
- Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
- Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
- Custom Sphinx Extension:
PandasCompat
(#8643) @isVoid - Fix README.md (#8535) @ajschmidt8
- Change namespace contains_nulls to struct (#8523) @davidwendt
- Add info about NVTX ranges to dev guide (#8461) @jrhemstad
- Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar
- Fix concatenating structs (#8811) @shaneding
- Implement JNI for groupby aggregations
M2
andMERGE_M2
(#8763) @ttnghia - Bump
isort
to5.6.4
and removeisort
overrides made for 5.0.7 (#8755) @charlesbluca - Implement
__setitem__
forStructColumn
(#8737) @shaneding - Add
is_leap_year
toDateTimeProperties
andDatetimeIndex
(#8736) @isVoid - Add
struct.explode()
method (#8729) @shwina - Add
DataFrame.to_struct()
method to convert a DataFrame to a struct Series (#8728) @shwina - Add support for list type in ORC writer (#8723) @vuule
- Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
- Add
datetime::is_leap_year
(#8711) @isVoid - Accessing struct columns from
dask_cudf
(#8675) @shaneding - Added pct_change to Series (#8650) @TravisHester
- Add strings support to cudf::shift function (#8648) @davidwendt
- Support Scatter
struct_scalar
(#8630) @isVoid - Struct scalar from host dictionary (#8629) @shaneding
- Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
- JNI support for capitalize (#8624) @firestarman
- Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
- Add NVBench in CMake (#8619) @PointKernel
- Change default datetime index resolution to ns to match pandas (#8611) @vyasr
- ListColumn
__setitem__
(#8606) @brandon-b-miller - Implement groupby aggregations
M2
andMERGE_M2
(#8605) @ttnghia - Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
- Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
- Benchmark for
strings::repeat_strings
APIs (#8589) @ttnghia - Nested scalar support for copy if else (#8588) @gerashegalov
- User specified decimal columns to float64 (#8587) @jdye64
- Add
get_element
for struct column (#8578) @isVoid - Python changes for adding
__getitem__
forstruct
(#8577) @shaneding - Add
strings::repeat_strings
API that can repeat each string a different number of times (#8561) @ttnghia - Refactor
tests/iterator_utilities.hpp
functions (#8540) @ttnghia - Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
- Decimal support csv reader (#8511) @elstehle
- Add column type tests (#8505) @isVoid
- Warn when downscaling decimal columns (#8492) @ChrisJar
- Add JNI for
strings::repeat_strings
(#8491) @ttnghia - Add
Index.get_loc
for Numerical, String Index support (#8489) @isVoid - Expose half_up rounding in cuDF (#8477) @shwina
- Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
- Add
str.edit_distance_matrix
(#8463) @isVoid - Support constructing
cudf.Scalar
objects from host side lists (#8459) @brandon-b-miller - Add accurate hash join size functions (#8453) @PointKernel
- Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt
- Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller
- JNI bindings for sort_lists (#8439) @sperlingxx
- Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
- Replace
all_null()
andall_valid()
byiterator_all_nulls()
anditerator_no_null()
in tests (#8437) @ttnghia - Implement groupby
MERGE_LISTS
andMERGE_SETS
aggregates (#8436) @ttnghia - Add public libcudf match_dictionaries API (#8429) @davidwendt
- Add move constructors for
string_scalar
andstruct_scalar
(#8428) @ttnghia - Implement
strings::repeat_strings
(#8423) @ttnghia - STRUCT column support for cudf::merge. (#8422) @nvdbaranec
- Implement reverse in libcudf (#8410) @shaneding
- Support multiple input files/buffers for read_json (#8403) @jdye64
- Improve test coverage for struct search (#8396) @ttnghia
- Add
groupby.fillna
(#8362) @isVoid - Enable AST-based joining (#8214) @vyasr
- Generalized null support in user defined functions (#8213) @brandon-b-miller
- Add compiled binary operation (#8192) @karthikeyann
- Implement
.describe()
forDataFrameGroupBy
(#8179) @skirui-source - ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
- Add Python bindings for
lists::concatenate_list_elements
and expose them as.list.concat()
(#8006) @shwina - Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64
- Example to build custom application and link to libcudf (#7671) @isVoid
- Upgrade arrow to 4.0.1 (#7495) @galipremsagar
- Provide a better error message when
CUDA::cuda_driver
not found (#8794) @robertmaynard - Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec
- Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard
- Pin
mimesis
to<4.1
(#8745) @galipremsagar - Update
conda
environment name for CI (#8692) @ajschmidt8 - Remove flatbuffers dependency (#8671) @Ethyling
- Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt
- Remove unused cudf::strings::create_offsets (#8663) @davidwendt
- Update GDS lib version to 1.0.0 (#8654) @pxLi
- Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee
- Fix usage of deprecated arrow ipc API (#8632) @revans2
- Use absolute imports in
cudf
(#8631) @galipremsagar - ENH Add Java CI build script (#8627) @dillon-cullinan
- Add DeprecationWarning to
ser.str.subword_tokenize
(#8603) @VibhuJawa - Rewrite binary operations for improved performance and additional type support (#8598) @vyasr
- Fix
mypy
errors surfacing because ofnumpy-1.21.0
(#8595) @galipremsagar - Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt
- Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard
- Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt
- Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk
- Remove checking if an unsigned value is less than zero (#8579) @robertmaynard
- Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt
- Make
cudf.api.types
imports consistent (#8571) @galipremsagar - Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid
- Rename concatenate_tests.cu to .cpp (#8555) @davidwendt
- enable window lead/lag test on struct (#8548) @wbo4958
- Add Java methods to split and write column views (#8546) @razajafri
- Small cleanup (#8534) @codereport
- Unpin
dask
version in CI (#8533) @galipremsagar - Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64
- Minor clean up of various internal column and frame utilities (#8528) @vyasr
- Rename some copying_test source files .cu to .cpp (#8527) @davidwendt
- Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard
- Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard
- Correct unused parameter warnings in string algorithms (#8509) @robertmaynard
- Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2
- Fix
21.08
forward-merge conflicts (#8502) @ajschmidt8 - Fix Cython formatting command in Contributing.md. (#8496) @marlenezw
- Bug/correct unused parameters in reshape and text (#8495) @robertmaynard
- Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard
- Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard
- Refactor index construction (#8485) @vyasr
- Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard
- Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard
- Correct unused parameter warnings in io algorithms (#8480) @robertmaynard
- Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard
- Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard
- Correct unused parameter warnings in groupby (#8467) @robertmaynard
- use libcu++ time_point as timestamp (#8466) @karthikeyann
- Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt
- Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev
- Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar
- Fix conflicts in
8447
(#8448) @ajschmidt8 - Add serialization methods for
List
andStructDtype
(#8441) @charlesbluca - Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt
- JNI bindings for get_element (#8433) @revans2
- Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
- Unpin dask version on CI (#8425) @galipremsagar
- Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt
- Adapt
cudf::scalar
classes to changes inrmm::device_scalar
(#8411) @harrism - Add benchmark for strings/integers convert APIs (#8402) @davidwendt
- Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora
- Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard
- Correct unused parameters in column round and search (#8389) @robertmaynard
- Add functionality to apply
Dtype
metadata toColumnBase
(#8373) @charlesbluca - Refactor setting stack size in regex code (#8358) @davidwendt
- Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi
- Replace remaining uses of device_vector (#8343) @harrism
- Statically link libnvcomp into libcudfjni (#8334) @jlowe
- Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar
- Minor code refactor for sorted_order (#8326) @wbo4958
- Remove special Index class from the general index class hierarchy (#8309) @vyasr
- Add first-class dtype utilities (#8308) @vyasr
- Add option to link Java bindings with Arrow dynamically (#8307) @jlowe
- Refactor ColumnMethods and its subclasses to remove
column
argument and requireparent
argument (#8306) @shwina - Refactor
scatter
for list columns (#8255) @isVoid - Expose pack/unpack API to Python (#8153) @charlesbluca
- Adding cudf.cut method (#8002) @marlenezw
- Optimize string gather performance for large strings (#7980) @gaohao95
- Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret
- Updating Clang Version to 11.0.0 (#6695) @codereport
- Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid
- Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
- Compilation fix: Remove redefinition for
std::is_same_v()
(#8369) @mythrocks - Add backward compatibility for
dask-cudf
to work with other versions ofdask
(#8368) @galipremsagar - Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
- Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
- Raise error when unsupported arguments are passed to
dask_cudf.DataFrame.sort_values
(#8349) @galipremsagar - Raise
NotImplementedError
for axis=1 inrank
(#8347) @galipremsagar - Add support for
make_meta_obj
dispatch indask-cudf
(#8342) @galipremsagar - Update Java string concatenate test for single column (#8330) @tgravescs
- Use empty_like in scatter (#8314) @revans2
- Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
- Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
- COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
- Update io util to convert path like object to string (#8275) @ayushdg
- Fix result column types for empty inputs to rolling window (#8274) @mythrocks
- Actually test equality in assert_groupby_results_equal (#8272) @shwina
- CMake always explicitly specify a source files extension (#8270) @robertmaynard
- Fix struct binary search and struct flattening (#8268) @ttnghia
- Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
- upgrade dlpack to 0.5 (#8262) @cwharris
- Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
- Fix incorrect assertion in Java concat (#8258) @sperlingxx
- Copy nested types upon construction (#8244) @isVoid
- Preserve column hierarchy when getting NULL row from
LIST
column (#8206) @isVoid - Clip decimal binary op precision at max precision (#8194) @ChrisJar
- Add docstring for
dask_cudf.read_csv
(#8355) @galipremsagar - Fix cudf release version in readme (#8331) @galipremsagar
- Fix structs column description in dev docs (#8318) @isVoid
- Update readme with correct CUDA versions (#8315) @raydouglass
- Add description of the cuIO GDS integration (#8293) @vuule
- Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard
- Add support merging b/w categorical data (#8332) @galipremsagar
- Java: Support struct scalar (#8327) @sperlingxx
- added _is_homogeneous property (#8299) @shaneding
- Added decimal writing for CSV writer (#8296) @kaatish
- Java: Support creating a scalar from utf8 string (#8294) @firestarman
- Add Java API for Concatenate strings with separator (#8289) @tgravescs
strings::join_list_elements
options for empty list inputs (#8285) @ttnghia- Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
- add unit tests for lead/lag on list for row window (#8259) @wbo4958
- Create a String column from UTF8 String byte arrays (#8257) @firestarman
- Support scattering
list_scalar
(#8256) @isVoid - Implement
lists::concatenate_list_elements
(#8231) @ttnghia - Support for struct scalars. (#8220) @nvdbaranec
- Add support for decimal types in ORC writer (#8198) @vuule
- Support create lists column from a
list_scalar
(#8185) @isVoid Groupby.shift
c++ API refactor and python binding (#8131) @isVoid- Add
groupby::replace_nulls(replace_policy)
api (#7118) @isVoid
- Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
- Add aliases for string methods (#8353) @shwina
- Update environment variable used to determine
cuda_version
(#8321) @ajschmidt8 - JNI: Refactor the code of making column from scalar (#8310) @firestarman
- Update
CHANGELOG.md
links for calver (#8303) @ajschmidt8 - Merge
branch-0.19
intobranch-21.06
(#8302) @ajschmidt8 - use address and length for GDS reads/writes (#8301) @rongou
- Update cudfjni version to 21.06.0 (#8292) @pxLi
- Update docs build script (#8284) @ajschmidt8
- Make device_buffer streams explicit and enforce move construction (#8280) @harrism
- Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
- Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
- Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
- Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
- Update cudfjni version to 21.06 (#8267) @pxLi
- support RMM aligned resource adapter in JNI (#8266) @rongou
- Pass compiler environment variables to conda python build (#8260) @Ethyling
- Remove abc inheritance from Serializable (#8254) @vyasr
- Move more methods into SingleColumnFrame (#8253) @vyasr
- Update ORC statistics API to use C++17 standard library (#8241) @vuule
- Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
- Correct unused parameters in the copying algorithms (#8232) @robertmaynard
- IO statistics cleanup (#8191) @kaatish
- Refactor of rolling_window implementation. (#8158) @nvdbaranec
- Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
- Column refactoring 2 (#8130) @vyasr
- support space in workspace (#7956) @jolorunyomi
- Support collect_set on rolling window (#7881) @sperlingxx
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Don't identify decimals as strings. (#7710) @vyasr
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Join APIs that return gathermaps (#7454) @shwina
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Refactor strings column factories (#7397) @harrism
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
- Fix a
NameError
in meta dispatch API (#7996) @galipremsagar - Reindex in
DataFrame.__setitem__
(#7957) @galipremsagar - jitify direct-to-cubin compilation and caching. (#7919) @cwharris
- Use dynamic cudart for nvcomp in java build (#7896) @abellina
- fix "incompatible redefinition" warnings (#7894) @cwharris
- cudf consistently specifies the cuda runtime (#7887) @robertmaynard
- disable verbose output for jitify_preprocess (#7886) @cwharris
- CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
- Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
- cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
- Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
- Sort by index in groupby tests more consistently (#7802) @shwina
- Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793" (#7793)) @raydouglass
- Add decimal column handling in copy_type_metadata (#7788) @shwina
- Add column names validation in parquet writer (#7786) @galipremsagar
- Fix Java explode outer unit tests (#7782) @jlowe
- Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
- User resource fix for replace_nulls (#7769) @magnatelee
- Fix type dispatch for columnar replace_nulls (#7768) @jlowe
- Add
ignore_order
parameter to dask-cudf concat dispatch (#7765) @galipremsagar - Fix slicing and arrow representations of decimal columns (#7755) @vyasr
- Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
- Implement scatter for struct columns (#7752) @ttnghia
- Fix data corruption in string columns (#7746) @galipremsagar
- Fix string length in stripe dictionary building (#7744) @kaatish
- Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
- Enable dask dispatch to cuDF's
is_categorical_dtype
for cuDF objects (#7740) @brandon-b-miller - Fix dictionary size computation in ORC writer (#7737) @vuule
- Fix
cudf::cast
overflow fordecimal64
toint32_t
or smaller in certain cases (#7733) @codereport - Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
- Disable column_view data accessors for unsupported types (#7725) @jrhemstad
- Materialize
RangeIndex
whenindex=True
in parquet writer (#7711) @galipremsagar - Don't identify decimals as strings. (#7710) @vyasr
- Fix return type of
DataFrame.argsort
(#7706) @galipremsagar - Fix/correct cudf installed package requirements (#7688) @robertmaynard
- Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
- Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
- Fix Java Parquet write after writer API changes (#7655) @revans2
- Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
- Fix internal compiler error during JNI Docker build (#7645) @jlowe
- Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
- Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
- Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
- Fix specifying GPU architecture in JNI build (#7612) @jlowe
- Fix ORC writer OOM issue (#7605) @vuule
- Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
- Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
- Fix missing Dask imports (#7580) @kkraus14
- CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
- Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
- Fix ORC writer output corruption with string columns (#7565) @vuule
- Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
- FIX Fix Anaconda upload args (#7558) @dillon-cullinan
- Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
- FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
- Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
- Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
- Update missing docstring examples in python public APIs (#7546) @galipremsagar
- Decimal32 Build Fix (#7544) @razajafri
- FIX Retry conda output location (#7540) @dillon-cullinan
- fix missing renames of dask git branches from master to main (#7535) @kkraus14
- Remove detail from device_span (#7533) @rwlee
- Change dask and distributed branch to main (#7532) @dantegd
- Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
- Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
- Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
- Change jit launch to safe_launch (#7510) @devavret
- Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
- Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
- Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
- Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
- Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
- Correctly compile benchmarks (#7485) @robertmaynard
- Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
- Fix
__repr__
for categorical dtype (#7476) @galipremsagar - Java cleaner synchronization (#7474) @abellina
- Fix java float/double parsing tests (#7473) @revans2
- Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
- Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
- Missing
device_storage_dispatch
change affectingcudf::gather
(#7449) @codereport - fix cuFile JNI compile errors (#7445) @rongou
- Support
Series.__setitem__
with key to a new row (#7443) @isVoid - Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
- Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
- Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
- Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
- Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
- Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
- Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
- fix Arrow CMake file (#7358) @rongou
- Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
- Handle cupy array in
Dataframe.__setitem__
(#7340) @galipremsagar - Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
- FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan
- Fix join API doxygen (#7890) @shwina
- Add Resources to README. (#7697) @bdice
- Add
isin
examples in Docstring (#7479) @galipremsagar - Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
- Fix typo in regex.md doc page (#7363) @davidwendt
- Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe
- Enable basic reductions for decimal columns (#7776) @ChrisJar
- Enable join on decimal columns (#7764) @ChrisJar
- Allow merging index column with data column using keyword "on" (#7736) @skirui-source
- Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
- Add support for
unique
groupby aggregation (#7726) @shwina - Expose libcudf's label_bins function to cudf (#7724) @vyasr
- Adding support for equi-join on struct (#7720) @hyperbolic2346
- Add decimal column comparison operations (#7716) @isVoid
- Implement scan operations for decimal columns (#7707) @ChrisJar
- Enable typecasting between decimal and int (#7691) @ChrisJar
- Enable decimal support in parquet writer (#7673) @devavret
- Adds
list.unique
API (#7664) @isVoid - Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
- Add
lists.sort_values
API (#7657) @isVoid - Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
- Adds
explode
API (#7607) @isVoid - Adds
list.take
, python binding forcudf::lists::segmented_gather
(#7591) @isVoid - Implement cudf::label_bins() (#7554) @vyasr
- Add Python bindings for
lists::contains
(#7547) @skirui-source - cudf::row_bit_count() support. (#7534) @nvdbaranec
- Implement drop_list_duplicates (#7528) @ttnghia
- Add Python bindings for
lists::extract_lists_element
(#7505) @skirui-source - Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
- Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
- Add struct support to parquet writer (#7461) @devavret
- Enable type conversion from float to decimal type (#7450) @ChrisJar
- Add cython for converting strings/fixed-point functions (#7429) @davidwendt
- Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
- Implement groupby collect_set (#7420) @ttnghia
- Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
- Refactor strings column factories (#7397) @harrism
- Add groupby scan operations (sort groupby) (#7387) @karthikeyann
- Add cudf::explode_position (#7376) @hyperbolic2346
- Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
- Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
- Add
Series.drop
api (#7304) @isVoid - get_json_object() implementation (#7286) @nvdbaranec
- Python API for
LIstMethods.len()
(#7283) @isVoid - Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
- Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
- Fix inplace update of data and add Series.update (#7201) @galipremsagar
- Implement
cudf::group_by
(hash) fordecimal32
anddecimal64
(#7190) @codereport - Adding support to specify "level" parameter for
Dataframe.rename
(#7135) @skirui-source
- fix GDS include path for version 0.95 (#7877) @rongou
- Update
dask
+distributed
to2021.4.0
(#7858) @jakirkham - Add ability to extract include dirs from
CUDF_HOME
(#7848) @galipremsagar - Add USE_GDS as an option in build script (#7833) @pxLi
- add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
- Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
- Revert dask versioning of concat dispatch (#7823) @galipremsagar
- add copy methods in Java memory buffer (#7791) @rongou
- Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
- Allow hash_partition to take a seed value (#7771) @magnatelee
- Turn on NVTX by default in java build (#7761) @tgravescs
- Add Java bindings to join gather map APIs (#7751) @jlowe
- Add replacements column support for Java replaceNulls (#7750) @jlowe
- Add Java bindings for row_bit_count (#7749) @jlowe
- Remove unused JVM array creation (#7748) @jlowe
- Added JNI support for new is_integer (#7739) @revans2
- Create and promote library aliases in libcudf installations (#7734) @trxcllnt
- Support groupby operations for decimal dtypes (#7731) @vyasr
- Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
- Replace device_vector with device_uvector in null_mask (#7715) @harrism
- Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
- Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
- Use stream in groupby calls (#7705) @karthikeyann
- Update codeowners file (#7701) @ajschmidt8
- Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
- Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
- Misc Python/Cython optimizations (#7686) @shwina
- Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
- Add column_device_view to orc writer (#7676) @kaatish
- cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
- Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
- Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
- Feature/optimize accessor copy (#7660) @vyasr
- Fix
find_package(cudf)
(#7658) @trxcllnt - Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
- Add in JNI support for count_elements (#7651) @revans2
- Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
- Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
- Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
- Handle constructing a
cudf.Scalar
from acudf.Scalar
(#7639) @shwina - Add in JNI support for table partition (#7637) @revans2
- Add explicit fixed_point merge test (#7635) @codereport
- Add JNI support for IDENTITY hash partitioning (#7626) @revans2
- Java support on explode_outer (#7625) @sperlingxx
- Java support of casting string from/to decimal (#7623) @sperlingxx
- Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
- Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
- Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
- Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
- Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
- Add gbenchmarks for string substrings functions (#7603) @davidwendt
- Refactor string conversion check (#7599) @ttnghia
- JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
- Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
- ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
- Fix auto-detecting GPU architectures (#7593) @trxcllnt
- Reduce cudf library size (#7583) @robertmaynard
- Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
- Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
- Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
- Add gbenchmark for strings::concatenate (#7560) @davidwendt
- Update Changelog Link (#7550) @ajschmidt8
- Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
- Add
__repr__
for Column and ColumnAccessor (#7531) @shwina - Support Decimal DIV changes in cudf (#7527) @razajafri
- Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
- Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
- Add gbenchmarks for strings extract function (#7522) @davidwendt
- Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
- Reduce compile time/size for scan.cu (#7516) @davidwendt
- Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
- Removed unneeded includes from traits.hpp (#7509) @davidwendt
- FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
- xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
- JNI bit cast (#7493) @revans2
- Combine rolling window function tests (#7480) @mythrocks
- Prepare Changelog for Automation (#7477) @ajschmidt8
- Java support for explode position (#7471) @sperlingxx
- Update 0.18 changelog entry (#7463) @ajschmidt8
- JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
- Join APIs that return gathermaps (#7454) @shwina
- Remove dependence on managed memory for multimap test (#7451) @jrhemstad
- Use cuFile for Parquet IO when available (#7444) @vuule
- Statistics cleanup (#7439) @kaatish
- Add gbenchmarks for strings filter functions (#7438) @davidwendt
fixed_point
+cudf::binary_operation
API Changes (#7435) @codereport- Improve string gather performance (#7433) @jlowe
- Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
- Detail APIs for datetime functions (#7430) @magnatelee
- Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
- Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
- Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
- Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
- Simplify type dispatch with
device_storage_dispatch
(#7419) @codereport - Java support for casting of nested child columns (#7417) @razajafri
- Improve scalar string replace performance for long strings (#7415) @jlowe
- Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
- bitmask_or implementation with bitmask refactor (#7406) @rwlee
- Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
- Clean up included headers in
device_operators.cuh
(#7401) @codereport - Move nullable index iterator to indexalator factory (#7399) @davidwendt
- ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
- upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
- Add gbenchmark for strings find/contains functions (#7392) @davidwendt
- Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
- Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
- Added in JNI support for out of core sort algorithm (#7381) @revans2
- Upgrade pandas to 1.2 (#7375) @galipremsagar
- Rename
logical_cast
tobit_cast
and allow additional conversions (#7373) @ttnghia - jitify 2 support (#7372) @cwharris
- compile_udf: Cache PTX for similar functions (#7371) @gmarkall
- Add string scalar replace benchmark (#7369) @jlowe
- Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
- Update orc reader and writer fuzz tests (#7357) @galipremsagar
- Improve url_decode performance for long strings (#7353) @jlowe
cudf::ast
Small Refactorings (#7352) @codereport- Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
- Use
cudf::detail::make_counting_transform_iterator
(#7338) @codereport - Change block size parameter from a global to a template param. (#7333) @nvdbaranec
- Partial clean up of ORC writer (#7324) @vuule
- Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
- Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
- Move
cudf::test::make_counting_transform_iterator
tocudf/detail/iterator.cuh
(#7306) @codereport - Use string literals in
fixed_point
release_assert
s (#7303) @codereport - Fix merge conflicts for #7295 (#7297) @ajschmidt8
- Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
- Abstracting block reduce and block scan from cuIO kernels with
cub
apis (#7278) @rgsl888prabhu - Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
- Refactor dictionary support for reductions any/all (#7242) @davidwendt
- Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
- Interval index and interval_range (#7182) @marlenezw
- avro reader integration tests (#7156) @cwharris
- Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
- Adding Interval Dtype (#6984) @marlenezw
- Cleaning up
for
loops withmake_(counting_)transform_iterator
(#6546) @codereport
- Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf API for parsing of ORC statistics (#7136) @vuule
- Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller
- Remove incorrect std::move call on return variable (#7319) @davidwendt
- Fix failing CI ORC test (#7313) @vuule
- Disallow constructing frames from a ColumnAccessor (#7298) @shwina
- fix java cuFile tests (#7296) @rongou
- Fix style issues related to NumPy (#7279) @shwina
- Fix bug when
iloc
slice terminates at before-the-zero position (#7277) @isVoid - Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
- Move lists utility function definition out of header (#7266) @mythrocks
- Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
- Use
uvector
inreplace_nulls
; Fixsort_helper::grouped_value
doc (#7256) @isVoid - Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
- Disallow picking output columns from nested columns. (#7248) @devavret
- Fix
loc
for Series with a MultiIndex (#7243) @shwina - Fix Arrow column test leaks (#7241) @tgravescs
- Fix test column vector leak (#7238) @kuhushukla
- Fix some bugs in java scalar support for decimal (#7237) @revans2
- Improve
assert_eq
handling of scalar (#7220) @isVoid - Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
- Remove floating point types from radix sort fast-path (#7215) @davidwendt
- Fixing parquet benchmarks (#7214) @rgsl888prabhu
- Handle various parameter combinations in
replace
API (#7207) @galipremsagar - Export mock aws credentials for s3 tests (#7176) @ayushdg
- Add
MultiIndex.rename
API (#7172) @isVoid - Fix importing list & struct types in
from_arrow
(#7162) @galipremsagar - Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
- Update s3 tests to use moto_server (#7144) @ayushdg
- Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
- Fix compilation errors in libcudf (#7138) @galipremsagar
- Fix compilation failure caused by
-Wall
addition. (#7134) @codereport - Add informative error message for
sep
in CSV writer (#7095) @galipremsagar - Add JIT cache per compute capability (#7090) @devavret
- Implement
__hash__
method for ListDtype (#7081) @galipremsagar - Only upload packages that were built (#7077) @raydouglass
- Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
- Handle
nan
values correctly inSeries.one_hot_encoding
(#7059) @galipremsagar - Add
unstack()
support for non-multiindexed dataframes (#7054) @isVoid - Fix
read_orc
for decimal type (#7034) @rgsl888prabhu - Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
- Decimal casts in JNI became a NOOP (#7032) @revans2
- Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
- Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
- Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
- Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
- Fix
fillna
&dropna
to also considernp.nan
as a missing value (#7019) @galipremsagar - Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
- Skip Thrust sort patch if already applied (#7009) @harrism
- Fix
cudf::hash_partition
fordecimal32
anddecimal64
(#7006) @codereport - Fix Thrust unroll patch command (#7002) @harrism
- Fix loc behaviour when key of incorrect type is used (#6993) @shwina
- Fix int to datetime conversion in csv_read (#6991) @kaatish
- fix excluding cufile tests by default (#6988) @rongou
- Fix java cufile tests when cufile is not installed (#6987) @revans2
- Make
cudf::round
forfixed_point
whenscale = -decimal_places
a no-op (#6975) @codereport - Fix type comparison for java (#6970) @revans2
- Fix default parameter values of
write_csv
andwrite_parquet
(#6967) @vuule - Align
Series.groupby
API to match Pandas (#6964) @kkraus14 - Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
- Fix typo in numerical.py (#6957) @rgsl888prabhu
fixed_point_value
double-shifts infixed_point
construction (#6950) @codereport- fix libcu++ include path for jni (#6948) @rongou
- Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
- Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
- Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
- Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
- Fix N/A detection for empty fields in CSV reader (#6922) @vuule
- Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
- Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
- Correct the sampling range when sampling with replacement (#6884) @ChrisJar
- Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
- Fix
columns
&index
handling in dataframe constructor (#6838) @galipremsagar
- Update readme (#7318) @shwina
- Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
- Update doxyfile project number (#7161) @davidwendt
- Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
- Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
- Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
- Add groupby docs (#7100) @shwina
- Update cudf python docstrings with new null representation (
<NA>
) (#7050) @galipremsagar - Make Doxygen comments formatting consistent (#7041) @vuule
- Add docs for working with missing data (#7010) @galipremsagar
- Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
- libcudf Developer Guide (#6977) @harrism
- Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou
- Support
numeric_only
field forrank()
(#7213) @isVoid - Add support for
cudf::binary_operation
TRUE_DIV
fordecimal32
anddecimal64
(#7198) @codereport - Implement COLLECT rolling window aggregation (#7189) @mythrocks
- Add support for array-like inputs in
cudf.get_dummies
(#7181) @galipremsagar - Default
groupby
tosort=False
(#7180) @isVoid - Add libcudf lists column count_elements API (#7173) @davidwendt
- Implement
cudf::group_by
(sort) fordecimal32
anddecimal64
(#7169) @codereport - Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
cudf::rolling_window
SUM
support fordecimal32
anddecimal64
(#7147) @codereport- Adding support for explode to cuDF (#7140) @hyperbolic2346
- Add libcudf API for parsing of ORC statistics (#7136) @vuule
- update GDS/cuFile location for 0.9 release (#7131) @rongou
- Add Segmented sort (#7122) @karthikeyann
- Add
cudf::binary_operation
NULL_MIN
,NULL_MAX
&NULL_EQUALS
fordecimal32
anddecimal64
(#7119) @codereport - Add
scale
andvalue
methods tofixed_point
(#7109) @codereport - Replace ORC writer api with class (#7099) @rgsl888prabhu
- Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
- Improve
digitize
API (#7071) @isVoid - Add List types support in data generator (#7064) @galipremsagar
cudf::scan
support fordecimal32
anddecimal64
(#7063) @codereportcudf::rolling
ROW_NUMBER
support fordecimal32
anddecimal64
(#7061) @codereport- Replace parquet writer api with class (#7058) @rgsl888prabhu
- Support contains() on lists of primitives (#7039) @mythrocks
- Implement
cudf::rolling
fordecimal32
anddecimal64
(#7037) @codereport - Add
ffill
andbfill
to string columns (#7036) @isVoid - Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
- Extend
replace_nulls_policy
tostring
anddictionary
type (#7004) @isVoid - Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
- Add
method
field tofillna
for fixed width columns (#6998) @isVoid - Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 2) (#6980) @codereport - Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
- Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
- Add
Index.set_names
api (#6929) @galipremsagar - Add
replace_null
API withreplace_policy
parameter,fixed_width
column support (#6907) @isVoid - Share
factorize
implementation with Index and cudf module (#6885) @brandon-b-miller - Implement update() function (#6883) @skirui-source
- Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
- Implement
cudf::reduce
fordecimal32
anddecimal64
(part 1) (#6814) @codereport - Implement cudf.DateOffset for months (#6775) @brandon-b-miller
- Add Python DecimalColumn (#6715) @shwina
- Add dictionary support to libcudf groupby functions (#6585) @davidwendt
- Update stale GHA with exemptions & new labels (#7395) @mike-wendt
- Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
- Unpin from numpy < 1.20 (#7335) @shwina
- Prepare Changelog for Automation (#7309) @galipremsagar
- Prepare Changelog for Automation (#7272) @ajschmidt8
- Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
- Add coverage for
skiprows
andnum_rows
in parquet reader fuzz testing (#7216) @galipremsagar - Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
- Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
- Add dictionary column support to rolling_window (#7186) @davidwendt
- Modify the semantics of
end
pointers in cuIO to match standard library (#7179) @vuule - Adding unit tests for
fixed_point
with extremely largescale
s (#7178) @codereport - Fast path single column sort (#7167) @davidwendt
- Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
- Refactor cudf::string_view host and device code (#7159) @davidwendt
- Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
- Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
- Add Java interface for the new API 'explode' (#7151) @firestarman
- Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
- Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
- Update JNI for contiguous_split packed results (#7127) @jlowe
- Add JNI and Java bindings for list_contains (#7125) @kuhushukla
- Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
- verify window operations on decimal with java tests (#7120) @sperlingxx
- Adds in JNI support for creating an list column from existing columns (#7112) @revans2
- Build libcudf with -Wall (#7105) @trxcllnt
- Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
- Add
pyorc
to dev environment (#7085) @galipremsagar - JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
- Fastpath single strings column in cudf::sort (#7075) @davidwendt
- Upgrade nvcomp to 1.2.1 (#7069) @rongou
- Refactor ORC
ProtobufReader
to make it more extendable (#7055) @vuule - Add Java tests for decimal casts (#7051) @sperlingxx
- Auto-label PRs based on their content (#7044) @jolorunyomi
- Create sort gbenchmark for strings column (#7040) @davidwendt
- Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
- Spark Murmur3 hash functionality (#7024) @rwlee
- Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
- Adding decimal writing support to parquet (#7017) @hyperbolic2346
- Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
- Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
- Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
- Check output size overflow on strings gather (#6997) @davidwendt
- Improve representation of
MultiIndex
(#6992) @galipremsagar - Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
- Minor
cudf::round
internal refactoring (#6976) @codereport - Add Java bindings for URL conversion (#6972) @jlowe
- Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
- Add in basic support to JNI for logical_cast (#6954) @revans2
- Remove duplicate file array_tests.cpp (#6953) @karthikeyann
- Add null mask
fixed_point_column_wrapper
constructors (#6951) @codereport - Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
- Use simplified
rmm::exec_policy
(#6939) @harrism - Add null count test for apply_boolean_mask (#6903) @harrism
- Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
- Remove **kwargs from string/categorical methods (#6750) @shwina
- Refactor rolling.cu to reduce compile time (#6512) @mythrocks
- Add static type checking via Mypy (#6381) @shwina
- Update to official libcu++ on Github (#6275) @trxcllnt
- PR #6116 Add
filters
parameter to Pythonread_orc
function or filtering - PR #6848 Added Java bindings for writing parquet files with INT96 timestamps
- PR #6460 Add is_timestamp format check API
- PR #6647 Implement
cudf::round
floating point and integer types (HALF_EVEN
) - PR #6562 Implement
cudf::round
floating point and integer types (HALF_UP
) - PR #6685 Implement
cudf::round
decimal32
&decimal64
(HALF_UP
andHALF_EVEN
) - PR #6711 Implement
cudf::cast
fordecimal32/64
to/from integer and floating point - PR #6777 Implement
cudf::unary_operation
fordecimal32
&decimal64
- PR #6729 Implement
cudf::cast
fordecimal32/64
to/from differenttype_id
- PR #6792 Implement
cudf::clamp
fordecimal32
anddecimal64
- PR #6845 Implement
cudf::copy_if_else
fordecimal32
anddecimal64
- PR #6805 Implement
cudf::detail::copy_if
fordecimal32
anddecimal64
- PR #6843 Implement
cudf::copy_range
fordecimal32
anddecimal64
- PR #6528 Enable
fixed_point
binary operations - PR #6460 Add is_timestamp format check API
- PR #6568 Add function to create hashed vocabulary file from raw vocabulary
- PR #6142 Add Python
read_orc_statistics
function for reading file- and stripe-level statistics - PR #6581 Add JNI API to check if PTDS is enabled
- PR #6615 Add support for list and struct types to contiguous_split
- PR #6625 Add INT96 timestamp writing option to parquet writer
- PR #6592 Add
cudf.to_numeric
function - PR #6598 Add strings::contains API with target column parameter
- PR #6638 Add support for
pipe
API - PR #6737 New build process (Project Flash)
- PR #6652 Add support for struct columns in concatenate
- PR #6675 Add DecimalDtype to cuDF
- PR #6739 Add Java bindings for is_timestamp
- PR #6808 Add support for reading decimal32 and decimal64 from parquet
- PR #6781 Add serial murmur3 hashing
- PR #6811 First class support for unbounded window function bounds
- PR #6768 Add support for scatter() on list columns
- PR #6796 Add create_metadata_file in dask_cudf
- PR #6765 Cupy fallback for array_function and array_ufunc for cudf.Series
- PR #6817 Add support for scatter() on lists-of-struct columns
- PR #6805 Implement
cudf::detail::copy_if
fordecimal32
anddecimal64
- PR #6483 Add
agg
function to aggregate dataframe using one or more operations - PR #6726 Support selecting different hash functions in hash_partition
- PR #6619 Improve Dockerfile
- PR #6831 Added parquet chunked writing ability for list columns
- PR #6430 Add struct type support to
to_arrow
andfrom_arrow
- PR #6384 Add CSV fuzz tests with varying function parameters
- PR #6385 Add JSON fuzz tests with varying function parameters
- PR #6398 Remove function constructor macros in parquet reader
- PR #6432 Add dictionary support to
cudf::upper_bound
andcudf::lower_bound
- PR #6461 Replace index type-dispatch call with indexalator in cudf::scatter
- PR #6415 Support
datetime64
in row-wise op - PR #6457 Replace index type-dispatch call with indexalator in
cudf::gather
- PR #6413 Replace Python NVTX package with conda-forge source
- PR #6442 Remove deprecated
DataFrame.from_gpu_matrix
,DataFrame.to_gpu_matrix
,DataFrame.add_column
APIs and method parameters - PR #6502 Add dictionary support to
cudf::merge
- PR #6471 Replace index type-dispatch call with indexalator in cudf::strings::substring
- PR #6485 Add File IO to cuIO benchmarks
- PR #6504 Update Java bindings version to 0.17-SNAPSHOT
- PR #6875 Remove bounds check for
cudf::gather
- PR #6489 Add
AVRO
fuzz tests with varying function parameters - PR #6540 Add dictionary support to
cudf::unary_operation
- PR #6537 Refactor ORC timezone
- PR #6527 Refactor DeviceColumnViewAccess to avoid JNI returning an array
- PR #6690 Explicitly set legacy or per-thread default stream in JNI
- PR #6545 Pin cmake policies to cmake 3.17 version
- PR #6556 Add dictionary support to
cudf::inner_join
,cudf::left_join
andcudf::full_join
- PR #6557 Support nullable timestamp columns in time range window functions
- PR #6566 Remove
reinterpret_cast
conversions between pointer types in ORC - PR #6544 Remove
fixed_point
precise round - PR #6552 Use
assert_exceptions_equal
to assert exceptions in pytests - PR #6555 Adapt JNI build to libcudf composition of multiple libraries
- PR #6559 Refactoring cooperative loading with single thread loading.
- PR #6564 Load JNI library dependencies with a thread pool
- PR #6571 Add ORC fuzz tests with varying function parameters
- PR #6578 Add in java column to row conversion
- PR #6573 Create
cudf::detail::byte_cast
forcudf::byte_cast
- PR #6597 Use thread-local to track CUDA device in JNI
- PR #6599 Replace
size()==0
withempty()
,is_empty()
- PR #6514 Initial work for decimal type in Java/JNI
- PR #6605 Reduce HtoD copies in
cudf::concatenate
of string columns - PR #6608 Improve subword tokenizer docs
- PR #6610 Add ability to set scalar values in
cudf.DataFrame
- PR #6612 Update JNI to new RMM cuda_stream_view API
- PR #6646 Replace
cudaStream_t
withrmm::cuda_stream_view
(part 1) - PR #6648 Replace
cudaStream_t
withrmm::cuda_stream_view
(part 2) - PR #6744 Replace
cudaStream_t
withrmm::cuda_stream_view
(part 3) - PR #6579 Update scatter APIs to use reference wrapper / const scalar
- PR #6614 Add support for conversion to Pandas nullable dtypes and fix related issue in
cudf.to_json
- PR #6622 Update
to_pandas
api docs - PR #6623 Add operator overloading to column and clean up error messages
- PR #6644 Cover different CSV reader/writer options in benchmarks
- PR #6741 Cover different ORC and Parquet reader/writer options in benchmarks
- PR #6651 Add cudf::dictionary::make_dictionary_pair_iterator
- PR #6666 Add dictionary support to
cudf::reduce
- PR #6635 Add cudf::test::dictionary_column_wrapper class
- PR #6702 Fix orc read corruption on boolean column
- PR #6676 Add dictionary support to
cudf::quantile
- PR #6673 Parameterize avro and json benchmark
- PR #6609 Support fixed-point decimal for HostColumnVector
- PR #6703 Add list column statistics writing to Parquet writer
- PR #6662
RangeIndex
supportsstep
parameter - PR #6712 Remove
reinterpret_cast
conversions between pointer types in Avro - PR #6705 Add nested type support to Java table serialization
- PR #6709 Raise informative error while converting a pandas dataframe with duplicate columns
- PR #6727 Remove 2nd type-dispatcher call from cudf::reduce
- PR #6749 Update nested JNI builder so we can do it incrementally
- PR #6748 Add Java API to concatenate serialized tables to ContiguousTable
- PR #6764 Add dictionary support to
cudf::minmax
- PR #6734 Binary operations support for decimal type in cudf Java
- PR #6761 Add Java/JNI bindings for round
- PR #6776 Use
void
return type for kernel wrapper functions instead of returningcudaError_t
- PR #6786 Add nested type support to ColumnVector#getDeviceMemorySize
- PR #6780 Move
cudf::cast
tests to separate test file - PR #6809 size_type overflow checking when concatenating columns
- PR #6789 Rename
unary_op
tounary_operator
- PR #6770 Support building decimal columns with Table.TestBuilder
- PR #6815 Add wildcard path support to
read_parquet
- PR #6800 Push DeviceScalar to cython-only
- PR #6822 Split out
cudf::distinct_count
fromdrop_duplicates.cu
- PR #6813 Enable
expand=False
in.str.split
and.str.rsplit
- PR #6829 Enable workaround to write categorical columns in csv
- PR #6819 Use CMake 3.19 for RMM when building cuDF jar
- PR #6833 Use settings.xml if existing for internal build
- PR #6839 Handle index when dispatching array_function and array_ufunc to cupy for cudf.Series
- PR #6835 Move template param to member var to improve compile of hash/groupby.cu
- PR #6837 Avoid gather when copying strings view from start of strings column
- PR #6859 Move align_ptr_for_type() from cuda.cuh to alignment.hpp
- PR #6807 Refactor
std::array
usage in row group index writing in ORC - PR #6914 Enable groupby
list
aggregation for strings - PR #6908 Parquet option for strictly decimal reading
- PR #6446 Fix integer parsing in CSV and JSON for values outside of int64 range
- PR #6506 Fix DateTime type value truncation while writing to csv
- PR #6509 Disable JITIFY log printing
- PR #6517 Handle index equality in
Series
andDataFrame
equality checks - PR #6519 Fix end-of-string marking boundary condition in subword-tokenizer
- PR #6543 Handle
np.nan
values inisna
/isnull
/notna
/notnull
- PR #6549 Fix memory_usage calls for list columns
- PR #6575 Fix JNI RMM initialize with no pool allocator limit
- PR #6636 Fix orc boolean column corruption issue
- PR #6582 Add missing
device_scalar
stream parameters - PR #6596 Fix memory usage calculation
- PR #6595 Fix JNI build, broken by to_arrow() signature change
- PR #6601 Fix timezone offset when reading ORC files
- PR #6603 Use correct stream in hash_join.
- PR #6616 Block
fixed_point
cudf::concatenate
with different scales - PR #6607 Fix integer overflow in ORC encoder
- PR #6617 Fix JNI native dependency load order
- PR #6621 Fix subword tokenizer metadata for token count equal to max_sequence_length
- PR #6629 Fix JNI CMake
- PR #6633 Fix Java HostColumnVector unnecessarily loading native dependencies
- PR #6643 Fix csv writer handling embedded comma delimiter
- PR #6640 Add error message for unsupported
axis
parameter in DataFrame APIs - PR #6686 Fix output size for orc read for skip_rows option
- PR #6710 Fix an out-of-bounds indexing error in gather() for lists
- PR #6670 Fix a bug where PTX parser fails to correctly parse a python lambda generated UDF
- PR #6687 Fix issue where index name of caller object is being modified in csv writer
- PR #6735 Fix hash join where row hash values would end up equal to the reserved empty key value
- PR #6696 Fix release_assert.
- PR #6692 Fix handling of empty column name in csv writer
- PR #6693 Fix issue related to
na_values
input inread_csv
- PR #6701 Fix issue when
numpy.str_
is given as input to string parameters in io APIs - PR #6704 Fix leak warnings in JNI unit tests
- PR #6713 Fix missing call to cudaStreamSynchronize in get_value
- PR #6708 Apply
na_rep
to column names in csv writer - PR #6720 Fix implementation of
dtype
parameter incudf.read_csv
- PR #6721 Add missing serialization methods for ListColumn
- PR #6722 Fix index=False bug in dask_cudf.read_parquet
- PR #6766 Fix race conditions in parquet
- PR #6728 Fix cudf python docs and associated build warnings
- PR #6732 Fix cuDF benchmarks build with static Arrow lib and fix rapids-compose cuDF JNI build
- PR #6742 Fix concat bug in dask_cudf Series/Index creation
- PR #6632 Fix DataFrame initialization from list of dicts
- PR #6767 Fix sort order of parameters in
test_scalar_invalid_implicit_conversion
pytest - PR #6771 Fix index handling in parquet reader and writer
- PR #6787 Update java reduction APIs to reflect C++ changes
- PR #6790 Fix result representation in groupby.apply
- PR #6794 Fix AVRO reader issues with empty input
- PR #6798 Fix
read_avro
docs - PR #6824 Fix JNI build
- PR #6826 Fix resource management in Java ColumnBuilder
- PR #6830 Fix categorical scalar insertion
- PR #6844 Fix uint32_t undefined errors
- PR #6854 Fix the parameter order of writeParquetBufferBegin
- PR #6855 Fix
.str.replace_with_backrefs
docs examples - PR #6853 Fix contiguous split of null string columns
- PR #6860 Move codecov upload to build script
- PR #6861 Fix compile error in type_dispatch_benchmark.cu
- PR #6864 Handle contiguous_split corner case for nested string columns with no children
- PR #6869 Avoid dependency resolution failure in latest version of pip by explicitly specifying versions for dask and distributed
- PR #6806 Force install of local conda artifacts
- PR #6887 Fix typo and
0-d
numpy array handling in binary operation - PR #6898 Fix missing clone overrides on derived aggregations
- PR #6899 Update JNI to new gather boundary check API
- PR #5779 Add DataFrame.pivot() and DataFrame.unstack()
- PR #5975 Add strings
filter_characters
API - PR #5843 Add
filters
parameter to Pythonread_parquet
function for filtering row groups - PR #5974 Use libcudf instead of cupy for
arange
or column creation from a scalar. - PR #5494 Add Abstract Syntax Tree (AST) evaluator.
- PR #6076 Add durations type support for csv writer, reader
- PR #5874 Add
COLLECT
groupby aggregation - PR #6330 Add ability to query if PTDS is enabled
- PR #6119 Add support for
dayofweek
property inDateTimeIndex
andDatetimeProperties
- PR #6171 Java and Jni support for Struct columns
- PR #6125 Add support for
Series.mode
andDataFrame.mode
- PR #6271 Add support to deep-copy struct columns from struct column-view
- PR #6262 Add nth_element series aggregation with null handling
- PR #6316 Add StructColumn to Python API
- PR #6247 Add
minmax
reduction function - PR #6232
Json
andAvro
benchmarking in python - PR #6139 Add column conversion to big endian byte list.
- PR #6220 Add
list_topics()
to supply list of underlying Kafka connection topics - PR #6254 Add
cudf::make_dictionary_from_scalar
factory function - PR #6262 Add nth_element series aggregation with null handling
- PR #6277 Add support for LEAD/LAG window functions for fixed-width types
- PR #6318 Add support for reading Struct and map types from Parquet files
- PR #6315 Native code for string-map lookups, for cudf-java
- PR #6302 Add custom dataframe accessors
- PR #6301 Add JNI bindings to nvcomp
- PR #6328 Java and JNI bindings for getMapValue/map_lookup
- PR #6371 Use ColumnViewAccess on Host side
- PR #6392 add hash based groupby mean aggregation
- PR #6511 Add LogicalType to Parquet reader
- PR #6297 cuDF Python Scalars
- PR #6723 Support creating decimal vectors from scalar
- PR #6393 Fix some misspelled words
- PR #6292 Remove individual size tracking from JNI tracking resource adaptor
- PR #5946 Add cython and python support for libcudf
to_arrow
andfrom_arrow
- PR #5919 Remove max_strings and max_chars from nvtext::subword_tokenize
- PR #5956 Add/Update tests for cuStreamz
- PR #5953 Use stable sort when doing a sort groupby
- PR #5973 Link to the Code of Conduct in CONTRIBUTING.md
- PR #6354 Perform shallow clone of external projects
- PR #6388 Add documentation for building
libboost_filesystem.a
from source - PR #5917 Just use
None
forstrides
inBuffer
- PR #6015 Upgrade CUB/Thrust to the latest commit
- PR #5971 Add cuStreamz README for basic installation and use
- PR #6024 Expose selecting multiple ORC stripes to read from Python
- PR #6155 Use the CUB submodule in Thrust instead of fetching CUB separately
- PR #6321 Add option in JNI code to use
arena_memory_resource
- PR #6002 Add Java bindings for md5
- PR #6311 Switch Thrust to use the NVIDIA/thrust repo
- PR #6060 Add support for all types in
Series.describe
andDataFrame.describe
- PR #6051 Add builder API for cuIO
parquet_writer_options
andparquet_reader_options
- PR #6067 Added compute codes for aarch64 devices
- PR #5861
fixed_point
Column Optimization (storescale
indata_type
) - PR #6083 Small cleanup
- PR #6355 Make sure PTDS mode is compatible between libcudf and JNI
- PR #6120 Consolidate functionality in NestedHostColumnVector and HostColumnVector
- PR #6092 Add
name
anddtype
field toIndex.copy
- PR #5984 Support gather() on CUDF struct columns
- PR #6103 Small refactor of
print_differences
- PR #6124 Fix gcc-9 compilation errors on tests
- PR #6122 Add builder API for cuIO
csv_writer_options
andcsv_reader_options
- PR #6141 Fix typo in custreamz README that was a result of recent changes
- PR #6162 Reduce output parameters in cuio csv and json reader internals
- PR #6146 Added element/validity pair constructors for fixed_width and string wrappers
- PR #6143 General improvements for java arrow IPC.
- PR #6138 Add builder API for cuIO
orc_writer_options
andorc_reader_options
- PR #6152 Change dictionary indices to uint32
- PR #6099 Add fluent builder apis to
json_reader_options
andavro_reader_options
- PR #6163 Use
Column.full
instead ofscalar_broadcast_to
orcupy.zeros
- PR #6176 Fix cmake warnings for GoogleTest, GoogleBenchmark, and Arrow external projects
- PR #6149 Update to Arrow v1.0.1
- PR #6421 Use
pandas.testing
incudf
- PR #6357 Use
pandas.testing
indask-cudf
- PR #6201 Expose libcudf test utilities headers for external project use.
- PR #6174 Data profile support in random data generator; Expand cuIO benchmarks
- PR #6189 Avoid deprecated pyarrow.compat for parquet
- PR #6184 Add cuda 11 dev environment.yml
- PR #6186 Update JNI to look for cub in new location
- PR #6194 Remove unnecessary memory-resource parameter in
cudf::contains
API - PR #6195 Update JNI to use parquet options builder
- PR #6190 Avoid reading full csv files for metadata in dask_cudf
- PR #6197 Remove librmm dependency for libcudf
- PR #6205 Add dictionary support to cudf::contains
- PR #6213 Reduce subscript usage in cuio in favor of pointer dereferencing
- PR #6230 Support any unsigned int type for dictionary indices
- PR #6202 Add additional parameter support to
DataFrame.drop
- PR #6214 Small clean up to use more algorithms
- PR #6209 Remove CXX11 ABI handling from CMake
- PR #6223 Remove CXX11 ABI flag from JNI build
- PR #6114 Implement Fuzz tests for cuIO
- PR #6231 Adds
inplace
,append
,verify_integrity
fields toDataFrame.set_index
- PR #6215 Add cmake command-line setting for spdlog logging level
- PR #6242 Added cudf::detail::host_span and device_span
- PR #6240 Don't shallow copy index in as_index() unless necessary
- PR #6204 Add dockerfile and script to build cuDF jar
- PR #6248 Optimize groupby-agg in dask_cudf
- PR #6243 Move
equals()
logic toFrame
- PR #6245 Split up replace.cu into multiple source files
- PR #6218 increase visibility/consistency for cuio reader writer private member variable names.
- PR #6268 Add file tags to libcudf doxygen
- PR #6265 Update JNI to use ORC options builder
- PR #6273 Update JNI to use ORC options builder
- PR #6293 Replace shuffle warp reduce with cub calls
- PR #6287 Make java aggregate API follow C++ API
- PR #6303 Use cudf test dtypes so timedelta tests are deterministic
- PR #6329 Update and clean-up gpuCI scripts
- PR #6299 Add lead and lag to java
- PR #6327 Add dictionary specialization to
cudf::replace_nulls
- PR #6306 Remove cpw macros from page encode kernels
- PR #6375 Parallelize Cython compilation in addition to Cythonization
- PR #6303 Use cudf test dtypes so timedelta tests are deterministic
- PR #6326 Simplify interal csv/json kernel parameters
- PR #6308 Add dictionary support to cudf::scatter with scalar
- PR #6367 Add JNI bindings for byte casting
- PR #6312 Conda recipe dependency cleanup
- PR #6346 Remove macros from CompactProtocolWriter
- PR #6347 Add dictionary support to cudf::copy_range
- PR #6352 Add specific Topic support for Kafka "list_topics()" metadata requests
- PR #6332 Add support to return csv as string when
path=None
into_csv
- PR #6358 Add Parquet fuzz tests with varying function parameters
- PR #6369 Add dictionary support to
cudf::find_and_replace
- PR #6373 Add dictionary support to
cudf::clamp
- PR #6377 Update ci/local/README.md
- PR #6383 Removed
move.pxd
, use standard librarymove
- PR #6400 Removed unused variables
- PR #6409 Allow CuPy 8.x
- PR #6407 Add RMM_LOGGING_LEVEL flag to Java docker build
- PR #6425 Factor out csv parse_options creation to pure function
- PR #6438 Fetch nvcomp v1.1.0 for JNI build
- PR #6459 Add
map
method to series - PR #6379 Add list hashing functionality to MD5
- PR #6498 Add helper method to ColumnBuilder with some nits
- PR #6336 Add
join
functionality in cudf concat - PR #6653 Replaced SHFL_XOR calls with cub::WarpReduce
- PR #6751 Rework ColumnViewAccess and its usage
- PR #6698 Remove macros from ORC reader and writer
- PR #6782 Replace cuio macros with constexpr and inline functions
- PR #6073 Fix issue related to
.loc
incase ofDatetimeIndex
- PR #6081 Fix issue where fsspec thinks it has a protocol string
- PR #6100 Fix issue in
Series.factorize
to correctly pickna_sentinel
value - PR #6106 Fix datetime limit in csv due to 32-bit arithmetic
- PR #6113 Fix to_timestamp to initialize default year to 1970
- PR #6110 Handle
format
for other input types into_datetime
- PR #6118 Fix Java build for ORC read args change and update package version
- PR #6121 Replace calls to get_default_resource with get_current_device_resource
- PR #6128 Add support for numpy RandomState handling in
sample
- PR #6134 Fix CUDA C/C++ debug builds
- PR #6137 Fix issue where
np.nan
is being return instead ofNAT
for datetime/duration types - PR #6298 Fix gcc-9 compilation error in dictionary/remove_keys.cu
- PR #6172 Fix slice issue with empty column
- PR #6342 Fix array out-of-bound errors in Orc writer
- PR #6154 Warnings on row-wise op only when non-numeric columns are found.
- PR #6150 Fix issue related to inferring
datetime64
format with UTC timezone in string data - PR #6179
make_elements
copies toiterator
without adjustingsize
- PR #6387 Remove extra
std::move
call in java/src/main/native/src/map_lookup.cu - PR #6182 Fix cmake build of arrow
- PR #6288 Fix gcc-9 compilation error with
ColumnVectorJni.cpp
- PR #6173 Fix normalize_characters offset logic on sliced strings column
- PR #6159 Fix issue related to empty
Dataframe
with columns input toDataFrame.appened
- PR #6199 Fix index preservation for dask_cudf parquet
- PR #6207 Remove shared libs from Java sources jar
- PR #6217 Fixed missing bounds checking when storing validity in parquet reader
- PR #6212 Update codeowners file
- PR #6389 Fix RMM logging level so that it can be turned off from the command line
- PR #6157 Fix issue related to
Series.concat
to concat a non-empty and empty series. - PR #6226 Add in some JNI checks for null handles
- PR #6183 Fix issues related to
Series.acos
for consistent output regardless of dtype - PR #6234 Add float infinity parsing in csv reader
- PR #6251 Replace remaining calls to RMM
get_default_resource
- PR #6257 Support truncated fractions in
cudf::strings::to_timestamp
- PR #6259 Fix compilation error with GCC 8
- PR #6258 Pin libcudf conda recipe to boost 1.72.0
- PR #6264 Remove include statement for missing rmm/mr/device/default_memory_resource.hpp file
- PR #6296 Handle double quote and escape character in json
- PR #6294 Fix read parquet key error when reading empty pandas DataFrame with cudf
- PR #6285 Removed unsafe
reinterpret_cast
and implicit pointer-to-bool casts - PR #6281 Fix unreachable code warning in datetime.cuh
- PR #6286 Fix
read_csv
int32
overflow - PR #6466 Fix ORC reader issue with decimal type
- PR #6310 Replace a misspelled reference to
master
branch withmain
branch in a comment in changelog.sh - PR #6289 Revert #6206
- PR #6291 Fix issue related to row-wise operations in
cudf.DataFrame
- PR #6304 Fix span_tests.cu includes
- PR #6331 Avoids materializing
RangeIndex
during frame concatnation (when not needed) - PR #6278 Add filter tests for struct columns
- PR #6344 Fix rolling-window count for null input
- PR #6353 Rename
skip_rows
parameter toskiprows
inread_parquet
,read_avro
andread_orc
- PR #6361 Detect overflow in hash join
- PR #6386 Removed c-style pointer casts and redundant
reinterpret_cast
s in cudf::io - PR #6397 Fix
build.sh
whenPARALLEL_LEVEL
environment variable isn't set - PR #6366 Fix Warp Reduce calls in cuio statistics calculation to account for NaNs
- PR #6345 Fix ambiguous constructor compile error with devtoolset
- PR #6335 Fix conda commands for outdated python version
- PR #6372 Fix issue related to reading a nullable boolean column in
read_parquet
whenengine=pyarrow
- PR #6378 Fix index handling in
fillna
and incorrect pytests - PR #6380 Avoid problematic column-index check in dask_cudf.read_parquet test
- PR #6403 Fix error handling in notebook tests
- PR #6408 Avoid empty offset list in hash_partition output
- PR #6402 Update JNI build to pull fixed nvcomp commit
- PR #6410 Fix uses of dangerous default values in Python code
- PR #6424 Check for null data in close for ColumnBuilder
- PR #6426 Fix
RuntimeError
whennp.bool_
is passed asheader
into_csv
- PR #6443 Make java apis getList and getStruct public
- PR #6445 Add
dlpack
to run section of libcudf conda recipe to fix downstream build issues - PR #6450 Make java Column Builder row agnostic
- PR #6309 Make all CI
.sh
scripts have a consistent set of permissions - PR #6491 Remove repo URL from Java build-info
- PR #6462 Bug fixes for ColumnBuilder
- PR #6497 Fixes a data corruption issue reading list columns from Parquet files with multiple row groups.
- PR #5292 Add unsigned int type columns to libcudf
- PR #5287 Add
index.join
support - PR #5222 Adding clip feature support to DataFrame and Series
- PR #5318 Support/leverage DataFrame.shuffle in dask_cudf
- PR #4546 Support pandas 1.0+
- PR #5331 Add
cudf::drop_nans
- PR #5327 Add
cudf::cross_join
feature - PR #5204 Concatenate strings columns using row separator as strings column
- PR #5342 Add support for
StringMethods.__getitem__
- PR #5358 Add zero-copy
column_view
cast for compatible types - PR #3504 Add External Kafka Datasource
- PR #5356 Use
size_type
instead ofscalar
incudf::repeat
. - PR #5397 Add internal implementation of nested loop equijoins.
- PR #5303 Add slice_strings functionality using delimiter string
- PR #5394 Enable cast and binops with duration types (builds on PR 5359)
- PR #5301 Add Java bindings for
zfill
- PR #5411 Enable metadata collection for chunked parquet writer
- PR #5359 Add duration types
- PR #5364 Validate array interface during buffer construction
- PR #5418 Add support for
DataFrame.info
- PR #5425 Add Python
Groupby.rolling()
- PR #5434 Add nvtext function generate_character_grams
- PR #5442 Add support for
cudf.isclose
- PR #5444 Remove usage of deprecated RMM APIs and headers.
- PR #5463 Add
.str.byte_count
python api and cython(bindings) - PR #5488 Add plumbings for
.str.replace_tokens
- PR #5502 Add Unsigned int types support in dlpack
- PR #5497 Add
.str.isinteger
&.str.isfloat
- PR #5511 Port of clx subword tokenizer to cudf
- PR #5528 Add unsigned int reading and writing support to parquet
- PR #5510 Add support for
cudf.Index
to create Indexes - PR #5618 Add Kafka as a cudf datasource
- PR #5668 Adding support for
cudf.testing
- PR #5460 Add support to write to remote filesystems
- PR #5454 Add support for
DataFrame.append
,Index.append
,Index.difference
andIndex.empty
- PR #5536 Parquet reader - add support for multiple sources
- PR #5654 Adding support for
cudf.DataFrame.sample
andcudf.Series.sample
- PR #5607 Add Java bindings for duration types
- PR #5612 Add
is_hex
strings API - PR #5625 String conversion to and from duration types
- PR #5659 Added support for rapids-compose for Java bindings and other enhancements
- PR #5637 Parameterize Null comparator behaviour in Joins
- PR #5623 Add
is_ipv4
strings API - PR #5723 Parquet reader - add support for nested LIST columns
- PR #5669 Add support for reading JSON files with missing or out-of-order fields
- PR #5674 Support JIT backend on PowerPC64
- PR #5629 Add
ListColumn
andListDtype
- PR #5658 Add
filter_tokens
nvtext API - PR #5666 Add
filter_characters_of_type
strings API - PR #5778 Add support for
cudf::table
toarrow::Table
andarrow::Table
tocudf::table
- PR #5673 Always build and test with per-thread default stream enabled in the GPU CI build
- PR #5438 Add MD5 hash support
- PR #5704 Initial
fixed_point
Column Support - PR #5716 Add
double_type_dispatcher
to libcudf - PR #5739 Add
nvtext::detokenize
API - PR #5645 Enforce pd.NA and Pandas nullable dtype parity
- PR #5729 Create nvtext normalize_characters API from the subword_tokenize internal function
- PR #5572 Add
cudf::encode
API. - PR #5767 Add
nvtext::porter_stemmer_measure
andnvtext::is_letter
APIs - PR #5753 Add
cudf::lists::extract_list_element
API - PR #5568 Add support for
Series.keys()
andDataFrame.keys()
- PR #5782 Add Kafka support to custreamz
- PR #5642 Add
GroupBy.groups()
- PR #5811 Add
nvtext::edit_distance
API - PR #5789 Add groupby support for duration types
- PR #5810 Make Cython subdirs packages and simplify package_data
- PR #6005 Add support for Ampere
- PR #5807 Initial support for struct columns
- PR #5817 Enable more
fixed_point
unit tests by introducing "scale-less" constructor - PR #5822 Add
cudf_kafka
tocustreamz
run time conda dependency and fix bash syntax issue - PR #5903 Add duration support for Parquet reader, writer
- PR #5845 Add support for
mask_to_bools
- PR #5851 Add support for
Index.sort_values
- PR #5904 Add slice/split support for LIST columns
- PR #5857 Add dtypes information page in python docs
- PR #5859 Add conversion form
fixed_point
tobool
- PR #5781 Add duration types support in cudf(python/cython)
- PR #5815 LIST Support for ColumnVector
- PR #5931 Support for
add_calendrical_months
API - PR #5992 Add support for
.dt.strftime
- PR #6075 Parquet writer - add support for nested LIST columns
- PR #5492 compile_udf: compile straight to PTX instead of using @jit
- PR #5605 Automatically flush RMM allocate/free logs in JNI
- PR #5632 Switch JNI code to use
pool_memory_resource
instead of CNMeM - PR #5486 Link Boost libraries statically in the Java build
- PR #5479 Link Arrow libraries statically
- PR #5414 Use new release of Thrust/CUB in the JNI build
- PR #5403 Update required CMake version to 3.14 in contribution guide
- PR #5245 Add column reduction benchmark
- PR #5315 Use CMake
FetchContent
to obtaincub
andthrust
- PR #5398 Use CMake
FetchContent
to obtainjitify
andlibcudacxx
- PR #5268 Rely on NumPy arrays for out-of-band pickling
- PR #5288 Drop
auto_pickle
decorator #5288 - PR #5231 Type
Buffer
asuint8
- PR #5305 Add support for
numpy
/cupy
array inDataFrame
construction - PR #5308 Coerce frames to
Buffer
s in deserialization - PR #5309 Handle host frames in serialization
- PR #5312 Test serializing
Series
afterslice
- PR #5248 Support interleave_columns for string types
- PR #5332 Remove outdated dask-xgboost docs
- PR #5349 Improve libcudf documentation CSS style
- PR #5317 Optimize fixed_point rounding shift for integers
- PR #5386 Remove
cub
frominclude_dirs
insetup.py
- PR #5373 Remove legacy nvstrings/nvcategory/nvtext
- PR #5362 Remove dependency on
rmm._DevicePointer
- PR #5302 Add missing comparison operators to
fixed_point
type - PR #5824 Mark host frames as not needing to be writeable
- PR #5354 Split Dask deserialization methods by dask/cuda
- PR #5363 Handle
0-dim
inputs while broadcasting to a column - PR #5396 Remove legacy tests env variable from build.sh
- PR #5374 Port nvtext character_tokenize API to libcudf
- PR #5389 Expose typed accessors for Java HostMemoryBuffer
- PR #5379 Avoid chaining
Buffer
s - PR #5387 Port nvtext replace_tokens API to libcudf
- PR #5381 Change numpy usages to cupy in
10min.ipynb
- PR #5408 Update pyrrow and arrow-cpp to 0.17.1
- PR #5366 Add benchmarks for cuIO writers
- PR #5913 Call cudaMemcpyAsync/cudaMemsetAsync in JNI
- PR #5405 Add Error message to
StringColumn.unary_operator
- PR #5424 Add python plumbing for
.str.character_tokenize
- PR #5420 Aligning signature of
Series.value_counts
to Pandas - PR #5535 Update document for XGBoost usage with dask-cuda
- PR #5431 Adding support for unsigned int
- PR #5426 Refactor strings code to minimize calls to regex
- PR #5433 Add support for column inputs in
strings::starts_with
andstrings::ends_with
- PR #5427 Add Java bindings for unsigned data types
- PR #5429 Improve text wrapping in libcudf documentation
- PR #5443 Remove unused
is_simple
trait - PR #5441 Update Java HostMemoryBuffer to only load native libs when necessary
- PR #5452 Add support for strings conversion using negative timestamps
- PR #5437 Improve libcudf join documentation
- PR #5458 Install meta packages for dependencies
- PR #5467 Move doc customization scripts to Jenkins
- PR #5468 Add cudf::unique_count(table_view)
- PR #5482 Use rmm::device_uvector in place of rmm::device_vector in copy_if
- PR #5483 Add NVTX range calls to dictionary APIs
- PR #5477 Add
is_index_type
trait - PR #5487 Use sorted lists instead of sets for pytest parameterization
- PR #5491 allow build libcudf in custom dir
- PR #5501 Adding only unsigned types support for categorical column codes
- PR #5570 Add Index APIs such as
Int64Index
,UInt64Index
and others - PR #5503 Change
unique_count
todistinct_count
- PR #5514
convert_datetime.cu
Small Cleanup - PR #5496 Rename .cu tests (zero cuda kernels) to .cpp files
- PR #5518 split iterator and gather tests to speedup build tests
- PR #5526 Change
type_id
to enum class - PR #5559 Java APIs for missing date/time operators
- PR #5582 Add support for axis and other parameters to
DataFrame.sort_index
and fix other bunch of issues. - PR #5562 Add missing join type for java
- PR #5584 Refactor
CompactProtocolReader::InitSchema
- PR #5591 Add
__arrow_array__
protocol and raise a descriptive error message - PR #5635 Ad cuIO reader benchmarks for CSV, ORC and Parquet
- PR #5601 Instantiate Table instances in
Frame._concat
to avoidDF.insert()
overhead - PR #5602 Add support for concatenation of
Series
&DataFrame
incudf.concat
whenaxis=0
- PR #5603 Refactor JIT
parser.cpp
- PR #5643 Update
isort
to 5.0.4 - PR #5648 OO interface for hash join with explicit
build/probe
semantic - PR #5662 Make Java ColumnVector(long nativePointer) constructor public
- PR #5681 Pin black, flake8 and isort
- PR #5679 Use
pickle5
to test older Python versions - PR #5684 Use
pickle5
inSerializable
(when available) - PR #5419 Support rolling, groupby_rolling for durations
- PR #5687 Change strings::split_record to return a lists column
- PR #5708 Add support for
dummy_na
inget_dummies
- PR #5709 Update java build to help cu-spacial with java bindings
- PR #5713 Remove old NVTX utilities
- PR #5726 Replace use of
assert_frame_equal
in tests withassert_eq
- PR #5720 Replace owning raw pointers with std::unique_ptr
- PR #5702 Add inherited methods to python docs and other docs fixes
- PR #5733 Add support for
size
property inDataFrame
/Series
/Index
/MultiIndex
- PR #5735 Force timestamp creation only with duration
- PR #5743 Reduce number of test cases in concatenate benchmark
- PR #5748 Disable
tolist
API inSeries
&Index
and addtolist
dispatch indask-cudf
- PR #5744 Reduce number of test cases in reduction benchmark
- PR #5756 Switch JNI code to use the RMM owning wrapper
- PR #5725 Integrate Gbenchmarks into CI
- PR #5752 Add cuDF internals documentation (ColumnAccessor)
- PR #5759 Fix documentation describing JIT cache default location
- PR #5780 Add Java bindings for pad
- PR #5775 Update dask_cudf.read_parquet to align with upstream improvements
- PR #5785 Enable computing views of ListColumns
- PR #5791 Get nullable_pd_dtype from kwargs if provided in assert_eq
- PR #5786 JNI Header Cleanup for cuSpatial
- PR #5800 Expose arrow datasource instead of directly taking a RandomAccessFile
- PR #5795 Clarify documentation on Boost dependency
- PR #5803 Add in Java support for the repeat command
- PR #5806 Expose the error message from native exception when throwing an OOM exception
- PR #5825 Enable ORC statistics generation by default
- PR #5771 Enable gather/slicing/joins with ListColumns in Python
- PR #5834 Add support for dictionary column in concatenate
- PR #5832 Make dictionary_wrapper constructor from a value explicit
- PR #5833 Pin
dask
anddistributed
version to2.22.0
- PR #5856 Bump Pandas support to >=1.0,<1.2
- PR #5855 Java interface to limit RMM maximum pool size
- PR #5853 Disable
fixed_point
for use incopy_if
- PR #5854 Raise informative error in
DataFrame.iterrows
andDataFrame.itertuples
- PR #5864 Replace cnmem with pool_memory_resource in test/benchmark fixtures
- PR #5863 Explicitly require
ucx-py
on CI - PR #5879 Added support of sub-types and object wrappers in concat()
- PR #5884 Use S3 bucket directly for benchmark plugni
- PR #5881 Add in JVM extractListElement and stringSplitRecord
- PR #5885 Add in java support for merge sort
- PR #5894 Small code improvement / cleanup
- PR #5899 Add in gather support for Java
- PR #5906 Add macros for showing line of failures in unit tests
- PR #5933 Add in APIs to read/write arrow IPC formatted data from java
- PR #3918 Update cuDF internals doc
- PR #5970 Map data to pandas through arrow, always
- PR #6012 Remove
cudf._cuda
and replace usages withrmm._cuda
- PR #6045 Parametrize parquet_reader_list tests
- PR #6053 Import traits.hpp for cudftestutils consumers
- PR #6034 Specify
--basetemp
forpy.test
run - PR #5793 Fix leak in mutable_table_device_view by deleting _descendant_storage in table_device_view_base::destroy
- PR #5525 Make sure to allocate bitmasks of string columns only once
- PR #5336 Initialize conversion tables on a per-context basis
- PR #5283 Fix strings::ipv4_to_integers overflow to negative
- PR #5269 Explicitly require NumPy
- PR #5271 Fix issue when different dtype values are passed to
.cat.add_categories
- PR #5333 Fix
DataFrame.loc
issue with list like argument - PR #5299 Update package version for Java bindings
- PR #5300 Add support to ignore
None
incudf.concat
input - PR #5334 Fix pickling sizeof test
- PR #5337 Fix broken alias from DataFrame.{at,iat} to {loc, iloc}
- PR #5347 Fix APPLY_BOOLEAN_MASK_BENCH segfault
- PR #5368 Fix loc indexing issue with
datetime
type index - PR #5367 Fix API for
cudf::repeat
incudf::cross_join
- PR #5377 Handle array of cupy scalars in to_column
- PR #5326 Fix
DataFrame.__init__
for list of scalar inputs and related dask issue - PR #5383 Fix cython
type_id
enum mismatch - PR #5982 Fix gcc-9 compile errors under CUDA 11
- PR #5382 Fix CategoricalDtype equality comparisons
- PR #5989 Fix gcc-9 warnings on narrowing conversion
- PR #5385 Fix index issues in
DataFrame.from_gpu_matrix
- PR #5390 Fix Java data type IDs and string interleave test
- PR #5392 Fix documentation links
- PR #5978 Fix option to turn off NVTX
- PR #5410 Fix compile warning by disallowing bool column type for slice_strings
- PR #5404 Fix issue with column creation when chunked arrays are passed
- PR #5409 Use the correct memory resource when creating empty null masks
- PR #5399 Fix cpp compiler warnings of unreachable code
- PR #5439 Fix nvtext ngrams_tokenize performance for multi-byte UTF8
- PR #5446 Fix compile error caused by out-of-date PR merge (4990)
- PR #5983 Fix JNI gcc-9 compile error under CUDA 11
- PR #5423 Fix any() reduction ignore nulls
- PR #5459 Fix str.translate to convert table characters to UTF-8
- PR #5480 Fix merge sort docs
- PR #5465 Fix benchmark out of memory errors due to multiple initialization
- PR #5473 Fix RLEv2 patched base in ORC reader
- PR #5472 Fix str concat issue with indexed series
- PR #5478 Fix
loc
andiloc
doc - PR #5484 Ensure flat index after groupby if nlevels == 1
- PR #5489 Fix drop_nulls/boolean_mask corruption for large columns
- PR #5504 Remove some java assertions that are not needed
- PR #5516 Update gpuCI image in local build script
- PR #5529 Fix issue with negative timestamp in orc writer
- PR #5523 Handle
dtype
ofBuffer
objects when not passed explicitly - PR #5534 Fix the java build around type_id
- PR #5564 Fix CudfEngine.read_metadata API in dask_cudf
- PR #5537 Fix issue related to using
set_index
on a string series - PR #5561 Fix
copy_bitmask
issue with offset - PR #5609 Fix loc and iloc issue with column like input
- PR #5578 Fix getattr logic in GroupBy
- PR #5490 Fix python column view
- PR #5613 Fix assigning an equal length object into a masked out Series
- PR #5608 Fix issue related to string types being represented as binary types
- PR #5619 Fix issue related to typecasting when using a
CategoricalDtype
- PR #5649 Fix issue when empty Dataframe with index are passed to
cudf.concat
- PR #5644 Fix issue related to Dataframe init when passing in
columns
- PR #5340 Disable iteration in cudf objects and add support for
DataFrame
initialization with list ofSeries
- PR #5663 Move Duration types under Timestamps in doxygen Modules page
- PR #5664 Update conda upload versions for new supported CUDA/Python
- PR #5656 Fix issue with incorrect docker image being used in local build script
- PR #5671 Fix chunksize issue with
DataFrame.to_csv
- PR #5672 Fix crash in parquet writer while writing large string data
- PR #5675 Allow lists_column_wrappers to be constructed from incomplete hierarchies.
- PR #5691 Raise error on incompatible mixed-type input for a column
- PR #5692 Fix compilation issue with gcc 7.4.0 and CUDA 10.1
- PR #5693 Add fix missing from PR 5656 to update local docker image to py3.7
- PR #5703 Small fix for dataframe constructor with cuda array interface objects that don't have
descr
field - PR #5727 Fix
Index.__repr__
to allow representation of null values - PR #5719 Fix Frame._concat() with categorical columns
- PR #5736 Disable unsigned type in ORC writer benchmarks
- PR #5745 Update JNI cast for inability to cast timestamp and integer types
- PR #5750 Add RMM_ROOT/include to the spdlog search path in JNI build
- PR #5763 Update Java slf4j version to match Spark 3.0
- PR #5816 Always preserve list column hierarchies across operations.
- PR #5766 Fix issue related to
iloc
and slicing aDataFrame
- PR #5827 Revert fallback for
tolist
being absent - PR #5774 Add fallback for when
tolist
is absent - PR #5319 Disallow SUM and specialize MEAN of timestamp types
- PR #5797 Fix a missing data issue in some Parquet files
- PR #5787 Fix column create from dictionary column view
- PR #5764 Remove repetition of install instructions
- PR #5926 Fix SeriesGroupBy.nunique() to return a Series
- PR #5813 Fix normalizer exception with all-null strings column
- PR #5820 Fix ListColumn.to_arrow for all null case
- PR #5837 Bash syntax error in prebuild.sh preventing
cudf_kafka
andlibcudf_kafka
from being uploaded to Anaconda - PR #5841 Added custreamz functions that were missing in interface layer
- PR #5844 Fix
.str.cat
when objects with different index are passed - PR #5849 Modify custreamz api to integrate seamlessly with python streamz
- PR #5866 cudf_kafka python version inconsistencies in Anaconda packages
- PR #5872 libcudf_kafka r_path is causing docker build failures on centos7
- PR #5869 Fix bug in parquet writer in writing string column with offset
- PR #5910 Propagate
CUDA
insufficient driver error to the user - PR #5914 Link CUDA against libcudf_kafka
- PR #5895 Do not break kafka client consumption loop on local client timeout
- PR #5915 Fix reference count on Java DeviceMemoryBuffer after contiguousSplit
- PR #5941 Fix issue related to
string
todatetime64
column typecast - PR #5927 Fix return type of
MultiIndex.argsort
- PR #5942 Fix JIT cache multiprocess test failure
- PR #5929 Revised assertEquals for List Columns in java tests
- PR #5947 Fix null count for child device column vector
- PR #5951 Fix mkdir error in benchmark build
- PR #5949 Find Arrow include directory for JNI builds
- PR #5964 Fix API doc page title tag
- PR #5981 Handle
nat
infillna
for datetime and timedelta types - PR #6016 Fix benchmark fixture segfault
- PR #6003 Fix concurrent JSON reads crash
- PR #6032 Change black version to 19.10b0 in .pre-commit-config.yaml
- PR #6041 Fix Java memory resource handler to rethrow original exception object
- PR #6057 Fix issue in parquet reader with reading columns out of file-order
- PR #6098 Patch Thrust to workaround CUDA_CUB_RET_IF_FAIL macro clearing CUDA errors
- PR #5042 Use RMM for Numba
- PR #4472 Add new
partition
API to replacescatter_to_tables
. - PR #4626 LogBase binops
- PR #4750 Normalize NANs and Zeroes (JNI Bindings)
- PR #4689 Compute last day of the month for a given date
- PR #4771 Added in an option to statically link against cudart
- PR #4788 Add cudf::day_of_year API
- PR #4789 Disallow timestamp sum and diffs via binary ops
- PR #4815 Add JNI total memory allocated API
- PR #4906 Add Java bindings for interleave_columns
- PR #4900 Add
get_element
to obtain scalar from a column given an index - PR #4938 Add Java bindings for strip
- PR #4923 Add Java and JNI bindings for string split
- PR #4972 Add list_view (cudf::LIST) type
- PR #4990 Add lists_column_view, list_column_wrapper, lists support for concatenate
- PR #5073 gather support for cudf::LIST columns
- PR #5004 Added a null considering min/max binary op
- PR #4992 Add Java bindings for converting nans to nulls
- PR #4975 Add Java bindings for first and last aggregate expressions based on nth
- PR #5036 Add positive remainder binary op functionality
- PR #5055 Add atan2 binary op
- PR #5099 Add git commit hook for clang-format
- PR #5072 Adding cython binding to
get_element
- PR #5092 Add
cudf::replace_nans
- PR #4881 Support row_number in rolling_window
- PR #5068 Add Java bindings for arctan2
- PR #5132 Support out-of-band buffers in Python pickling
- PR #5139 Add
Serializable
ABC for Python - PR #5149 Add Java bindings for PMOD
- PR #5153 Add Java bindings for extract
- PR #5196 Add Java bindings for NULL_EQUALS, NULL_MAX and NULL_MIN
- PR #5192 Add support for
cudf.to_datetime
- PR #5203 Add Java bindings for is_integer and is_float
- PR #5205 Add ci test for libcudf, libnvstrings headers existence check in meta.yml
- PR #5239 Support for custom cuIO datasource classes
- PR #5293 Add Java bindings for replace_with_backrefs
- PR #5235 Make DataFrame.clean_renderable_dataframe() and DataFrame.get_renderable_dataframe non-public methods
- PR #4995 Add CMake option for per-thread default stream
- PR #5033 Fix Numba deprecations warnings with Numba 0.49+
- PR #4950 Fix import errors with Numba 0.49+
- PR #4825 Update the iloc exp in dataframe.py
- PR #4450 Parquet writer: add parameter to retrieve the raw file metadata
- PR #4531 Add doc note on conda
channel_priority
- PR #4479 Adding cuda 10.2 support via conda environment file addition
- PR #4486 Remove explicit template parameter from detail::scatter.
- PR #4471 Consolidate partitioning functionality into a single header.
- PR #4483 Add support fill() on dictionary columns
- PR #4498 Adds in support for chunked writers to java
- PR #4073 Enable contiguous split java test
- PR #4527 Add JNI and java bindings for matches_re
- PR #4606 Fix
scan
unit test and upgrade to more appropriate algorithms - PR #4527 Add JNI and java bindings for
matches_re
- PR #4532 Parquet reader: add support for multiple pandas index columns
- PR #4599 Add Java and JNI bindings for string replace
- PR #4655 Raise error for list like dtypes in cudf
- PR #4548 Remove string_view is_null method
- PR #4645 Add Alias for
kurtosis
askurt
- PR #4703 Optimize strings concatenate for many columns
- PR #4769 Remove legacy code from libcudf
- PR #4668 Add Java bindings for log2/log10 unary ops and log_base binary op
- PR #4616 Enable different RMM allocation modes in unit tests
- PR #4520 Fix several single char -> single char case mapping values. Add support for single -> multi char mappings.
- PR #4700 Expose events and more stream functionality in java
- PR #4699 Make Java's MemoryBuffer public and add MemoryBuffer.slice
- PR #4691 Fix compiler argument syntax for ccache
- PR #4792 Port
gather
,scatter
, andtype_dispatcher
benchmarks to libcudf++ - PR #3581 Remove
bool8
- PR #4692 Add GPU and CUDA validations
- PR #4705 quantile cython bindings
- PR #4627 Remove legacy Cython
- PR #4688 Add Java count aggregation to include null values
- PR #4331 Improved test for double that considers an epsilon
- PR #4731 Avoid redundant host->device copies when reading the entire CSV/JSON file
- PR #4739 Add missing aggregations for cudf::experimental::reduce
- PR #4738 Remove stop-gaps in StringMethods and enable related tests
- PR #4745 Fix
fsspec
related issue and upgradefsspec
version - PR #4779 Allow reading arbitrary stripes/rowgroup lists in CPP columnar readers
- PR #4766 Update to use header-only NVTX v3 and remove need to link against nvtx.
- PR #4716 Remove direct calls to RMM_ALLOC/RMM_FREE
- PR #4765 Add in java support for sequence
- PR #4772 Cleanup
dask_cudf
to_parquet
and enable"_metadata"
creation - PR #4733 Fix
isin
docs forDataFrame
,Series
,Index
, and addDataFrame.isin
support - PR #4767 Remove linking against
gtest_main
andgmock_main
in unit tests - PR #4660 Port
cudf::partition
api to python/cython - PR #4799 Remove null_count() and has_nulls() from column_device_view
- PR #4778 Remove
scatter_to_tables
from libcudf, cython and python - PR #4783 Add support for child columns to mutable_column_device_view
- PR #4802 Refactor
cudf::transpose
to increase performance. - PR #4776 Improve doxygen comments for libcudf string/timestamp conversion formats
- PR #4793 Add
cudf._cuda
to setup.py - PR #4790 Replace the use of deprecated rmm APIs in the test environment
- PR #4809 Improve libcudf doc rendering and add a new main page
- PR #4811 Add precision to subsecond specifier in timestamp/string conversion format
- PR #4543 Add
inplace
parameter support forSeries.replace
&DataFrame.replace
- PR #4816 Remove java API use of deprecated RMM APIs
- PR #4817 Fix
fixed_point
documentation - PR #4844 Change Doxygen color to RAPIDS purple and documentation improvement
- PR #4840 Add docs for
T
,empty
&values
- PR #4841 Remove unused
single_lane_block_popc_reduce
function - PR #4842 Added Java bindings for titlizing a String column
- PR #4847 Replace legacy NVTX calls with "standalone" NVTX bindings calls
- PR #4851 Performance improvements relating to
concat
- PR #4852 Add NVTX range calls to strings and nvtext APIs
- PR #4849 Update Java bindings to use new NVTX API
- PR #4845 Add CUDF_FUNC_RANGE to top-level cuIO function APIs
- PR #4848 Side step
unique_count
calculation inscatter_by_map
- PR #4863 Create is_integer/is_float functions for checking characters before calling to_integers/to_floats
- PR #4864 Add support for
__array__
method in cuDF - PR #4853 Added CUDA_TRY to multiple places in libcudf code
- PR #4870 Add chunked parquet file writing from python
- PR #4865 Add docs and clarify limitations of
applymap
- PR #4867 Parquet reader: coalesce adjacent column chunk reads
- PR #4871 Add in the build information when building the java jar file
- PR #4869 Expose contiguous table when deserializing from Java
- PR #4878 Remove obsolete string_from_host utility
- PR #4873 Prevent mutable_view() from invoking null count
- PR #4806 Modify doc and correct cupy array conversions in
10min-cudf-cupy.ipynb
- PR #4877 Fix
DataFrame.mask
and alignmask
&where
behavior with pandas - PR #4884 Add more NVTX annotations in cuDF Python
- PR #4902 Use ContextDecorator instead of contextmanager for nvtx.annotate
- PR #4894 Add annotations for the
.columns
property and setter - PR #4901 Improve unit tests for casting Java numeric types to string
- PR #4888 Handle dropping of nan's & nulls using
skipna
parameter in Statistical reduction ops - PR #4903 Improve internal documentation of cudf-io compression/decompression kernels
- PR #4905 Get decorated function name as message when annotating
- PR #4907 Reuse EventAttributes across NVTX annotations
- PR #4912 Drop old
valid
check inelement_indexing
- PR #4924 Properly handle npartition argument in rearrange_by_hash
- PR #4918 Adding support for
cupy.ndarray
inseries.loc
- PR #4909 Added ability to transform a column using cuda method in Java bindings
- PR #3259 Add .clang-format file & format all files
- PR #4943 Fix-up error handling in GPU detection
- PR #4917 Add support for casting unsupported
dtypes
of same kind - PR #4928 Misc performance improvements for
scatter_by_map
- PR #4927 Use stack for memory in
deviceGetName
- P# #4933 Enable nop annotate
- PR #4929 Java methods ensure calling thread's CUDA device matches RMM device
- PR #4956 Dropping
find_first_value
andfind_last_value
- PR #4962 Add missing parameters to
DataFrame.replace
&Series.replace
- PR #4960 Return the result of
to_json
- PR #4963 Use
cudaDeviceAttr
ingetDeviceAttribute
- PR #4953 add documentation for supported NVIDIA GPUs and CUDA versions for cuDF
- PR #4967 Add more comments to top-level gpuinflate and debrotli kernels
- PR #4968 Add CODE_OF_CONDUCT.md
- PR #4980 Change Java HostMemoryBuffer default to prefer pinned memory
- PR #4994 clang-format "cpp/tests" directory
- PR #4993 Remove Java memory prediction code
- PR #4985 Add null_count to Python Column ctors and use already computed null_count when possible
- PR #4998 Clean up dispatch of aggregation methods in result_cache
- PR #5000 Performance improvements in
isin
and dask_cudf backend - PR #5002 Fix Column.reduce to accept
null_count
- PR #5006 Add Java bindings for strip, lstrip and rstrip
- PR #5047 Add Cython binding for libcudf++ CSV reader
- PR #5027 Move nvstrings standalone docs pages to libcudf doxygen pages
- PR #4947 Add support for
CategoricalColumn
to be type-casted with different categories - PR #4822 Add constructor to
pq_chunked_state
to enable using RAII idiom - PR #5024 CSV reader input stage optimizations
- PR #5061 Add support for writing parquet to python file-like objects
- PR #5034 Use loc to apply boolmask to frame efficiently when constructing query result
- PR #5039 Make
annotate
picklable - PR #5045 Remove call to
unique()
in concat whenaxis=1
- PR #5023 Object oriented join and column agnostic typcasting
- PR #5049 Add grouping of libcudf apis into doxygen modules
- PR #5069 Remove duplicate documentation from detail headers
- PR #5075 Add simple row-group aggregation mechanism in dask_cudf read_parquet
- PR #5084 Improve downcasting in
Series.label_encoding()
to reduce memory usage - PR #5085 Print more precise numerical strings in unit tests
- PR #5028 Add Docker 19 support to local gpuci build
- PR #5093 Add
.cat.as_known
related test indask_cudf
- PR #5100 Add documentation on libcudf doxygen guidelines
- PR #5106 Add detail API for
cudf::concatenate
with tables - PR #5104 Add missing
.inl
files to clang-format and git commit hook - PR #5112 Adding
htoi
andip2int
support toStringMethods
- PR #5101 Add POSITION_INDEPENDENT_CODE flag to static cudftestutil library
- PR #5109 Update CONTRIBUTING.md for
clang-format
pre-commit hook - PR #5054 Change String typecasting to be inline with Pandas
- PR #5123 Display more useful info on
clang-format
CI Failure - PR #5058 Adding cython binding for CSV writer
- PR #5156 Raise error when applying boolean mask containing null values.
- PR #5137 Add java bindings for getSizeInBytes in DType
- PR #5194 Update Series.fillna to reflect dtype behavior
- PR #5159 Add
make_meta_object
indask_cudf
backend and addstr.split
test - PR #5147 Use logging_resource_adaptor from RMM in the JNI code
- PR #5184 Fix style checks
- PR #5198 Add detail headers for strings converter functions
- PR #5199 Add index support in
DataFrame.query
- PR #5227 Refactor
detail::gather
API to make use of scoped enumerators - PR #5218 Reduce memory usage when categorifying column with null values.
- PR #5209 Add
nan_as_null
support tocudf.from_pandas
- PR #5207 Break up backref_re.cu into multiple source files to improve compile time
- PR #5155 Fix cudf documentation misspellings
- PR #5208 Port search and join benchmark to libcudf++
- PR #5214 Move docs build script into repository
- PR #5219 Add per context cache for JIT kernels
- PR #5250 Improve
to_csv()
support for writing to buffers - PR #5233 Remove experimental namespace used during libcudf++ refactor
- PR #5213 Documentation enhancements to
cudf
python APIs - PR #5251 Fix more mispellings in cpp comments and strings
- PR #5261 Add short git commit to conda package name
- PR #5254 Deprecate nvstrings, nvcategory and nvtext
- PR #5270 Add support to check for "NaT" and "None" strings while typecasting to
datetime64
- PR #5298 Remove unused native deps from java library
- PR #5216 Make documentation uniform for params
- PR #5221 Fix the use of user-provided resource on temporary values
- PR #5181 Allocate null count using the default resource in
copy_if
- PR #5141 Use user-provided resource correctly in
unary_operation()
andshift()
- PR #5064 Fix
hash()
andconstruct_join_output_df()
to use user-provided memory resource correctly - PR #4386 Update Java package to 0.14
- PR #4466 Fix merge key column sorting
- PR #4402 Fix
cudf::strings::join_strings
logic with all-null strings and null narep - PR #4610 Fix validity bug in string scalar factory
- PR #4570 Fixing loc ordering issue in dataframe
- PR #4612 Fix invalid index handling in cudf:dictionary:add-keys call to gather
- PR #4614 Fix cuda-memcheck errors found in column_tests.cu and copying/utility_tests.cu
- PR #4614 Fix cuda-memcheck errors found in
column_tests.cu
andcopying/utility_tests.cu
- PR #4639 Fix java column of empty strings issue
- PR #4613 Fix issue related to downcasting in
.loc
- PR #4615 Fix potential OOB write in ORC writer compression stage
- PR #4587 Fix non-regex libcudf contains methods to return true when target is an empty string
- PR #4617 Fix memory leak in aggregation object destructor
- PR #4633 String concatenation fix in
DataFrame.rename
- PR #4609 Fix to handle
Series.factorize
when index is set - PR #4659 Fix strings::replace_re handling empty regex pattern
- PR #4652 Fix misaligned error when computing regex device structs
- PR #4651 Fix hashing benchmark missing includes
- PR #4672 Fix docs for
value_counts
and update test cases - PR #4672 Fix
__setitem__
handling list of column names - PR #4673 Fix regex infinite loop while parsing invalid quantifier pattern
- PR #4679 Fix comments for make_dictionary_column factory functions
- PR #4711 Fix column leaks in Java unit test
- pR #4721 Fix string binop to update nulls appropriately
- PR #4722 Fix strings::pad when using pad::both with odd width
- PR #4743 Fix loc issue with Multiindex on DataFrame and Series
- PR #4725 Fix issue java with not setting GPU on background thread
- PR #4701 Fix issue related to mixed input types in
as_column
- PR #4748 Fix strings::all_characters_of_type to allow verify-types mask
- PR #4747 Fix random failures of decompression gtests
- PR #4749 Setting
nan_as_null=True
while creating a column in DataFrame creation - PR #4761 Fix issues with
nan_as_null
in certain case - PR #4650 Fix type mismatch & result format issue in
searchsorted
- PR #4755 Fix Java build to deal with new quantiles API
- PR #4720 Fix issue related to
dtype
param not being adhered incase of cuda arrays - PR #4756 Fix regex error checking for valid quantifier condition
- PR #4777 Fix data pointer for column slices of zero length
- PR #4770 Fix readonly flag in
Column. __cuda_array_interface__
- PR #4800 Fix dataframe slicing with strides
- PR #4796 Fix groupby apply for operations that fail on empty groups
- PR #4801 gitignore
_cuda/*.cpp
files - PR #4805 Fix hash_object_dispatch definitions in dask_cudf
- PR #4813 Fix
GenericIndex
printing - PR #4804 Fix issue related
repartition
during hash based repartition - PR #4814 Raise error if
to_csv
does not getfilename/path
- PR #4821 Port apply_boolean_mask_benchmark to new cudf::column types
- PR #4826 Move memory resource from RmmTestEnvironment to the custom gtest main() scope
- PR #4839 Update Java bindings for timestamp cast formatting changes
- PR #4797 Fix string timestamp to datetime conversion with
ms
andns
- PR #4854 Fix several cases of incorrect downcasting of operands in binops
- PR #4834 Fix bug in transform in handling single line UDFs
- PR #4857 Change JIT cache default directory to $HOME/.cudf
- PR #4807 Fix
categories
duplication indask_cudf
- PR #4846 Fix CSV parsing with byte_range parameter and string columns
- PR #4883 Fix series get/set to match pandas
- PR #4861 Fix to_integers illegal-memory-access with all-empty strings column
- PR #4860 Fix issues in HostMemoryBufferTest, and testNormalizeNANsAndZeros
- PR #4879 Fix output for
cudf.concat
withaxis=1
for pandas parity - PR #4838 Fix to support empty inputs to
replace
method - PR #4859 JSON reader: fix data type inference for string columns
- PR #4868 Temporary fix to skip validation on Dask related runs
- PR #4872 Fix broken column wrapper constructors in merge benchmark
- PR #4875 Fix cudf::strings::from_integer logic converting min integer to string
- PR #4876 Mark Java cleaner objects as being cleaned even if exception is thrown
- PR #4780 Handle nulls in Statistical column operations
- PR #4886 Minimize regex-find calls in multi-replace cudf::strings::replace_re function
- PR #4887 Remove
developer.rst
and any links - PR #4915 Fix to
reset_index
inplace in MultiIndex and other places - PR #4899 Fix series inplace handling
- PR #4940 Fix boolean mask issue with large sized Dataframe
- PR #4889 Fix multi-index merging
- PR #4922 Fix cudf::strings:split logic for many columns
- PR #4949 Fix scatter, gather benchmark constructor call
- PR #4958 Fix strings::replace perf for long strings
- PR #4965 Raise Error when there are duplicate columns sent to
cudf.concat
- PR #4983 Fix from_cudf in dask_cudf
- PR #4996 Parquet writer: fix potentially zero-sized string dictionary
- PR #5009 Fix pickling for string and categorical columns
- PR #4984 Fix groupby nth aggregation negative n and exclude nulls
- PR #5011 Fix DataFrame loc issue with boolean masking
- PR #4977 Fix compilation of cuDF benchmarks with build.sh
- PR #5018 Fix crash when JIT cache dir inaccessible. Fix inter version cache clash for custom cache path.
- PR #5005 Fix CSV reader error when only one of the row selection parameters is set
- PR #5022 Add timestamp header to transform
- PR #5021 Fix bug with unsigned right shift and scalar lhs
- PR #5020 Fix
conda install pre_commit
not found when setting up dev environment - PR #5030 Fix Groupby sort=True
- PR #5029 Change temporary dir to working dir for cudf io tests
- PR #5040 Fix
make_scalar_iterator()
andmake_pair_iterator(scalar)
to not copy values to host - PR #5041 Fix invalid java test for shift right unsigned
- PR #5043 Remove invalid examples page libcudf doxygen
- PR #5060 Fix unsigned char limits issue in JIT by updating Jitify
- PR #5070 Fix libcudf++ csv reader support for hex dtypes, doublequotes and empty columns
- PR #5057 Fix metadata_out parameter not reaching parquet
write_all
- PR #5076 Fix JNI code for null_policy enum change
- PR #5031 grouped_time_range_rolling_window assumes ASC sort order
- PR #5032 grouped_time_range_rolling_window should permit invocation without specifying grouping_keys
- PR #5103 Fix
read_csv
issue with names and header - PR #5090 Fix losing nulls while creating DataFrame from dictionary
- PR #5089 Return false for sign-only string in libcudf is_float and is_integer
- PR #5124
DataFrame.rename
support for renaming indexes w/ default forindex
- PR #5108 Fix float-to-string convert for -0.0
- PR #5111 Fix header not being included in legacy jit transform.
- PR #5115 Fix hex-to-integer logic when string has prefix '0x'
- PR #5118 Fix naming for java string length operators
- PR #5129 Fix missed reference in tests from 5118
- PR #5122 Fix
clang-format
custrings
bug - PR #5138 Install
contextvars
backport on Python 3.6 - PR #5145 Fix an issue with calling an aggregation operation on
SeriesGroupBy
- PR #5148 Fix JNI build for GCC 8
- PR #5162 Fix issues related to empty
Dataframe
inas_gpu_matrix
&astype
- PR #5167 Fix regex extract match to return empty string
- PR #5163 Fix parquet INT96 timestamps before the epoch
- PR #5165 Fix potentially missing last row in libcudf++ csv reader
- PR #5185 Fix flake8 configuration and issues from new flake8 version
- PR #5193 Fix OOB read in csv reader
- PR #5191 Fix the use of the device memory resource
- PR #5212 Fix memory leak in
dlpack.pyx:from_dlpack()
- PR #5224 Add new headers from 5198 to libcudf/meta.yaml
- PR #5228 Fix datetime64 scalar dtype handling for unsupported time units
- PR #5256 ORC reader: fix loading individual timestamp columns
- PR #5285 Fix DEBUG compilation failure due to
fixed_point.hpp
- PR #4360 Added Java bindings for bitwise shift operators
- PR #3577 Add initial dictionary support to column classes
- PR #3777 Add support for dictionary column in gather
- PR #3693 add string support, skipna to scan operation
- PR #3662 Define and implement
shift
. - PR #3861 Added Series.sum feature for String
- PR #4069 Added cast of numeric columns from/to String
- PR #3681 Add cudf::experimental::boolean_mask_scatter
- PR #4040 Add support for n-way merge of sorted tables
- PR #4053 Multi-column quantiles.
- PR #4100 Add set_keys function for dictionary columns
- PR #3894 Add remove_keys functions for dictionary columns
- PR #4107 Add groupby nunique aggregation
- PR #4235 Port nvtx.pyx to use non-legacy libcudf APIs
- PR #4153 Support Dask serialization protocol on cuDF objects
- PR #4127 Add python API for n-way sorted merge (merge_sorted)
- PR #4164 Add Buffer "constructor-kwargs" header
- PR #4172 Add groupby nth aggregation
- PR #4159 Add COUNT aggregation that includes null values
- PR #4190 Add libcudf++ transpose Cython implementation
- PR #4063 Define and implement string capitalize and title API
- PR #4217 Add libcudf++ quantiles Cython implementation
- PR #4216 Add cudf.Scalar Python type
- PR #3782 Add
fixed_point
class to support DecimalType - PR #4272 Add stable sorted order
- PR #4129 Add libcudf++ interleave_columns and tile Cython implementation
- PR #4262 Port unaryops.pyx to use libcudf++ APIs
- PR #4276 Port avro.pyx to libcudf++
- PR #4259 Ability to create Java host buffers from memory-mapped files
- PR #4240 Add groupby::groups()
- PR #4294 Add Series rank and Dataframe rank
- PR #4304 Add new NVTX infrastructure and add ranges to all top-level compute APIs.
- PR #4319 Add repartition_by_hash API to dask_cudf
- PR #4315 ShiftLeft, ShiftRight, ShiftRightUnsigned binops
- PR #4321 Expose Python Semi and Anti Joins
- PR #4291 Add Java callback support for RMM events
- PR #4298 Port orc.pyx to libcudf++
- PR #4344 Port concat.pyx to libcudf++
- PR #4329 Add support for dictionary columns in scatter
- PR #4352 Add factory function make_column_from_scalar
- PR #4381 Add Java support for copying buffers with asynchronous streams
- PR #4288 Add libcudf++ shift Cython implementation
- PR #4338 Add cudf::sequence() for generating an incrementing list of numeric values
- PR #4456 Add argmin/max and string min/max to sort groupby
- PR #4564 Added Java bindings for clamp operator.
- PR #4602 Add Cython bindings for functions in
datetime.hpp
- PR #4670 Add java and JNI bindings for contains_re
- PR #4363 Grouped Rolling Window support
- PR #4798 Add UDF support to grouped rolling window
- PR #3917 Add dictionary add_keys function
- PR #3842 ORC writer: add support for column statistics
- PR #4088 Added asString() on ColumnVector in Java that takes a format string
- PR #4484 Port CSV writer to libcudf++
- PR #4641 Add replace example in dataframe.py and update 10min.ipynb
- PR #4140 Add cudf series examples and corr() method for dataframe in dataframe.py
- PR #4187 exposed getNativeView method in Java bindings
- PR #3525 build.sh option to disable nvtx
- PR #3748 Optimize hash_partition using shared memory
- PR #3808 Optimize hash_partition using shared memory and cub block scan
- PR #3698 Add count_(un)set_bits functions taking multiple ranges and updated slice to compute null counts at once.
- PR #3909 Move java backend to libcudf++
- PR #3971 Adding
as_table
to convert Column to Table in python - PR #3910 Adding sinh, cosh, tanh, asinh, acosh, atanh cube root and rint unary support.
- PR #3972 Add Java bindings for left_semi_join and left_anti_join
- PR #3975 Simplify and generalize data handling in
Buffer
- PR #3985 Update RMM include files and remove extraneously included header files.
- PR #3601 Port UDF functionality for rolling windows to libcudf++
- PR #3911 Adding null boolean handling for copy_if_else
- PR #4003 Drop old
to_device
utility wrapper function - PR #4002 Adding to_frame and fix for categorical column issue
- PR #4009 build script update to enable cudf build without installing
- PR #3897 Port cuIO JSON reader to cudf::column types
- PR #4008 Eliminate extra copy in column constructor
- PR #4013 Add cython definition for io readers cudf/io/io_types.hpp
- PR #4028 Port json.pyx to use new libcudf APIs
- PR #4014 ORC/Parquet: add count parameter to stripe/rowgroup-based reader API
- PR #3880 Add aggregation infrastructure support for cudf::reduce
- PR #4059 Add aggregation infrastructure support for cudf::scan
- PR #4021 Change quantiles signature for clarity.
- PR #4057 Handle offsets in cython Column class
- PR #4045 Reorganize
libxx
directory - PR #4029 Port stream_compaction.pyx to use libcudf++ APIs
- PR #4031 Docs build scripts and instructions update
- PR #4062 Improve how java classifiers are produced
- PR #4038 JNI and Java support for is_nan and is_not_nan
- PR #3786 Adding string support to rolling_windows
- PR #4067 Removed unused
CATEGORY
type ID. - PR #3891 Port NVStrings (r)split_record to contiguous_(r)split_record
- PR #4070 Port NVText normalize_spaces to use libcudf strings column
- PR #4072 Allow round_robin_partition to single partition
- PR #4064 Add cudaGetDeviceCount to JNI layer
- PR #4075 Port nvtext ngrams-tokenize to libcudf++
- PR #4087 Add support for writing large Parquet files in a chunked manner.
- PR #3716 Update cudf.to_parquet to use new GPU accelerated Parquet writer
- PR #4083 Use two partitions in test_groupby_multiindex_reset_index
- PR #4071 Add Java bindings for round robin partition
- PR #4079 Simply use
mask.size
to create the array view - PR #4092 Keep mask on GPU for bit unpacking
- PR #4081 Copy from
Buffer
's pointer directly to host - PR #4105 Change threshold of using optimized hash partition code
- PR #4101 Redux serialize
Buffer
directly with__cuda_array_interface__
- PR #4098 Remove legacy calls from libcudf strings column code
- PR #4044 Port join.pyx to use libcudf++ APIs
- PR #4111 Use
Buffer
's to serializeStringColumn
- PR #4567 Optimize
__reduce__
inStringColumn
- PR #4590 Register a few more types for Dask serialization
- PR #4113 Get
len
ofStringColumn
s withoutnvstrings
- PR #4147 Remove workaround for UNKNOWN_NULL_COUNT in contiguous_split.
- PR #4130 Renames in-place
cudf::experimental::fill
tocudf::experimental::fill_in_place
- PR #4136 Add
Index.names
property - PR #4139 Port rolling.pyx to new libcudf APIs
- PR #4143 Renames in-place
cudf::experimental::copy_range
tocudf::experimental::copy_range_in_place
- PR #4144 Release GIL when calling libcudf++ functions
- PR #4082 Rework MultiColumns in cuDF
- PR #4149 Use "type-serialized" for pickled types like Dask
- PR #4174 Port hash groupby to libcudf++
- PR #4171 Split java host and device vectors to make a vector truly immutable
- PR #4167 Port
search
to libcudf++ (support multi-column searchsorted) - PR #4163 Assert Dask CUDA serializers have
Buffer
frames - PR #4165 List serializable classes once
- PR #4168 IO readers: do not create null mask for non-nullable columns
- PR #4177 Use
uint8
type for host array copy ofBuffer
- PR #4183 Update Google Test Execution
- PR #4182 Rename cuDF serialize functions to be more generic
- PR #4176 Add option to parallelize setup.py's cythonize
- PR #4191 Porting sort.pyx to use new libcudf APIs
- PR #4196 reduce CHANGELOG.md merge conflicts
- PR #4197 Added notebook testing to gpuCI gpu build
- PR #4220 Port strings wrap functionality.
- PR #4204 Port nvtext create-ngrams function
- PR #4219 Port dlpack.pyx to use new libcudf APIs
- PR #4225 Remove stale notebooks
- PR #4233 Porting replace.pyx to use new libcudf APIs
- PR #4223 Fix a few of the Cython warnings
- PR #4224 Optimize concatenate for many columns
- PR #4234 Add BUILD_LEGACY_TESTS cmake option
- PR #4231 Support for custom cuIO data_sink classes.
- PR #4251 Add class to docs in
dask-cudf
derived_from
- PR #4261 libxx Cython reorganization
- PR #4274 Support negative position values in slice_strings
- PR #4282 Porting nvstrings conversion functions from new libcudf++ to Python/Cython
- PR #4290 Port Parquet to use new libcudf APIs
- PR #4299 Convert cudf::shift to column-based api
- PR #4301 Add support for writing large ORC files in a chunked manner
- PR #4306 Use libcudf++
unary.pyx
cast instead of legacy cast - PR #4295 Port reduce.pyx to libcudf++ API
- PR #4305 Move gpuarrow.pyx and related libarrow_cuda files into
_libxx
- PR #4244 Port nvstrings Substring Gather/Scatter functions to cuDF Python/Cython
- PR #4280 Port nvstrings Numeric Handling functions to cuDF Python/Cython
- PR #4278 Port filling.pyx to libcudf++ API
- PR #4328 Add memory threshold callbacks for Java RMM event handler
- PR #4336 Move a bunch of internal nvstrings code to use native StringColumns
- PR #4166 Port
is_sorted.pyx
to use libcudf++ APIs - PR #4351 Remove a bunch of internal usage of Numba; set rmm as cupy allocator
- PR #4333 nvstrings case/capitalization cython bindings
- PR #4345 Removed an undesirable backwards include from /include to /src in cuIO writers.hpp
- PR #4367 Port copying.pyx to use new libcudf
- PR #4362 Move pq_chunked_state struct into it's own header to match how orc writer is doing it.
- PR #4339 Port libcudf strings
wrap
api to cython/python - PR #4236 Update dask_cudf.io.to_parquet to use cudf to_parquet
- PR #4311 Port nvstrings String Manipulations functions to cuDF Python/Cython
- PR #4373 Port nvstrings Regular Expressions functions to cuDF Python/Cython
- PR #4308 Replace dask_cudf sort_values and improve set_index
- PR #4407 Enable
.str.slice
&.str.get
and.str.zfill
unit-tests - PR #4412 Require Dask + Distributed 2.12.0+
- PR #4377 Support loading avro files that contain nested arrays
- PR #4436 Enable
.str.cat
and fix.str.split
on python side - PR #4405 Port nvstrings (Sub)string Comparisons functions to cuDF Python/Cython
- PR #4316 Add Java and JNI bindings for substring expression
- PR #4314 Add Java and JNI bindings for string contains
- PR #4461 Port nvstrings Miscellaneous functions to cuDF Python/Cython
- PR #4495 Port nvtext to cuDF Python/Cython
- PR #4503 Port binaryop.pyx to libcudf++ API
- PR #4499 Adding changes to handle include
keep_index
andRangeIndex
- PR #4533 Import
tlz
for optionalcytoolz
support - PR #4493 Skip legacy testing in CI
- PR #4346 Port groupby Cython/Python to use libcudf++ API
- PR #4524 Updating
__setitem__
for DataFrame to use scalar scatter - PR #4611 Fix to use direct slicing in iloc for multiindex than using gather under
_get_row_major
- PR #4534 Disable deprecation warnings as errors.
- PR #4542 Remove RMM init/finalize in cudf test fixture.
- PR #4506 Check for multi-dimensional data in column/Series creation
- PR #4549 Add option to disable deprecation warnings.
- PR #4516 Add negative value support for
.str.get
- PR #4563 Remove copying to host for metadata generation in
generate_pandas_metadata
- PR #4554 Removed raw RMM allocation from
column_device_view
- PR #4619 Remove usage of
nvstrings
indata_array_view
- PR #4654 Upgrade version of
numba
required to>=0.48.0
- PR #4035 Port NVText tokenize function to libcudf++
- PR #4042 Port cudf/io/functions.hpp to Cython for use in IO bindings
- PR #4058 Port hash.pyx to use libcudf++ APIs
- PR #4133 Mask cleanup and fixes: use
int32
dtype, ensure 64 byte padding, handle offsets
- PR #3888 Drop
ptr=None
fromDeviceBuffer
call - PR #3976 Fix string serialization and memory_usage method to be consistent
- PR #3902 Fix conversion of large size GPU array to dataframe
- PR #3953 Fix overflow in column_buffer when computing the device buffer size
- PR #3959 Add missing hash-dispatch function for cudf.Series
- PR #3970 Fix for Series Pickle
- PR #3964 Restore legacy NVStrings and NVCategory dependencies in Java jar
- PR #3982 Fix java unary op enum and add missing ops
- PR #3999 Fix issue serializing empty string columns (java)
- PR #3979 Add
name
to Series serialize and deserialize - PR #4005 Fix null mask allocation bug in gather_bitmask
- PR #4000 Fix dask_cudf sort_values performance for single partitions
- PR #4007 Fix for copy_bitmask issue with uninitialized device_buffer
- PR #4037 Fix JNI quantile compile issue
- PR #4054 Fixed JNI to deal with reduction API changes
- PR #4052 Fix for round-robin when num_partitions divides nrows.
- PR #4061 Add NDEBUG guard on
constexpr_assert
. - PR #4049 Fix
cudf::split
issue returning one less than expected column vectors - PR #4065 Parquet writer: fix for out-of-range dictionary indices
- PR #4066 Fixed mismatch with dtype enums
- PR #4078 Fix joins for when column_in_common input parameter is empty
- PR #4080 Fix multi-index dask test with sort issue
- PR #4084 Update Java for removal of CATEGORY type
- PR #4086 ORC reader: fix potentially incorrect timestamp decoding in the last rowgroup
- PR #4089 Fix dask groupby mutliindex test case issues in join
- PR #4097 Fix strings concatenate logic with column offsets
- PR #4076 All null string entries should have null data buffer
- PR #4109 Use rmm::device_vector instead of thrust::device_vector
- PR #4113 Use
.nvstrings
inStringColumn.sum(...)
- PR #4116 Fix a bug in contiguous_split() where tables with mixed column types could corrupt string output
- PR #4125 Fix type enum to account for added Dictionary type in
types.hpp
- PR #4132 Fix
hash_partition
null mask allocation - PR #4137 Update Java for mutating fill and rolling window changes
- PR #4184 Add missing except+ to Cython bindings
- PR #4141 Fix NVStrings test_convert failure in 10.2 build
- PR #4156 Make fill/copy_range no-op on empty columns
- PR #4158 Fix merge issue with empty table return if one of the two tables are empty
- PR #4162 Properly handle no index metadata generation for to_parquet
- PR #4175 Fix
__sizeof__
calculation inStringColumn
- PR #4155 Update groupby group_offsets size and fix unnecessary device dispatch.
- PR #4186 Fix from_timestamps 12-hour specifiers support
- PR #4198 Fix constructing
RangeIndex
fromrange
- PR #4192 Parquet writer: fix OOB read when computing string hash
- PR #4201 Fix java window tests
- PR #4199 Fix potential race condition in memcpy_block
- PR #4221 Fix series dict alignment to not drop index name
- PR #4218 Fix
get_aggregation
definition withexcept *
- PR #4215 Fix performance regression in strings::detail::concatenate
- PR #4214 Alter ValueError exception for GPU accelerated Parquet writer to properly report
categorical
columns are not supported. - PR #4232 Fix handling empty tuples of children in string columns
- PR #4222 Fix no-return compile error in binop-null-test
- PR #4242 Fix for rolling tests CI failure
- PR #4245 Fix race condition in parquet reader
- PR #4253 Fix dictionary decode and set_keys with column offset
- PR #4258 Fix dask-cudf losing index name in
reset_index
- PR #4268 Fix java build for hash aggregate
- PR #4275 Fix bug in searching nullable values in non-nullable search space in
upper_bound
- PR #4273 Fix losing
StringIndex
name in dask_meta_nonempty
- PR #4279 Fix converting
np.float64
to Scalar - PR #4285 Add init files for cython pkgs and fix
setup.py
- PR #4287 Parquet reader: fix empty string potentially read as null
- PR #4310 Fix empty values case in groupby
- PR #4297 Fix specification of package_data in setup.py
- PR #4302 Fix
_is_local_filesystem
check - PR #4303 Parquet reader: fix empty columns missing from table
- PR #4317 Fix fill() when using string_scalar with an empty string
- PR #4324 Fix slice_strings for out-of-range start position value
- PR #4115 Serialize an empty column table with non zero rows
- PR #4327 Preemptive dispatch fix for changes in dask#5973
- PR #4379 Correct regex reclass count variable to number of pairs instead of the number of literals
- PR #4364 Fix libcudf zfill strings to ignore '+/-' chars
- PR #4358 Fix strings::concat where narep is an empty string
- PR #4369 Fix race condition in gpuinflate
- PR #4390 Disable ScatterValid and ScatterNull legacy tests
- PR #4399 Make scalar destructor virtual.
- PR #4398 Fixes the failure in groupby in MIN/MAX on strings when some groups are empty
- PR #4406 Fix sorted merge issue with null values and ascending=False
- PR #4445 Fix string issue for parquet reader and support
keep_index
forscatter_to_tables
- PR #4423 Tighten up Dask serialization checks
- PR #4537 Use
elif
in Dask deserialize check - PR #4682 Include frame lengths in Dask serialized header
- PR #4438 Fix repl-template error for replace_with_backrefs
- PR #4434 Fix join_strings logic with all-null strings and non-null narep
- PR #4465 Fix use_pandas_index having no effect in libcudf++ parquet reader
- PR #4464 Update Cmake to always link in libnvToolsExt
- PR #4467 Fix dropna issue for a DataFrame having np.nan
- PR #4480 Fix string_scalar.value to return an empty string_view for empty string-scalar
- PR #4474 Fix to not materialize RangeIndex in copy_categories
- PR #4496 Skip tests which require 2+ GPUs
- PR #4494 Update Java memory event handler for new RMM resource API
- PR #4505 Fix 0 length buffers during serialization
- PR #4482 Fix
.str.rsplit
,.str.split
,.str.find
,.str.rfind
,.str.index
,.str.rindex
and enable related tests - PR #4513 Backport scalar virtual destructor fix
- PR #4519 Remove
n
validation fornlargest
&nsmallest
and add negative support forn
- PR #4596 Fix
_popn
issue with performance - PR #4526 Fix index slicing issue for index incase of an empty dataframe
- PR #4538 Fix cudf::strings::slice_strings(step=-1) for empty strings
- PR #4557 Disable compile-errors on deprecation warnings, for JNI
- PR #4669 Fix
dask_cudf
categorical nonempty meta handling - PR #4576 Fix typo in
serialize.py
- PR #4571 Load JNI native dependencies for Scalar class
- PR #4598 Fix to handle
pd.DataFrame
inDataFrame.__init__
- PR #4594 Fix exec dangling pointer issue in legacy groupby
- PR #4591 Fix issue when reading consecutive rowgroups
- PR #4600 Fix missing include in benchmark_fixture.hpp
- PR #4588 Fix ordering issue in
MultiIndex
- PR #4632 Fix handling of empty inputs to concatenate
- PR #4630 Remove dangling reference to RMM exec policy in drop duplicates tests.
- PR #4625 Fix hash-based repartition bug in dask_cudf
- PR #4662 Fix to handle
keep_index
inpartition_by_hash
- PR #4683 Fix Slicing issue with categorical column in DataFrame
- PR #4676 Fix bug in
_shuffle_group
for repartition - PR #4681 Fix
test_repr
tests that were generating aRangeIndex
for column names - PR #4729 Fix
fsspec
versioning to prevent dask test failures - PR #4145 Support empty index case in DataFrame._from_table
- PR #4108 Fix dtype bugs in dask_cudf metadata (metadata_nonempty overhaul)
- PR #4138 Really fix strings concatenate logic with column offsets
- PR #4119 Fix binary ops slowdown using jitify -remove-unused-globals
- PR #3759 Updated 10 Minutes with clarification on how
dask_cudf
usescudf
API - PR #3224 Define and implement new join APIs.
- PR #3284 Add gpu-accelerated parquet writer
- PR #3254 Python redesign for libcudf++
- PR #3336 Add
from_dlpack
andto_dlpack
- PR #3555 Add column names support to libcudf++ io readers and writers
- PR #3527 Add string functionality for merge API
- PR #3610 Add memory_usage to DataFrame and Series APIs
- PR #3557 Add contiguous_split() function.
- PR #3619 Support CuPy 7
- PR #3604 Add nvtext ngrams-tokenize function
- PR #3403 Define and implement new stack + tile APIs
- PR #3627 Adding cudf::sort and cudf::sort_by_key
- PR #3597 Implement new sort based groupby
- PR #3776 Add column equivalence comparator (using epsilon for float equality)
- PR #3667 Define and implement round-robin partition API.
- PR #3690 Add bools_to_mask
- PR #3761 Introduce a Frame class and make Index, DataFrame and Series subclasses
- PR #3538 Define and implement left semi join and left anti join
- PR #3683 Added support for multiple delimiters in
nvtext.token_count()
- PR #3792 Adding is_nan and is_notnan
- PR #3594 Adding clamp support to libcudf++
- PR #3124 Add support for grand-children in cudf column classes
- PR #3292 Port NVStrings regex contains function
- PR #3409 Port NVStrings regex replace function
- PR #3417 Port NVStrings regex findall function
- PR #3351 Add warning when filepath resolves to multiple files in cudf readers
- PR #3370 Port NVStrings strip functions
- PR #3453 Port NVStrings IPv4 convert functions to cudf strings column
- PR #3441 Port NVStrings url encode/decode to cudf strings column
- PR #3364 Port NVStrings split functions
- PR #3463 Port NVStrings partition/rpartition to cudf strings column
- PR #3502 ORC reader: add option to read DECIMALs as INT64
- PR #3461 Add a new overload to allocate_like() that takes explicit type and size params.
- PR #3590 Specialize hash functions for floating point
- PR #3569 Use
np.asarray
inStringColumn.deserialize
- PR #3553 Support Python NoneType in numeric binops
- PR #3511 Support DataFrame / Series mixed arithmetic
- PR #3567 Include
strides
in__cuda_array_interface__
- PR #3608 Update OPS codeowner group name
- PR #3431 Port NVStrings translate to cudf strings column
- PR #3507 Define and implement new binary operation APIs
- PR #3620 Add stream parameter to unary ops detail API
- PR #3593 Adding begin/end for mutable_column_device_view
- PR #3587 Merge CHECK_STREAM & CUDA_CHECK_LAST to CHECK_CUDA
- PR #3733 Rework
hash_partition
API - PR #3655 Use move with make_pair to avoid copy construction
- PR #3402 Define and implement new quantiles APIs
- PR #3612 Add ability to customize the JIT kernel cache path
- PR #3647 Remove PatchedNumbaDeviceArray with CuPy 6.6.0
- PR #3641 Remove duplicate definitions of CUDA_DEVICE_CALLABLE
- PR #3640 Enable memory_usage in dask_cudf (also adds pd.Index from_pandas)
- PR #3654 Update Jitify submodule ref to include gcc-8 fix
- PR #3639 Define and implement
nans_to_nulls
- PR #3561 Rework contains implementation in search
- PR #3616 Add aggregation infrastructure for argmax/argmin.
- PR #3673 Parquet reader: improve rounding of timestamp conversion to seconds
- PR #3699 Stringify libcudacxx headers for binary op JIT
- PR #3697 Improve column insert performance for wide frames
- PR #3653 Make
gather_bitmask_kernel
more reusable. - PR #3710 Remove multiple CMake configuration steps from root build script
- PR #3657 Define and implement compiled binops for string column comparisons
- PR #3520 Change read_parquet defaults and add warnings
- PR #3780 Java APIs for selecting a GPU
- PR #3796 Improve on round-robin with the case when number partitions greater than number of rows.
- PR #3805 Avoid CuPy 7.1.0 for now
- PR #3758 detail::scatter variant with map iterator support
- PR #3882 Fail loudly when creating a StringColumn from nvstrings with > MAX_VAL(int32) bytes
- PR #3823 Add header file for detail search functions
- PR #2438 Build GBench Benchmarks in CI
- PR #3713 Adding aggregation support to rolling_window
- PR #3875 Add abstract sink for IO writers, used by ORC and Parquet writers for now
- PR #3916 Refactor gather bindings
- PR #3618 Update 10 minutes to cudf and cupy to hide warning that were being shown in the docs
- PR #3550 Update Java package to 0.12
- PR #3549 Fix index name issue with iloc with RangeIndex
- PR #3562 Fix 4GB limit for gzipped-compressed csv files
- PR #2981 enable build.sh to build all targets without installation
- PR #3563 Use
__cuda_array_interface__
for serialization - PR #3564 Fix cuda memory access error in gather_bitmask_kernel
- PR #3548 Replaced CUDA_RT_CALL with CUDA_TRY
- PR #3486 Pandas > 0.25 compatability
- PR #3622 Fix new warnings and errors when building with gcc-8
- PR #3588 Remove avro reader column order reversal
- PR #3629 Fix hash map test failure
- PR #3637 Fix sorted set_index operations in dask_cudf
- PR #3663 Fix libcudf++ ORC reader microseconds and milliseconds conversion
- PR #3668 Fixing CHECK_CUDA debug build issue
- PR #3684 Fix ends_with logic for matching string case
- PR #3691 Fix create_offsets to handle offset correctly
- PR #3687 Fixed bug while passing input GPU memory pointer in
nvtext.scatter_count()
- PR #3701 Fix hash_partition hashing all columns instead of columns_to_hash
- PR #3694 Allow for null columns parameter in
csv_writer
- PR #3706 Removed extra type-dispatcher call from merge
- PR #3704 Changed the default delimiter to
whitespace
for nvtext methods. - PR #3741 Construct DataFrame from dict-of-Series with alignment
- PR #3724 Update rmm version to match release
- PR #3743 Fix for
None
data in__array_interface__
- PR #3731 Fix performance of zero sized dataframe slice
- PR #3709 Fix inner_join incorrect result issue
- PR #3734 Update numba to 0.46 in conda files
- PR #3738 Update libxx cython types.hpp path
- PR #3672 Fix to_host issue with column_view having offset
- PR #3730 CSV reader: Set invalid float values to NaN/null
- PR #3670 Floor when casting between timestamps of different precisions
- PR #3728 Fix apply_boolean_mask issue with non-null string column
- PR #3769 Don't look for a
name
attribute in column - PR #3783 Bind cuDF operators to Dask Dataframe
- PR #3775 Fix segfault when reading compressed CSV files larger than 4GB
- PR #3799 Align indices of Series inputs when adding as columns to DataFrame
- PR #3803 Keep name when unpickling Index objects
- PR #3804 Fix cuda crash in AVRO reader
- PR #3766 Remove references to cudf::type_id::CATEGORY from IO code
- PR #3817 Don't always deepcopy an index
- PR #3821 Fix OOB read in gpuinflate prefetcher
- PR #3829 Parquet writer: fix empty dataframe causing cuda launch errors
- PR #3835 Fix memory leak in Cython when dealing with nulls in string columns
- PR #3866 Remove unnecessary if check in NVStrings.create_offsets
- PR #3858 Fixes the broken debug build after #3728
- PR #3850 Fix merge typecast scope issue and resulting memory leak
- PR #3855 Fix MultiColumn recreation with reset_index
- PR #3869 Fixed size calculation in NVStrings::byte_count()
- PR #3868 Fix apply_grouped moving average example
- PR #3900 Properly link
NVStrings
andNVCategory
into tests - PR #3868 Fix apply_grouped moving average example
- PR #3871 Fix
split_out
error - PR #3886 Fix string column materialization from column view
- PR #3893 Parquet reader: fix segfault reading empty parquet file
- PR #3931 Dask-cudf groupby
.agg
multicolumn handling fix - PR #4017 Fix memory leaks in
GDF_STRING
cython handling andnans_to_nulls
cython
- PR #2905 Added
Series.median()
and null support forSeries.quantile()
- PR #2930 JSON Reader: Support ARROW_RANDOM_FILE input
- PR #2956 Add
cudf::stack
andcudf::tile
- PR #2980 Added nvtext is_vowel/is_consonant functions
- PR #2987 Add
inplace
arg toDataFrame.reset_index
andSeries
- PR #3011 Added libcudf++ transition guide
- PR #3129 Add strings column factory from
std::vector
s - PR #3054 Add parquet reader support for decimal data types
- PR #3022 adds DataFrame.astype for cuDF dataframes
- PR #2962 Add isnull(), notnull() and related functions
- PR #3025 Move search files to legacy
- PR #3068 Add
scalar
class - PR #3094 Adding
any
andall
support from libcudf - PR #3130 Define and implement new
column_wrapper
- PR #3143 Define and implement new copying APIs
slice
andsplit
- PR #3161 Move merge files to legacy
- PR #3079 Added support to write ORC files given a local path
- PR #3192 Add dtype param to cast
DataFrame
on init - PR #3213 Port cuIO to libcudf++
- PR #3222 Add nvtext character tokenizer
- PR #3223 Java expose underlying buffers
- PR #3300 Add
DataFrame.insert
- PR #3263 Define and implement new
valid_if
- PR #3278 Add
to_host
utility to copycolumn_view
to host - PR #3087 Add new cudf::experimental bool8 wrapper
- PR #3219 Construct column from column_view
- PR #3250 Define and implement new merge APIs
- PR #3144 Define and implement new hashing APIs
hash
andhash_partition
- PR #3229 Define and implement new search APIs
- PR #3308 java add API for memory usage callbacks
- PR #2691 Row-wise reduction and scan operations via CuPy
- PR #3291 Add normalize_nans_and_zeros
- PR #3187 Define and implement new replace APIs
- PR #3356 Add vertical concatenation for table/columns
- PR #3344 java split API
- PR #2791 Add
groupby.std()
- PR #3368 Enable dropna argument in dask_cudf groupby
- PR #3298 add null replacement iterator for column_device_view
- PR #3297 Define and implement new groupby API.
- PR #3396 Update device_atomics with new bool8 and timestamp specializations
- PR #3411 Java host memory management API
- PR #3393 Implement df.cov and enable covariance/correlation in dask_cudf
- PR #3401 Add dask_cudf ORC writer (to_orc)
- PR #3331 Add copy_if_else
- PR #3427 Define and Implement new multi-search API
- PR #3442 Add Bool-index + Multi column + DataFrame support for set-item
- PR #3172 Define and implement new fill/repeat/copy_range APIs
- PR #3490 Add pair iterators for columns
- PR #3497 Add DataFrame.drop(..., inplace=False) argument
- PR #3469 Add string functionality for replace API
- PR #3273 Define and implement new reduction APIs
- PR #2904 Move gpu decompressors to cudf::io namespace
- PR #2977 Moved old C++ test utilities to legacy directory.
- PR #2965 Fix slow orc reader perf with large uncompressed blocks
- PR #2995 Move JIT type utilities to legacy directory
- PR #2927 Add
Table
andTableView
extension classes that wrap legacy cudf::table - PR #3005 Renames
cudf::exp
namespace tocudf::experimental
- PR #3008 Make safe versions of
is_null
andis_valid
incolumn_device_view
- PR #3026 Move fill and repeat files to legacy
- PR #3027 Move copying.hpp and related source to legacy folder
- PR #3014 Snappy decompression optimizations
- PR #3032 Use
asarray
to coerce indices to a NumPy array - PR #2996 IO Readers: Replace
cuio::device_buffer
withrmm::device_buffer
- PR #3051 Specialized hash function for strings column
- PR #3065 Select and Concat for cudf::experimental::table
- PR #3080 Move
valid_if.cuh
tolegacy/
- PR #3052 Moved replace.hpp functionality to legacy
- PR #3091 Move join files to legacy
- PR #3092 Implicitly init RMM if Java allocates before init
- PR #3029 Update gdf_ numeric types with stdint and move to cudf namespace
- PR #3052 Moved replace.hpp functionality to legacy
- PR #2955 Add cmake option to only build for present GPU architecture
- PR #3070 Move functions.h and related source to legacy
- PR #2951 Allow set_index to handle a list of column names
- PR #3093 Move groupby files to legacy
- PR #2988 Removing GIS functionality (now part of cuSpatial library)
- PR #3067 Java method to return size of device memory buffer
- PR #3083 Improved some binary operation tests to include null testing.
- PR #3084 Update to arrow-cpp and pyarrow 0.15.0
- PR #3071 Move cuIO to legacy
- PR #3126 Round 2 of snappy decompression optimizations
- PR #3046 Define and implement new copying APIs
empty_like
andallocate_like
- PR #3128 Support MultiIndex in DataFrame.join
- PR #2971 Added initial gather and scatter methods for strings_column_view
- PR #3133 Port NVStrings to cudf column: count_characters and count_bytes
- PR #2991 Added strings column functions concatenate and join_strings
- PR #3028 Define and implement new
gather
APIs. - PR #3135 Add nvtx utilities to cudf::nvtx namespace
- PR #3021 Java host side concat of serialized buffers
- PR #3138 Move unary files to legacy
- PR #3170 Port NVStrings substring functions to cudf strings column
- PR #3159 Port NVStrings is-chars-types function to cudf strings column
- PR #3154 Make
table_view_base.column()
const and addmutable_table_view.column()
- PR #3175 Set cmake cuda version variables
- PR #3171 Move deprecated error macros to legacy
- PR #3191 Port NVStrings integer convert ops to cudf column
- PR #3189 Port NVStrings find ops to cudf column
- PR #3352 Port NVStrings convert float functions to cudf strings column
- PR #3193 Add cuPy as a formal dependency
- PR #3195 Support for zero columned
table_view
- PR #3165 Java device memory size for string category
- PR #3205 Move transform files to legacy
- PR #3202 Rename and move error.hpp to public headers
- PR #2878 Use upstream merge code in dask_cudf
- PR #3217 Port NVStrings upper and lower case conversion functions
- PR #3350 Port NVStrings booleans convert functions
- PR #3231 Add
column::release()
to give up ownership of contents. - PR #3157 Use enum class rather than enum for mask_allocation_policy
- PR #3232 Port NVStrings datetime conversion to cudf strings column
- PR #3136 Define and implement new transpose API
- PR #3237 Define and implement new transform APIs
- PR #3245 Move binaryop files to legacy
- PR #3241 Move stream_compaction files to legacy
- PR #3166 Move reductions to legacy
- PR #3261 Small cleanup: remove
== true
- PR #3271 Update rmm API based on
rmm.reinitialize(...)
change - PR #3266 Remove optional checks for CuPy
- PR #3268 Adding null ordering per column feature when sorting
- PR #3239 Adding floating point specialization to comparators for NaNs
- PR #3270 Move predicates files to legacy
- PR #3281 Add to_host specialization for strings in column test utilities
- PR #3282 Add
num_bitmask_words
- PR #3252 Add new factory methods to include passing an existing null mask
- PR #3288 Make
bit.cuh
utilities usable from host code. - PR #3287 Move rolling windows files to legacy
- PR #3182 Define and implement new unary APIs
is_null
andis_not_null
- PR #3314 Drop
cython
from run requirements - PR #3301 Add tests for empty column wrapper.
- PR #3294 Update to arrow-cpp and pyarrow 0.15.1
- PR #3310 Add
row_hasher
andelement_hasher
utilities - PR #3272 Support non-default streams when creating/destroying hash maps
- PR #3286 Clean up the starter code on README
- PR #3332 Port NVStrings replace to cudf strings column
- PR #3354 Define and implement new
scatter
APIs - PR #3322 Port NVStrings pad operations to cudf strings column
- PR #3345 Add cache member for number of characters in string_view class
- PR #3299 Define and implement new
is_sorted
APIs - PR #3328 Partition by stripes in dask_cudf ORC reader
- PR #3243 Use upstream join code in dask_cudf
- PR #3371 Add
select
method totable_view
- PR #3309 Add java and JNI bindings for search bounds
- PR #3305 Define and implement new rolling window APIs
- PR #3380 Concatenate columns of strings
- PR #3382 Add fill function for strings column
- PR #3391 Move device_atomics_tests.cu files to legacy
- PR #3303 Define and implement new stream compaction APIs
copy_if
,drop_nulls
,apply_boolean_mask
,drop_duplicate
andunique_count
. - PR #3387 Strings column gather function
- PR #3440 Strings column scatter function
- PR #3389 Move quantiles.hpp + group_quantiles.hpp files to legacy
- PR #3397 Port unary cast to libcudf++
- PR #3398 Move reshape.hpp files to legacy
- PR #3395 Port NVStrings regex extract to cudf strings column
- PR #3423 Port NVStrings htoi to cudf strings column
- PR #3425 Strings column copy_if_else implementation
- PR #3422 Move utilities to legacy
- PR #3201 Define and implement new datetime_ops APIs
- PR #3421 Port NVStrings find_multiple to cudf strings column
- PR #3448 Port scatter_to_tables to libcudf++
- PR #3458 Update strings sections in the transition guide
- PR #3462 Add
make_empty_column
and updateempty_like
. - PR #3465 Port
aggregation
traits and utilities. - PR #3214 Define and implement new unary operations APIs
- PR #3475 Add
bitmask_to_host
column utility - PR #3487 Add is_boolean trait and random timestamp generator for testing
- PR #3492 Small cleanup (remove std::abs) and comment
- PR #3407 Allow multiple row-groups per task in dask_cudf read_parquet
- PR #3512 Remove unused CUDA conda labels
- PR #3500 cudf::fill()/cudf::repeat() support for strings columns.
- PR #3438 Update scalar and scalar_device_view to better support strings
- PR #3414 Add copy_range function for strings column
- PR #3685 Add string support to contiguous_split.
- PR #3471 Add scalar/column, column/scalar and scalar/scalar overloads to copy_if_else.
- PR #3451 Add support for implicit typecasting of join columns
- PR #2895 Fixed dask_cudf group_split behavior to handle upstream rearrange_by_divisions
- PR #3048 Support for zero columned tables
- PR #3030 Fix snappy decoding regression in PR #3014
- PR #3041 Fixed exp to experimental namespace name change issue
- PR #3056 Add additional cmake hint for finding local build of RMM files
- PR #3060 Move copying.hpp includes to legacy
- PR #3139 Fixed java RMM auto initalization
- PR #3141 Java fix for relocated IO headers
- PR #3149 Rename column_wrapper.cuh to column_wrapper.hpp
- PR #3168 Fix mutable_column_device_view head const_cast
- PR #3199 Update JNI includes for legacy moves
- PR #3204 ORC writer: Fix ByteRLE encoding of NULLs
- PR #2994 Fix split_out-support but with hash_object_dispatch
- PR #3212 Fix string to date casting when format is not specified
- PR #3218 Fixes
row_lexicographic_comparator
issue with handling two tables - PR #3228 Default initialize RMM when Java native dependencies are loaded
- PR #3012 replacing instances of
to_gpu_array
withmem
- PR #3236 Fix Numba 0.46+/CuPy 6.3 interface compatibility
- PR #3276 Update JNI includes for legacy moves
- PR #3256 Fix orc writer crash with multiple string columns
- PR #3211 Fix breaking change caused by rapidsai/rmm#167
- PR #3265 Fix dangling pointer in
is_sorted
- PR #3267 ORC writer: fix incorrect ByteRLE encoding of long literal runs
- PR #3277 Fix invalid reference to deleted temporary in
is_sorted
. - PR #3274 ORC writer: fix integer RLEv2 mode2 unsigned base value encoding
- PR #3279 Fix shutdown hang issues with pinned memory pool init executor
- PR #3280 Invalid children check in mutable_column_device_view
- PR #3289 fix java memory usage API for empty columns
- PR #3293 Fix loading of csv files zipped on MacOS (disabled zip min version check)
- PR #3295 Fix storing storing invalid RMM exec policies.
- PR #3307 Add pd.RangeIndex to from_pandas to fix dask_cudf meta_nonempty bug
- PR #3313 Fix public headers including non-public headers
- PR #3318 Revert arrow to 0.15.0 temporarily to unblock downstream projects CI
- PR #3317 Fix index-argument bug in dask_cudf parquet reader
- PR #3323 Fix
insert
non-assert test case - PR #3341 Fix
Series
constructor converting NoneType to "None" - PR #3326 Fix and test for detail::gather map iterator type inference
- PR #3334 Remove zero-size exception check from make_strings_column factories
- PR #3333 Fix compilation issues with
constexpr
functions not marked__device__
- PR #3340 Make all benchmarks use cudf base fixture to initialize RMM pool
- PR #3337 Fix Java to pad validity buffers to 64-byte boundary
- PR #3362 Fix
find_and_replace
upcasting series for python scalars and lists - PR #3357 Disabling
column_view
iterators for non fixed-width types - PR #3383 Fix : properly compute null counts for rolling_window.
- PR #3386 Removing external includes from
column_view.hpp
- PR #3369 Add write_partition to dask_cudf to fix to_parquet bug
- PR #3388 Support getitem with bools when DataFrame has a MultiIndex
- PR #3408 Fix String and Column (De-)Serialization
- PR #3372 Fix dask-distributed scatter_by_map bug
- PR #3419 Fix a bug in parse_into_parts (incomplete input causing walking past the end of string).
- PR #3413 Fix dask_cudf read_csv file-list bug
- PR #3416 Fix memory leak in ColumnVector when pulling strings off the GPU
- PR #3424 Fix benchmark build by adding libcudacxx to benchmark's CMakeLists.txt
- PR #3435 Fix diff and shift for empty series
- PR #3439 Fix index-name bug in StringColumn concat
- PR #3445 Fix ORC Writer default stripe size
- PR #3459 Fix printing of invalid entries
- PR #3466 Fix gather null mask allocation for invalid index
- PR #3468 Fix memory leak issue in
drop_duplicates
- PR #3474 Fix small doc error in capitalize Docs
- PR #3491 Fix more doc errors in NVStrings
- PR #3478 Fix as_index deep copy via Index.rename inplace arg
- PR #3476 Fix ORC reader timezone conversion
- PR #3188 Repr slices up large DataFrames
- PR #3519 Fix strings column concatenate handling zero-sized columns
- PR #3530 Fix copy_if_else test case fail issue
- PR #3523 Fix lgenfe issue with debug build
- PR #3532 Fix potential use-after-free in cudf parquet reader
- PR #3540 Fix unary_op null_mask bug and add missing test cases
- PR #3559 Use HighLevelGraph api in DataFrame constructor (Fix upstream compatibility)
- PR #3572 Fix CI Issue with hypothesis tests that are flaky
- PR #2423 Added
groupby.quantile()
- PR #2522 Add Java bindings for NVStrings backed upper and lower case mutators
- PR #2605 Added Sort based groupby in libcudf
- PR #2607 Add Java bindings for parsing JSON
- PR #2629 Add dropna= parameter to groupby
- PR #2585 ORC & Parquet Readers: Remove millisecond timestamp restriction
- PR #2507 Add GPU-accelerated ORC Writer
- PR #2559 Add Series.tolist()
- PR #2653 Add Java bindings for rolling window operations
- PR #2480 Merge
custreamz
codebase intocudf
repo - PR #2674 Add contains for Index/Series/Column
- PR #2635 Add support to read from remote and cloud sources like s3, gcs, hdfs
- PR #2722 Add Java bindings for NVTX ranges
- PR #2702 Add make_bool to dataset generation functions
- PR #2394 Move
rapidsai/custrings
intocudf
- PR #2734 Final sync of custrings source into cudf
- PR #2724 Add libcudf support for contains
- PR #2777 Add python bindings for porter stemmer measure functionality
- PR #2781 Add issorted to is_monotonic
- PR #2685 Add cudf::scatter_to_tables and cython binding
- PR #2743 Add Java bindings for NVStrings timestamp2long as part of String ColumnVector casting
- PR #2785 Add nvstrings Python docs
- PR #2786 Add benchmarks option to root build.sh
- PR #2802 Add
cudf::repeat()
andcudf.Series.repeat()
- PR #2773 Add Fisher's unbiased kurtosis and skew for Series/DataFrame
- PR #2748 Parquet Reader: Add option to specify loading of PANDAS index
- PR #2807 Add scatter_by_map to DataFrame python API
- PR #2836 Add nvstrings.code_points method
- PR #2844 Add Series/DataFrame notnull
- PR #2858 Add GTest type list utilities
- PR #2870 Add support for grouping by Series of arbitrary length
- PR #2719 Series covariance and Pearson correlation
- PR #2207 Beginning of libcudf overhaul: introduce new column and table types
- PR #2869 Add
cudf.CategoricalDtype
- PR #2838 CSV Reader: Support ARROW_RANDOM_FILE input
- PR #2655 CuPy-based Series and Dataframe .values property
- PR #2803 Added
edit_distance_matrix()
function to calculate pairwise edit distance for each string on a given nvstrings object. - PR #2811 Start of cudf strings column work based on 2207
- PR #2872 Add Java pinned memory pool allocator
- PR #2969 Add findAndReplaceAll to ColumnVector
- PR #2814 Add Datetimeindex.weekday
- PR #2999 Add timestamp conversion support for string categories
- PR #2918 Add cudf::column timestamp wrapper types
- PR #2578 Update legacy_groupby to use libcudf group_by_without_aggregation
- PR #2581 Removed
managed
allocator from hash map classes. - PR #2571 Remove unnecessary managed memory from gdf_column_concat
- PR #2648 Cython/Python reorg
- PR #2588 Update Series.append documentation
- PR #2632 Replace dask-cudf set_index code with upstream
- PR #2682 Add cudf.set_allocator() function for easier allocator init
- PR #2642 Improve null printing and testing
- PR #2747 Add missing Cython headers / cudftestutil lib to conda package for cuspatial build
- PR #2706 Compute CSV format in device code to speedup performance
- PR #2673 Add support for np.longlong type
- PR #2703 move dask serialization dispatch into cudf
- PR #2728 Add YYMMDD to version tag for nightly conda packages
- PR #2729 Handle file-handle input in to_csv
- PR #2741 CSV Reader: Move kernel functions into its own file
- PR #2766 Improve nvstrings python cmake flexibility
- PR #2756 Add out_time_unit option to csv reader, support timestamp resolutions
- PR #2771 Stopgap alias for to_gpu_matrix()
- PR #2783 Support mapping input columns to function arguments in apply kernels
- PR #2645 libcudf unique_count for Series.nunique
- PR #2817 Dask-cudf:
read_parquet
support for remote filesystems - PR #2823 improve java data movement debugging
- PR #2806 CSV Reader: Clean-up row offset operations
- PR #2640 Add dask wait/persist example to 10 minute guide
- PR #2828 Optimizations of kernel launch configuration for
DataFrame.apply_rows
andDataFrame.apply_chunks
- PR #2831 Add
column
argument toDataFrame.drop
- PR #2775 Various optimizations to improve getitem and setitem performance
- PR #2810 cudf::allocate_like can optionally always allocate a mask.
- PR #2833 Parquet reader: align page data allocation sizes to 4-bytes to satisfy cuda-memcheck
- PR #2832 Using the new Python bindings for UCX
- PR #2856 Update group_split_cudf to use scatter_by_map
- PR #2890 Optionally keep serialized table data on the host.
- PR #2778 Doc: Updated and fixed some docstrings that were formatted incorrectly.
- PR #2830 Use YYMMDD tag in custreamz nightly build
- PR #2875 Java: Remove synchronized from register methods in MemoryCleaner
- PR #2887 Minor snappy decompression optimization
- PR #2899 Use new RMM API based on Cython
- PR #2788 Guide to Python UDFs
- PR #2919 Change java API to use operators in groupby namespace
- PR #2909 CSV Reader: Avoid row offsets host vector default init
- PR #2834 DataFrame supports setting columns via attribute syntax
df.x = col
- PR #3147 DataFrame can be initialized from rows via list of tuples
- PR #3539 Restrict CuPy to 6
- PR #2584 ORC Reader: fix parsing of
DECIMAL
index positions - PR #2619 Fix groupby serialization/deserialization
- PR #2614 Update Java version to match
- PR #2601 Fixes nlargest(1) issue in Series and Dataframe
- PR #2610 Fix a bug in index serialization (properly pass DeviceNDArray)
- PR #2621 Fixes the floordiv issue of not promoting float type when rhs is 0
- PR #2611 Types Test: fix static casting from negative int to string
- PR #2618 IO Readers: Fix datasource memory map failure for multiple reads
- PR #2628 groupby_without_aggregation non-nullable input table produces non-nullable output
- PR #2615 fix string category partitioning in java API
- PR #2641 fix string category and timeunit concat in the java API
- PR #2649 Fix groupby issue resulting from column_empty bug
- PR #2658 Fix astype() for null categorical columns
- PR #2660 fix column string category and timeunit concat in the java API
- PR #2664 ORC reader: fix
skip_rows
larger than first stripe - PR #2654 Allow Java gdfOrderBy to work with string categories
- PR #2669 AVRO reader: fix non-deterministic output
- PR #2668 Update Java bindings to specify timestamp units for ORC and Parquet readers
- PR #2679 AVRO reader: fix cuda errors when decoding compressed streams
- PR #2692 Add concatenation for data-frame with different headers (empty and non-empty)
- PR #2651 Remove nvidia driver installation from ci/cpu/build.sh
- PR #2697 Ensure csv reader sets datetime column time units
- PR #2698 Return RangeIndex from contiguous slice of RangeIndex
- PR #2672 Fix null and integer handling in round
- PR #2704 Parquet Reader: Fix crash when loading string column with nulls
- PR #2725 Fix Jitify issue with running on Turing using CUDA version < 10
- PR #2731 Fix building of benchmarks
- PR #2738 Fix java to find new NVStrings locations
- PR #2736 Pin Jitify branch to v0.10 version
- PR #2742 IO Readers: Fix possible silent failures when creating
NvStrings
instance - PR #2753 Fix java quantile API calls
- PR #2762 Fix validity processing for time in java
- PR #2796 Fix handling string slicing and other nvstrings delegated methods with dask
- PR #2769 Fix link to API docs in README.md
- PR #2772 Handle multiindex pandas Series #2772
- PR #2749 Fix apply_rows/apply_chunks pessimistic null mask to use in_cols null masks only
- PR #2752 CSV Reader: Fix exception when there's no rows to process
- PR #2716 Added Exception for
StringMethods
in string methods - PR #2787 Fix Broadcasting
None
tocudf-series
- PR #2794 Fix async race in NVCategory::get_value and get_value_bounds
- PR #2795 Fix java build/cast error
- PR #2496 Fix improper merge of two dataframes when names differ
- PR #2824 Fix issue with incorrect result when Numeric Series replace is called several times
- PR #2751 Replace value with null
- PR #2765 Fix Java inequality comparisons for string category
- PR #2818 Fix java join API to use new C++ join API
- PR #2841 Fix nvstrings.slice and slice_from for range (0,0)
- PR #2837 Fix join benchmark
- PR #2809 Add hash_df and group_split dispatch functions for dask
- PR #2843 Parquet reader: fix skip_rows when not aligned with page or row_group boundaries
- PR #2851 Deleted existing dask-cudf/record.txt
- PR #2854 Fix column creation from ephemeral objects exposing cuda_array_interface
- PR #2860 Fix boolean indexing when the result is a single row
- PR #2859 Fix tail method issue for string columns
- PR #2852 Fixed
cumsum()
andcumprod()
on boolean series. - PR #2865 DaskIO: Fix
read_csv
andread_orc
when input is list of files - PR #2750 Fixed casting values to cudf::bool8 so non-zero values always cast to true
- PR #2873 Fixed dask_cudf read_partition bug by generating ParquetDatasetPiece
- PR #2850 Fixes dask_cudf.read_parquet on partitioned datasets
- PR #2896 Properly handle
axis
string keywords inconcat
- PR #2926 Update rounding algorithm to avoid using fmod
- PR #2968 Fix Java dependency loading when using NVTX
- PR #2963 Fix ORC writer uncompressed block indexing
- PR #2928 CSV Reader: Fix using
byte_range
for large datasets - PR #2983 Fix sm_70+ race condition in gpu_unsnap
- PR #2964 ORC Writer: Segfault when writing mixed numeric and string columns
- PR #3007 Java: Remove unit test that frees RMM invalid pointer
- PR #3009 Fix orc reader RLEv2 patch position regression from PR #2507
- PR #3002 Fix CUDA invalid configuration errors reported after loading an ORC file without data
- PR #3035 Update update-version.sh for new docs locations
- PR #3038 Fix uninitialized stream parameter in device_table deleter
- PR #3064 Fixes groupby performance issue
- PR #3061 Add rmmInitialize to nvstrings gtests
- PR #3058 Fix UDF doc markdown formatting
- PR #3059 Add nvstrings python build instructions to contributing.md
- PR #1993 Add CUDA-accelerated series aggregations: mean, var, std
- PR #2111 IO Readers: Support memory buffer, file-like object, and URL inputs
- PR #2012 Add
reindex()
to DataFrame and Series - PR #2097 Add GPU-accelerated AVRO reader
- PR #2098 Support binary ops on DFs and Series with mismatched indices
- PR #2160 Merge
dask-cudf
codebase intocudf
repo - PR #2149 CSV Reader: Add
hex
dtype for explicit hexadecimal parsing - PR #2156 Add
upper_bound()
andlower_bound()
for libcudf tables andsearchsorted()
for cuDF Series - PR #2158 CSV Reader: Support single, non-list/dict argument for
dtype
- PR #2177 CSV Reader: Add
parse_dates
parameter for explicit date inference - PR #1744 cudf::apply_boolean_mask and cudf::drop_nulls support for cudf::table inputs (multi-column)
- PR #2196 Add
DataFrame.dropna()
- PR #2197 CSV Writer: add
chunksize
parameter forto_csv
- PR #2215
type_dispatcher
benchmark - PR #2179 Add Java quantiles
- PR #2157 Add array_function to DataFrame and Series
- PR #2212 Java support for ORC reader
- PR #2224 Add DataFrame isna, isnull, notna functions
- PR #2236 Add Series.drop_duplicates
- PR #2105 Add hash-based join benchmark
- PR #2316 Add unique, nunique, and value_counts for datetime columns
- PR #2337 Add Java support for slicing a ColumnVector
- PR #2049 Add cudf::merge (sorted merge)
- PR #2368 Full cudf+dask Parquet Support
- PR #2380 New cudf::is_sorted checks whether cudf::table is sorted
- PR #2356 Java column vector standard deviation support
- PR #2221 MultiIndex full indexing - Support iloc and wildcards for loc
- PR #2429 Java support for getting length of strings in a ColumnVector
- PR #2415 Add
value_counts
for series of any type - PR #2446 Add array_function for index
- PR #2437 ORC reader: Add 'use_np_dtypes' option
- PR #2382 Add CategoricalAccessor add, remove, rename, and ordering methods
- PR #2464 Native implement
__cuda_array_interface__
for Series/Index/Column objects - PR #2425 Rolling window now accepts array-based user-defined functions
- PR #2442 Add setitem
- PR #2449 Java support for getting byte count of strings in a ColumnVector
- PR #2492 Add groupby.size() method
- PR #2358 Add cudf::nans_to_nulls: convert floating point column into bitmask
- PR #2489 Add drop argument to set_index
- PR #2491 Add Java bindings for ORC reader 'use_np_dtypes' option
- PR #2213 Support s/ms/us/ns DatetimeColumn time unit resolutions
- PR #2536 Add _constructor properties to Series and DataFrame
- PR #2103 Move old
column
andbitmask
files intolegacy/
directory - PR #2109 added name to Python column classes
- PR #1947 Cleanup serialization code
- PR #2125 More aggregate in java API
- PR #2127 Add in java Scalar tests
- PR #2088 Refactor of Python groupby code
- PR #2130 Java serialization and deserialization of tables.
- PR #2131 Chunk rows logic added to csv_writer
- PR #2129 Add functions in the Java API to support nullable column filtering
- PR #2165 made changes to get_dummies api for it to be available in MethodCache
- PR #2171 Add CodeCov integration, fix doc version, make --skip-tests work when invoking with source
- PR #2184 handle remote orc files for dask-cudf
- PR #2186 Add
getitem
andgetattr
style access to Rolling objects - PR #2168 Use cudf.Column for CategoricalColumn's categories instead of a tuple
- PR #2193 DOC: cudf::type_dispatcher documentation for specializing dispatched functors
- PR #2199 Better java support for appending strings
- PR #2176 Added column dtype support for datetime, int8, int16 to csv_writer
- PR #2209 Matching
get_dummies
&select_dtypes
behavior to pandas - PR #2217 Updated Java bindings to use the new groupby API
- PR #2214 DOC: Update doc instructions to build/install
cudf
anddask-cudf
- PR #2220 Update Java bindings for reduction rename
- PR #2232 Move CodeCov upload from build script to Jenkins
- PR #2225 refactor to use libcudf for gathering columns in dataframes
- PR #2293 Improve join performance (faster compute_join_output_size)
- PR #2300 Create separate dask codeowners for dask-cudf codebase
- PR #2304 gdf_group_by_without_aggregations returns gdf_column
- PR #2309 Java readers: remove redundant copy of result pointers
- PR #2307 Add
black
andisort
to style checker script - PR #2345 Restore removal of old groupby implementation
- PR #2342 Improve
astype()
to operate all ways - PR #2329 using libcudf cudf::copy for column deep copy
- PR #2344 DOC: docs on code formatting for contributors
- PR #2376 Add inoperative axis= and win_type= arguments to Rolling()
- PR #2378 remove dask for (de-)serialization of cudf objects
- PR #2353 Bump Arrow and Dask versions
- PR #2377 Replace
standard_python_slice
with justslice.indices()
- PR #2373 cudf.DataFrame enchancements & Series.values support
- PR #2392 Remove dlpack submodule; make cuDF's Cython API externally accessible
- PR #2430 Updated Java bindings to use the new unary API
- PR #2406 Moved all existing
table
related files to alegacy/
directory - PR #2350 Performance related changes to get_dummies
- PR #2420 Remove
cudautils.astype
and replace withtypecast.apply_cast
- PR #2456 Small improvement to typecast utility
- PR #2458 Fix handling of thirdparty packages in
isort
config - PR #2459 IO Readers: Consolidate all readers to use
datasource
class - PR #2475 Exposed type_dispatcher.hpp, nvcategory_util.hpp and wrapper_types.hpp in the include folder
- PR #2484 Enabled building libcudf as a static library
- PR #2453 Streamline CUDA_REL environment variable
- PR #2483 Bundle Boost filesystem dependency in the Java jar
- PR #2486 Java API hash functions
- PR #2481 Adds the ignore_null_keys option to the java api
- PR #2490 Java api: support multiple aggregates for the same column
- PR #2510 Java api: uses table based apply_boolean_mask
- PR #2432 Use pandas formatting for console, html, and latex output
- PR #2573 Bump numba version to 0.45.1
- PR #2606 Fix references to notebooks-contrib
- PR #2086 Fixed quantile api behavior mismatch in series & dataframe
- PR #2128 Add offset param to host buffer readers in java API.
- PR #2145 Work around binops validity checks for java
- PR #2146 Work around unary_math validity checks for java
- PR #2151 Fixes bug in cudf::copy_range where null_count was invalid
- PR #2139 matching to pandas describe behavior & fixing nan values issue
- PR #2161 Implicitly convert unsigned to signed integer types in binops
- PR #2154 CSV Reader: Fix bools misdetected as strings dtype
- PR #2178 Fix bug in rolling bindings where a view of an ephemeral column was being taken
- PR #2180 Fix issue with isort reordering
importorskip
below imports depending on them - PR #2187 fix to honor dtype when numpy arrays are passed to columnops.as_column
- PR #2190 Fix issue in astype conversion of string column to 'str'
- PR #2208 Fix issue with calling
head()
on one row dataframe - PR #2229 Propagate exceptions from Cython cdef functions
- PR #2234 Fix issue with local build script not properly building
- PR #2223 Fix CUDA invalid configuration errors reported after loading small compressed ORC files
- PR #2162 Setting is_unique and is_monotonic-related attributes
- PR #2244 Fix ORC RLEv2 delta mode decoding with nonzero residual delta width
- PR #2297 Work around
var/std
unsupported only at debug build - PR #2302 Fixed java serialization corner case
- PR #2355 Handle float16 in binary operations
- PR #2311 Fix copy behaviour for GenericIndex
- PR #2349 Fix issues with String filter in java API
- PR #2323 Fix groupby on categoricals
- PR #2328 Ensure order is preserved in CategoricalAccessor._set_categories
- PR #2202 Fix issue with unary ops mishandling empty input
- PR #2326 Fix for bug in DLPack when reading multiple columns
- PR #2324 Fix cudf Docker build
- PR #2325 Fix ORC RLEv2 patched base mode decoding with nonzero patch width
- PR #2235 Fix get_dummies to be compatible with dask
- PR #2332 Zero initialize gdf_dtype_extra_info
- PR #2355 Handle float16 in binary operations
- PR #2360 Fix missing dtype handling in cudf.Series & columnops.as_column
- PR #2364 Fix quantile api and other trivial issues around it
- PR #2361 Fixed issue with
codes
of CategoricalIndex - PR #2357 Fixed inconsistent type of index created with from_pandas vs direct construction
- PR #2389 Fixed Rolling getattr and getitem for offset based windows
- PR #2402 Fixed bug in valid mask computation in cudf::copy_if (apply_boolean_mask)
- PR #2401 Fix to a scalar datetime(of type Days) issue
- PR #2386 Correctly allocate output valids in groupby
- PR #2411 Fixed failures on binary op on single element string column
- PR #2422 Fix Pandas logical binary operation incompatibilites
- PR #2447 Fix CodeCov posting build statuses temporarily
- PR #2450 Fix erroneous null handling in
cudf.DataFrame
'sapply_rows
- PR #2470 Fix issues with empty strings and string categories (Java)
- PR #2471 Fix String Column Validity.
- PR #2481 Fix java validity buffer serialization
- PR #2485 Updated bytes calculation to use size_t to avoid overflow in column concat
- PR #2461 Fix groupby multiple aggregations same column
- PR #2514 Fix cudf::drop_nulls threshold handling in Cython
- PR #2516 Fix utilities include paths and meta.yaml header paths
- PR #2517 Fix device memory leak in to_dlpack tensor deleter
- PR #2431 Fix local build generated file ownerships
- PR #2511 Added import of orc, refactored exception handlers to not squash fatal exceptions
- PR #2527 Fix index and column input handling in dask_cudf read_parquet
- PR #2466 Fix
dataframe.query
returning null rows erroneously - PR #2548 Orc reader: fix non-deterministic data decoding at chunk boundaries
- PR #2557 fix cudautils import in string.py
- PR #2521 Fix casting datetimes from/to the same resolution
- PR #2545 Fix MultiIndexes with datetime levels
- PR #2560 Remove duplicate
dlpack
definition in conda recipe - PR #2567 Fix ColumnVector.fromScalar issues while dealing with null scalars
- PR #2565 Orc reader: fix incorrect data decoding of int64 data types
- PR #2577 Fix search benchmark compilation error by adding necessary header
- PR #2604 Fix a bug in copying.pyx:_normalize_types that upcasted int32 to int64
- PR #1524 Add GPU-accelerated JSON Lines parser with limited feature set
- PR #1569 Add support for Json objects to the JSON Lines reader
- PR #1622 Add Series.loc
- PR #1654 Add cudf::apply_boolean_mask: faster replacement for gdf_apply_stencil
- PR #1487 cython gather/scatter
- PR #1310 Implemented the slice/split functionality.
- PR #1630 Add Python layer to the GPU-accelerated JSON reader
- PR #1745 Add rounding of numeric columns via Numba
- PR #1772 JSON reader: add support for BytesIO and StringIO input
- PR #1527 Support GDF_BOOL8 in readers and writers
- PR #1819 Logical operators (AND, OR, NOT) for libcudf and cuDF
- PR #1813 ORC Reader: Add support for stripe selection
- PR #1828 JSON Reader: add suport for bool8 columns
- PR #1833 Add column iterator with/without nulls
- PR #1665 Add the point-in-polygon GIS function
- PR #1863 Series and Dataframe methods for all and any
- PR #1908 cudf::copy_range and cudf::fill for copying/assigning an index or range to a constant
- PR #1921 Add additional formats for typecasting to/from strings
- PR #1807 Add Series.dropna()
- PR #1987 Allow user defined functions in the form of ptx code to be passed to binops
- PR #1948 Add operator functions like
Series.add()
to DataFrame and Series - PR #1954 Add skip test argument to GPU build script
- PR #2018 Add bindings for new groupby C++ API
- PR #1984 Add rolling window operations Series.rolling() and DataFrame.rolling()
- PR #1542 Python method and bindings for to_csv
- PR #1995 Add Java API
- PR #1998 Add google benchmark to cudf
- PR #1845 Add cudf::drop_duplicates, DataFrame.drop_duplicates
- PR #1652 Added
Series.where()
feature - PR #2074 Java Aggregates, logical ops, and better RMM support
- PR #2140 Add a
cudf::transform
function - PR #2068 Concatenation of different typed columns
- PR #1538 Replacing LesserRTTI with inequality_comparator
- PR #1703 C++: Added non-aggregating
insert
toconcurrent_unordered_map
with specializations to store pairs with a single atomicCAS when possible. - PR #1422 C++: Added a RAII wrapper for CUDA streams
- PR #1701 Added
unique
method for stringColumns - PR #1713 Add documentation for Dask-XGBoost
- PR #1666 CSV Reader: Improve performance for files with large number of columns
- PR #1725 Enable the ability to use a single column groupby as its own index
- PR #1759 Add an example showing simultaneous rolling averages to
apply_grouped
documentation - PR #1746 C++: Remove unused code:
windowed_ops.cu
,sorting.cu
,hash_ops.cu
- PR #1748 C++: Add
bool
nullability flag todevice_table
row operators - PR #1764 Improve Numerical column:
mean_var
andmean
- PR #1767 Speed up Python unit tests
- PR #1770 Added build.sh script, updated CI scripts and documentation
- PR #1739 ORC Reader: Add more pytest coverage
- PR #1696 Added null support in
Series.replace()
. - PR #1390 Added some basic utility functions for
gdf_column
's - PR #1791 Added general column comparison code for testing
- PR #1795 Add printing of git submodule info to
print_env.sh
- PR #1796 Removing old sort based group by code and gdf_filter
- PR #1811 Added funtions for copying/allocating
cudf::table
s - PR #1838 Improve columnops.column_empty so that it returns typed columns instead of a generic Column
- PR #1890 Add utils.get_dummies- a pandas-like wrapper around one_hot-encoding
- PR #1823 CSV Reader: default the column type to string for empty dataframes
- PR #1827 Create bindings for scalar-vector binops, and update one_hot_encoding to use them
- PR #1817 Operators now support different sized dataframes as long as they don't share different sized columns
- PR #1855 Transition replace_nulls to new C++ API and update corresponding Cython/Python code
- PR #1858 Add
std::initializer_list
constructor tocolumn_wrapper
- PR #1846 C++ type-erased gdf_equal_columns test util; fix gdf_equal_columns logic error
- PR #1390 Added some basic utility functions for
gdf_column
s - PR #1391 Tidy up bit-resolution-operation and bitmask class code
- PR #1882 Add iloc functionality to MultiIndex dataframes
- PR #1884 Rolling windows: general enhancements and better coverage for unit tests
- PR #1886 support GDF_STRING_CATEGORY columns in apply_boolean_mask, drop_nulls and other libcudf functions
- PR #1896 Improve performance of groupby with levels specified in dask-cudf
- PR #1915 Improve iloc performance for non-contiguous row selection
- PR #1859 Convert read_json into a C++ API
- PR #1919 Rename libcudf namespace gdf to namespace cudf
- PR #1850 Support left_on and right_on for DataFrame merge operator
- PR #1930 Specialize constructor for
cudf::bool8
to cast argument tobool
- PR #1938 Add default constructor for
column_wrapper
- PR #1930 Specialize constructor for
cudf::bool8
to cast argument tobool
- PR #1952 consolidate libcudf public API headers in include/cudf
- PR #1949 Improved selection with boolmask using libcudf
apply_boolean_mask
- PR #1956 Add support for nulls in
query()
- PR #1973 Update
std::tuple
tostd::pair
in top-most libcudf APIs and C++ transition guide - PR #1981 Convert read_csv into a C++ API
- PR #1868 ORC Reader: Support row index for speed up on small/medium datasets
- PR #1964 Added support for list-like types in Series.str.cat
- PR #2005 Use HTML5 details tag in bug report issue template
- PR #2003 Removed few redundant unit-tests from test_string.py::test_string_cat
- PR #1944 Groupby design improvements
- PR #2017 Convert
read_orc()
into a C++ API - PR #2011 Convert
read_parquet()
into a C++ API - PR #1756 Add documentation "10 Minutes to cuDF and dask_cuDF"
- PR #2034 Adding support for string columns concatenation using "add" binary operator
- PR #2042 Replace old "10 Minutes" guide with new guide for docs build process
- PR #2036 Make library of common test utils to speed up tests compilation
- PR #2022 Facilitating get_dummies to be a high level api too
- PR #2050 Namespace IO readers and add back free-form
read_xxx
functions - PR #2104 Add a functional
sort=
keyword argument to groupby - PR #2108 Add
find_and_replace
for StringColumn for replacing single values - PR #1803 cuDF/CuPy interoperability documentation
- PR #1465 Fix for test_orc.py and test_sparse_df.py test failures
- PR #1583 Fix underlying issue in
as_index()
that was causingSeries.quantile()
to fail - PR #1680 Add errors= keyword to drop() to fix cudf-dask bug
- PR #1651 Fix
query
function on empty dataframe - PR #1616 Fix CategoricalColumn to access categories by index instead of iteration
- PR #1660 Fix bug in
loc
when indexing with a column name (a string) - PR #1683 ORC reader: fix timestamp conversion to UTC
- PR #1613 Improve CategoricalColumn.fillna(-1) performance
- PR #1642 Fix failure of CSV_TEST gdf_csv_test.SkiprowsNrows on multiuser systems
- PR #1709 Fix handling of
datetime64[ms]
indataframe.select_dtypes
- PR #1704 CSV Reader: Add support for the plus sign in number fields
- PR #1687 CSV reader: return an empty dataframe for zero size input
- PR #1757 Concatenating columns with null columns
- PR #1755 Add col_level keyword argument to melt
- PR #1758 Fix df.set_index() when setting index from an empty column
- PR #1749 ORC reader: fix long strings of NULL values resulting in incorrect data
- PR #1742 Parquet Reader: Fix index column name to match PANDAS compat
- PR #1782 Update libcudf doc version
- PR #1783 Update conda dependencies
- PR #1786 Maintain the original series name in series.unique output
- PR #1760 CSV Reader: fix segfault when dtype list only includes columns from usecols list
- PR #1831 build.sh: Assuming python is in PATH instead of using PYTHON env var
- PR #1839 Raise an error instead of segfaulting when transposing a DataFrame with StringColumns
- PR #1840 Retain index correctly during merge left_on right_on
- PR #1825 cuDF: Multiaggregation Groupby Failures
- PR #1789 CSV Reader: Fix missing support for specifying
int8
andint16
dtypes - PR #1857 Cython Bindings: Handle
bool
columns while callingcolumn_view_from_NDArrays
- PR #1849 Allow DataFrame support methods to pass arguments to the methods
- PR #1847 Fixed #1375 by moving the nvstring check into the wrapper function
- PR #1864 Fixing cudf reduction for POWER platform
- PR #1869 Parquet reader: fix Dask timestamps not matching with Pandas (convert to milliseconds)
- PR #1876 add dtype=bool for
any
,all
to treat integer column correctly - PR #1875 CSV reader: take NaN values into account in dtype detection
- PR #1873 Add column dtype checking for the all/any methods
- PR #1902 Bug with string iteration in _apply_basic_agg
- PR #1887 Fix for initialization issue in pq_read_arg,orc_read_arg
- PR #1867 JSON reader: add support for null/empty fields, including the 'null' literal
- PR #1891 Fix bug #1750 in string column comparison
- PR #1909 Support of
to_pandas()
of boolean series with null values - PR #1923 Use prefix removal when two aggs are called on a SeriesGroupBy
- PR #1914 Zero initialize gdf_column local variables
- PR #1959 Add support for comparing boolean Series to scalar
- PR #1966 Ignore index fix in series append
- PR #1967 Compute index sizeof only once for DataFrame sizeof
- PR #1977 Support CUDA installation in default system directories
- PR #1982 Fixes incorrect index name after join operation
- PR #1985 Implement
GDF_PYMOD
, a special modulo that follows python's sign rules - PR #1991 Parquet reader: fix decoding of NULLs
- PR #1990 Fixes a rendering bug in the
apply_grouped
documentation - PR #1978 Fix for values being filled in an empty dataframe
- PR #2001 Correctly create MultiColumn from Pandas MultiColumn
- PR #2006 Handle empty dataframe groupby construction for dask
- PR #1965 Parquet Reader: Fix duplicate index column when it's already in
use_cols
- PR #2033 Add pip to conda environment files to fix warning
- PR #2028 CSV Reader: Fix reading of uncompressed files without a recognized file extension
- PR #2073 Fix an issue when gathering columns with NVCategory and nulls
- PR #2053 cudf::apply_boolean_mask return empty column for empty boolean mask
- PR #2066 exclude
IteratorTest.mean_var_output
test from debug build - PR #2069 Fix JNI code to use read_csv and read_parquet APIs
- PR #2071 Fix bug with unfound transitive dependencies for GTests in Ubuntu 18.04
- PR #2089 Configure Sphinx to render params correctly
- PR #2091 Fix another bug with unfound transitive dependencies for
cudftestutils
in Ubuntu 18.04 - PR #2115 Just apply
--disable-new-dtags
instead of trying to define all the transitive dependencies - PR #2106 Fix errors in JitCache tests caused by sharing of device memory between processes
- PR #2120 Fix errors in JitCache tests caused by running multiple threads on the same data
- PR #2102 Fix memory leak in groupby
- PR #2113 fixed typo in to_csv code example
- PR #1735 Added overload for atomicAdd on int64. Streamlined implementation of custom atomic overloads.
- PR #1741 Add MultiIndex concatenation
- PR #1718 Fix issue with SeriesGroupBy MultiIndex in dask-cudf
- PR #1734 Python: fix performance regression for groupby count() aggregations
- PR #1768 Cython: fix handling read only schema buffers in gpuarrow reader
- PR #1702 Lazy load MultiIndex to return groupby performance to near optimal.
- PR #1708 Fix handling of
datetime64[ms]
indataframe.select_dtypes
- PR #982 Implement gdf_group_by_without_aggregations and gdf_unique_indices functions
- PR #1142 Add
GDF_BOOL
column type - PR #1194 Implement overloads for CUDA atomic operations
- PR #1292 Implemented Bitwise binary ops AND, OR, XOR (&, |, ^)
- PR #1235 Add GPU-accelerated Parquet Reader
- PR #1335 Added local_dict arg in
DataFrame.query()
. - PR #1282 Add Series and DataFrame.describe()
- PR #1356 Rolling windows
- PR #1381 Add DataFrame._get_numeric_data
- PR #1388 Add CODEOWNERS file to auto-request reviews based on where changes are made
- PR #1396 Add DataFrame.drop method
- PR #1413 Add DataFrame.melt method
- PR #1412 Add DataFrame.pop()
- PR #1419 Initial CSV writer function
- PR #1441 Add Series level cumulative ops (cumsum, cummin, cummax, cumprod)
- PR #1420 Add script to build and test on a local gpuCI image
- PR #1440 Add DatetimeColumn.min(), DatetimeColumn.max()
- PR #1455 Add Series.Shift via Numba kernel
- PR #1441 Add Series level cumulative ops (cumsum, cummin, cummax, cumprod)
- PR #1461 Add Python coverage test to gpu build
- PR #1445 Parquet Reader: Add selective reading of rows and row group
- PR #1532 Parquet Reader: Add support for INT96 timestamps
- PR #1516 Add Series and DataFrame.ndim
- PR #1556 Add libcudf C++ transition guide
- PR #1466 Add GPU-accelerated ORC Reader
- PR #1565 Add build script for nightly doc builds
- PR #1508 Add Series isna, isnull, and notna
- PR #1456 Add Series.diff() via Numba kernel
- PR #1588 Add Index
astype
typecasting - PR #1301 MultiIndex support
- PR #1599 Level keyword supported in groupby
- PR #929 Add support operations to dataframe
- PR #1609 Groupby accept list of Series
- PR #1658 Support
group_keys=True
keyword in groupby method
- PR #1531 Refactor closures as private functions in gpuarrow
- PR #1404 Parquet reader page data decoding speedup
- PR #1076 Use
type_dispatcher
in join, quantiles, filter, segmented sort, radix sort and hash_groupby - PR #1202 Simplify README.md
- PR #1149 CSV Reader: Change convertStrToValue() functions to
__device__
only - PR #1238 Improve performance of the CUDA trie used in the CSV reader
- PR #1245 Use file cache for JIT kernels
- PR #1278 Update CONTRIBUTING for new conda environment yml naming conventions
- PR #1163 Refactored UnaryOps. Reduced API to two functions:
gdf_unary_math
andgdf_cast
. Addedabs
,-
, and~
ops. Changed bindings to Cython - PR #1284 Update docs version
- PR #1287 add exclude argument to cudf.select_dtype function
- PR #1286 Refactor some of the CSV Reader kernels into generic utility functions
- PR #1291 fillna in
Series.to_gpu_array()
andSeries.to_array()
can accept the scalar too now. - PR #1005 generic
reduction
andscan
support - PR #1349 Replace modernGPU sort join with thrust.
- PR #1363 Add a dataframe.mean(...) that raises NotImplementedError to satisfy
dask.dataframe.utils.is_dataframe_like
- PR #1319 CSV Reader: Use column wrapper for gdf_column output alloc/dealloc
- PR #1376 Change series quantile default to linear
- PR #1399 Replace CFFI bindings for NVTX functions with Cython bindings
- PR #1389 Refactored
set_null_count()
- PR #1386 Added macros
GDF_TRY()
,CUDF_TRY()
andASSERT_CUDF_SUCCEEDED()
- PR #1435 Rework CMake and conda recipes to depend on installed libraries
- PR #1391 Tidy up bit-resolution-operation and bitmask class code
- PR #1439 Add cmake variable to enable compiling CUDA code with -lineinfo
- PR #1462 Add ability to read parquet files from arrow::io::RandomAccessFile
- PR #1453 Convert CSV Reader CFFI to Cython
- PR #1479 Convert Parquet Reader CFFI to Cython
- PR #1397 Add a utility function for producing an overflow-safe kernel launch grid configuration
- PR #1382 Add GPU parsing of nested brackets to cuIO parsing utilities
- PR #1481 Add cudf::table constructor to allocate a set of
gdf_column
s - PR #1484 Convert GroupBy CFFI to Cython
- PR #1463 Allow and default melt keyword argument var_name to be None
- PR #1486 Parquet Reader: Use device_buffer rather than device_ptr
- PR #1525 Add cudatoolkit conda dependency
- PR #1520 Renamed
src/dataframe
tosrc/table
and movedtable.hpp
. Madetypes.hpp
to be type declarations only. - PR #1492 Convert transpose CFFI to Cython
- PR #1495 Convert binary and unary ops CFFI to Cython
- PR #1503 Convert sorting and hashing ops CFFI to Cython
- PR #1522 Use latest release version in update-version CI script
- PR #1533 Remove stale join CFFI, fix memory leaks in join Cython
- PR #1521 Added
row_bitmask
to compute bitmask for rows of a table. Mergedvalids_ops.cu
andbitmask_ops.cu
- PR #1553 Overload
hash_row
to avoid using initial hash values. Updatedgdf_hash
to select between overloads - PR #1585 Updated
cudf::table
to maintain own copy of wrappedgdf_column*
s - PR #1559 Add
except +
to all Cython function definitions to catch C++ exceptions properly - PR #1617
has_nulls
andcolumn_dtypes
forcudf::table
- PR #1590 Remove CFFI from the build / install process entirely
- PR #1536 Convert gpuarrow CFFI to Cython
- PR #1655 Add
Column._pointer
as a way to access underlyinggdf_column*
of aColumn
- PR #1655 Update readme conda install instructions for cudf version 0.6 and 0.7
- PR #1233 Fix dtypes issue while adding the column to
str
dataframe. - PR #1254 CSV Reader: fix data type detection for floating-point numbers in scientific notation
- PR #1289 Fix looping over each value instead of each category in concatenation
- PR #1293 Fix Inaccurate error message in join.pyx
- PR #1308 Add atomicCAS overload for
int8_t
,int16_t
- PR #1317 Fix catch polymorphic exception by reference in ipc.cu
- PR #1325 Fix dtype of null bitmasks to int8
- PR #1326 Update build documentation to use -DCMAKE_CXX11_ABI=ON
- PR #1334 Add "na_position" argument to CategoricalColumn sort_by_values
- PR #1321 Fix out of bounds warning when checking Bzip2 header
- PR #1359 Add atomicAnd/Or/Xor for integers
- PR #1354 Fix
fillna()
behaviour when replacing values with different dtypes - PR #1347 Fixed core dump issue while passing dict_dtypes without column names in
cudf.read_csv()
- PR #1379 Fixed build failure caused due to error: 'col_dtype' may be used uninitialized
- PR #1392 Update cudf Dockerfile and package_versions.sh
- PR #1385 Added INT8 type to
_schema_to_dtype
for use in GpuArrowReader - PR #1393 Fixed a bug in
gdf_count_nonzero_mask()
for the case of 0 bits to count - PR #1395 Update CONTRIBUTING to use the environment variable CUDF_HOME
- PR #1416 Fix bug at gdf_quantile_exact and gdf_quantile_appox
- PR #1421 Fix remove creation of series multiple times during
add_column()
- PR #1405 CSV Reader: Fix memory leaks on read_csv() failure
- PR #1328 Fix CategoricalColumn to_arrow() null mask
- PR #1433 Fix NVStrings/categories includes
- PR #1432 Update NVStrings to 0.7.* to coincide with 0.7 development
- PR #1483 Modify CSV reader to avoid cropping blank quoted characters in non-string fields
- PR #1446 Merge 1275 hotfix from master into branch-0.7
- PR #1447 Fix legacy groupby apply docstring
- PR #1451 Fix hash join estimated result size is not correct
- PR #1454 Fix local build script improperly change directory permissions
- PR #1490 Require Dask 1.1.0+ for
is_dataframe_like
test or skip otherwise. - PR #1491 Use more specific directories & groups in CODEOWNERS
- PR #1497 Fix Thrust issue on CentOS caused by missing default constructor of host_vector elements
- PR #1498 Add missing include guard to device_atomics.cuh and separated DEVICE_ATOMICS_TEST
- PR #1506 Fix csv-write call to updated NVStrings method
- PR #1510 Added nvstrings
fillna()
function - PR #1507 Parquet Reader: Default string data to GDF_STRING
- PR #1535 Fix doc issue to ensure correct labelling of cudf.series
- PR #1537 Fix
undefined reference
link error in HashPartitionTest - PR #1548 Fix ci/local/build.sh README from using an incorrect image example
- PR #1551 CSV Reader: Fix integer column name indexing
- PR #1586 Fix broken
scalar_wrapper::operator==
- PR #1591 ORC/Parquet Reader: Fix missing import for FileNotFoundError exception
- PR #1573 Parquet Reader: Fix crash due to clash with ORC reader datasource
- PR #1607 Revert change of
column.to_dense_buffer
always return by copy for performance concerns - PR #1618 ORC reader: fix assert & data output when nrows/skiprows isn't aligned to stripe boundaries
- PR #1631 Fix failure of TYPES_TEST on some gcc-7 based systems.
- PR #1641 CSV Reader: Fix skip_blank_lines behavior with Windows line terminators ( )
- PR #1648 ORC reader: fix non-deterministic output when skiprows is non-zero
- PR #1676 Fix groupby
as_index
behaviour withMultiIndex
- PR #1659 Fix bug caused by empty groupbys and multiindex slicing throwing exceptions
- PR #1656 Correct Groupby failure in dask when un-aggregable columns are left in dataframe.
- PR #1689 Fix groupby performance regression
- PR #1694 Add Cython as a runtime dependency since it's required in
setup.py
- PR #1275 Fix CentOS exception in DataFrame.hash_partition from using value "returned" by a void function
- PR #760 Raise
FileNotFoundError
instead ofGDF_FILE_ERROR
inread_csv
if the file does not exist - PR #539 Add Python bindings for replace function
- PR #823 Add Doxygen configuration to enable building HTML documentation for libcudf C/C++ API
- PR #807 CSV Reader: Add byte_range parameter to specify the range in the input file to be read
- PR #857 Add Tail method for Series/DataFrame and update Head method to use iloc
- PR #858 Add series feature hashing support
- PR #871 CSV Reader: Add support for NA values, including user specified strings
- PR #893 Adds PyArrow based parquet readers / writers to Python, fix category dtype handling, fix arrow ingest buffer size issues
- PR #867 CSV Reader: Add support for ignoring blank lines and comment lines
- PR #887 Add Series digitize method
- PR #895 Add Series groupby
- PR #898 Add DataFrame.groupby(level=0) support
- PR #920 Add feather, JSON, HDF5 readers / writers from PyArrow / Pandas
- PR #888 CSV Reader: Add prefix parameter for column names, used when parsing without a header
- PR #913 Add DLPack support: convert between cuDF DataFrame and DLTensor
- PR #939 Add ORC reader from PyArrow
- PR #918 Add Series.groupby(level=0) support
- PR #906 Add binary and comparison ops to DataFrame
- PR #958 Support unary and binary ops on indexes
- PR #964 Add
rename
method toDataFrame
,Series
, andIndex
- PR #985 Add
Series.to_frame
method - PR #985 Add
drop=
keyword to reset_index method - PR #994 Remove references to pygdf
- PR #990 Add external series groupby support
- PR #988 Add top-level merge function to cuDF
- PR #992 Add comparison binaryops to DateTime columns
- PR #996 Replace relative path imports with absolute paths in tests
- PR #995 CSV Reader: Add index_col parameter to specify the column name or index to be used as row labels
- PR #1004 Add
from_gpu_matrix
method to DataFrame - PR #997 Add property index setter
- PR #1007 Replace relative path imports with absolute paths in cudf
- PR #1013 select columns with df.columns
- PR #1016 Rename Series.unique_count() to nunique() to match pandas API
- PR #947 Prefixsum to handle nulls and float types
- PR #1029 Remove rest of relative path imports
- PR #1021 Add filtered selection with assignment for Dataframes
- PR #872 Adding NVCategory support to cudf apis
- PR #1052 Add left/right_index and left/right_on keywords to merge
- PR #1091 Add
indicator=
andsuffixes=
keywords to merge - PR #1107 Add unsupported keywords to Series.fillna
- PR #1032 Add string support to cuDF python
- PR #1136 Removed
gdf_concat
- PR #1153 Added function for getting the padded allocation size for valid bitmask
- PR #1148 Add cudf.sqrt for dataframes and Series
- PR #1159 Add Python bindings for libcudf dlpack functions
- PR #1155 Add array_ufunc for DataFrame and Series for sqrt
- PR #1168 to_frame for series accepts a name argument
- PR #1218 Add dask-cudf page to API docs
- PR #892 Add support for heterogeneous types in binary ops with JIT
- PR #730 Improve performance of
gdf_table
constructor - PR #561 Add Doxygen style comments to Join CUDA functions
- PR #813 unified libcudf API functions by replacing gpu_ with gdf_
- PR #822 Add support for
__cuda_array_interface__
for ingest - PR #756 Consolidate common helper functions from unordered map and multimap
- PR #753 Improve performance of groupby sum and average, especially for cases with few groups.
- PR #836 Add ingest support for arrow chunked arrays in Column, Series, DataFrame creation
- PR #763 Format doxygen comments for csv_read_arg struct
- PR #532 CSV Reader: Use type dispatcher instead of switch block
- PR #694 Unit test utilities improvements
- PR #878 Add better indexing to Groupby
- PR #554 Add
empty
method andis_monotonic
attribute toIndex
- PR #1040 Fixed up Doxygen comment tags
- PR #909 CSV Reader: Avoid host->device->host copy for header row data
- PR #916 Improved unit testing and error checking for
gdf_column_concat
- PR #941 Replace
numpy
call inSeries.hash_encode
withnumba
- PR #942 Added increment/decrement operators for wrapper types
- PR #943 Updated
count_nonzero_mask
to returnnum_rows
when the mask is null - PR #952 Added trait to map C++ type to
gdf_dtype
- PR #966 Updated RMM submodule.
- PR #998 Add IO reader/writer modules to API docs, fix for missing cudf.Series docs
- PR #1017 concatenate along columns for Series and DataFrames
- PR #1002 Support indexing a dataframe with another boolean dataframe
- PR #1018 Better concatenation for Series and Dataframes
- PR #1036 Use Numpydoc style docstrings
- PR #1047 Adding gdf_dtype_extra_info to gdf_column_view_augmented
- PR #1054 Added default ctor to SerialTrieNode to overcome Thrust issue in CentOS7 + CUDA10
- PR #1024 CSV Reader: Add support for hexadecimal integers in integral-type columns
- PR #1033 Update
fillna()
to use libcudf functiongdf_replace_nulls
- PR #1066 Added inplace assignment for columns and select_dtypes for dataframes
- PR #1026 CSV Reader: Change the meaning and type of the quoting parameter to match Pandas
- PR #1100 Adds
CUDF_EXPECTS
error-checking macro - PR #1092 Fix select_dtype docstring
- PR #1111 Added cudf::table
- PR #1108 Sorting for datetime columns
- PR #1120 Return a
Series
(not aColumn
) fromSeries.cat.set_categories()
- PR #1128 CSV Reader: The last data row does not need to be line terminated
- PR #1183 Bump Arrow version to 0.12.1
- PR #1208 Default to CXX11_ABI=ON
- PR #1252 Fix NVStrings dependencies for cuda 9.2 and 10.0
- PR #2037 Optimize the existing
gather
andscatter
routines inlibcudf
- PR #821 Fix flake8 issues revealed by flake8 update
- PR #808 Resolved renamed
d_columns_valids
variable name - PR #820 CSV Reader: fix the issue where reader adds additional rows when file uses as a line terminator
- PR #780 CSV Reader: Fix scientific notation parsing and null values for empty quotes
- PR #815 CSV Reader: Fix data parsing when tabs are present in the input CSV file
- PR #850 Fix bug where left joins where the left df has 0 rows causes a crash
- PR #861 Fix memory leak by preserving the boolean mask index
- PR #875 Handle unnamed indexes in to/from arrow functions
- PR #877 Fix ingest of 1 row arrow tables in from arrow function
- PR #876 Added missing
<type_traits>
include - PR #889 Deleted test_rmm.py which has now moved to RMM repo
- PR #866 Merge v0.5.1 numpy ABI hotfix into 0.6
- PR #917 value_counts return int type on empty columns
- PR #611 Renamed
gdf_reduce_optimal_output_size()
->gdf_reduction_get_intermediate_output_size()
- PR #923 fix index for negative slicing for cudf dataframe and series
- PR #927 CSV Reader: Fix category GDF_CATEGORY hashes not being computed properly
- PR #921 CSV Reader: Fix parsing errors with delim_whitespace, quotations in the header row, unnamed columns
- PR #933 Fix handling objects of all nulls in series creation
- PR #940 CSV Reader: Fix an issue where the last data row is missing when using byte_range
- PR #945 CSV Reader: Fix incorrect datetime64 when milliseconds or space separator are used
- PR #959 Groupby: Problem with column name lookup
- PR #950 Converting dataframe/recarry with non-contiguous arrays
- PR #963 CSV Reader: Fix another issue with missing data rows when using byte_range
- PR #999 Fix 0 sized kernel launches and empty sort_index exception
- PR #993 Fix dtype in selecting 0 rows from objects
- PR #1009 Fix performance regression in
to_pandas
method on DataFrame - PR #1008 Remove custom dask communication approach
- PR #1001 CSV Reader: Fix a memory access error when reading a large (>2GB) file with date columns
- PR #1019 Binary Ops: Fix error when one input column has null mask but other doesn't
- PR #1014 CSV Reader: Fix false positives in bool value detection
- PR #1034 CSV Reader: Fix parsing floating point precision and leading zero exponents
- PR #1044 CSV Reader: Fix a segfault when byte range aligns with a page
- PR #1058 Added support for
DataFrame.loc[scalar]
- PR #1060 Fix column creation with all valid nan values
- PR #1073 CSV Reader: Fix an issue where a column name includes the return character
- PR #1090 Updating Doxygen Comments
- PR #1080 Fix dtypes returned from loc / iloc because of lists
- PR #1102 CSV Reader: Minor fixes and memory usage improvements
- PR #1174: Fix release script typo
- PR #1137 Add prebuild script for CI
- PR #1118 Enhanced the
DataFrame.from_records()
feature - PR #1129 Fix join performance with index parameter from using numpy array
- PR #1145 Issue with .agg call on multi-column dataframes
- PR #908 Some testing code cleanup
- PR #1167 Fix issue with null_count not being set after inplace fillna()
- PR #1184 Fix iloc performance regression
- PR #1185 Support left_on/right_on and also on=str in merge
- PR #1200 Fix allocating bitmasks with numba instead of rmm in allocate_mask function
- PR #1213 Fix bug with csv reader requesting subset of columns using wrong datatype
- PR #1223 gpuCI: Fix label on rapidsai channel on gpu build scripts
- PR #1242 Add explicit Thrust exec policy to fix NVCATEGORY_TEST segfault on some platforms
- PR #1246 Fix categorical tests that failed due to bad implicit type conversion
- PR #1255 Fix overwriting conda package main label uploads
- PR #1259 Add dlpack includes to pip build
- PR #842 Avoid using numpy via cimport to prevent ABI issues in Cython compilation
- PR #722 Add bzip2 decompression support to
read_csv()
- PR #693 add ZLIB-based GZIP/ZIP support to
read_csv_strings()
- PR #411 added null support to gdf_order_by (new API) and cudf_table::sort
- PR #525 Added GitHub Issue templates for bugs, documentation, new features, and questions
- PR #501 CSV Reader: Add support for user-specified decimal point and thousands separator to read_csv_strings()
- PR #455 CSV Reader: Add support for user-specified decimal point and thousands separator to read_csv()
- PR #439 add
DataFrame.drop
method similar to pandas - PR #356 add
DataFrame.transpose
method andDataFrame.T
property similar to pandas - PR #505 CSV Reader: Add support for user-specified boolean values
- PR #350 Implemented Series replace function
- PR #490 Added print_env.sh script to gather relevant environment details when reporting cuDF issues
- PR #474 add ZLIB-based GZIP/ZIP support to
read_csv()
- PR #547 Added melt similar to
pandas.melt()
- PR #491 Add CI test script to check for updates to CHANGELOG.md in PRs
- PR #550 Add CI test script to check for style issues in PRs
- PR #558 Add CI scripts for cpu-based conda and gpu-based test builds
- PR #524 Add Boolean Indexing
- PR #564 Update python
sort_values
method to use updated libcudfgdf_order_by
API - PR #509 CSV Reader: Input CSV file can now be passed in as a text or a binary buffer
- PR #607 Add
__iter__
and iteritems to DataFrame class - PR #643 added a new api gdf_replace_nulls that allows a user to replace nulls in a column
- PR #426 Removed sort-based groupby and refactored existing groupby APIs. Also improves C++/CUDA compile time.
- PR #461 Add
CUDF_HOME
variable in README.md to replace relative pathing. - PR #472 RMM: Created centralized rmm::device_vector alias and rmm::exec_policy
- PR #500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building.
- PR #454 Improve CSV reader docs and examples
- PR #465 Added templated C++ API for RMM to avoid explicit cast to
void**
- PR #513
.gitignore
tweaks - PR #521 Add
assert_eq
function for testing - PR #502 Simplify Dockerfile for local dev, eliminate old conda/pip envs
- PR #549 Adds
-rdynamic
compiler flag to nvcc for Debug builds - PR #472 RMM: Created centralized rmm::device_vector alias and rmm::exec_policy
- PR #577 Added external C++ API for scatter/gather functions
- PR #500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR #583 Updated
gdf_size_type
toint
- PR #500 Improved the concurrent hash map class to support partitioned (multi-pass) hash table building
- PR #617 Added .dockerignore file. Prevents adding stale cmake cache files to the docker container
- PR #658 Reduced
JOIN_TEST
time by isolating overflow test of hash table size computation - PR #664 Added Debuging instructions to README
- PR #651 Remove noqa marks in
__init__.py
files - PR #671 CSV Reader: uncompressed buffer input can be parsed without explicitly specifying compression as None
- PR #684 Make RMM a submodule
- PR #718 Ensure sum, product, min, max methods pandas compatibility on empty datasets
- PR #720 Refactored Index classes to make them more Pandas-like, added CategoricalIndex
- PR #749 Improve to_arrow and from_arrow Pandas compatibility
- PR #766 Remove TravisCI references, remove unused variables from CMake, fix ARROW_VERSION in Cmake
- PR #773 Add build-args back to Dockerfile and handle dependencies based on environment yml file
- PR #781 Move thirdparty submodules to root and symlink in /cpp
- PR #843 Fix broken cudf/python API examples, add new methods to the API index
- PR #569 CSV Reader: Fix days being off-by-one when parsing some dates
- PR #531 CSV Reader: Fix incorrect parsing of quoted numbers
- PR #465 Added templated C++ API for RMM to avoid explicit cast to
void**
- PR #473 Added missing include
- PR #478 CSV Reader: Add api support for auto column detection, header, mangle_dupe_cols, usecols
- PR #495 Updated README to correct where cffi pytest should be executed
- PR #501 Fix the intermittent segfault caused by the
thousands
andcompression
parameters in the csv reader - PR #502 Simplify Dockerfile for local dev, eliminate old conda/pip envs
- PR #512 fix bug for
on
parameter inDataFrame.merge
to allow for None or single column name - PR #511 Updated python/cudf/bindings/join.pyx to fix cudf merge printing out dtypes
- PR #513
.gitignore
tweaks - PR #521 Add
assert_eq
function for testing - PR #537 Fix CMAKE_CUDA_STANDARD_REQURIED typo in CMakeLists.txt
- PR #447 Fix silent failure in initializing DataFrame from generator
- PR #545 Temporarily disable csv reader thousands test to prevent segfault (test re-enabled in PR #501)
- PR #559 Fix Assertion error while using
applymap
to change the output dtype - PR #575 Update
print_env.sh
script to better handle missing commands - PR #612 Prevent an exception from occurring with true division on integer series.
- PR #630 Fix deprecation warning for
pd.core.common.is_categorical_dtype
- PR #622 Fix Series.append() behaviour when appending values with different numeric dtype
- PR #603 Fix error while creating an empty column using None.
- PR #673 Fix array of strings not being caught in from_pandas
- PR #644 Fix return type and column support of dataframe.quantile()
- PR #634 Fix create
DataFrame.from_pandas()
with numeric column names - PR #654 Add resolution check for GDF_TIMESTAMP in Join
- PR #648 Enforce one-to-one copy required when using
numba>=0.42.0
- PR #645 Fix cmake build type handling not setting debug options when CMAKE_BUILD_TYPE=="Debug"
- PR #669 Fix GIL deadlock when launching multiple python threads that make Cython calls
- PR #665 Reworked the hash map to add a way to report the destination partition for a key
- PR #670 CMAKE: Fix env include path taking precedence over libcudf source headers
- PR #674 Check for gdf supported column types
- PR #677 Fix 'gdf_csv_test_Dates' gtest failure due to missing nrows parameter
- PR #604 Fix the parsing errors while reading a csv file using
sep
instead ofdelimiter
. - PR #686 Fix converting nulls to NaT values when converting Series to Pandas/Numpy
- PR #689 CSV Reader: Fix behavior with skiprows+header to match pandas implementation
- PR #691 Fixes Join on empty input DFs
- PR #706 CSV Reader: Fix broken dtype inference when whitespace is in data
- PR #717 CSV reader: fix behavior when parsing a csv file with no data rows
- PR #724 CSV Reader: fix build issue due to parameter type mismatch in a std::max call
- PR #734 Prevents reading undefined memory in gpu_expand_mask_bits numba kernel
- PR #747 CSV Reader: fix an issue where CUDA allocations fail with some large input files
- PR #750 Fix race condition for handling NVStrings in CMake
- PR #719 Fix merge column ordering
- PR #770 Fix issue where RMM submodule pointed to wrong branch and pin other to correct branches
- PR #778 Fix hard coded ABI off setting
- PR #784 Update RMM submodule commit-ish and pip paths
- PR #794 Update
rmm::exec_policy
usage to fix segmentation faults when used as temprory allocator. - PR #800 Point git submodules to branches of forks instead of exact commits
- PR #398 add pandas-compatible
DataFrame.shape()
andSeries.shape()
- PR #394 New documentation feature "10 Minutes to cuDF"
- PR #361 CSV Reader: Add support for strings with delimiters
- PR #436 Improvements for type_dispatcher and wrapper structs
- PR #429 Add CHANGELOG.md (this file)
- PR #266 use faster CUDA-accelerated DataFrame column/Series concatenation.
- PR #379 new C++
type_dispatcher
reduces code complexity in supporting many data types. - PR #349 Improve performance for creating columns from memoryview objects
- PR #445 Update reductions to use type_dispatcher. Adds integer types support to sum_of_squares.
- PR #448 Improve installation instructions in README.md
- PR #456 Change default CMake build to Release, and added option for disabling compilation of tests
- PR #444 Fix csv_test CUDA too many resources requested fail.
- PR #396 added missing output buffer in validity tests for groupbys.
- PR #408 Dockerfile updates for source reorganization
- PR #437 Add cffi to Dockerfile conda env, fixes "cannot import name 'librmm'"
- PR #417 Fix
map_test
failure with CUDA 10 - PR #414 Fix CMake installation include file paths
- PR #418 Properly cast string dtypes to programmatic dtypes when instantiating columns
- PR #427 Fix and tests for Concatenation illegal memory access with nulls
- PR #336 CSV Reader string support
- PR #354 source code refactored for better organization. CMake build system overhaul. Beginning of transition to Cython bindings.
- PR #290 Add support for typecasting to/from datetime dtype
- PR #323 Add handling pyarrow boolean arrays in input/out, add tests
- PR #325 GDF_VALIDITY_UNSUPPORTED now returned for algorithms that don't support non-empty valid bitmasks
- PR #381 Faster InputTooLarge Join test completes in ms rather than minutes.
- PR #373 .gitignore improvements
- PR #367 Doc cleanup & examples for DataFrame methods
- PR #333 Add Rapids Memory Manager documentation
- PR #321 Rapids Memory Manager adds file/line location logging and convenience macros
- PR #334 Implement DataFrame
__copy__
and__deepcopy__
- PR #271 Add NVTX ranges to pygdf
- PR #311 Document system requirements for conda install
- PR #337 Retain index on
scale()
function - PR #344 Fix test failure due to PyArrow 0.11 Boolean handling
- PR #364 Remove noexcept from managed_allocator; CMakeLists fix for NVstrings
- PR #357 Fix bug that made all series be considered booleans for indexing
- PR #351 replace conda env configuration for developers
- PRs #346 #360 Fix CSV reading of negative numbers
- PR #342 Fix CMake to use conda-installed nvstrings
- PR #341 Preserve categorical dtype after groupby aggregations
- PR #315 ReadTheDocs build update to fix missing libcuda.so
- PR #320 FIX out-of-bounds access error in reductions.cu
- PR #319 Fix out-of-bounds memory access in libcudf count_valid_bits
- PR #303 Fix printing empty dataframe
These were initial releases of cuDF based on previously separate pyGDF and libGDF libraries.