-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayDataAdapter no longer converts to NumPy and supports sparse tens… #19298
Conversation
…ors. Instead, the passed arrays can be sliced or indexed in their native format. - This addresses keras-team#18408 and improves performance, especially with Tensorflow and Torch. It improves TF -> TF and Torch -> Torch, but also TF -> Torch and Torch -> TF. - This allows the support of sparse tensors (`tf.SparseTensor`, `jax.experimental.sparse.BCOO` and `scipy.sparse`). These sparse tensors are sliced as sparse and the iterators yield sparse tensors in the requested format (either TF or JAX). - The `validation_split` argument of `Model.fit()` can now be used with anything supported by `ArrayDataAdapter`, in particular, sparse tensors are now supported. In summary, `ArrayDataAdapter` now supports: - native Python arrays - NumPy arrays - Tensorflow tensors, ragged tensors, sparse tensors (new) - JAX arrays and BCOO sparse tensors (new) - pandas DataFrames - pandas Series - scipy sparse matrices (new) Also: - Fixed bug where batch level shuffling would shuffle inconsistently the different arrays (in particular inputs and labels) when using a TF dataset or a NumPy iterator. - Fixed bug where `tf.RaggedTensor`s would only work when using a TF dataset. - Fixed bug where `tf.RaggedTensor`s would not work when doing batch level shuffling. - Added a workaround for a bug where `tf.cast`ing a `tf.SparseTensor` would lose the static shape. - Added test coverage for `tf.RaggedTensor`s and `pandas.Series`. - Added verification in tests that inputs and labels are shuffled consistently.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #19298 +/- ##
==========================================
- Coverage 80.14% 75.70% -4.44%
==========================================
Files 341 366 +25
Lines 36163 40069 +3906
Branches 7116 7769 +653
==========================================
+ Hits 28982 30334 +1352
- Misses 5578 8052 +2474
- Partials 1603 1683 +80
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Great work! Are you seeing any performance impact? |
Yes, here is the summary of the benchmark. I used a trivial model, clocked Summary:
Notes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
…ors.
Instead, the passed arrays can be sliced or indexed in their native format.
tf.SparseTensor
,jax.experimental.sparse.BCOO
andscipy.sparse
). These sparse tensors are sliced as sparse and the iterators yield sparse tensors in the requested format (either TF or JAX).validation_split
argument ofModel.fit()
can now be used with anything supported byArrayDataAdapter
, in particular, sparse tensors are now supported.In summary,
ArrayDataAdapter
now supports:Also:
tf.RaggedTensor
s would only work when using a TF dataset.tf.RaggedTensor
s would not work when doing batch level shuffling.tf.cast
ing atf.SparseTensor
would lose the static shape.tf.RaggedTensor
s andpandas.Series
.