-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to nullify empty lines #17028
Changes from 9 commits
9ff3129
ab7659b
0ef5108
1dffbf0
ebc5275
679833b
b192fd2
31d5cab
7c3e0f0
35b7177
9370dc5
6d87031
b9005ae
4382ef8
f75d8ee
bb9584e
6ad06ca
424f90f
8b48297
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
|
||
#include <cudf/io/datasource.hpp> | ||
#include <cudf/io/json.hpp> | ||
#include <cudf/strings/strings_column_view.hpp> | ||
#include <cudf/types.hpp> | ||
#include <cudf/utilities/export.hpp> | ||
#include <cudf/utilities/memory_resource.hpp> | ||
|
@@ -73,5 +74,9 @@ table_with_metadata read_json(host_span<std::unique_ptr<datasource>> sources, | |
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr); | ||
|
||
std::tuple<rmm::device_buffer, char> preprocess(cudf::strings_column_view const& input, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see this function is called only in testing. Do we ever need it in the source code in other places. If not, can we generate the test string directly without this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. you can test without this function. But idea is that each string row is appended with 1 delimiter that's not present in the strings. This function is provided by @shrshi for you to convert string column to a rmm buffer and delimiter easily. |
||
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr); | ||
|
||
} // namespace io::json::detail | ||
} // namespace CUDF_EXPORT cudf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this imply any performance hit? Please run benchmark with this. If there is any slowdown, we probably need to make this as a template argument (with sacrificing compile time) so we can optimize the code out if it is
false
.