Add in basic support for JSON generation in BigDataGen and improve performance of from_json #10361

revans2 · 2024-02-01T20:42:42Z

This depends on work done in rapidsai/cudf#14954

In my testing using from_json to pull one column out of a JSON string with only one column in it (10 GiB of data) went from 13.8 seconds to 7.0 seconds. Note that this included reading the 10 GiB from a parquet file too so the speedup is even more.

For longer strings with 512 columns in it and only reading one of the columns (still 10 GiB of data) went from 17.6 seconds to 11.4 seconds.

…rformance of from_json Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 · 2024-02-09T15:04:11Z

build

Add in basic support for JSON generation in BigDataGen and improve pe…

7849a2b

…rformance of from_json Signed-off-by: Robert (Bobby) Evans <[email protected]>

revans2 requested a review from andygrove February 1, 2024 20:42

revans2 mentioned this pull request Feb 1, 2024

Experimental json perf #10362

Closed

andygrove approved these changes Feb 2, 2024

View reviewed changes

revans2 added 2 commits February 8, 2024 09:35

Merge branch 'branch-24.04' into basic_json_gen

c4f2f30

Merge branch 'branch-24.04' into basic_json_gen

8fe8741

revans2 marked this pull request as ready for review February 9, 2024 15:03

revans2 self-assigned this Feb 9, 2024

revans2 merged commit 355a770 into NVIDIA:branch-24.04 Feb 9, 2024
40 checks passed

sameerz added the performance A performance related task/issue label Feb 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in basic support for JSON generation in BigDataGen and improve performance of from_json #10361

Add in basic support for JSON generation in BigDataGen and improve performance of from_json #10361

revans2 commented Feb 1, 2024

revans2 commented Feb 9, 2024

Add in basic support for JSON generation in BigDataGen and improve performance of from_json #10361

Add in basic support for JSON generation in BigDataGen and improve performance of from_json #10361

Conversation

revans2 commented Feb 1, 2024

revans2 commented Feb 9, 2024