avro reader integration tests #7156

cwharris · 2021-01-15T19:19:02Z

Added some avro reader integration tests for fastavro. These cover type detection, single-value parsing, and null value parsing, but do not cover parsing multiple values.

cwharris · 2021-01-27T16:14:03Z

rerun tests

cwharris · 2021-01-27T16:34:42Z

rerun tests

codecov · 2021-01-27T20:55:16Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-0.19@fc40c52). Click here to learn what that means.
The diff coverage is n/a.

@@              Coverage Diff               @@
##             branch-0.19    #7156   +/-   ##
==============================================
  Coverage               ?   82.22%           
==============================================
  Files                  ?      100           
  Lines                  ?    16969           
  Branches               ?        0           
==============================================
  Hits                   ?    13953           
  Misses                 ?     3016           
  Partials               ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc40c52...af6a966. Read the comment docs.

vuule

Good stuff. Got some questions/suggestions.

python/cudf/cudf/tests/test_avro.py

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

vuule · 2021-01-28T23:41:29Z

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

+    records = [
+        {"prop": avro_val},
+        {"prop": None},
+    ]


is the dataframe shape (1,2)?

expected and actual are the same shape. I don't know what shape that should be.

Should we also have some tests with a large number of rows?

We can test a large number of values,. It would be nice to have a test data generator. I see we're generating random values for fuzz testing. Are we able to do that in a deterministic manner so it can be also be used for unit tests?

IIRC the data generator optionally takes a seed value; that the output is deterministic for each seed. CC @galipremsagar for pointer to the generator + sample use.

Since we are discussing having large rows, I'd recommend staying in <30 rows range to not slow down things in pytests by a lot as that would slow down in gpu CI too. If there is a bug that only reproduces for a large column scenarion then we can widen the test coverage for large columns, else I think fuzz tests should take care of large rows testing. For using the dataset generator, here is how we can use it:

>>> import cudf >>> from cudf.tests.dataset_generator import rand_dataframe >>> rand_dataframe(dtypes_meta=[{"dtype": "int64", "null_frequency": 0.4, "cardinality": 10}], 100, seed=2) File "<stdin>", line 1 SyntaxError: positional argument follows keyword argument >>> rand_dataframe(dtypes_meta=[{"dtype": "int64", "null_frequency": 0.4, "cardinality": 10}], rows=100, seed=2) pyarrow.Table 0: int64 >>> cudf.DataFrame.from_arrow(rand_dataframe(dtypes_meta=[{"dtype": "int64", "null_frequency": 0.4, "cardinality": 10}], rows=100, seed=2)) 0 0 -1468954783236838137 1 <NA> 2 2200161065918338095 3 -1193091257902529461 4 -5448271019629827509 .. ... 95 <NA> 96 2200161065918338095 97 -8745117541724490168 98 <NA> 99 -4301277553722975852 [100 rows x 1 columns]

Alternatively, There is also an existing API that also returns deterministic data with the same seed values that is widely used across our pytests:
https://github.com/rapidsai/cudf/blob/branch-0.18/python/cudf/cudf/datasets.py#L60
This is much simpler to use and fits the use-case here.

Should we rather just change this test to be a list of values(cudf_val be length 5/10) instead of 1 value?

cwharris · 2021-02-02T19:54:08Z

Looks like the PR is failing due to mypy style checks unrelated to these changes. Can we ignore that?

galipremsagar · 2021-02-02T20:21:28Z

Looks like the PR is failing due to mypy style checks unrelated to these changes. Can we ignore that?

Fix incoming: #7279

vuule

Requesting changes based on the two unresolved comments.

kkraus14 · 2021-02-09T15:41:40Z

rerun tests

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

vuule · 2021-02-10T18:57:02Z

@cwharris should this PR close #6802?

kkraus14 · 2021-02-10T19:15:31Z

Retargeted to branch-0.19.

galipremsagar

Maybe better to add some PR description?

vuule · 2021-02-11T06:28:51Z

@gpucibot merge

cwharris added 4 commits January 11, 2021 17:26

tset avro nested dtype detection

0e7547b

realize tests are integration, not unit, rename file accordingly.

f412ab7

Merge branch 'branch-0.18' of github.com:rapidsai/cudf into avro-tests

2e47499

Merge branch 'branch-0.18' of github.com:rapidsai/cudf into avro-tests

8f1f842

cwharris marked this pull request as ready for review January 27, 2021 05:37

cwharris requested a review from a team as a code owner January 27, 2021 05:37

cwharris requested review from isVoid and brandon-b-miller January 27, 2021 05:37

cwharris added 4 - Needs cuIO Reviewer cuIO cuIO issue Python Affects Python cuDF API. non-breaking Non-breaking change labels Jan 27, 2021

fix styles

a63f0a5

cwharris requested review from vuule and removed request for brandon-b-miller January 28, 2021 22:24

vuule added the improvement Improvement / enhancement to an existing function label Jan 28, 2021

vuule requested changes Jan 28, 2021

View reviewed changes

vuule added the 0 - Waiting on Author Waiting for author to respond to review label Feb 2, 2021

cwharris requested a review from vuule February 2, 2021 19:23

cwharris mentioned this pull request Feb 2, 2021

cuio: reduce/improve kernel parms: avro #6399

Closed

vuule removed the 0 - Waiting on Author Waiting for author to respond to review label Feb 2, 2021

vuule requested changes Feb 3, 2021

View reviewed changes

address pr comments

c7d18b5

cwharris force-pushed the avro-tests branch from 14432d8 to c7d18b5 Compare February 6, 2021 01:58

harrism requested a review from vuule February 9, 2021 22:52

vuule reviewed Feb 9, 2021

View reviewed changes

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py Outdated Show resolved Hide resolved

vuule added the 0 - Waiting on Author Waiting for author to respond to review label Feb 10, 2021

add no-data, no-fields, and no-schema test cases.

af6a966

cwharris removed the 0 - Waiting on Author Waiting for author to respond to review label Feb 10, 2021

cwharris requested a review from vuule February 10, 2021 16:55

vuule approved these changes Feb 10, 2021

View reviewed changes

vuule removed the 4 - Needs cuIO Reviewer label Feb 10, 2021

kkraus14 changed the base branch from branch-0.18 to branch-0.19 February 10, 2021 19:15

galipremsagar reviewed Feb 10, 2021

View reviewed changes

cwharris requested a review from galipremsagar February 10, 2021 21:34

galipremsagar approved these changes Feb 11, 2021

View reviewed changes

galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs cuDF (Python) Reviewer labels Feb 11, 2021

rapids-bot bot merged commit aa72df7 into rapidsai:branch-0.19 Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avro reader integration tests #7156

avro reader integration tests #7156

cwharris commented Jan 15, 2021 •

edited

Loading

cwharris commented Jan 27, 2021

cwharris commented Jan 27, 2021

codecov bot commented Jan 27, 2021 •

edited

Loading

vuule left a comment

vuule Jan 28, 2021

cwharris Feb 2, 2021

vuule Feb 2, 2021

cwharris Feb 3, 2021

vuule Feb 3, 2021

galipremsagar Feb 4, 2021 •

edited

Loading

galipremsagar Feb 10, 2021

cwharris commented Feb 2, 2021

galipremsagar commented Feb 2, 2021

vuule left a comment

kkraus14 commented Feb 9, 2021

vuule commented Feb 10, 2021

kkraus14 commented Feb 10, 2021

galipremsagar left a comment

vuule commented Feb 11, 2021

avro reader integration tests #7156

avro reader integration tests #7156

Conversation

cwharris commented Jan 15, 2021 • edited Loading

cwharris commented Jan 27, 2021

cwharris commented Jan 27, 2021

codecov bot commented Jan 27, 2021 • edited Loading

Codecov Report

vuule left a comment

Choose a reason for hiding this comment

vuule Jan 28, 2021

Choose a reason for hiding this comment

cwharris Feb 2, 2021

Choose a reason for hiding this comment

vuule Feb 2, 2021

Choose a reason for hiding this comment

cwharris Feb 3, 2021

Choose a reason for hiding this comment

vuule Feb 3, 2021

Choose a reason for hiding this comment

galipremsagar Feb 4, 2021 • edited Loading

Choose a reason for hiding this comment

galipremsagar Feb 10, 2021

Choose a reason for hiding this comment

cwharris commented Feb 2, 2021

galipremsagar commented Feb 2, 2021

vuule left a comment

Choose a reason for hiding this comment

kkraus14 commented Feb 9, 2021

vuule commented Feb 10, 2021

kkraus14 commented Feb 10, 2021

galipremsagar left a comment

Choose a reason for hiding this comment

vuule commented Feb 11, 2021

cwharris commented Jan 15, 2021 •

edited

Loading

codecov bot commented Jan 27, 2021 •

edited

Loading

galipremsagar Feb 4, 2021 •

edited

Loading