Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(udf): udf call with empty table and batch size #3604

Merged
merged 2 commits into from
Dec 19, 2024

Conversation

kevinzwang
Copy link
Member

No description provided.

@kevinzwang kevinzwang requested a review from jaychia December 18, 2024 23:21
@github-actions github-actions bot added the fix label Dec 18, 2024
daft/udf.py Outdated
# Don't call UDF with empty tables
# TODO: fix assumption that all input lengths are the same
if len(evaluated_expressions[0]) == 0:
return Series.empty(return_dtype, name)._series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically it's a little wrong to do this, since a UDF might want to have some special logic for when the incoming data is empty... For example aggregation UDFs.

Any thoughts on how this might affect usage of UDFs in map_groups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're right, although I don't think it affects map_groups since each group would have at least one element, otherwise it would not be included. However this is probably wrong behavior if we ever build some sort of global aggregation UDF.

I will handle the logic in another manner then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaychia went with a new solution, PTAL!

Copy link

codspeed-hq bot commented Dec 18, 2024

CodSpeed Performance Report

Merging #3604 will degrade performances by 14.42%

Comparing kevin/empty-udf-call (3c50b57) with main (ca4d3f7)

Summary

❌ 1 regressions
✅ 26 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main kevin/empty-udf-call Change
test_iter_rows_first_row[100 Small Files] 169.1 ms 197.6 ms -14.42%

@kevinzwang kevinzwang requested a review from jaychia December 18, 2024 23:44
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.12%. Comparing base (ca4d3f7) to head (3c50b57).
Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3604      +/-   ##
==========================================
- Coverage   77.80%   77.12%   -0.68%     
==========================================
  Files         718      718              
  Lines       88176    89686    +1510     
==========================================
+ Hits        68607    69174     +567     
- Misses      19569    20512     +943     
Files with missing lines Coverage Δ
daft/udf.py 95.58% <100.00%> (ø)

... and 13 files with indirect coverage changes

@kevinzwang kevinzwang merged commit c30f6a8 into main Dec 19, 2024
42 of 43 checks passed
@kevinzwang kevinzwang deleted the kevin/empty-udf-call branch December 19, 2024 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants