Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHORE] Explain block_on function in common-runtime #3442

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Nov 27, 2024

Addresses: #3435

@colin-ho
Copy link
Contributor Author

@andrewgazelka Lmk if it makes sense

Copy link
Contributor

it makes sense. so this function is kinda jank then? cause the sync function should probably be async in that situation? since trend seems we are moving to asyncland except on top level might wanna document that this is kinda a workaround to use this then and maybe at some point we can deprecate it?

Copy link

codspeed-hq bot commented Nov 27, 2024

CodSpeed Performance Report

Merging #3442 will degrade performances by 29.99%

Comparing colin/explain-runtime-block-on (42e43d6) with main (2db1233)

Summary

❌ 2 regressions
✅ 15 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main colin/explain-runtime-block-on Change
test_iter_rows_first_row[100 Small Files] 236.1 ms 337.2 ms -29.99%
test_show[100 Small Files] 23.2 ms 31.3 ms -25.68%

@colin-ho
Copy link
Contributor Author

it makes sense. so this function is kinda jank then? cause the sync function should probably be async in that situation? since trend seems we are moving to asyncland except on top level might wanna document that this is kinda a workaround to use this then and maybe at some point we can deprecate it?

Just for more context, the main issue is calling expression evaluation trait methods which are sync.

#[typetag::serde(tag = "type")]
pub trait ScalarUDF: Send + Sync + std::fmt::Debug {
    fn as_any(&self) -> &dyn Any;
    fn name(&self) -> &'static str;
    fn evaluate(&self, inputs: &[Series]) -> DaftResult<Series>;
    fn to_field(&self, inputs: &[ExprRef], schema: &Schema) -> DaftResult<Field>;
}

The url-download expression implements ScalarUDF, and in the evaluate method it needs to call some async code, i.e. to download from the url. But the evaluate trait method is sync, and it is called from the executor, which is in a tokio runtime context.

So we need a way run the async url download code, thus we came up with the block_on method.

Of course, ideally we just want to somehow do a url_download.await from the executor.

@colin-ho colin-ho enabled auto-merge (squash) November 27, 2024 05:47
@colin-ho colin-ho merged commit e89c9f5 into main Nov 27, 2024
41 of 42 checks passed
@colin-ho colin-ho deleted the colin/explain-runtime-block-on branch November 27, 2024 05:57
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.35%. Comparing base (2db1233) to head (42e43d6).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3442   +/-   ##
=======================================
  Coverage   77.35%   77.35%           
=======================================
  Files         684      684           
  Lines       83637    83639    +2     
=======================================
+ Hits        64695    64698    +3     
+ Misses      18942    18941    -1     
Files with missing lines Coverage Δ
src/common/runtime/src/lib.rs 90.78% <ø> (ø)

... and 4 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants