Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Harden GCP Retries #3253

Merged
merged 9 commits into from
Nov 9, 2024
Merged

[PERF] Harden GCP Retries #3253

merged 9 commits into from
Nov 9, 2024

Conversation

samster25
Copy link
Member

@samster25 samster25 commented Nov 9, 2024

  • Introduces retries with exponential backoffs for GCS (default 5)
  • Introduces connection and read timeouts (default 30 seconds)
  • Introduces maximum connections for GCS (default 8/thread or 64)
  • introduces idle connection clean up (max of 70)

Copy link

codspeed-hq bot commented Nov 9, 2024

CodSpeed Performance Report

Merging #3253 will improve performances by 28.43%

Comparing sammy/gcp-retry (0cf8028) with main (e27e2f5)

Summary

⚡ 1 improvements
✅ 16 untouched benchmarks

Benchmarks breakdown

Benchmark main sammy/gcp-retry Change
test_show[100 Small Files] 40.7 ms 31.7 ms +28.43%

@samster25 samster25 marked this pull request as ready for review November 9, 2024 00:44
@samster25 samster25 requested a review from jaychia November 9, 2024 00:44
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 8.64198% with 148 lines in your changes missing coverage. Please review.

Project coverage is 77.66%. Comparing base (e27e2f5) to head (0cf8028).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-io/src/google_cloud.rs 0.00% 84 Missing ⚠️
src/common/io-config/src/python.rs 0.00% 39 Missing ⚠️
src/daft-sql/src/modules/config.rs 0.00% 14 Missing ⚠️
src/common/io-config/src/gcs.rs 56.00% 11 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3253      +/-   ##
==========================================
- Coverage   77.80%   77.66%   -0.14%     
==========================================
  Files         645      645              
  Lines       79917    80056     +139     
==========================================
  Hits        62177    62177              
- Misses      17740    17879     +139     
Files with missing lines Coverage Δ
src/common/io-config/src/gcs.rs 42.42% <56.00%> (-13.14%) ⬇️
src/daft-sql/src/modules/config.rs 1.83% <0.00%> (-0.07%) ⬇️
src/common/io-config/src/python.rs 50.48% <0.00%> (-3.38%) ⬇️
src/daft-io/src/google_cloud.rs 0.00% <0.00%> (ø)

... and 3 files with indirect coverage changes

Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

struct GCSClientWrapper(Client);
struct GCSClientWrapper {
client: Client,
connection_pool_sema: Arc<Semaphore>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So IIUC this semaphore is:

  1. Acquired when we initiate a connection to GCS
  2. Released when the stream for that connection is exhausted?

For quick operations such as heads and stuff I guess we release it right after the result is obtained, hence the _permit pattern.

Could we add some short docstring here too describing that?

Ok(IOConfig {
gcs: GCSConfig {
project_id,
credentials: credentials.map(|s| s.into()),
token,
anonymous: anonymous.unwrap_or(default.anonymous),
max_connections_per_io_thread: max_connections_per_io_thread
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_connections_per_io_thread: max_connections_per_io_thread
max_connections_per_io_thread

Weird that lint didn't catch this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the next line has an unwrap.

.connect_timeout(Duration::from_millis(config.connect_timeout_ms))
.read_timeout(Duration::from_millis(config.read_timeout_ms))
.pool_idle_timeout(Duration::from_secs(60))
.pool_max_idle_per_host(70)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 70? How many connections does it create anyways

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for idle connections for connection reuse. this is the default for many AWS SDKS

@samster25 samster25 enabled auto-merge (squash) November 9, 2024 02:04
@samster25 samster25 merged commit 84e34d0 into main Nov 9, 2024
42 checks passed
@samster25 samster25 deleted the sammy/gcp-retry branch November 9, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants