[PERF] Harden GCP Retries #3253

samster25 · 2024-11-09T00:14:50Z

Introduces retries with exponential backoffs for GCS (default 5)
Introduces connection and read timeouts (default 30 seconds)
Introduces maximum connections for GCS (default 8/thread or 64)
introduces idle connection clean up (max of 70)

codspeed-hq · 2024-11-09T00:23:20Z

CodSpeed Performance Report

Merging #3253 will improve performances by 28.43%

_{Comparing sammy/gcp-retry (0cf8028) with main (e27e2f5)}

Summary

⚡ 1 improvements
✅ 16 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`sammy/gcp-retry`	Change
⚡	`test_show[100 Small Files]`	40.7 ms	31.7 ms	+28.43%

codecov · 2024-11-09T00:48:21Z

Codecov Report

Attention: Patch coverage is 8.64198% with 148 lines in your changes missing coverage. Please review.

Project coverage is 77.66%. Comparing base (e27e2f5) to head (0cf8028).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-io/src/google_cloud.rs	0.00%	84 Missing ⚠️
src/common/io-config/src/python.rs	0.00%	39 Missing ⚠️
src/daft-sql/src/modules/config.rs	0.00%	14 Missing ⚠️
src/common/io-config/src/gcs.rs	56.00%	11 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3253      +/-   ##
==========================================
- Coverage   77.80%   77.66%   -0.14%     
==========================================
  Files         645      645              
  Lines       79917    80056     +139     
==========================================
  Hits        62177    62177              
- Misses      17740    17879     +139

Files with missing lines	Coverage Δ
src/common/io-config/src/gcs.rs	`42.42% <56.00%> (-13.14%)`	⬇️
src/daft-sql/src/modules/config.rs	`1.83% <0.00%> (-0.07%)`	⬇️
src/common/io-config/src/python.rs	`50.48% <0.00%> (-3.38%)`	⬇️
src/daft-io/src/google_cloud.rs	`0.00% <0.00%> (ø)`

... and 3 files with indirect coverage changes

jaychia

LGTM

jaychia · 2024-11-09T01:07:47Z

src/daft-io/src/google_cloud.rs

-struct GCSClientWrapper(Client);
+struct GCSClientWrapper {
+    client: Client,
+    connection_pool_sema: Arc<Semaphore>,


So IIUC this semaphore is:

Acquired when we initiate a connection to GCS

Released when the stream for that connection is exhausted?

For quick operations such as heads and stuff I guess we release it right after the result is obtained, hence the _permit pattern.

Could we add some short docstring here too describing that?

jaychia · 2024-11-09T01:09:56Z

src/daft-sql/src/modules/config.rs

            Ok(IOConfig {
                gcs: GCSConfig {
                    project_id,
                    credentials: credentials.map(|s| s.into()),
                    token,
                    anonymous: anonymous.unwrap_or(default.anonymous),
+                    max_connections_per_io_thread: max_connections_per_io_thread


Suggested change

max_connections_per_io_thread: max_connections_per_io_thread

max_connections_per_io_thread

Weird that lint didn't catch this

the next line has an unwrap.

jaychia · 2024-11-09T01:23:48Z

src/daft-io/src/google_cloud.rs

+                .connect_timeout(Duration::from_millis(config.connect_timeout_ms))
+                .read_timeout(Duration::from_millis(config.read_timeout_ms))
+                .pool_idle_timeout(Duration::from_secs(60))
+                .pool_max_idle_per_host(70)


Why 70? How many connections does it create anyways

This is for idle connections for connection reuse. this is the default for many AWS SDKS

samster25 added 2 commits November 8, 2024 15:45

threaded in args except sema

38c012d

thread in idle connection, and permits

ddfa7fe

github-actions bot added the performance label Nov 9, 2024

samster25 added 3 commits November 8, 2024 16:16

update GCS pyi

b3a1608

update to be correct

ae896c7

handle error

d2299c5

clean up code block

50b8983

samster25 marked this pull request as ready for review November 9, 2024 00:44

samster25 requested a review from jaychia November 9, 2024 00:44

samster25 added 2 commits November 8, 2024 16:55

add tests for timeout for read and connect

7280c6c

fix read timeout test

f514574

jaychia approved these changes Nov 9, 2024

View reviewed changes

add doc string

0cf8028

samster25 enabled auto-merge (squash) November 9, 2024 02:04

samster25 merged commit 84e34d0 into main Nov 9, 2024
42 checks passed

samster25 deleted the sammy/gcp-retry branch November 9, 2024 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF] Harden GCP Retries #3253

[PERF] Harden GCP Retries #3253

samster25 commented Nov 9, 2024 •

edited

Loading

codspeed-hq bot commented Nov 9, 2024 •

edited

Loading

codecov bot commented Nov 9, 2024 •

edited

Loading

jaychia left a comment

jaychia Nov 9, 2024

jaychia Nov 9, 2024

samster25 Nov 9, 2024

jaychia Nov 9, 2024

samster25 Nov 9, 2024

	max_connections_per_io_thread: max_connections_per_io_thread
	max_connections_per_io_thread

[PERF] Harden GCP Retries #3253

[PERF] Harden GCP Retries #3253

Conversation

samster25 commented Nov 9, 2024 • edited Loading

codspeed-hq bot commented Nov 9, 2024 • edited Loading

CodSpeed Performance Report

Merging #3253 will improve performances by 28.43%

Summary

Benchmarks breakdown

codecov bot commented Nov 9, 2024 • edited Loading

Codecov Report

jaychia left a comment

Choose a reason for hiding this comment

jaychia Nov 9, 2024

Choose a reason for hiding this comment

jaychia Nov 9, 2024

Choose a reason for hiding this comment

samster25 Nov 9, 2024

Choose a reason for hiding this comment

jaychia Nov 9, 2024

Choose a reason for hiding this comment

samster25 Nov 9, 2024

Choose a reason for hiding this comment

samster25 commented Nov 9, 2024 •

edited

Loading

codspeed-hq bot commented Nov 9, 2024 •

edited

Loading

codecov bot commented Nov 9, 2024 •

edited

Loading