Added decorator for retry_request and handle err 429 #1491

a24lorie · 2024-01-07T13:57:48Z

Added test for pyarrow integration using the DBFS implementation

a24lorie · 2024-01-07T14:06:26Z

This PR is a quick fix for handling HTTP error 429 with DBFS filesystem implementation by adding a decorator to the _send_to_api function. It does not change the base code used to interact with the DBFS. I have also added some integration tests that use the pyarrow implementation to read and write parquet files from DBFS.
The current patch solves the issue #1488 reported but the current code still lacks the capability to read from a partitioned directory structure using the hive partition method.

Commented some failed unit test Added cassettes for unit testing dfbs methods

fsspec/implementations/dbfs.py

martindurant · 2024-01-08T14:22:16Z

fsspec/implementations/tests/data/diabetes.csv

@@ -0,0 +1,769 @@
+Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
+6,148,72,35,0,33.6,0.627,50,1


Why include this big file? A small sample in code would be fine for the sake of testing.

I was trying to replicate the error I was having in a real case scenario to stress the BDFS API writing concurrently many records, doing this triggered the 429 error, I could remove the file and only put a couple of records but it will not replicate the initial issue

How about

import concurrent.futures fs = ... ex = concurrent.futures.ThreadPoolExecutor() ex.map(lambda i: fs.pipe(f"/path/{i}", b"data"), range(100))

You could use a mock to test that the retry is actually getting called; OR allow logging during retry (not a bad idea anyway) and check for logging statements coming out.

Actually, we're using VCR, so you'll never get a response except the one you go the original time. I have no idea, then, how to get back a 429 without mocking

fsspec/implementations/tests/test_dbfs.py

martindurant · 2024-01-08T14:22:50Z

fsspec/implementations/tests/test_dbfs.py

+def test_dbfs_write_pyarrow_non_partitioned(dbfsFS):
+    import pandas as pd
+    import pyarrow as pa
+    import pyarrow.parquet as pq


This will fail if these are missing in the environment. Can we do without them?

I have commented them, is it better to remove them?

fsspec/implementations/dbfs.py

martindurant · 2024-01-08T14:25:07Z

fsspec/implementations/dbfs.py

+                # Request Timeout
+                408,
+                # Too Many Requests
+                429,
            ]
            errs += [str(e) for e in errs]
            if type(exception) is requests.exceptions.HTTPError:


isinstance is better

fsspec/implementations/tests/test_dbfs.py

Removed unused tests Other minor format fixes

fsspec/implementations/tests/test_dbfs.py

fsspec/implementations/dbfs.py

fsspec/implementations/tests/test_dbfs.py

Fixed broken vcr tests due to comments Updated vcr cassettes

AlfredoLorie · 2024-01-26T18:57:42Z

@martindurant Is there anything else required to merge this feature?

martindurant · 2024-01-27T01:20:10Z

fsspec/implementations/dbfs.py

        self.session = requests.Session()
+        self.retries = Retry(total=10,


In the future we'll make this configurable, but I won't hold the PR any more

Added decorator for retry_request and handle err 429

88adbbb

Added test for pyarrow integration using the DBFS implementation

Updated minor changes to the retry_request decorator

05dabc2

Commented some failed unit test Added cassettes for unit testing dfbs methods

martindurant reviewed Jan 8, 2024

View reviewed changes

AlfredoLorie added 2 commits January 10, 2024 14:52

Added request python retry handler for managing the http errors

c5444b8

Removed unused tests Other minor format fixes

Some code cleaning

c5aa4bd

martindurant reviewed Jan 12, 2024

View reviewed changes

fsspec/implementations/tests/test_dbfs.py Outdated Show resolved Hide resolved

martindurant reviewed Jan 12, 2024

View reviewed changes

fsspec/implementations/dbfs.py Outdated Show resolved Hide resolved

martindurant reviewed Jan 16, 2024

View reviewed changes

fsspec/implementations/tests/test_dbfs.py Outdated Show resolved Hide resolved

Fixed pyarrow imports with pytest.importskip

cbec790

Fixed broken vcr tests due to comments Updated vcr cassettes

martindurant reviewed Jan 27, 2024

View reviewed changes

lint

7b250c6

martindurant merged commit a408121 into fsspec:master Jan 27, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added decorator for retry_request and handle err 429 #1491

Added decorator for retry_request and handle err 429 #1491

a24lorie commented Jan 7, 2024

a24lorie commented Jan 7, 2024

martindurant Jan 8, 2024

a24lorie Jan 8, 2024

martindurant Jan 8, 2024

martindurant Jan 11, 2024

martindurant Jan 8, 2024

a24lorie Jan 8, 2024

martindurant Jan 8, 2024

martindurant Jan 8, 2024

AlfredoLorie commented Jan 26, 2024 •

edited

Loading

martindurant Jan 27, 2024

		@@ -0,0 +1,769 @@
		Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
		6,148,72,35,0,33.6,0.627,50,1

		self.session = requests.Session()
		self.retries = Retry(total=10,

Added decorator for retry_request and handle err 429 #1491

Added decorator for retry_request and handle err 429 #1491

Conversation

a24lorie commented Jan 7, 2024

a24lorie commented Jan 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlfredoLorie commented Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

AlfredoLorie commented Jan 26, 2024 •

edited

Loading