-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python]The initialization of the S3FileSystem takes a long time. How to reduce the time? #37136
Comments
Would you mind set log level and see the logging of s3? I don't know how time spends here, need more log to see why initialize is slow |
I tried it, and it seems that the retry connection timed out: 169.254.169.254:80.
|
Seems that "connect timeout" is related to network environment...Maybe you can enlarge timeout here... |
In the S3FileSystem interface, there is no timeout option. How to do this? |
/// Options for the S3FileSystem implementation.
struct ARROW_EXPORT S3Options {
/// \brief AWS region to connect to.
///
/// If unset, the AWS SDK will choose a default value. The exact algorithm
/// depends on the SDK version. Before 1.8, the default is hardcoded
/// to "us-east-1". Since 1.8, several heuristics are used to determine
/// the region (environment variables, configuration profile, EC2 metadata
/// server).
std::string region;
/// \brief Socket connection timeout, in seconds
///
/// If negative, the AWS SDK default value is used (typically 1 second).
double connect_timeout = -1; I'm not familiar with Python code, maybe you could search S3Options. Maybe you can take a look at: https://github.com/apache/arrow/blob/main/python/pyarrow/_s3fs.pyx#L202 By the way, maybe you can switch to a better network env... |
It's my oversight, I'm using arrow 9.0.0, and there's no timeout option yet. |
No idea how to solve it in 9.0.0, maybe you can upgrade or patch the timeout option to your code :-) |
In addition, I guess it's not the network environment, because reading obs' parquet is successful. There is a high probability that the OBS service provider is not AWS. |
Thank you! At least I found out why. I'll see if there's another way to fix it. |
Maybe I've heard of this problem before, would you also like to take a look at: #36587 (comment) ? You can also check the object store provides configurations for help. |
It's amazing that this environment variable works.
Thank you very much! |
Close as solved |
Seems it would query endpoint for that . Also see aws/aws-cli#5623 |
ok! It seems that explicitly specifying the region can also circumvent the problem. |
Describe the usage question you have. Please include as many useful details as possible.
The initialization of the S3FileSystem takes a long time. How to reduce the time?
The text was updated successfully, but these errors were encountered: