[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #44834

steveloughran · 2024-01-22T16:00:00Z

What changes were proposed in this pull request?

Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null

Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0.

That is: if fs.s3a.endpoint is unset it will stay unset.

The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints

HADOOP-18975. AWS SDK v2: extend support for FIPS endpoints hadoop#6277

The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set.

Why are the changes needed?

AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to
be set if the region (fs.s3a.region) value is set. This means the spark code to
fix an endpoint is not only un-needed, it causes problems when trying to use specific
storage options (S3 Express) or security options (FIPS)

Does this PR introduce any user-facing change?

Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good.

How was this patch tested?

Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy.

Was this patch authored or co-authored using generative AI tooling?

No

### What changes were proposed in this pull request? Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0. That is: if fs.s3a.endpoint is unset it will stay unset. The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set. Change-Id: I59c51db8b8280a907fdd11131e527e7014cdefc3

dongjoon-hyun

Thank you, @steveloughran .

+1, LGTM (Pending CIs).

dongjoon-hyun

Is HADOOP-18975 still Pending in Hadoop 3.5.0 instead of 3.4.0?

steveloughran · 2024-01-23T12:01:32Z

fips support is in 3.4.1; i've just cherrypicked a chain of commits from the last week into branch-3.4, but not pushing for the 3.4.0 RC to be blocked on them. if the rc fails for other reasons I will cherrypick there, but otherwise wait for a 3.4.1

Hash	Date	Commit message
19b9e6a97b8f	2023-12-12 15:15:32 +0000	HADOOP-19008. S3A: update aws-sdk version to 2.21.41 (#6334)
2f1e1558b6fc	2024-01-11 17:13:31 +0000	HADOOP-19004. S3A: Support Authentication through HttpSigner API (#6324)
36198b5edf5b	2024-01-16 14:14:03 +0000	HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions (#6425)
d37885379009	2024-01-16 14:16:12 +0000	HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints (#6277)
7b1570e2f15d	2024-01-16 17:06:28 -0600	HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool. (#6372)
eeb657e85f3f	2024-01-17 18:34:14 +0000	HADOOP-19033. S3A: disable checksums when fs.s3a.checksum.validation = false (#6441)

HADOOP-19033 is a performance regression, but HADOOP-19027 and input stream resilience worries me. We will need more stack traces from the wild to be able to complete the resilience there as the new sdk stack is raising different failures and we need to see them.

I also want to get deeper into the sdk internals as it looks like rather than a blind "retry on IOE" class we could be a bit more specific and have some things failfast (UnknownHostException etc). But I'm not sure if the SDK lets us be that sophisticated policy-wise. And we cannot turn off its retries unless/until we move off the sdk transfer manager for multipart copy operations. Implement that ourselves and we can tell the sdk to never retry -we can take over that. Tempting

anyway, lets get 3.4.0 out and see how complains about what.

dongjoon-hyun · 2024-01-23T16:11:32Z

Got it~

but otherwise wait for a 3.4.1

steveloughran · 2024-01-23T16:15:25Z

oh, @mukund-thakur has asked how to test that things arent being passed on. good point.

really one of the tests i've cleaned up should make sure that the value isn't set...

steveloughran · 2024-01-24T17:34:43Z

the test failure was from kinesis. is this expected? or has removing this region related code broken it? I don't think it should as we are setting fs.s3a. options -nothing kinesis will be picking up.

dongjoon-hyun · 2024-01-24T22:15:22Z

Please re-trigger the failed streaming test pipeline. You can do that in your CI.

https://github.com/steveloughran/spark/runs/20734856528

dongjoon-hyun

To the reviewers, we are waiting for the official Apache Hadoop 3.4.x which is aligned with this PR.

steveloughran · 2024-01-25T22:25:29Z

you don't need to wait for it; 3.3.2+ shouldn't need the fixup either

dongjoon-hyun · 2024-01-25T23:32:32Z

you don't need to wait for it; 3.3.2+ shouldn't need the fixup either

Ya, but in the same way, the existing Spark code base doesn't block any Hadoop feature neither. So, we don't need this change yet.

steveloughran · 2024-01-27T15:46:55Z

makes sense.

shameersss1

@steveloughran - Could you please change the PR title to SPARK-46793. Currently it is being referenced in the original JIra

dongjoon-hyun · 2024-02-20T16:08:09Z

Oh, a nice catch. +1 for @shameersss1 's comment.

dongjoon-hyun · 2024-02-20T16:08:39Z

BTW, is there any news for Apache Hadoop 3.4.0 release?

dongjoon-hyun · 2024-02-21T06:27:23Z

Oh, my bad. I was confused that the JIRA ID is already fixed here.

Let me revert this and make a new PR with new JIRA and @steveloughran 's authorship.

Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0. That is: if fs.s3a.endpoint is unset it will stay unset. The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints - apache/hadoop#6277 The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set. AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to be set if the region (fs.s3a.region) value is set. This means the spark code to fix an endpoint is not only un-needed, it causes problems when trying to use specific storage options (S3 Express) or security options (FIPS) Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good. Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy. No Closes apache#44834 from steveloughran/SPARK-46793-revert-region-fixup-SPARK-35878. Authored-by: Steve Loughran <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

github-actions bot added the CORE label Jan 22, 2024

dongjoon-hyun approved these changes Jan 22, 2024

View reviewed changes

dongjoon-hyun reviewed Jan 22, 2024

View reviewed changes

dongjoon-hyun marked this pull request as draft January 22, 2024 17:15

steveloughran marked this pull request as ready for review January 24, 2024 17:32

steveloughran changed the title ~~[WIP][SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878~~ [SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 Jan 25, 2024

dongjoon-hyun reviewed Jan 25, 2024

View reviewed changes

shameersss1 approved these changes Feb 20, 2024

View reviewed changes

dongjoon-hyun closed this in 36f199d Feb 21, 2024

dongjoon-hyun mentioned this pull request Feb 21, 2024

[SPARK-47113][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #45193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #44834

[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #44834

steveloughran commented Jan 22, 2024 •

edited by dongjoon-hyun

Loading

dongjoon-hyun left a comment

dongjoon-hyun left a comment

steveloughran commented Jan 23, 2024

dongjoon-hyun commented Jan 23, 2024

steveloughran commented Jan 23, 2024

steveloughran commented Jan 24, 2024

dongjoon-hyun commented Jan 24, 2024

dongjoon-hyun left a comment

steveloughran commented Jan 25, 2024

dongjoon-hyun commented Jan 25, 2024

steveloughran commented Jan 27, 2024

shameersss1 left a comment

dongjoon-hyun commented Feb 20, 2024

dongjoon-hyun commented Feb 20, 2024

dongjoon-hyun commented Feb 21, 2024

[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #44834

[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #44834

Conversation

steveloughran commented Jan 22, 2024 • edited by dongjoon-hyun Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dongjoon-hyun left a comment

Choose a reason for hiding this comment

dongjoon-hyun left a comment

Choose a reason for hiding this comment

steveloughran commented Jan 23, 2024

dongjoon-hyun commented Jan 23, 2024

steveloughran commented Jan 23, 2024

steveloughran commented Jan 24, 2024

dongjoon-hyun commented Jan 24, 2024

dongjoon-hyun left a comment

Choose a reason for hiding this comment

steveloughran commented Jan 25, 2024

dongjoon-hyun commented Jan 25, 2024

steveloughran commented Jan 27, 2024

shameersss1 left a comment

Choose a reason for hiding this comment

dongjoon-hyun commented Feb 20, 2024

dongjoon-hyun commented Feb 20, 2024

dongjoon-hyun commented Feb 21, 2024

steveloughran commented Jan 22, 2024 •

edited by dongjoon-hyun

Loading