Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint url error #4811

Open
nttg8100 opened this issue Mar 12, 2024 · 5 comments
Open

Endpoint url error #4811

nttg8100 opened this issue Mar 12, 2024 · 5 comments

Comments

@nttg8100
Copy link

nttg8100 commented Mar 12, 2024

Bug report

Nextflow file main.nf

#!/usr/bin/env nextflow
params.values = Channel.from(1)
process echoValue {
        publishDir "${params.outdir}/echoValue/", mode: 'copy'
        input:
        val value 
        output:
        path "*_echoValue.txt"
        script:
        """
        echo "Value: $value" > ${value}_echoValue.txt
        """
    }
workflow {
    echoValue(params.values)
}

nextflow.conf

aws {
    accessKey = "***"
    secretKey = "***"
    client {
        endpoint = 'https://s3-hcm-r1.s3cloud.vn'
        s3PathStyleAccess = true
    }
}

command

nextflow run main.nf --outdir s3://project_1

Versions:
nextflow: 23.10.1
nf-amazon:2.1.4

Expected behavior and actual behavior

I expected that the file will be uploaded to the s3 bucket similar to the minio s3 server that I tested successfully on minio image with endpoint="http://localhost:9000".

I tested with the aws cli command that showed I have permission to put objects.
The error showed that it failed to parse the endpoint with no additional information.

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.hcm.amazonaws.com

I think it is failed because the nextflow plugin nf-amazon 2.1.4 or its dependencies SDK provided by the AWS failed to parsing the endpoint.

Steps to reproduce the problem

I can not provide the endpoint url and its credentials with specific patterns like above.

Program output

N E X T F L O W  ~  version 23.10.1
Launching `tmp/main.nf` [jovial_boltzmann] DSL2 - revision: 7325aaaf63
executor >  local (1)
[98/0715e2] process > echoValue (1) [  0%] 0 of 1
ERROR ~ Error executing process > 'echoValue (1)'

Caused by:
  s3.hcm.amazonaws.com


 -- Check '.nextflow.log' file for details

Environment

  • Nextflow version: 23.10.1
  • Java version: 11.0.13 2021-10-19
  • Operating system: macOS
  • Bash version: zsh 5.9 (x86_64-apple-darwin23.0)

Additional context

Is there any documents for recompiling the plugin nf-amazon ? I tried to compile the nextflow after modification of the nf-amazon plugin but it created a build folder structure that is quite different from the [email protected] downloaded by nextflow. I cloned the nextflow repo using tag v23.10.1, then ran

make compile
@rjb32
Copy link

rjb32 commented Jun 19, 2024

Can be reproduced with a custom S3 endpoint hosted on Scaleway.

  • Nextflow version: 24.04.2.5914
  • Java version: 17.0.11 2024-04-16
  • Operating system: Linux Ubuntu 22.04.4 LTS
aws {
    accessKey = 'ACCESSKEY'
    secretKey = 'SECRETKEY'
    region = 'fr-par'
    client {
        endpoint = 'https://s3.fr-par.scw.cloud'
        protocol = 'https'
        s3PathStyleAccess = true
    }
}

@rjb32
Copy link

rjb32 commented Jun 20, 2024

This is very important to be able to use private S3 implementations in Europe when you are analyzing data from hospitals that are forbidden to use AWS services for GDPR and regulatory reasons.

@rjb32
Copy link

rjb32 commented Jun 20, 2024

The issue is that at some point something adds back the "amazonaws.com" suffix instead of using the custom S3 endpoint URI provided.

./launch.sh -trace nextflow run ../../hello.nf -work-dir s3://turing/test

The stack trace is as follows:

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.fr-par.amazonaws.com
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505)
	at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:423)
	at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6639)
	at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1892)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1852)
	at nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:209)
	at nextflow.cloud.aws.nio.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:492)
	at java.base/java.nio.file.Files.createDirectory(Files.java:700)
	at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:807)
	at java.base/java.nio.file.Files.createDirectories(Files.java:753)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.extension.FilesEx.mkdirs(FilesEx.groovy:493)
	at nextflow.Session.init(Session.groovy:406)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:129)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
	at nextflow.cli.Launcher.run(Launcher.groovy:503)
	at nextflow.cli.Launcher.main(Launcher.groovy:657)
Caused by: java.net.UnknownHostException: s3.fr-par.amazonaws.com
	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:801)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1385)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy27.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	... 25 common frames omitted

We can look at what's happening in nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:209) which is at the boundary of nextflow package and going into AWS SDK.

	public PutObjectResult putObject(String bucket, String keyName, InputStream inputStream, ObjectMetadata metadata, List<Tag> tags, String contentType) {
		PutObjectRequest req = new PutObjectRequest(bucket, keyName, inputStream, metadata);
		if( cannedAcl != null ) {
			req.withCannedAcl(cannedAcl);
		}
		if( tags != null && tags.size()>0 ) {
			req.setTagging(new ObjectTagging(tags));
		}
		if( kmsKeyId != null ) {
			req.withSSEAwsKeyManagementParams( new SSEAwsKeyManagementParams(kmsKeyId) );
		}
		if( storageEncryption!=null ) {
			metadata.setSSEAlgorithm(storageEncryption.toString());
		}
		if( contentType!=null ) {
			metadata.setContentType(contentType);
		}
		if( log.isTraceEnabled() ) {
			log.trace("S3 PutObject request {}", req);
		}
		return client.putObject(req);
	}

The exception is raised at the last line, by the call to the AWS SDK client.putObject(req).
I did a little experiment to determine if the S3 client configuration is already wrong at this point, by trying a few calls to the S3 SDK:

	public PutObjectResult putObject(String bucket, String keyName, InputStream inputStream, ObjectMetadata metadata, List<Tag> tags, String contentType) {
		PutObjectRequest req = new PutObjectRequest(bucket, keyName, inputStream, metadata);
		if( cannedAcl != null ) {
			req.withCannedAcl(cannedAcl);
		}
		if( tags != null && tags.size()>0 ) {
			req.setTagging(new ObjectTagging(tags));
		}
		if( kmsKeyId != null ) {
			req.withSSEAwsKeyManagementParams( new SSEAwsKeyManagementParams(kmsKeyId) );
		}
		if( storageEncryption!=null ) {
			metadata.setSSEAlgorithm(storageEncryption.toString());
		}
		if( contentType!=null ) {
			metadata.setContentType(contentType);
		}
		if( log.isTraceEnabled() ) {
			log.trace("S3 PutObject request {}", req);
		}

                for (Bucket b : client.listBuckets()) {
                    System.out.println("bucket "+b.getName());
                }

		return client.putObject(req);
	}

We can see that the accessible buckets are in fact correctly listed on standard output:

bucket lucl
bucket martina
bucket maxime
bucket turing

So we can conclude that the AWS S3 client can actually access the custom S3 endpoint and gets correct answers in principle, at least enough to be able to list buckets. How strange!
Let's try now to list objects inside a bucket, at the same point in the code.

for (S3ObjectSummary obj : client.listObjects("turing", "db").getObjectSummaries()) {
    System.out.println("object "+obj.getKey());
}

This time it fails and we get the exception raised inside AWS SDK coming from the listObjects method.

com.amazonaws.SdkClientException: Unable to execute HTTP request: s3.fr-par.amazonaws.com
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1219)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1165)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:950)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:915)
	at nextflow.cloud.aws.nio.S3Client.putObject(S3Client.java:210)
	at nextflow.cloud.aws.nio.S3FileSystemProvider.createDirectory(S3FileSystemProvider.java:492)
	at java.base/java.nio.file.Files.createDirectory(Files.java:700)
	at java.base/java.nio.file.Files.createAndCheckIsDirectory(Files.java:807)
	at java.base/java.nio.file.Files.createDirectories(Files.java:753)
	at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
	at nextflow.extension.FilesEx.mkdirs(FilesEx.groovy:493)
	at nextflow.Session.init(Session.groovy:406)
	at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:129)
	at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
	at nextflow.cli.Launcher.run(Launcher.groovy:503)
	at nextflow.cli.Launcher.main(Launcher.groovy:657)
Caused by: java.net.UnknownHostException: s3.fr-par.amazonaws.com
	at java.base/java.net.InetAddress$CachedAddresses.get(InetAddress.java:801)
	at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1533)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1385)
	at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1306)
	at com.amazonaws.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:27)
	at com.amazonaws.http.DelegatingDnsResolver.resolve(DelegatingDnsResolver.java:38)
	at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:112)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
	at com.amazonaws.http.conn.$Proxy27.connect(Unknown Source)
	at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
	at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1346)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	... 23 common frames omitted

Conclusion

my guess at this point is that it is a bug inside AWS SDK because the S3 client appears to be well configured, with the right endpoint URI and works for just listing the buckets and for any query that does not attempt to read or write objects inside buckets. Some piece inside AWS SDK must overwrite parts of the endpoint URI for some reason.

Why not AWS SDK > 2?

Looking at the Gradle dependencies, question: is there a particular reason why Nextflow still uses AWS SDK 1.12.70 although it is clearly said that it is deprecated and we are at AWS SDK > 2 now?

@bentsherman
Copy link
Member

@rjb32 thanks for the triage. SDK v2 is on our roadmap but we just haven't gotten to it yet. It is not a trivial change.

@bentsherman
Copy link
Member

See #4741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants