Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ADLS Gen2: SDK is unable to read files containing % and # in the file name #18469

Closed
3 tasks
rajeevsinghAD opened this issue Jan 6, 2021 · 2 comments · Fixed by #18542
Closed
3 tasks
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Data Lake Storage Gen2 question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@rajeevsinghAD
Copy link

Describe the bug
We can create files with % and # in the file names, however, when we try to read the file using SDK, we get java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "y%.

Exception or Stack Trace

java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "y%"
	at java.net.URLDecoder.decode(Unknown Source)
	at com.azure.storage.common.Utility.decode(Utility.java:99)
	at com.azure.storage.common.Utility.urlDecode(Utility.java:90)
	at com.azure.storage.blob.BlobUrlParts.setBlobName(BlobUrlParts.java:150)
	at com.azure.storage.blob.BlobUrlParts.parseNonIpUrl(BlobUrlParts.java:458)
	at com.azure.storage.blob.BlobUrlParts.parse(BlobUrlParts.java:374)
	at com.azure.storage.blob.specialized.SpecializedBlobClientBuilder.endpoint(SpecializedBlobClientBuilder.java:302)
	at com.azure.storage.file.datalake.DataLakeDirectoryAsyncClient.prepareBuilderAppendPath(DataLakeDirectoryAsyncClient.java:499)
	at com.azure.storage.file.datalake.DataLakeDirectoryAsyncClient.getFileAsyncClient(DataLakeDirectoryAsyncClient.java:162)
	at com.azure.storage.file.datalake.DataLakeDirectoryClient.getFileClient(DataLakeDirectoryClient.java:139)

To Reproduce
Run the attached sample in code sinppet section

Code Snippet

import java.io.File;
import java.net.URLEncoder;
import java.nio.file.OpenOption;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
import java.util.UUID;

import com.azure.core.http.okhttp.OkHttpAsyncHttpClientBuilder;
import com.azure.identity.ClientSecretCredential;
import com.azure.identity.ClientSecretCredentialBuilder;
import com.azure.storage.common.ParallelTransferOptions;
import com.azure.storage.file.datalake.DataLakeDirectoryClient;
import com.azure.storage.file.datalake.DataLakeFileClient;
import com.azure.storage.file.datalake.DataLakeFileSystemClient;
import com.azure.storage.file.datalake.DataLakeServiceClient;
import com.azure.storage.file.datalake.DataLakeServiceClientBuilder;
import com.azure.storage.file.datalake.models.DownloadRetryOptions;

public class ADLS_Gen2_SDK_Test {

	private final static String accountName = "";
	private final static String clientID = "";
	private final static String clientSecret = "";
	private final static String tenantID = "";
	private final static String fileSystemName = "";

	private DataLakeServiceClient serviceClient;

	public static void main(String[] args) {

		ADLS_Gen2_SDK_Test adlsObj = new ADLS_Gen2_SDK_Test();

		adlsObj.establishADLSG2Connection();

		adlsObj.downloadFile();
	}

	private void establishADLSG2Connection(){
		ClientSecretCredential servicePrincipalCreds  = new ClientSecretCredentialBuilder()
				.clientId(clientID)
				.clientSecret(clientSecret)
				.tenantId(tenantID)
				.build();

		serviceClient = new DataLakeServiceClientBuilder()
				.endpoint("https://"+accountName+".dfs.core.windows.net")
				.credential(servicePrincipalCreds)
				.httpClient(new OkHttpAsyncHttpClientBuilder().build())
				.buildClient();
	}

	private void downloadFile(){
		try {			
			DataLakeFileSystemClient fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
			DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient("/");
			
			DataLakeFileClient dataLakeFileClient = directoryClient.getFileClient("abcd_%y%m_%d.csv");
			
			String stagingName = UUID.randomUUID().toString().replace("-", "_");
			File insertStagingFile = File.createTempFile("Riyaz" + stagingName, ".adlg2", new File("E:\\Riyaz"));
			
			String localDownloadedFilePath = insertStagingFile.getAbsolutePath();

			ParallelTransferOptions parallelTransferOptions = new ParallelTransferOptions().setBlockSizeLong(16 * 1024 * 1024L);
			DownloadRetryOptions downloadRetryOptions = new DownloadRetryOptions().setMaxRetryRequests(5);
			Set<OpenOption> openOptions = new HashSet<>(
					Arrays.asList(StandardOpenOption.WRITE, StandardOpenOption.READ));

			dataLakeFileClient.readToFileWithResponse(localDownloadedFilePath, null, parallelTransferOptions,
					downloadRetryOptions, null, false, openOptions, null, null);

			System.out.println("Download Finished");

		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

Expected behavior
When files can be created with # and % in their name, they should also be allowed to be read via SDK API.

Screenshots
NA

Setup (please complete the following information):

  • OS: Windows, Linux
  • IDE : Eclipse
  • Version of the Library used:
<dependencies>
     <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-storage-file-datalake</artifactId>
        <version>12.3.0</version>
        <exclusions>
           <exclusion>
              <groupId>com.azure</groupId>
              <artifactId>azure-core-http-netty</artifactId>
           </exclusion>
        </exclusions>
     </dependency>
     <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-identity</artifactId>
        <version>1.2.0</version>
     </dependency>
     <dependency>
        <groupId>com.azure</groupId>
        <artifactId>azure-core-http-okhttp</artifactId>
        <version>1.3.3</version>
     </dependency>
  </dependencies>

Additional context
NA

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jan 6, 2021
@gapra-msft
Copy link
Member

Hi @rajeevsinghAD Thank you for posting this issue.

We will work on reproducing this issue as soon as we can.

@gapra-msft gapra-msft self-assigned this Jan 6, 2021
@joshfree joshfree added Client This issue points to a problem in the data-plane of the library. Data Lake Storage Gen2 labels Jan 8, 2021
@ghost ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 8, 2021
@gapra-msft
Copy link
Member

@rajeevsinghAD Thanks again for reporting this issue. I was able to reproduce it and I have created a PR to resolve the problem. Note the getFileClient method does expect a URL encoded version of the file name.

openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-java that referenced this issue Apr 1, 2022
{azure-rest-api-specs} Add possible values for DomainValidationMethod (Azure#18469)

* As discussed with Elle Tojaroon, making the changes to include possible options for the domainValidationMethod property. with the following values
public const string CNameValidationValue = "cname-delegation";
public const string HttpTokenValidationValue = "http-token";

* Updating just the description of the domainValidationMethod with the possible values 'CNameValidationValue', 'HttpTokenValidationValue'
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Data Lake Storage Gen2 question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
3 participants