Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CaUsedAsEndEntity error in Microsoft Fabric #2449

Closed
martroben opened this issue Apr 24, 2024 · 19 comments
Closed

CaUsedAsEndEntity error in Microsoft Fabric #2449

martroben opened this issue Apr 24, 2024 · 19 comments
Labels
bug Something isn't working

Comments

@martroben
Copy link

Environment

Delta-rs version: Python deltalake-0.17.1
Cloud provider: Microsoft (North Europe)
Environment: Microsoft Fabric Notebook
OS: Fabric VM (CBL-Mariner Linux)


Bug

What happened:
While trying to create a Delta Table from a path in a Fabric Notebook, I'm getting the following error:
OSError: Generic MicrosoftAzure error: Error after 10 retries in 1.992440924s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://onelake.blob.fabric.microsoft.com/<workspace id>/<lakehouse id>/Tables/some_table/_delta_log/_last_checkpoint): error trying to connect: invalid peer certificate: Other(CaUsedAsEndEntity)

What you expected to happen:
To get a deltalake.DeltaTable instance without any errors.

How to reproduce it:
Run the following code in a Fabric Notebook:

import deltalake
import trident_token_library_wrapper

workspace_id = "xxxxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxxxxxxx"
lakehouse_id = "yyyyyyyy-yyyyy-yyyyy-yyyyy-yyyyyyyyyyyyy"
path = f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/some_table"

storage_options = {}
storage_options["bearer_token"] = trident_token_library_wrapper.PyTridentTokenLibrary.get_access_token("storage")
storage_options["use_fabric_endpoint"] = "true"
# storage_options["allow_invalid_certificates"] = "true"

deltalake.DeltaTable(
    table_uri=path,
    storage_options=storage_options)

More details:
storage_options["allow_invalid_certificates"] = "true" can be used as a quickfix.

Here are the certificate details fetched by openssl s_client -showcerts -connect onelake.blob.fabric.microsoft.com:443 in the Fabric Notebook:

CONNECTED(00000003)
depth=0 C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com
   i:C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com
-----BEGIN CERTIFICATE-----
MIIFKTCCBBGgAwIBAgIUSdOq2Tj7VfjrzloBGTEED3YeNGMwDQYJKoZIhvcNAQEL
BQAwgZ8xCzAJBgNVBAYTAlVTMRMwEQYDVQQIDApXYXNoaW5ndG9uMRAwDgYDVQQH
DAdSZWRtb25kMRYwFAYDVQQKDA1NaWNyb3NvZnREYXRhMRgwFgYDVQQLDA9TcGFy
a0RlcGFydG1lbnQxHzAdBgkqhkiG9w0BCQEWEG1lQG1pY3Jvc29mdC5jb20xFjAU
BgNVBAMMDW1pY3Jvc29mdC5jb20wHhcNMjQwNDI0MDk0NzAyWhcNMjUwNDI0MDk0
NzAyWjCBnzELMAkGA1UEBhMCVVMxEzARBgNVBAgMCldhc2hpbmd0b24xEDAOBgNV
BAcMB1JlZG1vbmQxFjAUBgNVBAoMDU1pY3Jvc29mdERhdGExGDAWBgNVBAsMD1Nw
YXJrRGVwYXJ0bWVudDEfMB0GCSqGSIb3DQEJARYQbWVAbWljcm9zb2Z0LmNvbTEW
MBQGA1UEAwwNbWljcm9zb2Z0LmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCC
AQoCggEBAL+l4Lto000/J9DEfqsuLZT48qh2K8gwQLOJvGu01LP+MqNm8QlT8K4r
hP6nShOoMTfMAEISbU9s+kN2/IjIl2fLGyHK+tB+NgMCo0mfdNyYmN/3oWfc4I1r
0sE+MfdhuC9VeayCyWTRR/O36PaggvmrAL45QQjqAUBgs0yZBnNtIRLy4QNm4ymS
yUvBzhJAyBmxuW1uuDo9SgoRk3EetxaUkObOT3fRyqoTKTU06Kpee8IK5CH4mhmr
ny/yVLHuaup13ZwQdmPJXZou2wIxa5fYqjeG46dVRT07IECl6KD/zoK+M227F0Ij
KQB2q5NlhgnkTxPpP0dJ54ophXkp6isCAwEAAaOCAVkwggFVMAwGA1UdEwQFMAMB
Af8wggFDBgNVHREEggE6MIIBNoIJbG9jYWxob3N0gh4qLnBiaWRlZGljYXRlZC53
aW5kb3dzLWludC5uZXSCIiouZGZzLnBiaWRlZGljYXRlZC53aW5kb3dzLWludC5u
ZXSCIyouYmxvYi5wYmlkZWRpY2F0ZWQud2luZG93cy1pbnQubmV0ghoqLmRmcy5m
YWJyaWMubWljcm9zb2Z0LmNvbYIbKi5ibG9iLmZhYnJpYy5taWNyb3NvZnQuY29t
gh4qLm9uZWxha2UuZmFicmljLm1pY3Jvc29mdC5jb22CGioucGJpZGVkaWNhdGVk
LndpbmRvd3MubmV0gh4qLmRmcy5wYmlkZWRpY2F0ZWQud2luZG93cy5uZXSCHyou
YmxvYi5wYmlkZWRpY2F0ZWQud2luZG93cy5uZXSHBH8AAAGHBH8AAAIwDQYJKoZI
hvcNAQELBQADggEBAEXF4WXBik4rb+xLj312GSu6oIgOPGLqOGnCseR6NU9DHaJo
MVG7Y4IEFwZI5VzPqS4sWoreNzhLwF2KbGXtZnWbs1LAAwLaOLQJx3uxRqFqH5BM
638GcXZ8Qc9Np82DQnw76lUah5BP/EkG6hgTcxeOF6m1yGaDJiwda43s+Y7CXmkD
XKSYxxqnvxGXlPnROyROnvIaRwd4l6UUYZmAEVaUjwuMdARJOhtn1vMLhNI0poS0
np39sqlWT/94vsdmWAF8/oPtyrocdKJha77vuLuRb1am1Wh6PwSp5I0HVmIsVeBk
Uah9Jj7LLdIySb8R00AIpdyp+7pj4Boz6VzctKs=
-----END CERTIFICATE-----
---
Server certificate
subject=C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com

issuer=C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA-PSS
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 1881 bytes and written 407 bytes
Verification: OK
---
New, TLSv1.3, Cipher is TLS_AES_256_GCM_SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
No ALPN negotiated
Early data was not sent
Verify return code: 0 (ok)
---

It doesn't seem to be a Certificate Authority certificate. More like a self-signed certificate, so I don't know why the error is CaUsedAsEndEntity.

Interestingly, the same openssl operation used to give a self signed certificate error (see this deltalake issue for details), but it seems that something has changed in the openssl setup of the underlying Fabric VMs.

If anyone has any ideas for how to start solving this new issue (other than using the "allow_invalid_certificates"-hammer in perpetuity), I would be most thankful.

@martroben martroben added the bug Something isn't working label Apr 24, 2024
@martroben
Copy link
Author

Upon further investigation, the correct command to check whether OneLake certificate is Certificate Authority or End Entity certificate is this:

openssl s_client -connect onelake.blob.fabric.microsoft.com:443 -showcerts | openssl x509 -text | grep "Basic Constraints" -A 1

This returns CA:TRUE, signifying that the certificate used is in fact a Certificate Authority cert, as the error suggests.

I'm not expecting Microsoft to alter their certificates to make it easier for people to use Polars in Fabric, so some workaround would still be appreciated.

Can anyone tell, what is the underlying module or crate that is giving the CaUsedAsEndEntity error? Maybe some setting can be passed to skip the CA vs EE check. (Somehow the Spark-based delta.tables module doesn't seem to be bothered by CA cert used as EE cert.)

@ion-elgreco
Copy link
Collaborator

@martroben all storage options are passed to the "object store" crate

@martroben
Copy link
Author

Posted an issue/question to object_store repo: apache/arrow-rs#5696

@hnasrullakhan
Copy link

Looks like object store bump up has caused this
#2311 . older versions of deltalake library

deltalake==0.16.2

this works fine

@ion-elgreco
Copy link
Collaborator

@hnasrullakhan please make an issue in arrow-rs repo then, there were zero code changes on our side.

@hnasrullakhan
Copy link

@ion-elgreco
Copy link
Collaborator

@hnasrullakhan that's correct, but what I am saying is. this didn't require any changes on our side outside of bumping it. So you should make an issue upstream

@martroben
Copy link
Author

Upon further testing, deltalake==0.16.1 works fine, but starting from 0.16.2, I'm getting the error (also tested the latest: 0.17.3).

I think @hnasrullakhan also confirms that - their earlier claim about 0.16.2 working fine was a typo.

I don't see any changes in object_store version between 0.16.1 and 0.16.2 (granted, I don't speak fluent rust).

Does anyone have any ideas, what else could have introduced this error between these two versions?

@ion-elgreco
Copy link
Collaborator

@martroben can you check against v0.18?

@martroben
Copy link
Author

@ion-elgreco, I sure can, but on Monday, when I'm back in office.

@hnasrullakhan
Copy link

what changed on v0.18 @ion-elgreco ?

@hnasrullakhan
Copy link

Could still repro with v0.18

@martroben
Copy link
Author

martroben commented Jun 11, 2024

@ion-elgreco, I confirm @hnasrullakhan's position: the same issue still occurs, even with deltalake==0.18.0:
error trying to connect: invalid peer certificate: Other(CaUsedAsEndEntity)


As an aside - I'm trying to push a case with MS support in parallel. Their initial position was that since there is no problem with the older versions of deltalake, it's a 3rd party problem.

I suggested that the root cause is still their improper use of certificates - the 3rd parties might have just tightened the rules about what they find acceptable to work with. Not sure if I'll win this argument though, being a mere mortal.

What can men do against such reckless hate?
- Théoden, son of Thengel

@martroben
Copy link
Author

Apparently Polars is not the only downstream library where delta lake interactions broke around deltalake v0.17. The linked issue does not seem to be related to Fabric certificates however.

Nevertheless, for anyone looking, Daft might be a viable alternative to Polars soon - especially if they implement deletes and merges for delta lake.

@ion-elgreco
Copy link
Collaborator

Apparently Polars is not the only downstream library where delta lake interactions broke around deltalake v0.17. The linked issue does not seem to be related to Fabric certificates however.

Nevertheless, for anyone looking, Daft might be a viable alternative to Polars soon - especially if they implement deletes and merges for delta lake.

That issue is not really related, Daft is using our internal methods for their writer. When we make changes in our internal methods this is not marked as a breaking change :)

@martroben
Copy link
Author

Thank you for the context @ion-elgreco. In that case it is indeed somewhat unfair for them to cite breaking changes in deltalake, when the issue is at least partly caused by their own misjudgment of what is exposed and what is not.

I'm still trying to understand though, what was the exact change between v0.16.1 and v.0.16.2 that changed the behaviour of SSL connections.

@martroben
Copy link
Author

Apparently the problem is no longer present in v0.18.1.

Not sure what caused the fix between 0.18.0 and 0.18.1. If I had to guess, it might be bumping object store from 0.9 to 0.10 where object store updated their reqwest dependency. I guess we'll never know, but I'm nonetheless happy.

Microsoft is still using a self-signed CA certificate as EE certificate in OneLake connections from Fabric. However, I had a call with their support and the product team has apparently promise to do something with the certificate. Not sure, what though. Hopefully it will not break whatever caused the fix.

@Josh-Hiz
Copy link

Josh-Hiz commented Jun 29, 2024

Additionally, I get the similar following error on 0.18.1:
OSError: Generic MicrosoftAzure error: Error after 1 retries in 7824.991305s, max_retries:10, retry_timeout:180s, source:error sending request for url ...

When performing a write on a very large data table, should I make a new issue for this? @ion-elgreco

@djouallah
Copy link

djouallah commented Aug 17, 2024

Microsoft deployed a new update to the notebook environment which should fixed this issue, could please give it another try. ( it may take some times to reach your particular region etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants