Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent "Server certificate CA fingerprint does not match the value configured in caFingerprint" (potential race condition?) #2355

Open
the-gabe opened this issue Aug 29, 2024 · 8 comments

Comments

@the-gabe
Copy link

the-gabe commented Aug 29, 2024

🐛 Bug report

We have an application under development using Elasticsearch self hosted, with self signed certificates, with clients connecting using TLS and CA Fingerprints. However, we are running into what appears to be some kind of bug with the library, or potentially even Elasticsearch itself. The issue is not consistent from several hours of testing.

To reproduce

I have uploaded a repo here which is a stripped down poc of the issue based on our application code.

https://github.com/the-gabe/elastic-failure/tree/main

usage instructions:

curl -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.15.0-linux-x86_64.tar.gz

bsdtar xvf elasticsearch-8.15.0-linux-x86_64.tar.gz

cd elasticsearch-8.15.0

./bin/elasticsearch

note down the fingerprint and password when printed in terminal

git clone https://github.com/the-gabe/elastic-failure

cd elastic-failure

edit packages/indexer/vars.bash so that ELASTIC_QUEUE_PASSWORD , ELASTIC_VECTOR_PASSWORD , ELASTIC_QUEUE_FINGERPRINT and ELASTIC_VECTOR_FINGERPRINT reflect the password and CA fingerprint you noted down.

cd packages/indexer

npm ci --no-scripts

npm run build

bash vars.bash

observe output in terminal where both clients are able to obtain the elasticsearch version just fine. but then you get a caFingerprint failure after this. The output has been included in the root of the repo, in a file here https://github.com/the-gabe/elastic-failure/blob/main/logoutput.txt This file was created on the Arch Linux environment described below, with elasticsearch 8.15.0. On Azure App Service, we were using Elasticsearch 8.14.3-1 on RHEL 9.

I have found that this issue is reproducible around 30-40% of the time, but is a guess, and is not backed by testing. I have found that starting with a fresh elasticsearch-8.15.0 folder can help, but this may be coincidence. I suspect in a speculative fashion that it could be a race condition.

Expected behavior

this just should not happen

Node.js version

Node.js v22.6.0 on Arch Linux, v20.11.1 on Azure App Service, v20.16.0 on Debian 12

@elastic/elasticsearch version

8.15.0

Operating system

Arch Linux on WSL2, Debian 11 on Azure App Service, Debian 12

Any other relevant environment information

No response

@the-gabe
Copy link
Author

the-gabe commented Aug 29, 2024

Additionally, we hacked the library to try see what was going on in the fingerprint comparison. Logs have been attached here:

https://github.com/the-gabe/elastic-failure/blob/main/appservice-hackedlib.txt (Note: for clarity w.r.t line numbers, this log was run with our actual application, not the code in the git repo)

We modified node_modules/@elastic/transport/lib/connection/UndiciConnection.js

Here is a snippet of how it looked.

        if (this[symbols_1.kCaFingerprint] !== null) {
            const caFingerprint = this[symbols_1.kCaFingerprint];
            const connector = (0, undici_1.buildConnector)(((_a = this.tls) !== null && _a !== void 0 ? _a : {}));
            undiciOptions.connect = function (opts, cb) {
                connector(opts, (err, socket) => {
                    if (err != null) {
                        return cb(err, null);
                    }
                    if (caFingerprint !== null && isTlsSocket(opts, socket)) {
                        const issuerCertificate = (0, BaseConnection_1.getIssuerCertificate)(socket);
                        /* istanbul ignore next */
                        if (issuerCertificate == null) {
                            socket.destroy();
                            return cb(new Error('Invalid or malformed certificate'), null);
                        }
                        // Check if fingerprint matches
                        /* istanbul ignore else */
      console.log("this is what we provided to the lib   " + caFingerprint);
      console.log("This is what was pulled from socket   " + issuerCertificate.fingerprint256);
                        if (caFingerprint !== issuerCertificate.fingerprint256) {
                            socket.destroy();
                            return cb(new Error("Server certificate CA fingerprint does not match the value configured in caFingerprint"), null);
                        }
                    }
                    return cb(null, socket);
                });
            };
        }

And for the sake of 100% clarity, we triple checked that "4F:57:DA:6A:80:46:C5:9F:BD:9E:49:78:BA:26:A2:FC:39:1D:32:B7:63:6C:7D:96:82:6A:1E:C5:BE:24:26:48" was valid for our CA fingerprint, we know it is as we checked several times with openssl x509 -fingerprint -sha256 -in /etc/elasticsearch/certs/http_ca.crt | grep Fingerprint and we have other applications using this fine.

@JoshMock
Copy link
Member

JoshMock commented Sep 3, 2024

Just to rule it out: it wouldn't have anything to do with this change, would it?

@the-gabe
Copy link
Author

the-gabe commented Sep 5, 2024

Hi @JoshMock , I don't think so, the actual fingerprints taken from the socket are returning undefined.

@the-gabe
Copy link
Author

the-gabe commented Sep 5, 2024

@JoshMock We confirmed that this is not related and have tested with 8.7.0 of @elastic/transport instead of 8.7.1

@JoshMock
Copy link
Member

JoshMock commented Sep 5, 2024

Got it, didn't look at the logs close enough to see that it was undefined. Definitely not related. 👍

@alimoezzi
Copy link

Hi,
I'm also having a similar issue. I'm getting error: Unhandled Rejection at: Promise [object Promise] reason ConnectionError: Invalid or malformed certificate with a valid caFingerprint that works in python client but in js results in the error. I'm using 8.15.0 and node 22.

@the-gabe
Copy link
Author

the-gabe commented Oct 7, 2024

Hi @JoshMock have you managed to look into this? This is impacting our production environments with this application now, and it's not a situation we are comfortable with. This is quite literally a mission critical functional of the library (being able to connect to Elasticsearch in an encrypted and authenticated fashion securely), is there any progress being made regarding this bug in a private capacity?

@JoshMock
Copy link
Member

No action has been taken yet, @the-gabe. I'm Elastic's only active maintainer of this project, and I've been either on PTO or occupied with higher priorities for the last few weeks. I will take a look as soon as I have time.

If you need a fix more urgently, pull requests are always welcome. I am typically able to review and merge a PR within a couple of working days if it has tests and all CI checks are passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants