-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: incorporation of HTTPS cert data for additional authenticity check #147
Comments
I'm very much not a TLS expert, and after thinking/reading up more I realize this is probably a much bigger project than I initially expected. To restate, the goal is to establish cryptographic proof that a web page's content was unmodified in transit from the declared domain. MotivationI believe (please correct me!) that currently a compliant WACZ archive with arbitrary content can be generated for any arbitrary domain (e.g. by editing /etc/hosts to point to your own IP). So if I created an archive right now of example.com and self-signed it according to wacz-auth, the owner of example.com would still be able to claim I simply fabricated the evidence when I created the archive, because WARC/WACZ doesn't appear to retain TLS data. I believe (not sure!) that TLS/HTTPS is intended to provide the exact kind of source authentication guarantees I want from WARC/WACZ, even if the current standards do not cryptographically verify timestamps like the wacz-auth spec authors would prefer. ImplementationI still need to figure out which data we would need to add to WARC, and which outputs we need to grab from TLS. It's possible that the cryptographic guarantees I want cannot be provided from TLS, but I think they can. Wikipedia on TLS describes:
However, I am very much under the impression that this kind of complexity should be handled by libcurl or similar. I strongly suspect "generate cryptographic proof of webpage authenticity (which can later be verified offline against the public HTTPS cert)" is not a new idea, so I'm really hoping we can "just" adapt some existing code and add a few new fields to WARC or WACZ.
Please comment if you believe that I've misrepresented the authentication guarantees of TLS or otherwise missed a reason this won't actually work the way I want it to! |
The problem with that is, short version, once both parties complete the handshake and have a shared symmetric key, all data transmitted can be forged by either party -- nothing in an HTTPS transport is signed except for parts of the handshake. What you would need is to have the server actually produce a signature of the content using an asymmetric keypair (presumably the same as the TLS certificate used), which is not the same as the shared symmetric key produced by asymmetric key exchange. Essentially what the signed HTTP exchanges proposal is that the WACZ signing doc linked. |
Ah, ok—this makes perfect sense! I was hoping that the symmetric key itself would be a form of signature somehow, but I absolutely see now how symmetric session encryption is intrinsically forgeable because the same key is available to both participants.
Thank you so much! I believe I mistakenly conflated the (very cool and necessary) timestamp verification mechanisms developed for wacz-auth with their very short dismissal of TLS/HTTPS, and wasn't sure whether NIH was at play (because I'm not an expert in networking yet). But it's clear to me now why signed HTTP is actually a very direct answer to this. I will maybe look to see if I can improve the wording here to avoid mistakes like mine, but I'm not sure if that's necessary now that I have more context. |
@cosmicexplorer it's a good question (perhaps we should cover it somewhere) and @rneilson thanks for quick response! Yes, unfortunately, TLS lacks 'non-repudiation' so it is not possible to use TLS to prove that the particular decoded response was in fact served by the server, due to the symmetric key. The best we can do is prove that the a particular party (the observer/witness) created the archive. We use TLS certs to extend the existing PKI and cert transparency logs to be able to say the a particular archive was created by whomever owns a particular TLS cert. Eg. we can create an archive and sign it with a cert such as Unfortunately, this still requires domain ownership, and we don't yet have a clear solution for ascertaining identity without a domain. |
Hello, I love this project!!! I was reviewing the verification mechanisms for WACZ files at https://specs.webrecorder.net/wacz-auth/0.1.0/#proof-of-authenticity, and noticed a peculiar wording in their requirements definition (emphasis mine):
The rest of the document goes on to describe the difficulty with verifying timestamps (and their great mechanism to address that), which I understand is necessary. However, if I created a WACZ archive of some pages, is it currently possible for the website operator to simply claim the WACZ was falsified when it was first created (e.g. the archive was created against a fake site with the same domain)? Even if the HTTPS cert doesn't verify timestamps, it seems extremely useful to be able to say "this website definitely absolutely served this content at some point" by incorporating the HTTPS certificate, and then relying on the rest of the authenticity spec to provide additional confirmation of its contents.
Do I understand this problem correctly, or does WACZ already incorporate the HTTPS certificate at time of archive creation in a way that verifies the content was actually downloaded from the remote server? It seems like the wacz-auth spec tries very hard to solve a more specific problem with timestamping and may have missed the opportunity to add this additional layer of verification, but I'm not sure.
The text was updated successfully, but these errors were encountered: