Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No clear way to handle MQTT broker SSL certificate expiration #433

Closed
benjamincburns opened this issue May 10, 2017 · 18 comments
Closed

No clear way to handle MQTT broker SSL certificate expiration #433

benjamincburns opened this issue May 10, 2017 · 18 comments
Labels
stale Action - Issue left behind - Used by the BOT to call for attention

Comments

@benjamincburns
Copy link

It doesn't seem that there's a clear path for the MQTT SSL support to automate broker fingerprint updates prior to server cert expiry?

Ideally this could be automated to happen in some way without any downtime. For instance, it would be nice if there was some way of publishing the fingerprint for the new cert prior to the old cert expiring via some "internal" MQTT topic.

I think apart from the logic to listen on the special fingerprint update topic, this would require the ability to specify two fingerprints to validate against, with one being regarded as the old/current fingerprint, and the other being the next/future fingerprint. Then if there were some mechanism to transparently invalidate the old fingerprint and then make the next/future fingerprint become the old/current when it starts being used by the server, I think this would be done and dusted.

@benjamincburns benjamincburns changed the title No clear way to handle MQTT broker certificate expiration No clear way to handle MQTT broker SSL certificate expiration May 10, 2017
@khcnz
Copy link

khcnz commented May 11, 2017

There is a command https://github.com/arendst/Sonoff-Tasmota/wiki/Commands MQTTFingerprint.

but right now you would either have to:
a) Change your cert immediately after publishing to all devices the new thumbprint (which you can do as a group topic) OR
b) You could push a new thumb and new domain - however i think by default it reboots after changing either of these, you would need to alter the firmware to only reboot after you have completed both

These are small embedded devices with very limited ram/flash - i'd suggest both of the workarounds above would be acceptable. In what case are you expecting your cert to expire anyway? IMO these shouldn't really be connecting directly over the internet - should be connecting to a local broker (and thus you can set a long expiry on a self signed cert).

@benjamincburns
Copy link
Author

I'd like to set up tooling to manage these devices remotely, and I think it's unreasonable to run the risk of devices being made inaccessible by risking their becoming orphaned from the MQTT broker.

I definitely like the approach of pushing a new fingerprint and new domain better than my original idea, but what recourse do I have if I've fat-fingered the domain name? More reliable yet would be for there to be the ability to specify a primary and a backup MQTT broker host/port/fingerprint/etc, and some command for switching the priority of the two.

While I appreciate the pragmatism of your suggestions around expiry times and connecting directly over the internet, I think that it would be unwise for that pragmatism to manifest as limitations in scope for a project like this. Long expiry times compromise security. Let's Encrypt explains why this is better than I ever could, and while it's true that these are low RAM/flash devices, I don't see why that should limit them from initiating connections over the open internet. "If something is painful, do more of it." Nothing will make it more robust/reliable/secure than being forced to deal with the problems foisted upon it by the open internet.

@benjamincburns
Copy link
Author

@arendst - what's the likelihood of you accepting a PR w/ these changes?

@khcnz
Copy link

khcnz commented May 11, 2017

Well as an example AFAIK it is impossible to run TLS 1.2 on these devices - like literally impossible due to memory constraints. Any other versions of SSL/TLS have either known real world of theoretical attacks - hence my suggestion of a local broker used as a relay.

Having said that a way to have a secondary server makes some sense - currently multiple wifi and multiple NTP servers are all accepted.

@benjamincburns
Copy link
Author

@khcnz on the off chance that you have the ability to accept a PR (I see your comments in the issues all over the place here), would you accept a PR for this?

And fair enough re: TLS 1.2. I'm not familiar enough with the limitations there to debate you on that, but if certificate chain validation is the blocker, I thought fingerprint validation was an acceptably secure workaround? Though I could imagine it'd be a lot more likely in the scenario I've mentioned for your devices to be hijacked by one of your own keys being compromised than it would be in the scenario with your certs backed by a proper CA chain...

@davidelang
Copy link
Collaborator

davidelang commented May 11, 2017 via email

@khcnz
Copy link

khcnz commented May 11, 2017

@benjamincburns only theo can accept pull requests - but I'd wager on his behalf that he would accept one adding a secondary mqtt broker address / fingerprint as long as existing comamnds are backwards compatible and upgrade happens automatically etc. Agreed with @davidelang fingerprint is probably all you want... but as mentioned you don't ever want to miss updating your certs and let them expire - as you have no 3rd party that can vouch you are you :) (both a good and bad thing). If that ever happened you would have to flash them (or setup a custom NTP server to "roll back time")

@davidelang
Copy link
Collaborator

davidelang commented May 11, 2017 via email

@benjamincburns
Copy link
Author

@davidelang & @khcnz thanks for your comments, and I agree that fingerprint validation seems sensible for devices like this.

Provided I can my devices connecting to my broker via SSL again, I'll likely send a PR tomorrow.

@arendst
Copy link
Owner

arendst commented May 12, 2017

@benjamincburns as long as your PR is short on code and memory I might be able to implement it. Problem is TLS is using too much code already to use OTA.

I am currently focusing on implementing Arduino-ESP8266 version > 2.3.0 which uses another 25k of code space and 4k of RAM but solves many problems that users currently experience like web page hangs due to better TCP buffer handling and PWM. In the current environment it works great but leaves only 4k code space to play with while keeping OTA available and that's still without TLS...

To be honest, TLS on the sonoff is not my main concern as explained great by @khcnz above.

@benjamincburns
Copy link
Author

benjamincburns commented May 14, 2017

It looks like the SYSCFG struct is organized such that new entries are added to the end (I assume to avoid breaking backward compatibility with settings which are already written to flash). However I also see that fields with more than one value (e.g. sta_ssid) are defined as a 2D array.

I'm going to assume that breaking backward compatibility w/ the settings in flash is worse than breaking style, and will be adding the fields for the new MQTT values at the end of the struct. If this is an incorrect assumption, please let me know.

@davidelang
Copy link
Collaborator

davidelang commented May 14, 2017 via email

@benjamincburns
Copy link
Author

benjamincburns commented May 14, 2017

@davidelang I am adding the backup host/port/user/pw settings as well. I don't mean to argue with your advice, but since this likely won't be my last PR, can you please explain then why fields like ex_mqtt_button_retain and their neighbours are kept around if alignment of the struct from version to version isn't important? Perhaps this was just an oversight? I'd rather use the arrays than different names, as this makes toggling between the two trivial.

Edit: Now that I've written the CFG_Delta code, I see that alignment with past struct versions is required.

Aside from all that -- regarding code size, I actually got a bit distracted yesterday investigating some code size reductions to try and "pay" for this change. Strangely the PlatformIO build is around 1KiB bigger than the Arduino IDE's build. I suspect some of this is the linker script being used. However I added some optimisation args to my platform.txt which are not covered by -Os and dropped my Arduino build by another ~1KiB. Will open another issue and PR for these changes once I'm done w/ my present work.

@davidelang
Copy link
Collaborator

davidelang commented May 14, 2017 via email

@arendst
Copy link
Owner

arendst commented May 14, 2017

The layout of flash settings is sacred! Any change will likely cause people to loose wifi or mqtt connection leading to serial re-flash.

If you need to add a setting append it please. I'll look into integration in the idle settings to keep the overall settings image as small as possible as someday it has to move to another area of flash due to wear.

I agree that arrays are more usefull and if the values are small that's fine but keep in mind that in that case you also have to write migration code for previous version settings to your newly defined array.

@wiktorschmidt
Copy link

Just wanted to share this here: https://github.com/copercini/esp8266-aws_iot

Also a related discussion:
aws/aws-iot-device-sdk-embedded-C#7

@stale
Copy link

stale bot commented Apr 23, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Action - Issue left behind - Used by the BOT to call for attention label Apr 23, 2018
@stale
Copy link

stale bot commented May 7, 2018

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem.

@stale stale bot closed this as completed May 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Action - Issue left behind - Used by the BOT to call for attention
Projects
None yet
Development

No branches or pull requests

5 participants