Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for GSSAPI/Kerberos signed updates #30

Merged
merged 5 commits into from
Feb 19, 2021
Merged

Conversation

bodgit
Copy link
Contributor

@bodgit bodgit commented Mar 2, 2018

This PR adds support for performing DNS updates using GSSAPI/Kerberos which makes it possible to update zones on Windows servers that have the zone require "Secure only" updates, (BIND can also be configured to support this mechanism).

To configure the provider, you need something like:

provider "dns" {
  update {
    server = "ns.example.com"
    gssapi = true
  }
}

Terraform needs to be run from a shell that has a valid Kerberos session which normally means having run kinit beforehand on Linux/Unix platforms. Current limitations are that it only supports using the credentials of the current shell running Terraform, you can't assume the identity of another user. This is mostly down to the fact I have yet to find a library for Go that both supports the necessary GSSAPI calls for doing the DNS exchange as well as the necessary Kerberos calls for creating a session as another user.

All of the heavy lifting is done by https://github.com/bodgit/tsig and it uses GSSAPI on Linux/Unix using https://github.com/apcera/gssapi via a native shared library or on Windows using SSPI through https://github.com/alexbrainman/sspi which has a very similar API to GSSAPI.

I've marked it WIP as it's currently reliant on a pending PR (miekg/dns#626) for the http://github.com/miekg/dns package to add support for non-HMAC TSIG algorithms but I figured I'd put the changes out there for people to test.

@apparentlymart
Copy link
Contributor

Hi @bodgit! This looks interesting and useful... thanks for working on it.

Unfortunately it looks like the gssapi requires CGo, and we're not currently able to use CGo in the Terraform providers here since our release process uses cross-compilation and we don't have cross-toolkits available for other platforms in our build environment. 😖

Do you think it's possible to support this functionality using a pure-Go Kerberos implementation? I see gokrb5 out there but I'm pretty rusty on Kerberos so I'm not sure if it supports sufficient functionality to get what we need to support Windows DNS updates here. I also realize that a re-implementation of the Kerberos functionality is non-ideal since the usual way to add Kerberos functionality to an application on Linux is via GSSAPI and so interop with other applications might not be perfect.

We've attempted to introduce CGo dependencies into providers before and found that it would require quite extensive updates to the release process which unfortunately we do not have the resources to do at this time due to other work. 😞

@bodgit
Copy link
Contributor Author

bodgit commented Mar 20, 2018

I wasn't sure if the CGo requirement was going to be an issue, makes sense.

I did look initially at using jcmturner/gokrb5 as it was a pure-Go implementation, (I have jcmturner/gokrb5#65 open from when I explored using it). From memory it works fine for the initial part of negotiating the context between the client and server however it currently (or did, I should check again) lacks an implementation of the gss_get_mic() and gss_verify_mic() functions which are used to create and verify the TSIG on messages which is why I then switched to the apcera/gssapi implementation.

However that implementation lacks a lot of the Kerberos-specific functions which then requires that you run kinit beforehand and basically set your session up before running Terraform, so switching would actually make this provider a bit easier to use in that regard; I just need to see how hard it is to get the missing functions implemented. It's one thing to use a Kerberos/GSSAPI library, quite another to implement one, but I'll see what I can do.

@bodgit
Copy link
Contributor Author

bodgit commented May 29, 2018

Not sure how I closed this.

@arnisoph
Copy link

Any updates?

@bodgit
Copy link
Contributor Author

bodgit commented Dec 11, 2018

@arnisoph I haven't had chance to revisit the pure-Go implementation, however the pivotal bit is the changes to the underlying DNS library; the main project maintainer seems to be ignoring the PR and insisting that I'm breaking backwards compatibility when I'm not and at least one other maintainer of the library seem to agree with me. I suspect my PR that is open there doesn't apply cleanly now however I'm in no rush to rebase it if it's just going to sit there and be ignored some more.

I'm not going to drop this as it's still actively being used, albeit just a custom in-house build off of this branch but the more people who can voice support would be useful.

@arnisoph
Copy link

We might need to maintain such a inhouse build, too. I'd like to prevent invoking the CLI tool nsupdate from TF..

@bodgit
Copy link
Contributor Author

bodgit commented Dec 30, 2018

I've finally managed to improve the https://github.com/jcmturner/gokrb5 library with the missing bits so there is now a pure Go implementation which works. I just need the outstanding PR merged to the https://github.com/miekg/dns library.

@ghost ghost added size/L documentation and removed size/XXL labels Jan 21, 2019
@bodgit bodgit force-pushed the gssapi branch 2 times, most recently from b98a566 to 0a1b93a Compare January 29, 2019 17:53
@ghost ghost added size/XXL and removed size/L labels Jan 29, 2019
@bodgit bodgit changed the title [WIP] Support for GSSAPI/Kerberos signed updates Support for GSSAPI/Kerberos signed updates Feb 21, 2019
@bodgit
Copy link
Contributor Author

bodgit commented Feb 21, 2019

This is actually ready for review, and hopefully merge. The code now uses a pure Go Kerberos/GSSAPI implementation and is also therefore more versatile. It can be used as in my original example which means it relies on the current user context, or you can explicitly configure it with a realm, username and either a password, or a keytab (keytab not supported on Windows):

provider "dns" {
  update {
    server   = "ns.example.com"
    gssapi   = true
    realm    = "EXAMPLE.COM"
    username = "user"
    password = "secret"
  }
}

or:

provider "dns" {
  update {
    server   = "ns.example.com"
    gssapi   = true
    realm    = "EXAMPLE.COM"
    username = "user"
    keytab   = "/path/to/keytab"
  }
}

Now, the sticky bit is the maintainer of the underlying DNS library just ignored my PR and wouldn't engage in a discussion. To that end I've ended up embedding and extending enough of the library so there is a compatible Exchange() method with the same signature that understands the additional TSIG methods. It reuses as much of the library as possible but I've had to copy some non-exported functions.

@theeternalrat
Copy link

@bodgit , is there any ETA on when this might be merged?

@bodgit
Copy link
Contributor Author

bodgit commented May 24, 2019

@atkinson137 I need to fix up the handful of merge conflicts which have appeared in the meantime, however I don't have commit access to this repository so I'm/we're waiting on someone from Hashicorp to 👍 or 👎 this.

I appreciate getting Terraform v0.12 ready has probably been taking up most of Hashicorp's time recently but there are a few issues and PR's that are just waiting for someone with commit privs. Sadly I don't think this provider gets as much love as say the AWS one, (happy to help with this!).

@bodgit bodgit force-pushed the gssapi branch 2 times, most recently from 5488a59 to f285960 Compare June 12, 2019 16:19
@bodgit
Copy link
Contributor Author

bodgit commented Jun 12, 2019

Right, should be current against master again now.

@bodgit
Copy link
Contributor Author

bodgit commented Jan 19, 2021

@kmoe @apparentlymart any chance I could possibly get a review on this and the other two PR's please?

They look scary but that's purely down to touching the vendor directory, which is why I've kept them as separate PR's or commits. Incidentally, I notice some TF providers (AWS for instance) have now dropped vendoring modules.

@bhoriuchi
Copy link

@bodgit just wondering if you were able to address the wildcard DNS issue with the latest changes? Additionally wondering if the provider still requires a krb5.conf file or if there is opportunity to load the config from a variable? Either way thanks for keeping this effort going!

@bodgit
Copy link
Contributor Author

bodgit commented Jan 21, 2021

@bhoriuchi It appears Windows doesn't support creating wildcard DNS entries using dynamic DNS updates. I've verified using a small piece of test code to send the wildcard update; it doesn't work against Windows but works fine against a Kerberos-enabled BIND. If I create a non-wildcard record it succeeds on both. I've also tried using nsupdate to send the update and that doesn't work either so I'm reasonably confident it's a bug/implementation detail on the Windows side, which is annoying.

My code (hopefully) honours all of the standard Kerberos environment variables so it will use the $KRB5_CONFIG environment variable if it's set. As for passing the file contents in as a variable, that isn't really portable to Windows as it has no relevance there so it needs to be done in a way that keeps the API sane. That then also needs to be surfaced by this provider code, does that just have a parameter with a raw krb5.conf as heredoc? I have some ideas on how to do it in a clean way but I'd like some feedback on this PR first before I change it any further. It can always be an incremental update later.

@kmoe
Copy link
Member

kmoe commented Jan 29, 2021

@bodgit Thank you for your saintly patience and the work you've put in to this PR. After some internal reshuffling at HashiCorp, this repository is now maintained by me. Since my team is responsible for more than 20 other repositories, this maintenance will be rather slow, especially for new features, since, as Martin indicates, our top priority is the stability and correctness of these very widely used providers. They are being monitoried for contributions, however, and priority is based on a number of factors including popularity measured in 👍 reactions.

As mentioned in #118, I'd like to avoid a major version bump of this provider, so it's best if we continue supporting the (admittedly deprecated) HMAC-MD5 algorithm. Once that is added I think we can merge all three PRs at the same time.

@bodgit
Copy link
Contributor Author

bodgit commented Feb 1, 2021

@kmoe I knew there was something I'd forgotten to mention in my comments, HMAC-MD5 should be supported still. The PR to the upstream DNS library I had merged allows you to plug in your own TSIG implementations so I just provide my own that still supports HMAC-MD5 as well as all of the other algorithms.

I've added a test case to make sure it works however I'm having trouble getting Travis to actually run the tests; it looks like it has such a backlog of jobs to run that the tests for this repo never manage to make it to the front of the queue before they get cancelled after something like 5 or 6 hours. I've checked this morning and the backlog is still over 2,600 jobs with only ~ 240 running at any one time. I will periodically keep trying to schedule this and I'll squash the empty commits away once I get things to run.

@kmoe
Copy link
Member

kmoe commented Feb 1, 2021

@bodgit Thanks for adding the MD5 test. There's an issue with Travis at the moment that I'll be investigating today. Unfortunate timing, as it was working last week. By the end of today, either we'll have Travis working again, or we'll start migrating onto our new CI system and I'll approve this PR after the tests pass locally.

Base automatically changed from master to main February 1, 2021 17:27
Copy link
Member

@kmoe kmoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this once the GitHub Actions setup is complete. Certain acceptance tests aren't working yet, for reasons unrelated to this PR. Thanks for your patience!

@bodgit
Copy link
Contributor Author

bodgit commented Feb 2, 2021

If you need me to look at anything or rebase things, let me know.

@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

@bodgit I'd like to merge this soon despite the ongoing GitHub Actions problem (which is unrelated to this PR). Tests were all passing locally when I last checked, but I ran them again just now and get the following error:

=== RUN   TestAccDnsTXTRecordSet_Basic
    resource_dns_txt_record_set_test.go:41: Step 1/6 error: Error running apply: 
        Error: Error updating DNS record: Error negotiating GSS context: dial tcp 92.242.132.24:53: i/o timeout

Would you mind rebasing and taking a look at this? If I have time later today I'll investigate myself.

@bodgit
Copy link
Contributor Author

bodgit commented Feb 19, 2021

Sure. Where's that external DNS server coming from though? If I do a reverse lookup it comes back as unallocated.barefruit.co.uk, is that a personal domain?

Switch to using our own Dockerfile. This creates three container images:

* Kerberos KDC. This creates a new realm and two keytab files; one for
  BIND to use and another for a test user.
* BIND DNS server. This has Kerberos configured but can be operated
  without.
* User keytab. This exists purely to extract the keytab file out of the
  KDC container.

The containers need to be created in sync with each other as the keytab
files need to match up.

The acceptance tests now build the containers from scratch each time,
although the layers should be cached so repeat builds will be fast.

The tests then run as before using the BIND container with a different
/etc/named.conf volume mounted into place with either no TSIG
configured, an HMAC-SHA256 TSIG key, or using Kerberos/GSS-TSIG. In the
last case the KDC container is also created.

For now, no Kerberos-enabled tests are actually run.

The Travis build environment needed to be bumped from Trusty to Focal to
get a new enough version of Docker to be able to use the buildx plugin,
which is also installed.
This commit is the result of running the following:

* go get github.com/bodgit/tsig
* go mod tidy
* go mod vendor
Runs three versions; password authentication, keytab authentication, and
using an existing ticket/session.
@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

Bizarre, I have no idea. All I did was pull your branch locally and run ./internal/provider/acceptance.sh.

@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

Ha, TIL! https://www.barefruit.com/

Barefruit generates highly targeted traffic for ISPs by replacing DNS and HTTP errors with relevant advertising.

Well then...

@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

With 8.8.8.8:

--- FAIL: TestAccDnsSRVRecordSet_Basic (0.75s)
=== RUN   TestAccDnsTXTRecordSet_Basic
    resource_dns_txt_record_set_test.go:41: Step 1/6 error: Error running apply: 
        Error: Error updating DNS record: Error negotiating GSS context: dial tcp: lookup ns.example.com: no such host

Not sure why I'm getting this. Do the tests pass for you, @bodgit ?

@bodgit
Copy link
Contributor Author

bodgit commented Feb 19, 2021

If you're running them locally, you might have to pop a 127.0.0.1 ns.example.com entry into your /etc/hosts (assuming you're on a Unix-ish system, there's an equivalent on Windows buried under C:\windows\ somewhere). I think I did that step as part of the Travis job setup.

@bodgit
Copy link
Contributor Author

bodgit commented Feb 19, 2021

There's something screwy in my install, if I try and run the tests on main, or this branch, it always errors out here:

+ make testacc TEST=./internal/provider
==> Checking that code complies with gofmt requirements...
TF_ACC=1 go test ./internal/provider -v  -timeout 120m
=== RUN   TestAccDataDnsARecordSet_Basic
--- PASS: TestAccDataDnsARecordSet_Basic (20.96s)
=== RUN   TestAccDataDnsAAAARecordSet_Basic
--- PASS: TestAccDataDnsAAAARecordSet_Basic (10.23s)
=== RUN   TestAccDataDnsCnameRecordSet_Basic
--- PASS: TestAccDataDnsCnameRecordSet_Basic (10.32s)
=== RUN   TestAccDataDnsMXRecordSet_Basic
--- PASS: TestAccDataDnsMXRecordSet_Basic (10.35s)
=== RUN   TestAccDataDnsNSRecordSet_Basic
--- PASS: TestAccDataDnsNSRecordSet_Basic (10.41s)
=== RUN   TestAccDataDnsPtrRecordSet_Basic
--- PASS: TestAccDataDnsPtrRecordSet_Basic (10.15s)
=== RUN   TestAccDataDnsSRVRecordSet_Basic
--- PASS: TestAccDataDnsSRVRecordSet_Basic (10.45s)
=== RUN   TestAccDataDnsTxtRecordSet_Basic
--- PASS: TestAccDataDnsTxtRecordSet_Basic (15.23s)
=== RUN   TestHashIPString
--- PASS: TestHashIPString (0.00s)
=== RUN   TestProvider
--- PASS: TestProvider (0.00s)
=== RUN   TestProvider_impl
--- PASS: TestProvider_impl (0.00s)
=== RUN   TestAccDnsARecordSet_Basic
--- FAIL: TestAccDnsARecordSet_Basic (3.49s)
panic: interface conversion: interface {} is nil, not *provider.DNSClient [recovered]
	panic: interface conversion: interface {} is nil, not *provider.DNSClient

@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

Thank you, I had entirely neglected the 127.0.0.1 ns.example.com entry. This may well be what's broken in the GHA setup as well - will check later. Tests all passing for me now on both this branch and main.

The error you're getting seems to indicate you've somehow ended up with a nil client, despite (presumably) setting DNS_UPDATE_SERVER. I don't presently see how this is possible without the provider erroring out during New(). Are you running the tests via acceptance.sh? Only thing I can think of at the moment is to try in a clean env. Worth double-checking you don't have any terraform.* or .terraform* files lying around also.

Were the tests passing for you previously?

@bodgit
Copy link
Contributor Author

bodgit commented Feb 19, 2021

I think my error was caused by me having a 0.12.x terraform binary somewhere in my path which was being used for running the tests. I don't think the configureProvider() function was being called, which is what goes and looks up all of the environment variables to configure the provider. Once I removed that binary, the test suite seemed to be downloading a fresh copy of terraform for each individual test, which is really slow on my internet connection, but the tests passed. I've now installed a 0.14.x terraform binary and tests are still passing and running a bit quicker. All good here now.

Not having the /etc/hosts record makes sense now with the first DNS provider you were using as they have that annoying habit of serving a landing page address for any NXDOMAIN response so you were looking up ns.example.com, getting a landing page address and then trying to negotiate a Kerberos ticket with it! I hate those sort of sites.

@kmoe
Copy link
Member

kmoe commented Feb 19, 2021

Fabulous. After three years and a number of related and unrelated difficulties along the way, I'm now going to merge this PR. The new functionality will go out in today's v3.1.0 release. Thank you @bodgit for your great work on this PR and @apparentlymart for the thorough review.

@kmoe kmoe merged commit c317a9f into hashicorp:main Feb 19, 2021
@bodgit
Copy link
Contributor Author

bodgit commented Feb 19, 2021

198601-She-Went-A-Little-Too-Far

Copy link

I'm going to lock this pull request because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants