Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding x509_cert input plugin #3768

Merged
merged 14 commits into from
Jul 30, 2018
Merged

Adding x509_cert input plugin #3768

merged 14 commits into from
Jul 30, 2018

Conversation

jtyr
Copy link
Contributor

@jtyr jtyr commented Feb 8, 2018

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

This PR is adding new input plugin which allows to monitor SSL certificates.

@jtyr jtyr force-pushed the jtyr-ssl_cert branch 2 times, most recently from 0691f34 to 44af23c Compare February 8, 2018 00:35
Copy link

@sepetrov sepetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtyr , thanks for the contribution. As requested, I have reviewed the PR and my comments are mostly related to coding standards. Hope the maintainers don't mind that.

@@ -0,0 +1,155 @@
package ssl_cert
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that some packages in the repository do not conform with the standards described in Effective Go - Package names , but I would suggest to drop the underscore.

ssl_cert -> sslcert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will keep the underscore in the package name because the name appears in the config file and I believe that it improves readability of the config file.

AoQUXWIiFBoFfpNVcvUgHyGGc1hv7TX8Eh8KGo2+VuPzxnFuRrbbYCs=
-----END RSA PRIVATE KEY-----`

func getTestPrefix(testN int) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this function is not used.

for _, file := range sc.Files {
cert, err := getLocalCert(file)
if err != nil {
return errors.New(fmt.Sprintf("Cannot get local SSL cert: %s", err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use fmt.Errorf.

Here is another alternative, which is not using fmt: errors.New("my error message: " + err.Error()). In this case you could remove the fmt package.

for _, server := range sc.Servers {
cert, err := getRemoteCert(server, sc.Timeout*time.Second, sc.CloseConn)
if err != nil {
return errors.New(fmt.Sprintf("Cannot get remote SSL cert: %s", err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use fmt.Errorf

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should not be treated as sentences:

  • should not start with uppercase unless the first word has special meaning, i.e. exported method/struct name etc.
  • should end with ., !, ? etc.

Think about the recommended error message style: [caller 2 error message]:[caller 1 error message]:[original error message]

See here

@@ -0,0 +1,155 @@
package ssl_cert
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The package description is missing. The exported package members also do not have any comments. Please see here or here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not that important as it's quite clear what the package is (it's an input plugin) and further description is provided by the Description function.

// For tests
CloseConn bool
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding a check, which makes sure SSLCert implements telegraf.Input.

// make sure SSLCert implements telegraf.Input
var _ telegraf.Input = &SSLCert{}

I would also consider it as a valid point, if someone prefers to have this check performed only in the _test.go file.

for _, server := range sc.Servers {
cert, err := getRemoteCert(server, sc.Timeout*time.Second, sc.CloseConn)
if err != nil {
return errors.New(fmt.Sprintf("Cannot get remote SSL cert: %s", err))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message could provide a little more information: consider including the server details. Currently neither all getRemoteCert possible errors include the server, nor the Gather method.

Timeout time.Duration `toml:"timeout"`

// For tests
CloseConn bool
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not define interfaces on the implementor side of an API "for mocking"; instead, design the API so that it can be tested using the public API of the real implementation.

See here

I see no reason to test if the connection has been closed on your end. Instead I would test the scenarios under which a connection error could occur.

func TestSSLCert_Gather_ConnectionErrors(t *testing.T) {
	pair, err := tls.X509KeyPair([]byte(testCert), []byte(testKey))
	if err != nil {
		t.Fatal(err)
	}
	config := &tls.Config{
		Certificates: []tls.Certificate{pair},
	}

	t.Run("server connection refused", func(t *testing.T) {
		ln, err := tls.Listen("tcp", ":0", config)
		if err != nil {
			t.Fatal(err)
		}
		svr := ln.Addr().String()
		ln.Close()

		sc := SSLCert{
			Servers:   []string{svr},
			Timeout:   5,
		}
		acc := testutil.Accumulator{}
		err = sc.Gather(&acc)
		if err == nil {
			t.Error("got nil; want error")
		}
	})
	t.Run("server closes the connection", func(t *testing.T) {
		ln, err := tls.Listen("tcp", ":0", config)
		if err != nil {
			t.Fatal(err)
		}
		defer ln.Close()
		go func() {
			sconn, err := ln.Accept()
			if err != nil {
				return
			}
			sconn.Close()
		}()

		sc := SSLCert{
			Servers:   []string{ln.Addr().String()},
			Timeout:   5,
		}
		acc := testutil.Accumulator{}
		err = sc.Gather(&acc)
		if err == nil {
			t.Error("got nil; want error")
		}
	})
	t.Run("server closes without handshake", func(t *testing.T) {
		ln, err := tls.Listen("tcp", ":0", config)
		if err != nil {
			t.Fatal(err)
		}
		defer ln.Close()
		go func() {
			sconn, err := ln.Accept()
			if err != nil {
				return
			}
			serverConfig := config.Clone()
			srv := tls.Server(sconn, serverConfig)
			srv.Close()
		}()

		sc := SSLCert{
			Servers:   []string{ln.Addr().String()},
			Timeout:   5,
		}
		acc := testutil.Accumulator{}
		err = sc.Gather(&acc)
		if err == nil {
			t.Error("got nil; want error")
		}
	})
}


ln, err := tls.Listen("tcp", ":0", config)
if err != nil {
t.Error(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error, the test can not continue. You should call t.Fatal(err) instead


pair, err := tls.X509KeyPair([]byte(testCert), []byte(testKey))
if err != nil {
t.Error(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error, the test cannot continue. You should call t.Fatal(err) instead

@danielnelson
Copy link
Contributor

Thanks for the pull request @jtyr and also the review @sepetrov (definitely appreciate non-maintainer reviews!). We have a pretty large backlog, but I'll try to take a look at this for 1.7, please nudge me if I don't get to it sometime in the next month. I'll also mention that there is a similar pull request #1762, it may be useful to read over it for ideas.

@danielnelson danielnelson added this to the 1.7.0 milestone Feb 9, 2018
@jtyr jtyr force-pushed the jtyr-ssl_cert branch 3 times, most recently from 2f38465 to 8aae40a Compare February 12, 2018 20:41
@jtyr
Copy link
Contributor Author

jtyr commented Feb 12, 2018

@sepetrov Thanks a lot for your review! I have implemented most of your feedback. Please see the latest commit.

Timeout time.Duration `toml:"timeout"`

// For tests
CloseConn bool

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said in #3768 (comment) I would prefer to see black box tests where your tests live in separate package ssl_cert_test, and therefore you test only the public API. But if you have to have access to parts of the package, which change the behaviour, so that more test cases can be covered, then I would recommend the following:

Move the test-specific code outside the configuration struct SSLCert

// SSLCert holds the configuration of the plugin.
type SSLCert struct {
	Servers []string      `toml:"servers"`
	Files   []string      `toml:"files"`
	Timeout time.Duration `toml:"timeout"`
}

// For tests
var (
	closeConn  bool
	unsetCerts bool
)

This way the test-related code does not become part of the public API.

The disadvantage of having global state modifiers for the package by using this approach is that you will not be able to run tests in parallel, which you do not do here anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. It should be fixed in the last commit.

@jtyr jtyr force-pushed the jtyr-ssl_cert branch 2 times, most recently from 2f2c5ff to 9889fed Compare March 21, 2018 23:47
@jtyr
Copy link
Contributor Author

jtyr commented Mar 22, 2018

@danielnelson Please could you review it once you have time?

@danielnelson
Copy link
Contributor

Will do, sorry about the delay... once 1.6 is released please ping me again if I still haven't reviewed yet

@sepetrov
Copy link

LGTM 👍

@jtyr
Copy link
Contributor Author

jtyr commented May 8, 2018

@danielnelson Ping. Please could you review?

# Reads metrics from a SSL certificate
[[inputs.ssl_cert]]
## List of local SSL files
#files = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a bit more detail about what certificates can be loaded (PEM encoded). Perhaps we can use /etc/ssl/certs/ssl-cert-snakeoil.pem as the default path.

## List of local SSL files
#files = []
## List of servers
#servers = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have examples of the how this variable could look, use example.org as the hostname.

InsecureSkipVerify: true,
}

ipConn, err := net.DialTimeout("tcp", server, timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should allow the user to specify the network in the configuration, it would then look like servers = ["tcp://example.org:8080"]


func getRemoteCert(server string, timeout time.Duration) (*x509.Certificate, error) {
tlsCfg := &tls.Config{
InsecureSkipVerify: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the tls.Config could be configurable? We have a feature request for client authentication in #3854 so perhaps we could use the standard set of SSL options:

## Optional TLS Config
# tls_ca = "/etc/telegraf/ca.pem"
# tls_cert = "/etc/telegraf/cert.pem"
# tls_key = "/etc/telegraf/key.pem"
## Use TLS but skip chain & host verification
# insecure_skip_verify = false

I think we would need to check what happens if we ignore the error from Handshake(), perhaps we can still get the PeerCertificates? We could also explore a custom tls.Config.VerifyPeerCertificate func.

@@ -90,6 +90,7 @@ import (
_ "github.com/influxdata/telegraf/plugins/inputs/socket_listener"
_ "github.com/influxdata/telegraf/plugins/inputs/solr"
_ "github.com/influxdata/telegraf/plugins/inputs/sqlserver"
_ "github.com/influxdata/telegraf/plugins/inputs/ssl_cert"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename the plugin? We don't always do SSL/TLS (in the case of files), maybe it should be x509_cert or certificate?


certs := conn.ConnectionState().PeerCertificates

if unsetCerts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of this testing method but I guess it's acceptable. It would be better IMO to use an interface for tls.Client and a mock during tests or to implement the behavior in the tests.

age := int(now.Sub(cert.NotBefore).Seconds())
expiry := int(cert.NotAfter.Sub(now).Seconds())
startdate := int(cert.NotBefore.Unix())
enddate := int(cert.NotAfter.Unix())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For startdate, enddate no need to convert to int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Seconds() returns float64 so it's necessary to convert it to int to get only the integer part of the number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make the change for startdate, enddate. (All integer fields added are converted to int64 or uint64 in the call to AddFields)

type SSLCert struct {
Servers []string `toml:"servers"`
Files []string `toml:"files"`
Timeout time.Duration `toml:"timeout"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use internal.Duration so we can specify timeout using a string like "5s"

"expiry": expiry,
"startdate": startdate,
"enddate": enddate,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we could also have metrics for if the certificate is valid, including values for common errors such as the hostname not matching the certificate. This is implemented in the related pr #3829, though ideally we could leverage the standard library code for this.

"github.com/influxdata/telegraf/testutil"
)

const testCert = `-----BEGIN CERTIFICATE-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update to use the testing certs in testutil/pki, check plugins/inputs/socket_listener/socket_listener_test.go for examples.

@danielnelson danielnelson modified the milestones: 1.7.0, 1.8.0 Jun 3, 2018
@jtyr jtyr changed the title Adding ssl_cert input plugin Adding x509_cert input plugin Jun 26, 2018
@jtyr
Copy link
Contributor Author

jtyr commented Jun 26, 2018

@danielnelson I have implemented all your suggestions. Please review.

@jtyr
Copy link
Contributor Author

jtyr commented Jul 20, 2018

@danielnelson Ping. Please could you review?

@glinton
Copy link
Contributor

glinton commented Jul 20, 2018

I made some slight modifications based on my review (see here). I know the first commit is fine, I agree with sepetrov regarding the test specific variables. Unifying the two get[Local|Remote]Certs is where I feel like danielnelson would have useful input. Feel free to cherry pick those commits, or i can directly push them to your branch, up to you.

"crypto/tls"
"crypto/x509"
"encoding/pem"
"errors"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace errors.New with fmt.Errorf


// For tests
var (
closeConn bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a need for test only variables in here, the same can be accomplished without them. (see this commit)

@jtyr
Copy link
Contributor Author

jtyr commented Jul 22, 2018

Thanks for your review, @glinton. I have merged your changes and added few more. Any idea how to further improve test coverage (current coverage is only 92.7%)?

@glinton
Copy link
Contributor

glinton commented Jul 23, 2018

😄 I guess one thing you could do is remove this check if it never gets called (i believe there would be an error on the handshake if there were no certs), but i think you've got plenty of coverage.

@jtyr
Copy link
Contributor Author

jtyr commented Jul 23, 2018

Thanks for the tip, @glinton. I have removed the check but the coverage is still only 94.3%! 😞

expiry := int(cert.NotAfter.Sub(now).Seconds())
startdate := cert.NotBefore.Unix()
enddate := cert.NotAfter.Unix()
valid := expiry > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this field for now. There are various aspects of the certificate which currently we do not take into account, such as hostname and chain of trust. We can add it back in if/when we include these features.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return certs, nil
case "file":
fallthrough
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think default should be an error, but "" could be file, this would avoid schemes like "foo://".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

fallthrough
case "udp":
fallthrough
case "tcp":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add support for tcp4, tcp6, udp4, udp6?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

conn := tls.Client(ipConn, tlsCfg)
defer conn.Close()

hsErr := conn.Handshake()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about what happens when a certificate has an invalid hostname or the cert chain is invalid? What I think would be ideal is if we still called acc.AddFields( but with valid=false and potentially with more detailed information. If this is possible, Gather would only return an error if the network connection fails or we can't get any certificates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the valid metric as per your request bellow (line 116).

{mode: 0640, content: pki.ReadServerCert(), error: false},
}

for i, test := range tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required, but would be a nice touch if we used subtests like here https://github.com/influxdata/telegraf/blob/master/plugins/inputs/docker/docker_test.go#L720


const sampleConfig = `
## List certificate sources
# sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "tcp://example.org:443"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's uncomment this one line since it will almost always need to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

# sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "tcp://example.org:443"]

## Timeout for SSL connection
# timeout = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an internal.Duration, it should be like timeout = "5s"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


func getFields(cert *x509.Certificate, now time.Time) map[string]interface{} {
age := int(now.Sub(cert.NotBefore).Seconds())
expiry := int(cert.NotAfter.Sub(now).Seconds())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these already ints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@jtyr jtyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielnelson All should be fixed in the last commit. Please review it.


func getFields(cert *x509.Certificate, now time.Time) map[string]interface{} {
age := int(now.Sub(cert.NotBefore).Seconds())
expiry := int(cert.NotAfter.Sub(now).Seconds())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "tcp://example.org:443"]

## Timeout for SSL connection
# timeout = 5
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


const sampleConfig = `
## List certificate sources
# sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "tcp://example.org:443"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

expiry := int(cert.NotAfter.Sub(now).Seconds())
startdate := cert.NotBefore.Unix()
enddate := cert.NotAfter.Unix()
valid := expiry > 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

fallthrough
case "udp":
fallthrough
case "tcp":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return certs, nil
case "file":
fallthrough
default:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

conn := tls.Client(ipConn, tlsCfg)
defer conn.Close()

hsErr := conn.Handshake()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the valid metric as per your request bellow (line 116).

sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "https://example.org"]

## Timeout for SSL connection
# timeout = 5s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add quotes to be valid toml

sources = ["/etc/ssl/certs/ssl-cert-snakeoil.pem", "tcp://example.org:443"]

## Timeout for SSL connection
# timeout = 5s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add quotes to be valid toml

@glinton glinton merged commit a897b84 into influxdata:master Jul 30, 2018
rgitzel pushed a commit to rgitzel/telegraf that referenced this pull request Oct 17, 2018
otherpirate pushed a commit to otherpirate/telegraf that referenced this pull request Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants