Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETOBSERV-1532: add TLS support to ebpf agent metrics config #305

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

msherif1234
Copy link
Contributor

Description

Add the ability to use TLS for the metrics server,

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
    • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
    • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
    • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Mar 25, 2024

@msherif1234: This pull request references NETOBSERV-1532 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Description

Add the ability to use TLS for the metrics server,

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

codecov bot commented Mar 25, 2024

Codecov Report

Attention: Patch coverage is 30.00000% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 34.01%. Comparing base (3a12ba2) to head (aa041de).

Files Patch % Lines
pkg/agent/agent.go 0.00% 4 Missing ⚠️
pkg/prometheus/prom_server.go 50.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #305      +/-   ##
==========================================
- Coverage   34.04%   34.01%   -0.03%     
==========================================
  Files          47       47              
  Lines        3836     3845       +9     
==========================================
+ Hits         1306     1308       +2     
- Misses       2444     2449       +5     
- Partials       86       88       +2     
Flag Coverage Δ
unittests 34.01% <30.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -153,6 +153,10 @@ func FlowsAgent(cfg *Config) (*Flows, error) {
PromConnectionInfo: metrics.PromConnectionInfo{
Address: cfg.MetricsServerAddress,
Port: cfg.MetricsPort,
TLS: &metrics.PromTLS{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defined as such, PromConnectionInfo.TLS will never be nil; however, below you're checking with

if conn.TLS != nil {
			err = httpServer.ListenAndServeTLS(conn.TLS.CertPath, conn.TLS.KeyPath)
}

So I guess you should check for empty cfg.MetricsTLSCertPath instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 26, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:edd4cb8

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=edd4cb8 make set-agent-image

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 26, 2024
@msherif1234 msherif1234 requested a review from jotak March 26, 2024 12:30
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 26, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:b466e4a

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=b466e4a make set-agent-image

@memodi
Copy link
Contributor

memodi commented Mar 26, 2024

@msherif1234 - I tried to enable metrics with TLS with below config in flowcollector:

      metrics:
        enable: true
        server:
          port: 9090
          tls:
            insecureSkipVerify: false
            type: Auto

ebpf pods are landing in error state:

time="2024-03-26T15:28:50Z" level=info msg="starting NetObserv eBPF Agent"
time="2024-03-26T15:28:50Z" level=info msg="initializing Flows agent" component=agent.Flows
time="2024-03-26T15:28:50Z" level=info msg="StartServerAsync: addr = :9090" component=prometheus
time="2024-03-26T15:28:50Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution"
time="2024-03-26T15:28:50Z" level=info msg="starting Flows agent" component=agent.Flows
time="2024-03-26T15:28:50Z" level=warning msg="can't detect any network-namespaces err: open /var/run/netns: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher
time="2024-03-26T15:28:50Z" level=warning msg="failed to add watcher to netns directory err: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher
time="2024-03-26T15:28:50Z" level=fatal msg="error in http.ListenAndServe: open tls.crt: no such file or directory" component=prometheus

@msherif1234
Copy link
Contributor Author

@msherif1234 - I tried to enable metrics with TLS with below config in flowcollector:

      metrics:
        enable: true
        server:
          port: 9090
          tls:
            insecureSkipVerify: false
            type: Auto

ebpf pods are landing in error state:

time="2024-03-26T15:28:50Z" level=info msg="starting NetObserv eBPF Agent"
time="2024-03-26T15:28:50Z" level=info msg="initializing Flows agent" component=agent.Flows
time="2024-03-26T15:28:50Z" level=info msg="StartServerAsync: addr = :9090" component=prometheus
time="2024-03-26T15:28:50Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution"
time="2024-03-26T15:28:50Z" level=info msg="starting Flows agent" component=agent.Flows
time="2024-03-26T15:28:50Z" level=warning msg="can't detect any network-namespaces err: open /var/run/netns: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher
time="2024-03-26T15:28:50Z" level=warning msg="failed to add watcher to netns directory err: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher
time="2024-03-26T15:28:50Z" level=fatal msg="error in http.ListenAndServe: open tls.crt: no such file or directory" component=prometheus

@memodi there was missing mounts in the operator side I just updated the operator PR to do the proper mounts

@github-actions github-actions bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 27, 2024
@msherif1234
Copy link
Contributor Author

/ok-to-test

@openshift-ci openshift-ci bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 27, 2024
Copy link

New image:
quay.io/netobserv/netobserv-ebpf-agent:e418bc9

It will expire after two weeks.

To deploy this build, run from the operator repo, assuming the operator is running:

USER=netobserv VERSION=e418bc9 make set-agent-image

Copy link
Member

@jotak jotak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 25.00000% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 33.84%. Comparing base (a5bcf49) to head (68f00d3).

Files Patch % Lines
pkg/agent/agent.go 0.00% 6 Missing ⚠️
pkg/prometheus/prom_server.go 50.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #305      +/-   ##
==========================================
- Coverage   34.04%   33.84%   -0.21%     
==========================================
  Files          47       47              
  Lines        3836     3847      +11     
==========================================
- Hits         1306     1302       -4     
- Misses       2444     2456      +12     
- Partials       86       89       +3     
Flag Coverage Δ
unittests 33.84% <25.00%> (-0.21%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@memodi
Copy link
Contributor

memodi commented Mar 28, 2024

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved QE has approved this pull request label Mar 28, 2024
@openshift-ci-robot
Copy link
Collaborator

openshift-ci-robot commented Mar 28, 2024

@msherif1234: This pull request references NETOBSERV-1532 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Description

Add the ability to use TLS for the metrics server,

Dependencies

n/a

Checklist

If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.

  • Will this change affect NetObserv / Network Observability operator? If not, you can ignore the rest of this checklist.
  • Is this PR backed with a JIRA ticket? If so, make sure it is written as a title prefix (in general, PRs affecting the NetObserv/Network Observability product should be backed with a JIRA ticket - especially if they bring user facing changes).
  • Does this PR require product documentation?
  • If so, make sure the JIRA epic is labelled with "documentation" and provides a description relevant for doc writers, such as use cases or scenarios. Any required step to activate or configure the feature should be documented there, such as new CRD knobs.
  • Does this PR require a product release notes entry?
  • If so, fill in "Release Note Text" in the JIRA.
  • Is there anything else the QE team should know before testing? E.g: configuration changes, environment setup, etc.
  • If so, make sure it is described in the JIRA ticket.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@msherif1234
Copy link
Contributor Author

/approve

Copy link

openshift-ci bot commented Mar 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msherif1234

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 5f3c1b2 into netobserv:main Mar 28, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants