Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry job #1448

Merged
merged 11 commits into from
Jan 10, 2024
Merged

Add telemetry job #1448

merged 11 commits into from
Jan 10, 2024

Conversation

pleshakov
Copy link
Contributor

@pleshakov pleshakov commented Jan 5, 2024

Proposed changes

Problem:

We want to have a telemetry job that periodically reports product telemetry every 24h. For now, telemetry data is empty and report is sent to the debug log.

Solution:

  • Refactor leader election to use controller-runtime manager capabilities. This simplifies the existing code and make it easier to add a telemetry Job.
  • Add a telemetry Job that periodically reports empty telemetry to the debug log.
  • Make the period configurable at build time via TELEMETRY_REPORT_PERIOD Makefile variable.

Note: leader elector refactoring changes behavior of NGF process when leadership gets lost:
Before: the Manager would shutdown waiting for the runnables to exit. After: the Manager doesn't wait. It similar to NGF process panicing. This should be OK, as NGF container will restart and recover any potentially broken state (update not fully populated statuses, restore correct NGINX configuration).

Testing:

  • Unit tests
  • Manual testing:
    • Ensure leader election works as expected - both leader and non-pods run successfully.
    • Ensure NGF container exits when stop being leader.
    • Ensure an upgrade from Release 1.1.0 is successful for leader election - the leader gets elected among the new pods.
    • Ensure the telemetry Job reports telemetry multiple times, using a small value of ELEMETRY_REPORT_PERIOD

CLOSES #1382

More notes:

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@pleshakov pleshakov requested a review from a team as a code owner January 5, 2024 18:32
@github-actions github-actions bot added the enhancement New feature or request label Jan 5, 2024
@pleshakov pleshakov mentioned this pull request Jan 5, 2024
6 tasks
internal/mode/static/manager.go Outdated Show resolved Hide resolved
internal/mode/static/manager.go Outdated Show resolved Hide resolved
internal/mode/static/telemetry/job.go Show resolved Hide resolved
@pleshakov pleshakov requested a review from lucacome January 8, 2024 16:19
Copy link
Contributor

@kate-osborn kate-osborn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of questions, but it looks good to me!

internal/mode/static/manager.go Outdated Show resolved Hide resolved
internal/mode/static/telemetry/job_test.go Show resolved Hide resolved
pleshakov and others added 10 commits January 10, 2024 10:36
Problem:

We want to have a telemetry job that periodically reports product
telemetry every 24h. For now, telemetry data is empty and report is sent
to the debug log.

Solution:

- Refactor leader election to use controller-runtime manager
capabilities. This simplifies the existing code and make it easier to
add a telemetry Job.
- Add a telemetry Job that periodically reports empty telemetry to
the debug log.
- Make the period configurable at build time via TELEMETRY_REPORT_PERIOD
Makefile variable.

Note: leader elector refactoring changes behavior of NGF process
when leadership gets lost:
Before: the Manager would shutdown waiting for the runnables to exit.
After: the Manager doesn't wait. It similar to NGF process panicing.
This should be OK, as NGF container will restart and recover any
potentially broken state (update not fully populated statuses, restore
correct NGINX configuration).

Testing:
- Unit tests
- Manual testing:
  - Ensure leader election works as expected - both leader and
    non-pods run successfully.
  - Ensure NGF container exits when stop being leader.
  - Ensure an upgrade from Release 1.1.0 is successful for leader
    election - the leader gets elected among the new pods.
  - Ensure the telemetry Job reports telemetry multiple times, using
  a small value of ELEMETRY_REPORT_PERIOD

CLOSES nginx#1382
@pleshakov pleshakov force-pushed the feature/telemetry-job branch from 5c5fcf1 to 06832aa Compare January 10, 2024 15:37
@pleshakov pleshakov requested a review from bjee19 January 10, 2024 15:38
@pleshakov pleshakov merged commit 9d9c1f2 into nginx:main Jan 10, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Telemetry Job (NGF)
5 participants