Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add redis based reliability reporting #778

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

jrconlin
Copy link
Member

@jrconlin jrconlin commented Oct 17, 2024

This adds a feature flag reliable_report that optionally enables Push
message reliablity reporting. The report is done in two parts.
The first part uses a Redis like storage system to note message states.
This will require a regularly run "cleanup" script to sweep for expired
messages and adust the current counts, as well log those states to some
sequential logging friendly storage (e.g. common logging or steamed to
a file). The clean-up script should be a singleton to prevent possible
race conditions.

The second component will write a record of the state transition
times for tracked messages to a storage system that is indexed by the
tracking_id. This will allow for more "in depth" analysis by external
tooling.

The idea being that reporting will be comprised of two parts:
One part which shows active states of messages (with a log of prior
states to show trends over time), and an optional "in-depth" record
that could be used to show things like length of time in storage,
overall success rates, survivability rates, etc.

This patch also:

  • fixes a few typos
  • changes several methods that should consume Notifications, actually consume them.
  • convert from tracking_id to reliability_id
  • convert instance of specialized Metrics to generic Cadence (to make calls more consistent)
  • adds a RELIABLE_REPORT flag to testing.

Closes: SYNC-4327

This PR introduces tracking throughput for the database.

It also introduces the PushReliability reporting skeleton. This will be
fleshed out with full reporting later.

Closes: #SYNC-4324
This adds a feature flag `reliable_report` that optionally enables Push
message reliablity reporting. The report is done in two parts.
The first part uses a Redis like storage system to note message states.
This will require a regularly run "cleanup" script to sweep for expired
messages and adust the current counts, as well log those states to some
sequential logging friendly storage (e.g. common logging or steamed to
a file). The clean-up script should be a singleton to prevent possible
race conditions.

The second component will write a record of the state transition
times for tracked messages to a storage system that is indexed by the
tracking_id. This will allow for more "in depth" analysis by external
tooling.

The idea being that reporting will be comprised of two parts:
One part which shows active states of messages (with a log of prior
states to show trends over time), and an optional "in-depth" record
that could be used to show things like length of time in storage,
overall success rates, survivability rates, etc.

This patch also:
* fixes a few typos
* changes several methods that should consume Notifications, actually
  consume them.
* convert from `tracking_id` to `reliability_id`
* convert instance of specialized `Metrics` to generic Cadence (to make
  calls more consistent)
* adds a `RELIABLE_REPORT` flag to testing.

Closes: SYNC-4327
* alter `setup_bt` to include reliability family
* alter config.yml for eventual integration test changes
@jrconlin jrconlin marked this pull request as ready for review October 18, 2024 17:34
@jrconlin jrconlin requested review from pjenvey and taddes October 18, 2024 17:34
autoconnect/autoconnect-common/src/protocol.rs Outdated Show resolved Hide resolved
autoconnect/autoconnect-settings/src/lib.rs Show resolved Hide resolved
autoconnect/autoconnect-web/src/routes.rs Show resolved Hide resolved
tests/integration/test_integration_all_rust.py Outdated Show resolved Hide resolved
autopush-common/src/reliability.rs Outdated Show resolved Hide resolved
scripts/reliablity_cron.py Outdated Show resolved Hide resolved
autopush-common/src/reliability.rs Outdated Show resolved Hide resolved
autopush-common/src/reliability.rs Outdated Show resolved Hide resolved
autoendpoint/src/extractors/subscription.rs Outdated Show resolved Hide resolved
@jrconlin jrconlin force-pushed the feat/SYNC-4327_redis branch from 5ff3bb9 to 3de403d Compare December 3, 2024 22:19
@jrconlin jrconlin requested a review from pjenvey December 3, 2024 22:34
@jrconlin
Copy link
Member Author

jrconlin commented Dec 3, 2024

Well, that was less than entertaining, but at least educational.

@jrconlin jrconlin requested a review from Trinaa December 4, 2024 19:07
Makefile Outdated Show resolved Hide resolved
@jrconlin jrconlin requested a review from Trinaa December 6, 2024 23:26
Copy link
Contributor

@Trinaa Trinaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration test changes LGTM!

Makefile Outdated Show resolved Hide resolved
@jrconlin jrconlin requested a review from Trinaa December 17, 2024 18:32
@jrconlin jrconlin dismissed pjenvey’s stale review December 17, 2024 18:34

Changes applied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants