Skip to content
This repository has been archived by the owner on Feb 15, 2024. It is now read-only.

Notification delays and timeouts are not clear #85

Closed
atc0005 opened this issue Jul 9, 2020 · 6 comments · Fixed by #88
Closed

Notification delays and timeouts are not clear #85

atc0005 opened this issue Jul 9, 2020 · 6 comments · Fixed by #88
Assignees
Labels
bug Something isn't working
Milestone

Comments

@atc0005
Copy link
Owner

atc0005 commented Jul 9, 2020

While I was actively developing the functionality the intent of each setting, constant and parameter (for the most part) was clear to me, but while working to implement GH-3 I ran across the function call and had to dig deeper than I would have liked to figure out what the difference was between config.NotifyMgrEmailNotificationDelay and cfg.EmailNotificationDelay().

There is a difference, but in hindsight the names are too close together to easily tell what the purpose of each is.

Ultimately I think the cfg.EmailNotificationDelay() func used to be named cfg.EmailNotificationRetriesDelay() or cfg.EmailNotificationRetryDelay() (or the value referenced within was) and at that point it was probably clearer what was going on.

brick/cmd/brick/notify.go

Lines 645 to 655 in 30fdb08

go emailNotifier(
ctx,
config.NotifyMgrEmailTimeout,
config.NotifyMgrEmailNotificationDelay,
cfg.EmailNotificationRetries(),
cfg.EmailNotificationDelay(),
emailNotifyWorkQueue,
emailNotifyResultQueue,
emailNotifyDone,
)
}

TODO:

Add scratch notes detailing my findings so that I can work that explanation into the code via doc comment updates and also via variable/function/whatever name changes to better communicate the intent.

@atc0005 atc0005 self-assigned this Jul 9, 2020
@atc0005 atc0005 added the bug Something isn't working label Jul 9, 2020
@atc0005 atc0005 added this to the v0.2.1 milestone Jul 9, 2020
@atc0005
Copy link
Owner Author

atc0005 commented Jul 10, 2020

Still need to post the paper notes, but coming at this from the README in, here is what I see.

configure doc entry for teams-notify-delay says:

The number of seconds to wait between Microsoft Teams message delivery attempts.

The config file doesn't have any doc comments (boo, I should have caught that).
The MSTeams config struct has:

// MSTeams represents the various configuration settings used to send
// notifications to a Microsoft Teams channel.
type MSTeams struct {

	// ...

	// Delay is the number of seconds to wait between Microsoft Teams message
	// delivery attempts.
	Delay *int `toml:"delay" arg:"--teams-notify-delay,env:BRICK_MSTEAMS_WEBHOOK_DELAY" help:"The number of seconds to wait between Microsoft Teams message delivery attempts."`

	// Retries is the number of attempts that this application will make to
	// deliver Microsoft Teams messages before giving up.
	Retries *int `toml:"retries" arg:"--teams-notify-retries,env:BRICK_MSTEAMS_WEBHOOK_RETRIES" help:"The number of attempts that this application will make to deliver Microsoft Teams messages before giving up."`
}

The struct tag content scrolls off the page a bit, but the doc comments mirror the help text.

From this point we have a getter method defined to retrieve the "winning" value between CLI flag, env var and config file:

// TeamsNotificationDelay returns the user-provided delay for Microsoft Teams
// notifications or the default value if not provided. CLI flag values take
// precedence if provided. This delay is added regardless of whether a
// previous notification delivery attempt has been made.
func (c Config) TeamsNotificationDelay() int {

	switch {
	case c.cliConfig.MSTeams.Delay != nil:
		return *c.cliConfig.MSTeams.Delay
	case c.fileConfig.MSTeams.Delay != nil:
		return *c.fileConfig.MSTeams.Delay
	default:
		return defaultMSTeamsDelay
	}
}

The doc comments and setting name mostly match what was said before, but this stands out:

This delay is added regardless of whether a previous notification delivery attempt has been made.

This is our first time seeing that.

If no specific options are provided, what default is applied via the defaultMSTeamsDelay const?

	// the number of seconds to wait between retry attempts; applies to
	// Microsoft Teams notifications only
	defaultMSTeamsDelay int = 5

This is key:

the number of seconds to wait between retry attempts

This is the first time we see that the value is intended to be applied to retry attempts, not all attempts. How is this value used in the code? I vaguely recall applying it literally at one point in the development cycle before I worked out a scheduling mechanism to allow for spacing the notifications in such a way that we still respected the remote API limit.

Anyway, to parallel what the OP shows, this is how the getter method is called:

brick/cmd/brick/notify.go

Lines 627 to 638 in 30fdb08

go teamsNotifier(
ctx,
cfg.TeamsWebhookURL(),
config.NotifyMgrTeamsTimeout,
config.NotifyMgrTeamsNotificationDelay,
cfg.TeamsNotificationRetries(),
cfg.TeamsNotificationDelay(),
teamsNotifyWorkQueue,
teamsNotifyResultQueue,
teamsNotifyDone,
)
}

and this is how the value is referenced inside of the teamsNotifer function:

brick/cmd/brick/notify.go

Lines 340 to 350 in ddc57ab

func teamsNotifier(
ctx context.Context,
webhookURL string,
sendTimeout time.Duration,
sendDelay time.Duration,
retries int,
retriesDelay int,
incoming <-chan events.Record,
notifyMgrResultQueue chan<- NotifyResult,
done chan<- struct{},
) {

Here we can clearly see it described as retriesDelay, which matches the description/intent of the previously mentioned defaultMSTeamsDelay value.

The retriesDelay value is used here:

brick/cmd/brick/notify.go

Lines 399 to 404 in ddc57ab

timeoutValue := config.GetNotificationTimeout(
sendTimeout,
nextScheduledNotification,
retries,
retriesDelay,
)

The function description is a bit lacking (working on that), but this bit has thus far survived the (as of yet unmerged) doc comments update mostly intact:

This overall timeout value is computed using multiple values:

  1. the base timeout value for a single message submission attempt
  2. the next scheduled notification
    • created using the configured delay we wish to force between message submission attempts
  3. the total number of retries allowed
  4. the delay between retry attempts

The intent here is clear:

  • we have an overarching goal of respecting the remote API limit and a (non-configurable) value dedicated to that purpose
  • we have a configurable delay between retry attempts

This matches what is implied with the defaultMSTeamsDelay const.

What is actually done at the code level?

Not long after receiving a value back, the timeoutValue (of time.Duration) is used to build a context.WithTimeout which is then passed into a goroutine along with retryDelay (and other values):

brick/cmd/brick/notify.go

Lines 435 to 448 in ddc57ab

go func(
ctx context.Context,
webhookURL string,
record events.Record,
schedule time.Time,
numRetries int,
retryDelay int,
resultQueue chan<- NotifyResult) {
ourMessage := createMessage(record)
resultQueue <- sendMessage(ctx, webhookURL, ourMessage, schedule, numRetries, retryDelay)
}(ctx, webhookURL, record, nextScheduledNotification, retries, retriesDelay, ourResultQueue)

The sendMessage func call leads to this:

brick/cmd/brick/message.go

Lines 421 to 423 in ddc57ab

// Submit message card, retry submission if needed up to specified number
// of retry attempts.
if err := send2teams.SendMessage(ctx, webhookURL, msgCard, retries, retriesDelay); err != nil {

where retriesDelay is passed down to send2teams.SendMessage (v0.4.4 of that module/package):

logger.Printf("SendMessage: Attempt %d of %d to send message failed: %v\n",
attempt, attemptsAllowed, result)
time.Sleep(time.Duration(retriesDelay) * time.Second)
continue

Unless I missed something, the retry delay value (thankfully, it is consistently referred to as retriesDelay) is primarily used as a delay between retry attempts and only used otherwise in the GetNotificationTimeout func to compute the total timeout before the notification attempt is (context) cancelled.

@atc0005
Copy link
Owner Author

atc0005 commented Jul 10, 2020

The questions now:

  1. Should the original intent be exposed through user-configurable settings?
  2. Should the remote API limit threshold be directly exposed?
  3. Should the user-configurable setting directly map to the remote API limit threshold and automatically compute the retry delay as a percentage of the provided value?

Since this project hasn't been released yet and nobody is actually using it, we (I) have all options on the table.

@atc0005
Copy link
Owner Author

atc0005 commented Jul 10, 2020

  1. Should the original intent be exposed through user-configurable settings?

I'm thinking that this would be the first step. Since the provided value is used primarily to space out retry delays, that seems to be the best approach here. As I mentioned earlier, I believe in an earlier version of the code that I wasn't using scheduling so much as artificial delays at the start of notification to enforce respect for the remote API limit.

This was true even if earlier send attempts resulted in a number of retries (with delays between them) which would have ensured that the application's message attempts already were respecting the limit. This could result in a minimum of 5 seconds delay (the intent), but could just as easily lead to 10-15 seconds delay. For a potentially (however unlikely) busy system, this would result in messages that were queued longer than strictly necessary. Not a deal breaker, but it was the reason that I wrote in scheduling support.

@atc0005
Copy link
Owner Author

atc0005 commented Jul 10, 2020

I barely touched on this except to make loose references to it, but here are the API-limit specific settings:

	// NotifyMgrTeamsNotificationDelay is the delay between Microsoft Teams
	// notification attempts. This delay is intended to help prevent
	// unintentional abuse of remote services.
	NotifyMgrTeamsNotificationDelay time.Duration = 5 * time.Second

	// NotifyMgrEmailNotificationDelay is the delay between email notification
	// attempts. This delay is intended to help prevent unintentional abuse of
	// remote services.
	NotifyMgrEmailNotificationDelay time.Duration = 5 * time.Second

As already indicated, I believe that an earlier version of the code did not make a clear distinction between the retry delay value and the overall API-limit value, but ironically I've since squashed the commits and have pruned the older branches; I don't have (easy) access to the older branches where I believe this design was applicable.

Either way, the NotifyMgrTeamsNotificationDelay is actively used.

brick/cmd/brick/notify.go

Lines 357 to 360 in ddc57ab

// Setup new scheduler that we can use to add an intentional delay between
// Microsoft Teams notification attempts
// https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/connectors-using
notifyScheduler := newNotifyScheduler(sendDelay)

This "scheduler" is used to return the next available notification schedule based on the notification "spacing" or delay configured via the NotifyMgrTeamsNotificationDelay setting. From what I can tell/remember, this is a hard-coded value not set anywhere else.

The NotifyMgrTeamsNotificationDelay value used to create an independent scheduler per notification type (e.g., teamsNotifier, emailNotifier). From that scheduler, a value is generated indicating when the next notification is permitted to be sent (at the earliest):

brick/cmd/brick/notify.go

Lines 390 to 392 in ddc57ab

log.Debug("Calculating next scheduled notification")
nextScheduledNotification := notifyScheduler()

This new value (nextScheduledNotification) is included in the calculations to determine the notification timeout:

brick/cmd/brick/notify.go

Lines 399 to 404 in ddc57ab

timeoutValue := config.GetNotificationTimeout(
sendTimeout,
nextScheduledNotification,
retries,
retriesDelay,
)

To put it another way, the NotifyMgrTeamsNotificationDelay and retry delay values have different purposes, but are related to each other. The API-limit focused NotifyMgrTeamsNotificationDelay value is honored by implementing the scheduler and spacing out message delivery attempts until the schedule has been met. Provided that the last schedule + NotifyMgrTeamsNotificationDelay hasn't elapsed, the next notification can be sent immediately without waiting a full NotifyMgrTeamsNotificationDelay amount of time.

@atc0005
Copy link
Owner Author

atc0005 commented Jul 10, 2020

Ignoring the fact that there is likely a much better way to handle this (e.g., use a type+method instead of a standalone function), this is the proposed "fix" for the confusion I ran into earlier:

		go teamsNotifier(
			ctx,
			cfg.TeamsWebhookURL(),
			config.NotifyMgrTeamsNotificationTimeout,
			config.NotifyMgrTeamsNotificationRateLimit,
			cfg.TeamsNotificationRetries(),
			cfg.TeamsNotificationRetryDelay(),
			teamsNotifyWorkQueue,
			teamsNotifyResultQueue,
			teamsNotifyDone,
		)
	}

I can then tweak the parameter names for teamsNotifier and emailNotifier (and any other related functions) if necessary to further emphasize the difference between the "rate limit" and "retry delay".

That too far too long to work out.

atc0005 added a commit that referenced this issue Jul 10, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but the
purpose behind each related portion of code surrounding what I know
understand to be handling the rate limit (formerly named
`config.NotifyMgrEmailNotificationDelay`) and notification retry
delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate-limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
@atc0005
Copy link
Owner Author

atc0005 commented Jul 11, 2020

Ultimately I think the cfg.EmailNotificationDelay() func used to be named cfg.EmailNotificationRetriesDelay() or cfg.EmailNotificationRetryDelay() (or the value referenced within was) and at that point it was probably clearer what was going on.

brick/cmd/brick/notify.go

Lines 645 to 655 in 30fdb08

go emailNotifier(
ctx,
config.NotifyMgrEmailTimeout,
config.NotifyMgrEmailNotificationDelay,
cfg.EmailNotificationRetries(),
cfg.EmailNotificationDelay(),
emailNotifyWorkQueue,
emailNotifyResultQueue,
emailNotifyDone,
)
}

FWIW, I found an old backup of the initital-prototype branch and pulled this out:

		go emailNotifier(
			ctx,
			config.NotifyMgrEmailTimeout,
			config.NotifyMgrEmailNotificationDelay,
			cfg.Retries,
			cfg.RetriesDelay,
			emailNotifyWorkQueue,
			emailNotifyResultQueue,
			emailNotifyDone,
		)

As I alluded to, the intent was previously clearer (particularly with the cfg.RetriesDelay name), but became less so when I attempted to work in getter functions and failed to keep in mind the specific goals.

atc0005 added a commit that referenced this issue Jul 11, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but the
purpose behind each related portion of code surrounding what I know
understand to be handling the rate limit (formerly named
`config.NotifyMgrEmailNotificationDelay`) and notification retry
delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
atc0005 added a commit that referenced this issue Jul 11, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but the
purpose behind each related portion of code surrounding what I know
understand to be handling the rate limit (formerly named
`config.NotifyMgrEmailNotificationDelay`) and notification retry
delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
atc0005 added a commit that referenced this issue Jul 11, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but the
purpose behind each related portion of code surrounding what I know
understand to be handling the rate limit (formerly named
`config.NotifyMgrEmailNotificationDelay`) and notification retry
delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
atc0005 added a commit that referenced this issue Jul 11, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but the
purpose behind each related portion of code surrounding what I now
understand to be handling the rate limit (formerly named
`config.NotifyMgrEmailNotificationDelay`) and notification retry
delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
atc0005 added a commit that referenced this issue Jul 11, 2020
BACKSTORY

As I went to start work towards implementing email notification
support per GH-3, I was struck by how unclear the arguments to the
`emailNotifier` function (and similarly, the `teamsNotifier` function)
were. The difference between `config.NotifyMgrEmailNotificationDelay`
and `cfg.EmailNotificationDelay()` were not immediately clear and it
took longer than I'd like to admit to fully backtrack and confirm
not only the original intent, but the current behavior surrounding
those settings.

This commit attempts to clarify not only the original intent, but
the purpose behind each related portion of code surrounding what
I now understand to be handling the rate limit and notification
retry delay.

CHANGES

- documentation
  - README.md
    - update "feature" entry to emphasize configurable retries, but
      hard-coded notifications rate limit (hopefully soon to change)
  - configure.md
    - updated description
    - updated command-line flag name
    - updated environment variable name
    - updated help text/description for flag
  - doc.go
    - explicitly note that it is the retry delay that is configurable

- configuration file
  - rename `delay` to `retry_delay`
  - add missing doc comments for each msteams section setting

- code
  - search/replace across codebase to replace `Delay` with `RetryDelay`
    - this includes renaming the `Delay` field for the `MSTeams` type,
      and its associated flag, environment variable and TOML config
      file setting
  - replace `sendDelay` parameter with `sendRateLimit` to better
    communicate what the incoming value to `teamsNotifier` and
    `emailNotifier` is intended for

  - doc comments
    - fixes to correct wrong information
    - fixes to better communicate the intent and real behavior

REFERENCES

- refs GH-85
atc0005 added a commit that referenced this issue Jul 11, 2020
SUMMARY

As a follow-up to GH-87, this commit exposes a new setting that
allows directly configuring the rate limit used for Teams
notifications. GH-3 will be responsible for exposing the same
setting for email notifications.

CHANGES

- update documentation to note new setting
  - README
  - GoDoc
  - configure

- configuration file
  - new `rate_limit` setting

- code
  - remove rate limit constants in favor of external settings (email
    rate limit setting scheduled for GH-3) and default const value if
    not specified externally
  - new `TeamsNotificationRateLimit` getter func
  - new `EmailNotificationRateLimit` getter func
    - stub only; further work scheduled for GH-3

REFERENCES

refs GH-87, GH-85, GH-19, GH-3
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant