Skip to content
This repository has been archived by the owner on Nov 2, 2023. It is now read-only.

Performance monitoring #156

Merged
merged 32 commits into from
Oct 22, 2020
Merged

Performance monitoring #156

merged 32 commits into from
Oct 22, 2020

Conversation

Julio-Guerra
Copy link
Collaborator

@Julio-Guerra Julio-Guerra commented Sep 23, 2020

Monitor the execution time of requests protected by Sqreen. Optionally, it is now possible to set the maximum amount of time Sqreen is allowed to run per request: Sqreen's monitoring and protections will only run for the given amount of time. This option is disabled by default and should be used with caution as it can lead to partially protected requests.

The resulting performance monitoring diagrams and setting are available at https://my.sqreen.com/application/goto/settings/performance.

Note that the execution time monitoring diagram cannot be used as a strict Application Performance Monitoring diagram as it is based on a lossy exponential time-interval representation.

Julio Guerra added 10 commits October 14, 2020 16:44
Add a new type of metrics for binning metrics and explicitely rename the
previous sum metrics store types to make the API clearer.
Add a thread-safe shared stopwatch implementation accouting time duration
between the first goroutine starting it and the last stopping. It will be used
to compute sqreen's execution time per request for the simplest preformance
monitoring level 1. It indeed allows to read time less frequenttl than multiple
detailed timers.
Introduce a complete callback framework providing a fully managed and abstraced
callback-implementation API. Two new interfaces were introduced for that:

1. The RuleContext is the interface with the security rule the callback is
   serving. It hides implementation details such as the rule name, the blocking
   mode, etc. This context is provided at callback instantiation.

2. The CallbackContext is the result of a RuleContext within a given
   ProtectionContext. It is obtained by calling the RuleContext's Pre() and
   Post() methods, both expecting a function closure having the CallbackContext
   as argument. And this is how the callback gets wrapped with rule- and
   protection-specific features. The closure also allows to do the bridge with
   the hooked function for example to set its return values when blocking.

This patch ports every callback to this new architecture.
Some improvements - still not perfect - simplifying the protection API and
expectations regarding the agent.
@Julio-Guerra Julio-Guerra added this to the v1.0.0 milestone Oct 22, 2020
@Julio-Guerra Julio-Guerra self-assigned this Oct 22, 2020
@Julio-Guerra Julio-Guerra added enhancement New feature or request internals Internal feature labels Oct 22, 2020
@Julio-Guerra Julio-Guerra merged commit e4f8db2 into dev Oct 22, 2020
@Julio-Guerra Julio-Guerra deleted the feature/performance-monitoring branch October 22, 2020 20:26
@Julio-Guerra Julio-Guerra added the agent Agent feature label Nov 19, 2020
@Julio-Guerra Julio-Guerra mentioned this pull request Nov 19, 2020
Julio-Guerra pushed a commit that referenced this pull request Nov 19, 2020
New Features:

- **(#172) New SDK convenience function:**
  Add a new helper function `sdk.FromRequest()` allowing to retrieve Sqreen's
  request context and perform SDK calls. It is equivalent to
  `sdk.FromContext(r.Context())`.

- **(#156) Performance monitoring:**
  Monitor the execution time of requests protected by Sqreen. Optionally, it is
  possible to enforce the maximum amount of time Sqreen is allowed to run per
  request: Sqreen's monitoring and protections will only run for the given
  amount of time. This option is disabled by default and should be used with
  caution as it can lead to partially protected requests.
  The resulting performance monitoring diagrams and setting are available at
  <https://my.sqreen.com/application/goto/settings/performance>.
  Note that the execution time diagram cannot be used as a strict Application
  Performance Monitoring diagram as it is based on a lossy representation. It
  gives rough estimates of the actual execution time.

- **(#170) Transparent response writer instrumentation:**
  Make the HTTP response writer instrumentation transparent by providing the
  same set of interfaces as the instrumented HTTP response writer. The set of
  interfaces is currently every optional `net/http` response writer interface,
  along with some relevant `io` interfaces, among which:
    - `http.Flusher`: for HTTP streaming support (multipart, chunked...).
    - `http.Pusher`: for HTTP2 server push support.
    - `http.Hijacker`: for websocket server support (experimental).
    - `io.ReaderFrom`: for optimized copies (eg. file copies)
    - `io.WriteString`: for optimized string copies.

- **(#163) HTTP status code 404 (not found) monitoring:**
  Automatically log a security event when the response status code is 404. This
  event is used by an internal Sqreen backend playbook to detect security scans.

- **(#163) Scalable security event throughput:**
  To be able to handle a higher throughput of security events, the agent can now
  scale its number of goroutines. An extra goroutine is created every time the
  internal event queue is full, up to the number of available CPUs. Note that
  the agent still drops security events when the event queue is full in order to
  avoid slowing down the host application.

- **(#165) Agent errors in the request hot-path:**
  To avoid slowing down request handlers, agent errors happening in the request
  hot path are now logged based on an exponential backoff algorithm.
  This is disabled when the agent log level is `debug`.

Breaking Change:

- **(#168) SDK return values:**
  The SDK function and method return values are no longer pointer values but Go
  interface values. This may break integrations using explicit SDK return types,
  and we recommend to instead use type-inference when possible. This change will
  allow us to transparently change the actual return values without involving
  any further breaking change.
  As of today, the actual return value is a structure small enough to be
  returned by value in order to save memory-allocation and garbage-collection
  time. Returning an interface value allows to hide such implementation detail.

Fixes:

- **(#167) Playbook security response events:**
  Fix playbook security response events (blocking or redirecting a user or ip)
  so that Sqreen's dashboard can properly display them and link them to their
  source playbook.

- **(#169) SQL-injection protection with Elastic APM:**
  Fix the detection of the SQL dialect when the SQL driver is instrumented by
  Elastic's APM tracer. This requires Elastic's Go agent version greater than
  `v1.9.0`.

- **(#164) Echo middleware:**
  Fix the response status code monitoring when Echo's request handlers return an
  error.

- **(#166) Gin middleware:**
  Fix the response content-length monitoring of default responses
  (ie. when the handler does nothing).
Julio-Guerra pushed a commit that referenced this pull request Nov 19, 2020
New Features:

- **(#172) New SDK convenience function:**
  Add a new helper function `sdk.FromRequest()` allowing to retrieve Sqreen's
  request context and perform SDK calls. It is equivalent to
  `sdk.FromContext(r.Context())`.

- **(#156) Performance monitoring:**
  Monitor the execution time of requests protected by Sqreen. Optionally, it is
  possible to enforce the maximum amount of time Sqreen is allowed to run per
  request: Sqreen's monitoring and protections will only run for the given
  amount of time. This option is disabled by default and should be used with
  caution as it can lead to partially protected requests.
  The resulting performance monitoring diagrams and setting are available at
  <https://my.sqreen.com/application/goto/settings/performance>.
  Note that the execution time diagram cannot be used as a strict Application
  Performance Monitoring diagram as it is based on a lossy representation. It
  gives rough estimates of the actual execution time.

- **(#170) Transparent response writer instrumentation:**
  Make the HTTP response writer instrumentation transparent by providing the
  same set of interfaces as the instrumented HTTP response writer. The set of
  interfaces is currently every optional `net/http` response writer interface,
  along with some relevant `io` interfaces, among which:
    - `http.Flusher`: for HTTP streaming support (multipart, chunked...).
    - `http.Pusher`: for HTTP2 server push support.
    - `http.Hijacker`: for websocket server support (experimental).
    - `io.ReaderFrom`: for optimized copies (eg. file copies)
    - `io.WriteString`: for optimized string copies.

- **(#163) HTTP status code 404 (not found) monitoring:**
  Automatically log a security event when the response status code is 404. This
  event is used by an internal Sqreen backend playbook to detect security scans.

- **(#163) Scalable security event throughput:**
  To be able to handle a higher throughput of security events, the agent can now
  scale its number of goroutines. An extra goroutine is created every time the
  internal event queue is full, up to the number of available CPUs. Note that
  the agent still drops security events when the event queue is full in order to
  avoid slowing down the host application.

- **(#165) Agent errors in the request hot-path:**
  To avoid slowing down request handlers, agent errors happening in the request
  hot path are now logged based on an exponential backoff algorithm.
  This is disabled when the agent log level is `debug`.

Breaking Change:

- **(#168) SDK return values:**
  The SDK function and method return values are no longer pointer values but Go
  interface values. This may break integrations using explicit SDK return types,
  and we recommend to instead use type-inference when possible. This change will
  allow us to transparently change the actual return values without involving
  any further breaking change.
  As of today, the actual return value is a structure small enough to be
  returned by value in order to save memory-allocation and garbage-collection
  time. Returning an interface value allows to hide such implementation detail.

Fixes:

- **(#167) Playbook security response events:**
  Fix playbook security response events (blocking or redirecting a user or ip)
  so that Sqreen's dashboard can properly display them and link them to their
  source playbook.

- **(#169) SQL-injection protection with Elastic APM:**
  Fix the detection of the SQL dialect when the SQL driver is instrumented by
  Elastic's APM tracer. This requires Elastic's Go agent version greater than
  `v1.9.0`.

- **(#164) Echo middleware:**
  Fix the response status code monitoring when Echo's request handlers return an
  error.

- **(#166) Gin middleware:**
  Fix the response content-length monitoring of default responses
  (ie. when the handler does nothing).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
agent Agent feature enhancement New feature or request internals Internal feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant