Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce heartbeats #190

Merged
merged 25 commits into from
Jul 29, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
700260f
begin heartbeat work
jaronoff97 Apr 24, 2024
df7dd11
spec
jaronoff97 May 7, 2024
5bf4782
Merge branch 'main' of github.com:open-telemetry/opamp-spec into intr…
jaronoff97 May 7, 2024
5a10b74
Merge branch 'main' of github.com:open-telemetry/opamp-spec into intr…
jaronoff97 May 7, 2024
0655586
did the thing
jaronoff97 May 7, 2024
7cb4ecc
Apply suggestions from code review
jaronoff97 May 7, 2024
2ae4799
revert some things
jaronoff97 May 8, 2024
b24c948
Merge branch 'introduce-heartbeats' of github.com:jaronoff97/opamp-sp…
jaronoff97 May 8, 2024
85af35f
Add heartbeat capability
jaronoff97 May 10, 2024
ac174a7
Update from feedback
jaronoff97 Jun 24, 2024
3c92d98
Merge branch 'main' of github.com:open-telemetry/opamp-spec into intr…
jaronoff97 Jun 24, 2024
d6abecf
feedback
jaronoff97 Jul 9, 2024
d49a956
update from feedback
jaronoff97 Jul 10, 2024
329ff3d
Update specification.md
jaronoff97 Jul 10, 2024
6a9eeb7
remove optional in spec.md
jaronoff97 Jul 23, 2024
4c18536
Merge branch 'introduce-heartbeats' of github.com:jaronoff97/opamp-sp…
jaronoff97 Jul 23, 2024
1c2dc9f
Apply suggestions from code review
jaronoff97 Jul 26, 2024
e0b4331
updates
jaronoff97 Jul 26, 2024
8534843
update retry after
jaronoff97 Jul 26, 2024
0d0ee53
more minor changes
jaronoff97 Jul 26, 2024
082c1d7
Apply suggestions from code review
jaronoff97 Jul 26, 2024
bccefa0
Updates from PR feedback for better linking
jaronoff97 Jul 29, 2024
7d50075
Merge branch 'introduce-heartbeats' of github.com:jaronoff97/opamp-sp…
jaronoff97 Jul 29, 2024
88d6b9f
Apply suggestions from code review
jaronoff97 Jul 29, 2024
85e8f08
remove extra space
jaronoff97 Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion proto/opamp.proto
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,19 @@ message OpAMPConnectionSettings {
// This field is optional: if omitted the client SHOULD NOT use a client-side certificate.
// This field can be used to perform a client certificate revocation/rotation.
TLSCertificate certificate = 3;

// The Agent MUST periodically send an AgentToServer message if the
// AgentCapabilities_ReportsHeartbeat capability is true. At a minimum the instance_uid
// field MUST be set. It is recommended that the Agent also set ComponentHealth as well.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
//
// A Polling-based HTTP Client MUST use the value as polling interval.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
//
// A heartbeat is used to keep a load balancer connection active and inform the server that the Agent
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
// is still alive and active.
//
// This field is optional:
// if the capability is true but this field has no value, a default heartbeat interval of 30 seconds should be used.
optional uint64 heartbeat_interval_seconds = 4;
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
}

// The TelemetryConnectionSettings message is a collection of fields which comprise an
Expand Down Expand Up @@ -635,7 +648,10 @@ enum AgentCapabilities {
AgentCapabilities_ReportsHealth = 0x00000800;
// The Agent will report RemoteConfig status via AgentToServer.remote_config_status field.
AgentCapabilities_ReportsRemoteConfig = 0x00001000;

// The Agent will report heartbeats on a default interval of 30s.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
// This is specified by the ServerToAgent.OpAMPConnectionSettings.heartbeat_interval_seconds field.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
// Status: [Beta]
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
AgentCapabilities_ReportsHeartbeat = 0x00002000;
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
// Add new capabilities here, continuing with the least significant unused bit.
}

Expand Down
42 changes: 34 additions & 8 deletions specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ Status: [Beta]
- [OpAMPConnectionSettings.destination_endpoint](#opampconnectionsettingsdestination_endpoint)
- [OpAMPConnectionSettings.headers](#opampconnectionsettingsheaders)
- [OpAMPConnectionSettings.certificate](#opampconnectionsettingscertificate)
- [OpAMPConnectionSettings.heartbeat_interval_seconds](#opampconnectionsettingsheartbeat_interval_seconds)
+ [TelemetryConnectionSettings](#telemetryconnectionsettings)
- [TelemetryConnectionSettings.destination_endpoint](#telemetryconnectionsettingsdestination_endpoint)
- [TelemetryConnectionSettings.headers](#telemetryconnectionsettingsheaders)
Expand Down Expand Up @@ -229,6 +230,7 @@ OpAMP supports the following functionality:
[OTLP](https://opentelemetry.io/docs/specs/otlp/)-compatible
backend to monitor Agent's process metrics such as CPU or RAM usage, as well
as Agent-specific metrics such as rate of data processing.
* Agent heartbeating.
* Management of downloadable Agent-specific packages.
* Secure auto-updating capabilities (both upgrading and downgrading of the
Agents).
Expand Down Expand Up @@ -357,8 +359,7 @@ The format of each WebSocket message is the following:
```

The unencoded `header` is a 64 bit unsigned integer. In the WebSocket message the 64 bit
unencoded `header` value is encoded into bytes using [Base 128 Varint](
https://developers.google.com/protocol-buffers/docs/encoding#varints) format. The
unencoded `header` value is encoded into bytes using [Base 128 Varint](https://developers.google.com/protocol-buffers/docs/encoding#varints) format. The
number of the bytes that the encoded `header` uses depends on the value of unencoded
`header` and can be anything between 1 and 10 bytes.

Expand All @@ -369,8 +370,7 @@ compliant with this specification SHOULD check that the value of the `header` is
to 0 and if it is not SHOULD assume that the WebSocket message is malformed.

The `data` field contains the bytes that represent the AgentToServer or ServerToAgent
message encoded in [Protobuf binary wire format](
https://developers.google.com/protocol-buffers/docs/encoding).
message encoded in [Protobuf binary wire format](https://developers.google.com/protocol-buffers/docs/encoding).

Note that both `header` and `data` fields contain a variable number of bytes.
The decoding Base 128 Varint algorithm for the `header` knows when to stop based on the
Expand Down Expand Up @@ -417,6 +417,11 @@ message may also be sent by the Client in response to the Server making a remote
configuration offer to the Agent and Agent reporting that it accepted the
configuration.

If the client has enabled the ReportsHeartbeat capability, the websocket transport
will send a heartbeat message to keep the websocket connection alive. By default,
a 30s interval is used. Without heartbeats, the websocket transport may be closed
unexpectedly by the network if the connection idles for too long.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved

See sections under the [Operation](#operation) section for the details of the
message sequences.

Expand Down Expand Up @@ -444,9 +449,12 @@ deliver to the Agent (such as for example a new remote configuration).

The default polling interval when the Agent does not have anything to deliver is 30
seconds. This polling interval SHOULD be configurable on the Client.
If the server sets OpAMPConnectionSettings.heartbeat_interval_seconds, the client MUST
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
use that for its polling interval.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved

When using HTTP transport the sequence of messages is exactly the same as it is
when using the WebSocket transport. The only difference is in the timing:

- When the Server wants to send a message to the Agent, the Server needs to wait
for the Client to poll the Server and establish an HTTP request over which the Server's
message can be sent back as an HTTP response.
Expand Down Expand Up @@ -579,7 +587,10 @@ enum AgentCapabilities {
ReportsHealth = 0x00000800;
// The Agent will report RemoteConfig status via AgentToServer.remote_config_status field.
ReportsRemoteConfig = 0x00001000;

// The Agent can report heartbeats on a default interval of 30s.
// This is specified by the ServerToAgent.OpAMPConnectionSettings.heartbeat_interval_seconds field.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
// Status: [Beta]
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
ReportsHeartbeat = 0x00002000;
// Add new capabilities here, continuing with the least significant unused bit.
}
```
Expand Down Expand Up @@ -923,7 +934,7 @@ message ServerToAgentCommand {
```

The ServerToAgentCommand message is sent when the Server wants the Agent to restart.
This message must only contain the command, instance_uid, and capabilities fields. All other fields
This message must only contain the command, instance_uid, and capabilities fields. All other fields
will be ignored.

## Operation
Expand Down Expand Up @@ -1127,8 +1138,8 @@ runs.
The following attributes SHOULD be included:

- os.type, os.version - to describe where the Agent runs.
- host.* to describe the host the Agent runs on.
- cloud.* to describe the cloud where the host is located.
- host.\* to describe the host the Agent runs on.
- cloud.\* to describe the cloud where the host is located.
- any other relevant Resource attributes that describe this Agent and the
environment it runs in.
- any user-defined attributes that the end user would like to associate with
Expand Down Expand Up @@ -1606,6 +1617,7 @@ connection types.
```

The sequence is the following:

- (1) The Client connects to the Server. The Client SHOULD use regular TLS and validate
the Server's identity. The Agent may also use a bootstrap client certificate that is
already trusted by the Server. (Note: the distribution and installation method of
Expand Down Expand Up @@ -1830,6 +1842,7 @@ message OpAMPConnectionSettings {
string destination_endpoint = 1;
Headers headers = 2;
TLSCertificate certificate = 3;
optional uint64 heartbeat_interval_seconds = 4;
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
}
```

Expand All @@ -1855,6 +1868,19 @@ for this connection.
This field is optional: if omitted the client SHOULD NOT use a client-side certificate.
This field can be used to perform a client certificate revocation/rotation.

##### OpAMPConnectionSettings.heartbeat_interval_seconds

If the ReportsHeartbeat capability is true, the Client MUST use the offered heartbeat
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
interval to periodically send an AgentToServer message. At a minimum the instance_uid
field MUST be set. It is recommended that the Agent also set ComponentHealth as well.
An HTTP based-client MUST use the heartbeat interval as its polling interval.
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved

A heartbeat is used to keep a load balancer connection active and inform the server that
jaronoff97 marked this conversation as resolved.
Show resolved Hide resolved
the Agent is still alive and active. A server could use the heartbeat to make decisions about
the liveness of the connected Agent.

A default of a 30s should be used if not set by the OpAMPConnectionSettings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As brought up in #183, should we write a note about clients respecting values negotiated from the server or coming from a backpressure mechanism (eg. a Retry-After header?)


#### TelemetryConnectionSettings

The TelemetryConnectionSettings message is a collection of fields which comprise an
Expand Down
Loading