Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added alertmanager client package #873

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions cmd/kured/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,13 @@ var (
postRebootNodeLabels []string
nodeID string
concurrency int

rebootDays []string
rebootStart string
rebootEnd string
timezone string
annotateNodes bool
alertManagerURL string
alertManagerToken string
rebootDays []string
rebootStart string
rebootEnd string
timezone string
annotateNodes bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: As I don't think we preserve the order of the addition vars in this section, we might as well sort them alphabetically?


// Metrics
rebootRequiredGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{
Expand Down Expand Up @@ -207,6 +208,11 @@ func NewRootCommand() *cobra.Command {
rootCmd.PersistentFlags().StringVar(&messageTemplateReboot, "message-template-reboot", "Rebooting node %s",
"message template used to notify about a node being rebooted")

rootCmd.PersistentFlags().StringVar(&alertManagerURL, "alert-manager-url", "",
"alertmanager URL for getting silencers")
rootCmd.PersistentFlags().StringVar(&alertManagerToken, "alert-manager-token", "",
"alertmanager token for authenticating")

rootCmd.PersistentFlags().StringArrayVar(&podSelectors, "blocking-pod-selector", nil,
"label selector identifying pods whose presence should prevent reboots")

Expand Down Expand Up @@ -387,6 +393,7 @@ func (pb PrometheusBlockingChecker) isBlocked() bool {
if count > 10 {
alertNames = append(alertNames[:10], "...")
}

if count > 0 {
log.Warnf("Reboot blocked: %d active alerts: %v", count, alertNames)
return true
Expand Down
2 changes: 2 additions & 0 deletions kured-ds-signal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -98,3 +98,5 @@ spec:
# - --annotate-nodes=false
# - --lock-release-delay=30m
# - --log-format=text
# - --alert-manager-url=""
# - --alert-manager-token=""
121 changes: 121 additions & 0 deletions pkg/alerts/alertmanager/client.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
package alertmanager

import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/url"
"path"
"time"
)

const (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using the alertmanager cli go module instead?
Would it be more complex to maintain?

alertManagerPathPrefix = "/api/v2"
// the default context timeout for alert manager client
// feel free to change this value/set a corresponding env var if needed
defaultTimeOut = 10 * time.Second
)

// New is a constructor of AlertManagerClient
//
// if no url flag is given => error
func New(alertManagerURL, alertManagerToken string) (*Client, error) {
if alertManagerURL == "" {
return nil, fmt.Errorf("no alert manager url found")
}
return &Client{
Token: alertManagerToken,
HostURL: alertManagerURL,
Client: new(http.Client),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use cli-flags instead (they can be populated with env-variables as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for feedback. That's exactly what I needed to know!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}, nil
}

// Status builds the Status endpoint
func (c *Client) Status() *StatusEndpoint {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's your intention behind this?

I missed where the statusEndpoint (and its Get()) is used.

Do you intend to use Status() to alter the behaviour of kured? For example, if alertmanager is not reachable, Status would return an error, who would prevent a reboot?

I like the concept, just not understanding where we are heading here (sorry if I missed a part of the code!)

return &StatusEndpoint{
Client: *c,
}
}

// Silences builds the Silences endpoint
func (c *Client) Silences() *SilencesEndpoint {
return &SilencesEndpoint{
Client: *c,
}
}

// BuildURL builds the full URL for Status Endpoint
func (s *StatusEndpoint) BuildURL() error {
url, err := url.Parse(s.HostURL)
if err != nil {
return err
}
url.Path = path.Join(alertManagerPathPrefix, "status")
s.FullURL = url.String()
return nil
}

// Get receives information about alert manager overall status
func (s *StatusEndpoint) Get() (*StatusResponse, error) {
err := s.BuildURL()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this BuildURL() method exist, while the StatusEndpoint data structure receives the FullURL (which I suppose contains status)?

If FullURL does not intend to be the complete URL (status included) and only intends to be base URL, then I guess having this logic could make sense.

However, in that case, I think I would implement it using the standard go net/url instead, because we have a recent enough version of go. This allows us to remove BuildURL() completely.

For example, in statusendpoint get(), I would do url.JoinPath(s.baseUrl,"status") (see https://pkg.go.dev/net/[email protected]#JoinPath).

This automatically parses the url and will output any errors of parsing: https://cs.opensource.google/go/go/+/refs/tags/go1.20:src/net/url/url.go;l=1262.

Hence the purpose of L61 of this file is still a one liner, yet it removes the whole BuildURL.

On top of that, I feel it is easier to grasp the Get() if you have all the details there.

if err != nil {
return nil, err
}
ctx, cancel := context.WithTimeout(context.Background(), defaultTimeOut)
defer cancel()
request, err := http.NewRequestWithContext(ctx, http.MethodGet, s.FullURL, nil)
if err != nil {
return nil, err
}
request.Header.Add("Authentication", fmt.Sprintf("Bearer %s", s.Token))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if there is no auth? Do we still have to add the header?

For example if AlertManager is empty.

response, err := s.Client.Client.Do(request)
if err != nil {
return nil, err
}
responseObject := new(StatusResponse)
err = json.NewDecoder(response.Body).Decode(responseObject)
ckotzbauer marked this conversation as resolved.
Show resolved Hide resolved
if err != nil {
return nil, err
}
return responseObject, nil
}

// BuildURL builds the full URL for silences Endpoint
func (s *SilencesEndpoint) BuildURL() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reasoning as above.

url, err := url.Parse(s.HostURL)
if err != nil {
return err
}
url.Path = path.Join(alertManagerPathPrefix, "silences")
s.FullURL = url.String()
return nil
}

// Get lists the silences
func (s *SilencesEndpoint) Get() ([]GettableSilence, error) {
err := s.BuildURL()
if err != nil {
return nil, err
}
ctx, cancel := context.WithTimeout(context.Background(), defaultTimeOut)
defer cancel()
request, err := http.NewRequestWithContext(ctx, http.MethodGet, s.FullURL, nil)
if err != nil {
return nil, err
}
request.Header.Add("Authentication", fmt.Sprintf("Bearer %s", s.Token))
response, err := s.Client.Client.Do(request)
if err != nil {
return nil, err
}
responseObject := make([]GettableSilence, 0)
err = json.NewDecoder(response.Body).Decode(&responseObject)
if err != nil {
return nil, err
}
if err := ValidateStatus(responseObject); err != nil {
return nil, err
}
return responseObject, nil
}
ckotzbauer marked this conversation as resolved.
Show resolved Hide resolved
95 changes: 95 additions & 0 deletions pkg/alerts/alertmanager/types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
package alertmanager

import (
"fmt"
"net/http"
)

var (
silenceStates = map[string]bool{"expired": true, "active": true, "pending": true}
Copy link
Collaborator

@evrardjp evrardjp May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment or another data structure would help here, as the intent is not obvious (our goal should not be to have our own alertmanager client, just the minimum we need for the functionality required)

)

// Client is the object of the alert manager client
type Client struct {
Token string `json:"token" yaml:"token"`
HostURL string `json:"hostUrl" yaml:"hostUrl"`
Client *http.Client `json:"client" yaml:"client"`
}

// StatusEndpoint is the status enpoint of the alert manager client
type StatusEndpoint struct {
Client `json:"alertmanagerClient" yaml:"alertmanagerClient"`
FullURL string `json:"fullUrl" yaml:"fullUrl"`
}

// SilencesEndpoint is the silences enpoint of the alert manager client
type SilencesEndpoint struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the same structure?

If you still wish to use a not complete url and keep the different data types (did you think of interfaces?), you could then use a variable in such structure for relative endpoint, which could be "status" for StatusEndpoint. This could be filled by default with tags. It would improve readability IMO, but less than including the whole mumbojumbo in the Get().

... On top of that, I prefer the endpoint being the full URL to be honest :) I avoid the temptation to guess.

Client `json:"alertmanagerClient" yaml:"alertmanagerClient"`
FullURL string `json:"fullUrl" yaml:"fullUrl"`
}

// StatusResponse is the object returned when sending GET $(host_url)$(path_prefix)/status request
type StatusResponse struct {
Cluster ClusterStatus `json:"cluster" yaml:"cluster"`
VersionInfo VersionInfo `json:"versionInfo" yaml:"versionInfo"`
Config Config `json:"alertmanagerConfig" yaml:"alertmanagerConfig"`
Uptime string `json:"uptime" yaml:"uptime"`
}

// ClusterStatus is the status of the cluster
type ClusterStatus struct {
Name string `json:"name" yaml:"name"`
Status string `json:"status" yaml:"status"`
Peers []PeerStatus `json:"peers" yaml:"peers"`
}

// PeerStatus is part of get status response
type PeerStatus struct {
Name string `json:"name" yaml:"name"`
Address string `json:"address" yaml:"address"`
}

// VersionInfo contains various go and alert manager version info
type VersionInfo struct {
Version string `json:"version" yaml:"version"`
Revision string `json:"revision" yaml:"revision"`
Branch string `json:"branch" yaml:"branch"`
BuildUser string `json:"buildUser" yaml:"buildUser"`
BuildData string `json:"buildData" yaml:"buildData"`
GoVersion string `json:"goVersion" yaml:"goVersion"`
}

// Config contains a string
type Config struct {
Original string `json:"original" yaml:"original"`
}

// GettableSilence is the response when sending GET $(host_url)$(path_prefix)/silences request
type GettableSilence struct {
ID string `json:"id" yaml:"id"`
Status SilenceStatus `json:"status" yaml:"status"`
UpdatedAt string `json:"updatedAt" yaml:"updatedAt"`
}

// SilenceStatus shows the state of the silence
type SilenceStatus struct {
State string `json:"state" yaml:"state"`
}

// Validate is validating if the status string corresponds to any of the pre-defined dict elements
func (s SilenceStatus) Validate() error {
if !silenceStates[s.State] {
return fmt.Errorf("such silence state does not exist: %s", s.State)
}
return nil
}

// ValidateStatus is checking the whole slice of GettableSilences if silence.status has the right values
func ValidateStatus(g []GettableSilence) error {
for _, silence := range g {
if err := silence.Status.Validate(); err != nil {
return err
}
}
return nil
}
Loading