Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a builtin action for triggering webhooks #43538

Merged
merged 35 commits into from
Aug 23, 2019

Conversation

gmmorris
Copy link
Contributor

@gmmorris gmmorris commented Aug 19, 2019

Adds a builtin action for triggering Webhooks.
Closes #40026

This allows triggering a remote Webhook, almost providing feature parity with Watcher.
Remaining features available in Watcher will be implemented as generic Action abilities as part of https://github.com/elastic/kibana/projects/26#card-24087404

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

}),
method: schema.oneOf(
[
schema.literal('head'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head seems pretty weird here, I guess maybe for testing or something? Likewise delete, and even get - I typically think of webhook as "url and request body that does something interesting", but obviously could be viewed more generally. Thinking we might just start with post and put, expand to other methods if there's an actual requirement for it - will at least cut down on testing :-) . Also, default to post which is what I think I usually see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduces to POST & PUT.
I considered keeping GET but a quick 20 minutes search found very very few webhooks implementing GET, so I think it's best to remove it for now, as it's very easy to bring back later if we need it.

function validateConfig(configObject: any): string | void {
const config: ActionTypeConfigType = configObject;

const { read_timeout: readTimeout, connection_timeout: connectionTimeout } = config;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout is another one of these kinda general things, like http proxy, that I feel like we need to somehow expose higher, for all actions; and then actions could make use of it somehow, by passing that value into http calls as the timeout. There's a card open for this: https://github.com/elastic/kibana/projects/26#card-24087404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect all actions to be HTTP style actions?
We don't want to end up with Proxy settings for some kind of more "internal" action that interacts with Kibana for example.

// secrets definition
export type ActionTypeSecretsType = TypeOf<typeof SecretsSchema>;
const SecretsSchema = schema.object({
username: schema.string(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the other built-in types, I see we use user and password for email, wonder if we should informally standardize on that; we could then refactor that later into some reusable type (or just schema) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I actually considered that and set it as username because that's what the Watcher task has.
I don't feel strongly about this either way, so I'll change that to user so it's the same in both actions.

expect(validateSecrets(actionType, secrets)).toEqual(secrets);
});

test('secrets validation fails when secret password is omitted', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still getting used to how jest and mocha ends up printing test results, and basically I think they append together all the parent describe() strings, and then the test() string, so the secrets validation isn't really needed here. A lot (all?) of the action tests have this kind of duplication, would be nice to eventually clean up sometime, but not a biggie, and more important for mocha tests which do not provide line numbers, just the appended describe/test string:-)

@gmmorris
Copy link
Contributor Author

gmmorris commented Aug 19, 2019 via email

@pmuellr
Copy link
Member

pmuellr commented Aug 19, 2019

I was under the impression we were aiming to start with feature parity to
Watcher

high-level parity; don't need every bell and whistle especially if the semantics aren't clear (eg, url, host, port). The ultimate out is that customers can write their own actions - probably copying the webhook source - if they need something more advanced.

At a bare minimum, we could use the Slack (and upcoming PagerDuty) simulators that run under FTS as example webhook receivers for webhooks functional tests. Can we build webhook actions in FTR that can successfully communicate with those server simulators (hosted by the FTS)?

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a bunch of comments. In general, looks great.

Per comments, I think we want to make sure our thrown errors don't include user data , and I'm worried about the @hapi/basic package usage.

Per comment, I'm also thinking we want to simplify the config url to just a string, instead of url | bag-of-url-components, but maybe there's an identified use case where we need this.


const HeadersSchema = schema.recordOf(schema.string(), schema.string());

export const CompositeUrlSchema = schema.object({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we need to go this far, beyond just supporting a plain ole url string. Are there existing use cases where dealing with urls by their internal component is required?

If this ends up being overkill - maybe even just for now - I'd be happy with fixing it in a subsequent PR, as long as it makes it makes the 7.4 feature freeze on Tuesday ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too fussed with the composite URL, as looking at examples online I've yet to find a case where the whole URL isn't good enough.
That said, this feature is already done, I don't feel it adds much cognitive load and isn't blocking us from making feature freeze.

I'll get the other points done and revisit this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly wondering if there's a use case tho; I don't think there's anything you can describe with pieces that you can't with a url string, right?

It does add code, and it increases the API surface area that we will have to support forever.

I vote to kick it off the island :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm leaning the same way.
Just want to get this PR merged as it's a 3 day weekend in the UK.

Since the CI takes an hour every time I don't want to push yet another change now.

I'll remove it on a follow up PR.


// secrets definition
export type ActionTypeSecretsType = TypeOf<typeof SecretsSchema>;
const SecretsSchema = schema.object({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess authToken will go here, eventually. The schema validation gets a little more complicated, what I've done is to add a top-level validation function (look at email, I think) which makes sure the combinations are correct (must have a user AND pass, not just one). There are some config-schema helpers for these (sibling constraints) that I haven't played with, figuring keeping it explicit with some plain ole TS code was simplest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the authtoken (which is already done locally, haven't pushed yet as still having trouble integration testing it) is an extension to this type using a "oneOf" relationship.

I didn't follow the validation point - is there a reason why using the typesystem to enforce both values being specified isn't good enough? Ideally I'd like to use the typesystem to enforce logic rather than imperative code as it catches the invalidity earlier and is generally easier to maintain.

The only thing I can think of that might make this an issue is what you mentioned about the savedObjects semantics... which I don't know enough about yet, so if I'm missing something here, please do say. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation code gets passed JSON.parse()'d object coming from http requests, so we can't depend on the type system for that, that I'm aware of :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be true at runtime, but doesn't doing it this way means that at compile time were type safe too?
The benefit would be that when we interact with our config object it's type safe and if you wish to trigger an action programmatically from within Kibana, not via the API, you still have types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it comes down to whether I want to wrestle with the type system vs doing some explicit checks. One of the downsides of using config-schema is that there may be some unknown unknowns.

It's super-nice to be able to use a type from a schema, but I'm most interested in just getting the properties with the right types. Extra validation beyond that doesn't really need to be part of that contract. Here's the email config validation:

function validateConfig(configObject: any): string | void {
// avoids circular reference ...
const config: ActionTypeConfigType = configObject;
// Make sure service is set, or if not, both host/port must be set.
// If service is set, host/port are ignored, when the email is sent.
// Note, not currently making these message translated, as will be
// emitted alongside messages from @kbn/config-schema, which does not
// translate messages.
if (config.service == null) {
if (config.host == null && config.port == null) {
return 'either [service] or [host]/[port] is required';
}
if (config.host == null) {
return '[host] is required if [service] is not provided';
}
if (config.port == null) {
return '[port] is required if [service] is not provided';
}
} else {
// service is not null
if (!isValidService(config.service)) {
return `[service] value "${config.service}" is not valid`;
}
}
}

I'm happy with doing some extra validation like that.

OTOH, I'm also open to using union types for [{user,password}, {authToken}], to see what it looks like, once we get there. That can be in a different PR, if so, create an issue or project card to track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be a matter of taste.

In my book, the the main value in types is that it gives the developer immediate feedback so they know whether what they're doing is legal or not.
If the type doesn't let you create an invalid object, you know immediately.
And since type systems are (usually) harder to break than business logic, I prefer to use it to express the constraints over imperative logic that is more likely to be changed when I, or someone else, has forgotten the context of the constraints.

But, as I said, it's a matter of taste.
I think I've become so accustomed to typed code now that I definitely feel more confident in them, than in my own BL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My ideal is that the "JSON -> Typed" conversion happens right at the edge (as in, when the JSON input enters the server) and from there on out, throughout your pipeline, its using Typescript so that it's impossible to misuse the input.

But that's much harder to do when introducing types into an existing system.

status: 'ok',
data: response.data,
};
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to be nice about retries in other built-in actions; if you get an explicit 429, or >= 500, we return the retry: true property in the action result; when run with task manager (eg, as part of an alert group firing), this means there's some automatic retry support for these, otherwise there isn't. Slack returns a specific header with 429, but I think we only check that for slack, to get a specific retry time (if we don't know, return retry: true).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I wasn't aware of this ability.

I've added this and unified the way Slack and Webhooks implement this.
This is mostly a refactoring, moving code around, but you'll note a small change in the Slack client which is that if you get a 429 we'll now retry with the default instead of dropping through to the generic error.
This feels right to me as a Too Many Requests is a client issue not server and so, we should encourage retrying after a short interval.
If the server is overloaded it'll likely respond with a 503/504, wouldn't you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a server is overloaded, it's LB will return a 50x responses, usually :-). Not sure how much consistency there is for particular ones, I think my test is usually just >= 500

}
}

export { isOk, asOk, isErr, asErr, promiseResult };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some reason the export for the functions is done this way, vs prefixing the relevant functions with export?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha stupid coding standard from my last company that I never liked.
Happy to change this (more than happy ;))

yarn.lock Outdated
@@ -2001,6 +2001,14 @@
normalize-path "^2.0.1"
through2 "^2.0.3"

"@hapi/basic@^5.1.1":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem a bit worrisome. I would have thought they would be in here already?

Hmm, I see we have an older hoek in the base package.json; is all this stuff compatible?

"hoek": "^5.0.4",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably starting at #kibana-operations is a good place to start, to find out if it's going to be kosher to mix the existing non-namespaced base hapi with this namespaced add-on package, and if not, how to resolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, as this is the first time I've ever used Hapi.
I assumed that this would only affect integration tests as it's installed by the Webhook simulator.
Is this not a safe assumption?

In anycase, I've removed it, as I don't actually need authentication, it was just a way of keepign the test code a little cleaner.
I now have a simple "extraction" scheme that gets the credentials out of Hapi when a call is made to the test endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would only affect integration tests as it's installed by the Webhook simulator

Ya, just the tests. But the tests. You don't want to debug obscure cross-version package issues in the functional tests :-)

@@ -27,6 +27,7 @@
"**/@types/node": "10.12.27"
},
"devDependencies": {
"@hapi/basic": "^5.1.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes down at the yarn.lock file re: adding this devDep. Worried because we're using "old" hapi bits (I think - at least not the @hapi name-spaced ones), so a little concerned that we could have some incompatibilities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, no longer an issue.
But in general though, if all the tests pass and this is only used in "devDependencies" and enabled in an integration test... do we not feel same enough to add such things?

In this case it was avoidable, but in the future it might not be.

Just trying to gauge how confident we are in our testing process so I know where to focus my attention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally comfortable adding devDeps - in this case it looked like it might have involved different versions of hapi - that's what got me concerned.

};
} else {
const message = i18n.translate('xpack.actions.builtin.webhook.invalidResponseErrorMessage', {
defaultMessage: `an http error {code} occurred in action "{id}", due to an unknown webhook event: {data}`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These error messages can make their way back to the user. They will likely also end up in history / audit logs. So, we want to make them smallish, and we want to make sure we don't leak data. For my more recent actions, I've been pretty skimpy on the error messages, not really passing anything back but http status codes, and config-schema validation messages.

Basically, we likely don't want to include data in this message :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, replaced with the status text.
I do see in Slack we do include the error.message in some cases though, is that not the same issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I've been learning as I'm going - we might be leaking something in the older actions; feel free to fix these up if you notice them.

I added a card standardize 3rd party http response handling ; I'm getting worried now that we'll need to make sure we return the http response in error results for things like 500 responses; for diagnostic purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.
I tried merging a few of the things across Slack and Webhooks, but had to revert a bunch because our static i18n analysis failed miserably.

I'll have to revisit it.

if (error.response) {
// The request was made and the server responded with a status code
// that falls out of the range of 2xx
return `remote webhook failure`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to include the http status here.

Also, I guess these need to get translated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I hadn't quite twigged the translation mechanism when i wrote that.
I've now unified the handlers for these cases and it's now possible to translate the "unreachable remote" which is the one case I don't actually have a status code/text for because Axios just omits the response object and request is an any, so who knows what they put in there.


// action executor
async function executor(execOptions: ActionTypeExecutorOptions): Promise<ActionTypeExecutorResult> {
const log = (msg: string) => execOptions.services.log(['warn', 'actions', 'webhook'], msg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warn seems a bit loud for these. Check the other built-in actions, I think we were trying to be fairly quiet, and even errors invoking an action would only ever be warn anyway; but we don't want to be flooding the log with tons of messages. Maybe create a debugLog() which you can use liberally, and warnLog() for warn-level things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I was following your example from PageDuty ;)

Simplified now to warn on error, but debug the rest of the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! I'll go look at PD - I might have had them artificially high during dev ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looking at PD now looks like you only did it for errors, but as that was where I got the reference form I reused it.

You're all good. :)

const { user: username, password } = execOptions.secrets as ActionTypeSecretsType;
const { body: data } = execOptions.params as ActionParamsType;

const result: Result<AxiosResponse, AxiosError> = await promiseResult(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a nice pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps make code so much cleaner, but I'm getting the handle on using it in TS.
I'll play around with how best to use it internally and share my findings.

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

return fromNullable(headers['retry-after'])
.map(retryAfter => parseInt(retryAfter, 10))
.filter(retryAfter => !isNaN(retryAfter))
.getOrElse(DEFAULT_RETRY_AFTER);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, the way I was thinking of handling 429/retry-after, is to NOT assume a default value, but return retry: true from the action, which is a hint to the caller (eg, alert runner) that the action should be retried, but we weren't told when, so the caller should decide.

With the default here, I guess, it would always return 60. I think it should return null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this issue in both Webhooks and Slack actions. 👍

Copy link
Member

@pmuellr pmuellr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@elasticmachine
Copy link
Contributor

💔 Build Failed

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@gmmorris gmmorris merged commit e8c50c0 into elastic:master Aug 23, 2019
gmmorris added a commit to gmmorris/kibana that referenced this pull request Aug 23, 2019
Adds the ability to trigger webhooks using an action.

This feature is currently locked off while we figure out the right privileges model.
pmuellr pushed a commit that referenced this pull request Aug 23, 2019
…3889)

Adds the ability to trigger webhooks using an action.

This feature is currently locked off while we figure out the right privileges model.
@njd5475 njd5475 mentioned this pull request Sep 13, 2019
19 tasks
@gmmorris gmmorris deleted the actions/webhooks branch November 21, 2019 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Webhook action type
4 participants