Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security Telemetry Refactor #109875

Merged
merged 12 commits into from
Sep 1, 2021
Merged

Security Telemetry Refactor #109875

merged 12 commits into from
Sep 1, 2021

Conversation

pjhampton
Copy link
Contributor

@pjhampton pjhampton commented Aug 24, 2021

Summary

Maturing the security solutions use of event and batch-based telemetry to improve user protections on the endpoint agent.

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@pjhampton pjhampton added Feature:Telemetry release_note:fix Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. release_note:skip Skip the PR/issue when compiling release notes auto-backport Deprecated - use backport:version if exact versions are needed v7.16.0 v8.0.0 and removed release_note:fix labels Aug 24, 2021
@@ -0,0 +1,274 @@
import { ElasticsearchClient, Logger, SavedObjectsClient } from 'src/core/server';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea, IMO, is that this module is going to capture ES queries inside the telemetry library. Let me know if this is a bad idea -- generally seemed good to abstract this portion from the actual transmission.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good idea in theory, but the execution is problematic and doesn't set us up to adequately test this.
A more effective way of writing this code would be to invert control and pass the TelemetryQuerier as a dependency to the TelemetrySender class.

const telemetryQuerier = new TelemetryQuerier(...deps)
const telemetrySender = new TelemetrySender(...deps, telemetryQuerier)

By doing this the sender no longer needs to reference an SO Client, ES Client, Fleet Interface, etc and will get us closer to a Single Responsibility telemetry sender.

const telemetryQuerier = new TelemetryQuerier(...deps)
const telemetrySender = new TelemetrySender(...deps)
const telemetryOrchestrator = new TelemetryOrchestrator(telemetryQuerier, telemetrySender, ...deps)

I also consider the term Querier is overloaded here - Receiver would be a better noun to describe the interface interactions.
Let's pair on this bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ok! That'll be a good starting point. I'm good with the inversion of control, and it's better as it removes coupling between the sender and the receiver. Where do you think the Queue should reside?

@donaherc
Copy link
Contributor

It seems to me that the primary modules to abstract from the sender are into:

  1. Query
  2. Filter
  3. Submission (sender)

I started to split these pieces out accordingly.

import { exceptionListItemToEndpointEntry } from './helpers';
import { TelemetryEvent, ESLicense, ESClusterInfo, GetEndpointListResponse } from './types';

export class TelemetryReceiver {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donaherc I cleaned this class up a little and got the tests / type checks working again. Make sure to g pull

On working on this code, I'm not sure "Receiver" is the correct name. "Retriever" might be better. I think this because the way the data is being requested.

entity -> push -> receiver  -> sender -> ESTC
entity <- pull <- retriever -> sender -> ESTC

What do you reckon?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On meditation, I think that this might a why not both situation.
We have a receiver in the form of:

*Detection Engine* -> push -> receiver  -> sender -> ESTC

and a retreiver in:

* ES / SO / Fleet * <- pull <- retriever -> sender -> ESTC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a little confusing. I think I'm comfortable renaming in the PR we discussed as a follow-on (Telemetry Orchestrator). In my first set of commits I was definitely perceiving the behavior in the second example as primary, and by lines/functions I think the majority of the complexity in that module is the "pull from ES" functionality. Since we've decided that the Orchestrator will own the Queue, it seems that it'd serve as the receiver for now, but it we determine that there's enough functionality there, we can break out orchestrator functionality under the paradigm that you've described above, and both Tasks and the Detection Engine (streaming) approach can use the portions of the Orchestrator they need.

private exceptionListClient?: ExceptionListClient;
private soClient?: SavedObjectsClientContract;
private readonly max_records = 10_000;
private maxQueueSize = 100;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've duplicated this value in this class and the sender. We should move this into a constant reference.

throw Error('elasticsearch client is unavailable: cannot retrieve cluster infomation');
}

return this.getClusterInfo(this.esClient);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can / should collapse the private func getClusterInfo into this function as it is only called from here and nowhere else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

}

try {
const ret = await this.getLicense(this.esClient, true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I think we can get rid of getLicense and move the effect into this function. No point having these dispatches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@pjhampton
Copy link
Contributor Author

Killing it 💪 @donaherc

@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@pjhampton pjhampton marked this pull request as ready for review September 1, 2021 10:03
@pjhampton pjhampton requested a review from a team as a code owner September 1, 2021 10:03
@pjhampton
Copy link
Contributor Author

@elasticmachine merge upstream

@elastic elastic deleted a comment from elasticmachine Sep 1, 2021
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @pjhampton @donaherc

@pjhampton pjhampton enabled auto-merge (squash) September 1, 2021 16:54
Copy link
Contributor

@donaherc donaherc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, as @pjhampton and I paired on it. Some added context for other reviewers... We're taking an incremental approach to break up some chunky modules into more intuitive smaller chunks.

To that end, this PR pulls purpose-specific behavior from the sender module and spreads it into other modules (filter,receiver), with some bonus information hiding. We'll then, in a subsequent PR, wrap these modules in a coordinating orchestrator module and then hide the vast majority of the implementation inside the library itself so that both the Detection Engine and Task portions need to know less about the underlying implementation. This will greatly improve our ability to test portions of this in the future, as well as allow other portions of the code to use the telemetry in the future.

@pjhampton pjhampton merged commit 4f0a63f into master Sep 1, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request Sep 1, 2021
* [@pjhampton/@donaherc] Move sec telem tasks into own package.

* Split filter out into its own module, started abstracting ES interaction into a queries module

* Implemented querier and fixed some types

* Updated tests, moved receiver to plugin from sender to decouple them.

* fixed integration in detection engine, misc fixes

* [@pjhampton] Fix type ref problems. Update test defs.

* Make url transformer a member func of the sender class.

* [@pjhampton] clean up receiver commentary.

* [@pjhampton] add null check consistency.

* Fix bad formatting.

Co-authored-by: cdonaher <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request Sep 2, 2021
* [@pjhampton/@donaherc] Move sec telem tasks into own package.

* Split filter out into its own module, started abstracting ES interaction into a queries module

* Implemented querier and fixed some types

* Updated tests, moved receiver to plugin from sender to decouple them.

* fixed integration in detection engine, misc fixes

* [@pjhampton] Fix type ref problems. Update test defs.

* Make url transformer a member func of the sender class.

* [@pjhampton] clean up receiver commentary.

* [@pjhampton] add null check consistency.

* Fix bad formatting.

Co-authored-by: cdonaher <[email protected]>
Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Pete Hampton <[email protected]>
Co-authored-by: cdonaher <[email protected]>
pjhampton added a commit that referenced this pull request Sep 22, 2021
* Resolve merge conflict.

* Add lost functionality in merge commit.
@spalger spalger deleted the pjhampton/telemetry-refactor branch May 8, 2022 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v7.16.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants