Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial telemetry setup #69330

Merged
merged 3 commits into from
Jul 14, 2020
Merged

Conversation

michaelolo24
Copy link
Contributor

@michaelolo24 michaelolo24 commented Jun 16, 2020

Summary

For reference, the original issue: https://github.com/elastic/protections-team/issues/161

This PR looks to retain the following:

Number of active endpoints
Endpoints have protections enabled or not (malware for now) (TBD).
List by customer (provided by the telemetry service, license.issued_to)
OS type
OS version

With the exception of policy details (for now), the above information is tracked in savedObjects owned by the ingest_manager team. Given our inability to access our indices directly due to security concerns, this provides the best current alternative.

Proposed telemetry schema, but a work in progress depending on discussion here as well as ability to obtain policy details. This description and PR will be updated accordingly as those details are finalized.

Schema conversation (including detections work): https://github.com/elastic/telemetry/issues/392

  {
    total_installed: 17;
    active_within_last_24_hours: 12;
	os: [
	  {
	    full_name: "Windows 10",
		platform: 'windows',
	    version: "10.0",
	    count: 10
	  },
	  {
	    full_name: "Windows Server 2012",
	    platform: 'windows',
	    version: "6.2"
	    count: 7
	  }
	],
	policies: { // TBD
		malware: {
			success: 8,
			warning: 4,
			failure: 5
		}
	}
  }

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@michaelolo24 michaelolo24 force-pushed the security-telemetry branch 3 times, most recently from 8f7925f to 790ff51 Compare July 7, 2020 15:53
@michaelolo24 michaelolo24 marked this pull request as ready for review July 7, 2020 19:27
@michaelolo24 michaelolo24 requested review from a team as code owners July 7, 2020 19:27
@elasticmachine
Copy link
Contributor

Pinging @elastic/endpoint-data-visibility-team (Team:Endpoint Data Visibility)

@michaelolo24 michaelolo24 added release_note:skip Skip the PR/issue when compiling release notes Feature:Telemetry labels Jul 7, 2020
Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the TODOs could you move those to github issues?

If we're not really getting endpoint data right now it might be worth switching the names to agent until we are actually getting endpoint specific data.


export const getDefaultEndpointMetadataTelemetry = (): EndpointMetadataTelemetry => ({
endpoints: { installed: 0, last_24_hours: 0 },
os: {} as Record<string, EndpointMetadataOSTelemtry>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does eslint complain if you don't do the as ... part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did before when I initially set this up, but have since fixed. Will remove, thanks

},
});

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A github issue might be a better spot to track the functionality implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'm gonna pull this out, this was more for my own eyes as I finished up the initial implementation

});
const uniqueHostIds: Set<string> = new Set();
const endpointTelemetry = getDefaultEndpointMetadataTelemetry();
endpointTelemetry.endpoints.installed = endpointAgents.length; // All agents who have the endpoint registered
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really tracking how many agents are installed? And we're assuming the endpoint has been installed if the agent has been installed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent references installed packages in the packages field here. So when we filter for it, this should give us all of the agents that have installed the endpoint. Right now I'm using system since I haven't been able to get the endpoint successfully installed via the agent 😞. @nchaulet , can you confirm my previous statement is accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just saw a unique case where I enrolled a second agent on an endpoint and got 2 installed instead of one. Using the unique hosts id's now to track the count to avoid this issue

};
}

export interface IngestManagerLocalMetadata extends AgentMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So fleet stores this information based on the endpoints that are connected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, successfully installed I believe

) {
const collector = usageCollector.makeUsageCollector({
type: SIEM_USAGE_COLLECTOR_TYPE,
isReady: () => true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to wait on the ingest manager being initialized? I suppose if no saved objects existed then we jus wouldn't find them right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea and we wouldn't send empty telemetry until the next time this runs. @TinaHeiligers is there a better way to implement this?

export const getDefaultEndpointMetadataTelemetry = (): EndpointMetadataTelemetry => ({
endpoints: { installed: 0, last_24_hours: 0 },
os: {} as Record<string, EndpointMetadataOSTelemtry>,
policies: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove these if we're not planning on implementing them for 7.9?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'll take them out unless something magical happens in the next day or two 😂

if (err.output?.statusCode === 404) {
return {};
}
return getDefaultEndpointMetadataTelemetry(); // TODO: no data returned if an error occured or should fail silently here?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wonder if we should return {} if an error occurs because we don't know the results? It's probably not very helpful to send 0s right?

Copy link
Contributor Author

@michaelolo24 michaelolo24 Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, you're right. I was back and forth on this and saw different implementations, but I agree that an empty object is definitely more informative.

export const getEndpointMetadataTelemetryFromFleet = async (
savedObjectsClient: ISavedObjectsRepository
) => {
const { saved_objects: endpointAgents } = await savedObjectsClient.find<AgentSOAttributes>({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelolo24 are these SOs global or space-aware? If you need to search across spaces you'll need to query the kibana index directly. You can see an example of that in rules/jobs telemetry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are globals? I was told that I could access the SO directly if they were listed as namepaceType: 'agnostic'. I'll take a look

const { host, os, elastic } = localMetadata as AgentLocalMetadata;

if (lastCheckin && new Date(lastCheckin) > aDayAgo) {
// Get agents that have checked in within the last 24 hours to later see if their endpoints are running
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we are able to get policy response details setup, we should be able to get this information in this metadata


endpointMetadataTelemetry.os = Object.values(osTracker);

// Handle Last 24 Hours
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the best way to know for certain that an endpoint is actively running is to check the latest status the agent has for the endpoint from it's events list

savedObjectsClient.find<Agent>({
type: AGENT_SAVED_OBJECT_TYPE,
fields: ['packages', 'last_checkin', 'local_metadata'],
filter: `${AGENT_SAVED_OBJECT_TYPE}.attributes.packages: ${ENDPOINT_PACKAGE_CONSTANT}`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of installed integrations are listed in packages


describe('test security solution endpoint telemetry', () => {
let mockSavedObjectsRepository: jest.Mocked<ISavedObjectsRepository>;
let getFleetSavedObjectsMetadataSpy: jest.SpyInstance<Promise<SavedObjectsFindResponse<Agent>>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ These jest.Mocked and jest.SpyInstance deals look pretty cool. Good docs for them anywhere I can read?

const { last_checkin: lastCheckin, local_metadata: localMetadata } = metadataAttributes;
const { host, os, elastic } = localMetadata as AgentLocalMetadata;

if (lastCheckin && new Date(lastCheckin) > aDayAgo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❔ This is a big reduce and it seems like maybe it could be easier to read if we did some of this stuff in front of it as a filter. This is a ❔ more on the "ignore for now" side, just a comment for later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will break apart in next commit

@michaelolo24 michaelolo24 requested a review from a team July 9, 2020 20:54
@michaelolo24 michaelolo24 requested a review from nchaulet July 9, 2020 22:04
@michaelolo24
Copy link
Contributor Author

cc @nchaulet if you can take a look through when you get a chance. Just want to make sure I didn't miss anything, including potential optimizations, regarding the savedObjects

) =>
savedObjectsClient.find<AgentEventSOAttributes>({
type: AGENT_EVENT_SAVED_OBJECT_TYPE,
filter: `${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.agent_id: ${agentId}`,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-buttner , is there a way for me to also filter for values that have the value endpoint somewhere in the attributes.message field? Just trying to think of ways to optimize this here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on filtering this, also maybe fetch only data for the last n time, no? this list could be really long at some point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure about that. I'll have to take a look at how the filtering works with saved objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nchaulet I would do that, but only issue is, if a policy was enabled a week ago, and nothing has changed, then there won't be any endpoint related updates in the events lists in the past lets say 24 hours. i.e. there isn't a STILL_RUNNING subtype 😂. Hence why it would be great to filter by all references to endpoint, but don't think that's possible via the filter?

If I were to just check if anything has changed regarding the endpoint in the last 24 hours, then I would have to keep some knowledge from day to day of the previous state, and that feels like a recipe for bugs if something gets missed in the cracks somehow

Copy link
Contributor Author

@michaelolo24 michaelolo24 Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's pure EQL and this should work to get all endpoint references in the message! Hoot Hoot 😄

`${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.agent_id: ${agentId} and ${AGENT_EVENT_SAVED_OBJECT_TYPE}.attributes.message: "${FLEET_ENDPOINT_PACKAGE_CONSTANT}"`

re here: https://www.elastic.co/guide/en/kibana/master/kuery-query.html#_language_syntax

Quotes around a search term will initiate a phrase search. For example, message:"Quick brown fox" will search for the phrase "quick brown fox" in the message field. Without the quotes, your query will get broken down into tokens via the message field’s configured analyzer and will match documents that contain those tokens, regardless of the order in which they appear. This means documents with "quick brown fox" will match, but so will "quick fox brown". Remember to use quotes if you want to search for a phrase.

@afharo
Copy link
Member

afharo commented Jul 13, 2020

@michaelolo24 can you run node scripts/telemetry_check --fix and commit the changes? It should add the detailed schema to the .json file containing all the plugins schemas in one place 🙏

@michaelolo24
Copy link
Contributor Author

@afharo, done. Sorry about that!

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Telemetry changes (schema), LGTM!

Bear in mind, though we are using the same type as in #71102

You'll get some conflicts if the other PR is merged first (and vice-versa)

},
schema: {
endpoint: {
total_installed: { type: 'number' },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: To help us out, it might be beneficial if devs specify 'long' instead.

Looks good as-is anyway. No need to change it 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, I can change it though. No biggie. Once @rylnd has his PR in I'll rebase on his and update these to long 😄

savedObjectsClient: ISavedObjectsRepository
) {
const collector = usageCollector.makeUsageCollector<UsageData>({
type: 'security_solution',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Bear in mind this collector is registering with the same type as the one specified in #71102. Whoever merges first will take ownership of it and it will break the other's implementation

cc/ @rylnd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Plan is for him to merge his first and then I'll make the updates for mine on top of his changes once they're in

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 the query to ingest manager saved objects looks good to me

Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked it out locally, and I saw endpoints telemetry! I didn't look to closely at implementation details here, but tests look good and data looks 💯

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@michaelolo24 michaelolo24 merged commit 8325222 into elastic:master Jul 14, 2020
@michaelolo24 michaelolo24 deleted the security-telemetry branch July 14, 2020 00:52
michaelolo24 added a commit to michaelolo24/kibana that referenced this pull request Jul 14, 2020
gmmorris added a commit to gmmorris/kibana that referenced this pull request Jul 14, 2020
* master: (314 commits)
  [APM] Use status_code field to calculate error rate (elastic#71109)
  [Observability] Change appLink passing the date range (elastic#71259)
  [Security] Add Timeline improvements (elastic#71506)
  adjust vislib bar opacity (elastic#71421)
  Fix ScopedHistory mock and adapt usages (elastic#71404)
  [Security Solution] Add hook for reading/writing resolver query params (elastic#70809)
  [APM] Bug fixes from ML integration testing (elastic#71564)
  [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404)
  [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275)
  [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321)
  Change signal.rule.risk score mapping from keyword to float (elastic#71126)
  Added help text where needed on connectors and alert actions UI (elastic#69601)
  [SIEM][Detections] Value Lists Management Modal (elastic#67068)
  [test] Skips test preventing promotion of ES snapshot elastic#71582
  [test] Skips test preventing promotion of ES snapshot elastic#71555
  [ILM] Fix alignment of the timing field (elastic#71273)
  [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540)
  initial telemetry setup (elastic#69330)
  [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027)
  Search across spaces (elastic#67644)
  ...
jloleysens added a commit to jloleysens/kibana that referenced this pull request Jul 14, 2020
…t-apps-page-titles

* 'master' of github.com:elastic/kibana: (88 commits)
  [ML] Functional tests - disable DFA creation and cloning tests
  [APM] Use status_code field to calculate error rate (elastic#71109)
  [Observability] Change appLink passing the date range (elastic#71259)
  [Security] Add Timeline improvements (elastic#71506)
  adjust vislib bar opacity (elastic#71421)
  Fix ScopedHistory mock and adapt usages (elastic#71404)
  [Security Solution] Add hook for reading/writing resolver query params (elastic#70809)
  [APM] Bug fixes from ML integration testing (elastic#71564)
  [Discover] Add caused_by.type and caused_by.reason to error toast modal (elastic#70404)
  [Security Solution] Add 3rd level breadcrumb to admin page (elastic#71275)
  [Security Solution][Exceptions] Exception modal bulk close alerts that match exception attributes (elastic#71321)
  Change signal.rule.risk score mapping from keyword to float (elastic#71126)
  Added help text where needed on connectors and alert actions UI (elastic#69601)
  [SIEM][Detections] Value Lists Management Modal (elastic#67068)
  [test] Skips test preventing promotion of ES snapshot elastic#71582
  [test] Skips test preventing promotion of ES snapshot elastic#71555
  [ILM] Fix alignment of the timing field (elastic#71273)
  [SIEM][Detection Engine][Lists] Adds the ability for exception lists to be multi-list queried. (elastic#71540)
  initial telemetry setup (elastic#69330)
  [Reporting] Formatting fixes for CSV export in Discover, CSV download from Dashboard panel (elastic#67027)
  ...

# Conflicts:
#	x-pack/plugins/index_management/public/application/index.tsx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Endpoint Data Visibility Team managing the endpoint resolver v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants