-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect non-sensitive platform-specific config stats #56762
Comments
Pinging @elastic/kibana-platform (Team:Platform) |
Pinging @elastic/pulse (Team:Pulse) |
Only option here would be to open hook-like API as we do for the http authent, I think. We really need a good reason to do so however as this is relatively heavy changes.
Which part of the config would the plugin actually need access to? Everything? |
In theory yes, we might want to track any part of the config: from Elasticsearch SSL setup to logging flags. |
@restrry Telemetry plugin is only responsible for sending telemetry to our cluster. The I feel that we can move the |
That was my feeling too. A plugin that can't/shouldn't be disabled and is used by every other plugin should probably be part of core. |
Reopening the discussion as this is currently due for
So, as it seems we did get a consensus on that, should we move the
BTW The plugin is currently owned by @elastic/pulse. Moving it to
Any further insight on what we should collect, as least for the initial version? |
I have no objections to that.
I think we can implement this without moving it to Also, if those plugins are registered apps in Kibana,
"Any" seems a bit risky. I agree with only exposing a well-defined set of properties we may want to report. Q: Can we retrieve any of those configurations via an Elasticsearch API request? NB: Maybe
Please, bear in mind the current payload (it can be seen in Management > Stack Management > Kibana > Advanced Settings > Usage Data > See an example of what we collect) is already huge and we might be duplicating an effort about something that is already reported: i.e.: We already collect the overridden configurations in the UI Management section: https://github.com/elastic/kibana/blob/master/src/plugins/kibana_usage_collection/server/collectors/management/telemetry_management_collector.ts#L26-L43 Re the runtime metrics, maybe APM is the way to go? Personally, I think anything we may implement to measure them would end up looking pretty similar to APM's existing agents. There is a blocker at the moment, but I'd say we push the effort to get it fixed over re-implementing our own solution. |
That seems like a very fragile mechanism compared to using the informations from the
Hum, I really don't know well how we are sending data, but the whole payload containing all the metrics is always sent? We don't have an event-based API to send custom events with custom payloads? |
Yeah,
Fair enough. I'm really OK with moving the usageCollection plugin to
We don't have any event-based API. We only send the usage payload once every 24h (with a * here because it's not that strict). At that point of sending the telemetry payload, we call the |
The primary argument I can think of against moving usage collectors to Core is more of an organization / team design one then a technical one. In essence, I think it's simpler if all the code in given subdirectory is fully owned by a single team. It eliminates communication overhead and allows a single team to make sweeping changes to that directory without coordination. That said, usage collectors is such a small pattern / mechanism that it may be simple enough to just move to Core and make the Platform team the owners. However if there are future enhancements planned for this pattern or we expect to use something else entirely in the near- to mid-term, then I think we should explore other options. One way we could achieve what we need here without moving usage collectors, is to expose an API from Core that essentially is a usage collector, but isn't tied into the registry of usage collectors directly. Instead, the telemetry plugin could setup a usage collector to call this Core API in order to grab data from Core. ExampleExtension of the Core API interface CoreSetup {
// Core could expose some API that is _essentially_ a usage collector
telemetry: {
type: 'kibana_core',
schema: CoreTelemetrySchema;
fetch(): Promise<CoreTelemetry>;
}
} Telemetry plugin uses Core API to create a core usage collector class TelemetryPlugin {
setup(core, plugins) {
const coreCollector = usageCollection.makeUsageCollector(core.telemetry);
plugins.usageCollection.registerCollector(coreCollector);
}
} |
We shouldn’t restrict ourselves to the current team/organization design when it comes to making the right decisions. Teams can be restructured and need to change with Elastic’s needs 😄
A lot of different teams send usage data to our telemetry cluster and the demand for data is increasing. The data we collect from Kibana allows business to monitor solutions adoptions and, IMHO, any sweeping changes made to the collection mechanism and the data we do collect needs to be communicated and controlled. ATM, nothing’s really stopping the small kibana-telemetry team from making such changes in isolation and, with the importance of usage data increasing, it’s maybe time to reconsider that approach.
Code wise, we can certainly take the approach you suggest to get the job done. We do have a few improvement initiatives we're working on while helping Infra take on the task of the remote Elastic Telemetry service. |
Sounds like there is potential for a decent amount of change in this coming. We aim for Core APIs to be stable-ish, so I lean towards the approach I outlined above. If we go with this option, I think the Platform team could handle all of the changes. We can have the Pulse team review the change to the usage_collection plugin to integrate with the new Core API. The main downside I see of this approach is that we'd be exposing internal data from core that is not normally available to plugins. However, I think we can document that this API is only valid for telemetry and should not be used for any other reason. @elastic/kibana-platform any objections? |
@joshdover What I was trying to say is that we need got to a point where the usage collection and telemetry API's are stable. The improvements we're investigating are under the condition that they won't change the usage collection API's. We're striving to get to a point where collecting data and shipping it to the remote service is as stable and reliable as using Saved Objects or any of the other Core capabilities. I'm by no means trying to shift the work over to another team, we're more than willing to take the task on and @afharo 's already working on a draft. What I was trying to say is that telemetry has become so important, that we need to be careful about the approach from here on. That being said, we can always reconsider the approach later if needs be. |
Thank you @TinaHeiligers and @joshdover! I think I agree with you both: the Usage Collection APIs (not to be confused with the telemetry plugin) are key APIs and I think they should belong to core eventually. +1 on the
Looking forward to knowing all your thoughts :) |
interface CoreSetup {
// Core could expose some API that is _essentially_ a usage collector
telemetry: {
type: 'kibana_core',
schema: CoreTelemetrySchema;
fetch(): Promise<CoreTelemetry>;
}
} Let's be honest, it's definitely not a correct design, we are just cheating with pseudo inversion of control here. But it's the most pragmatic solution in term of impact to the codebase and to the code owners, so I'm not opposed to it. However, I'm wondering if we will be able to achieve this option without importing (or duplicating) types from the |
I totally understand your point, @pgayvallet! How about meeting somewhere in the middle? What about core providing a 'stats' API (as part of the |
@joshdover the 7.9 deadline looks a little tight here. Do you still think that we can make it before FF? |
@TinaHeiligers I think it's likely this will slip to 7.10 at this point :-/ And that's not on y'all, we've been working on other tasks. |
I've ran through all of Core's configuration and come up with a potential list of non-sensitive metrics that we could collect. For sensitive configuration keys (eg. hostname, ssl), I include fields that indicate a feature is being used rather than including the actual configuration value. There is some overlap with some of the metrics collected by export interface CoreTelemetry {
config: CoreConfigTelemetry;
usage: CoreUsageTelemetry;
environment: CoreEnvironmentTelemetry;
} /**
* Telemetry data on this cluster's usage of Core features
*/
export interface CoreUsageTelemetry {
savedObjects: {
totalCount: number;
typesCount: number;
/** Record of Kibana version -> number of failures that occured on that version's upgrade attempt */
migrationFailures: Record<string, number>;
};
} /**
* Telemetry data on this Kibana node's runtime environment.
*/
export interface CoreEnvironmentTelemetry {
memory: {
heapTotalBytes: number;
heapUsedBytes: number;
/** V8 heap size limit */
heapSizeLimit: number;
};
os: {
platform: string;
platformRelease: string;
distro?: string;
distroRelease?: string;
};
} /**
* Telemetry data on this cluster's configuration of Core features
*/
export interface CoreConfigTelemetry {
elasticsearch: {
sniffOnStart: boolean;
sniffIntervalMs?: number;
sniffOnConnectionFault: boolean;
numberOfHostsConfigured: boolean;
preserveHost: boolean;
requestHeadersWhitelistConfigured: boolean;
customHeadersConfigured: boolean;
shardTimeoutMs: number;
requestTimeoutMs: number;
pingTimeoutMs: number;
startupTimeoutMs: number;
logQueries: boolean;
ssl: {
verificationMode: 'none' | 'certificate' | 'full';
certificateAuthoritiesConfigured: boolean;
certificateConfigured: boolean;
keyConfigured: boolean;
keystoreConfigured: boolean;
truststoreConfigured: boolean;
alwaysPresentCertificate: boolean;
};
apiVersion: string;
healthCheck: {
delayMs: number;
};
};
http: {
basePath: string;
maxPayloadInBytes: number;
rewriteBasePath: boolean;
keepaliveTimeout: number;
socketTimeout: number;
compression: {
enabled: boolean;
referrerWhitelistConfigured: boolean;
};
cors: {
enabled: boolean;
}
xsrf: {
disableProtection: boolean;
whitelistConfigured: boolean;
};
requestId: {
allowFromAnyIp: boolean;
ipAllowlistConfigured: boolean;
};
ssl: {
certificateAuthoritiesConfigured: boolean;
certificateConfigured: boolean;
cipherSuites: Array<'TLSv1' | 'TLSv1.1' | 'TLSv1.2'>;
keyConfigured: boolean;
keystoreConfigured: boolean;
truststoreConfigured: boolean;
redirectHttpFromPort?: number;
supportedProtocols: Array<'TLSv1' | 'TLSv1.1' | 'TLSv1.2'>;
clientAuthentication: 'none' | 'optional' | 'required';
};
};
logging: {
appendersTypesUsed: string[];
loggersConfiguredCount: number;
};
plugins: {
/** list of built-in plugins that are disabled */
firstPartyDisabled: string[];
/** list of third-party plugins that are installed and enabled */
thirdParty: string[];
};
savedObjects: {
maxImportPayloadBytes: number;
maxImportExportSizeBytes: number;
migrations: {
batchSize: number;
scrollDurationMs: number;
pollIntervalMs: number;
skip: boolean;
};
};
uiSettings: {
overridesCount: number;
};
} |
@joshdover what are your thoughts on collecting ssl config for Perhaps something like (added a bonus /**
* Telemetry data on this cluster's configuration of Core features
*/
export interface CoreConfigTelemetry {
http: {
cors: {
enabled: boolean;
};
ssl: {
enabled: boolean;
supportedProtocols: string[];
keyConfigured: boolean;
keystoreConfigured: boolean;
truststoreConfigured: boolean;
};
};
} |
Makes sense, not sure how I skipped http.ssl EDIT: updated my comment with the http.ssl and http.cors fields. |
What about CSP? I could see it being helpful to know how many users have a custom CSP policy in place. For SSL, when the certificate is specified the key must also be specified. I can't think of a good name to denote both of these being configured, so perhaps that's justification for leaving them separate? |
CSP is already being reported here, although it'd be nice to consolidate the fields into kibana/src/plugins/kibana_usage_collection/server/collectors/csp/csp_collector.ts Lines 45 to 55 in fd459de
|
This is fantastic. We have so many different types of saved objects now, is there any value in getting count by type? |
Hey @joshdover! Sorry for my late response. This collector is treated in a special way at the moment:
@alexfrancoeur we already report the saved objects count per type in here. Although it's limited to these 6 types only. We might want to revisit that list? |
Ideally we could generate metrics that are more dynamic so that all types are reported on, that would probably be easier to do from within Core. For saved object migrations it would also be useful to understand the size of the .kibana/.kibana_task_manager indices. |
Do we care about the actual basepath or just whether or not it was set? Should we rather use the following?
|
I'm leaving out |
Good catch, I think basePathConfigured is what we want. |
I'll consider this completed by #79101 and will close it. Please reopen if necessary. |
We would like to understand what platform features are used and how they configured by the users. That would help us make decisions about API completeness.
The main problem with this that the platform doesn't have access to the Telemetry plugin. Nor telemetry plugin has access to the platform config.
One item we'd like to include this as well is which 3rd party plugins are installed + enabled.
Questions we'd like to answer for config things:
Data to collect, all keys should be opt-in:
The text was updated successfully, but these errors were encountered: