Skip to content

Commit

Permalink
[8.13] [ML] Fixes ES|QL field stats showing skewed numeric distributi…
Browse files Browse the repository at this point in the history
…on, duplicated examples for string fields, and adds functional tests (#177085) (#178586)

# Backport

This will backport the following commits from `main` to `8.13`:
- [[ML] Fixes ES|QL field stats showing skewed numeric distribution,
duplicated examples for string fields, and adds functional tests
(#177085)](#177085)

<!--- Backport version: 9.4.3 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Quynh Nguyen
(Quinn)","email":"[email protected]"},"sourceCommit":{"committedDate":"2024-03-12T22:52:12Z","message":"[ML]
Fixes ES|QL field stats showing skewed numeric distribution, duplicated
examples for string fields, and adds functional tests (#177085)\n\n##
Summary\r\n\r\nPart of #173301.
This PR:\r\n- Fixes ES|QL field stats showing skewed numeric
distribution,\r\nduplicated examples for string fields\r\n- Adds
functional tests for ES|QL data visualizer view\r\n\r\n[Flaky test
suite\r\nrunner](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5226):\r\n50/50
runs successful ✅\r\n\r\n### Checklist\r\n\r\nDelete any items that are
not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"2922dd206fa463d455cac7b958787f3a68bc4d0d","branchLabelMapping":{"^v8.14.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":[":ml","test_ui_functional","release_note:skip","Feature:File
and Index Data Viz","v8.13.0","v8.14.0"],"title":"[ML] Fixes ES|QL field
stats showing skewed numeric distribution, duplicated examples for
string fields, and adds functional
tests","number":177085,"url":"https://github.com/elastic/kibana/pull/177085","mergeCommit":{"message":"[ML]
Fixes ES|QL field stats showing skewed numeric distribution, duplicated
examples for string fields, and adds functional tests (#177085)\n\n##
Summary\r\n\r\nPart of #173301.
This PR:\r\n- Fixes ES|QL field stats showing skewed numeric
distribution,\r\nduplicated examples for string fields\r\n- Adds
functional tests for ES|QL data visualizer view\r\n\r\n[Flaky test
suite\r\nrunner](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5226):\r\n50/50
runs successful ✅\r\n\r\n### Checklist\r\n\r\nDelete any items that are
not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"2922dd206fa463d455cac7b958787f3a68bc4d0d"}},"sourceBranch":"main","suggestedTargetBranches":["8.13"],"targetPullRequestStates":[{"branch":"8.13","label":"v8.13.0","branchLabelMappingKey":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"},{"branch":"main","label":"v8.14.0","branchLabelMappingKey":"^v8.14.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/177085","number":177085,"mergeCommit":{"message":"[ML]
Fixes ES|QL field stats showing skewed numeric distribution, duplicated
examples for string fields, and adds functional tests (#177085)\n\n##
Summary\r\n\r\nPart of #173301.
This PR:\r\n- Fixes ES|QL field stats showing skewed numeric
distribution,\r\nduplicated examples for string fields\r\n- Adds
functional tests for ES|QL data visualizer view\r\n\r\n[Flaky test
suite\r\nrunner](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/5226):\r\n50/50
runs successful ✅\r\n\r\n### Checklist\r\n\r\nDelete any items that are
not applicable to this PR.\r\n\r\n- [ ] Any text added follows [EUI's
writing\r\nguidelines](https://elastic.github.io/eui/#/guidelines/writing),
uses\r\nsentence case text and includes
[i18n\r\nsupport](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)\r\n-
[
]\r\n[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)\r\nwas
added for features that require explanation or tutorials\r\n- [ ] [Unit
or
functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere
updated or added to match the most common scenarios\r\n- [ ] [Flaky
Test\r\nRunner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1)
was\r\nused on any tests changed\r\n- [ ] Any UI touched in this PR is
usable by keyboard only (learn more\r\nabout [keyboard
accessibility](https://webaim.org/techniques/keyboard/))\r\n- [ ] Any UI
touched in this PR does not create any new axe failures\r\n(run axe in
browser:\r\n[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),\r\n[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))\r\n-
[ ] If a plugin configuration key changed, check if it needs to
be\r\nallowlisted in the cloud and added to the
[docker\r\nlist](https://github.com/elastic/kibana/blob/main/src/dev/build/tasks/os_packages/docker_generator/resources/base/bin/kibana-docker)\r\n-
[ ] This renders correctly on smaller devices using a
responsive\r\nlayout. (You can test this [in
your\r\nbrowser](https://www.browserstack.com/guide/responsive-testing-on-local-server))\r\n-
[ ] This was checked for
[cross-browser\r\ncompatibility](https://www.elastic.co/support/matrix#matrix_browsers)\r\n\r\n\r\n###
Risk Matrix\r\n\r\nDelete this section if it is not applicable to this
PR.\r\n\r\nBefore closing this PR, invite QA, stakeholders, and other
developers to\r\nidentify risks that should be tested prior to the
change/feature\r\nrelease.\r\n\r\nWhen forming the risk matrix, consider
some of the following examples\r\nand how they may potentially impact
the change:\r\n\r\n| Risk | Probability | Severity | Mitigation/Notes
|\r\n\r\n|---------------------------|-------------|----------|-------------------------|\r\n|
Multiple Spaces&mdash;unexpected behavior in non-default Kibana
Space.\r\n| Low | High | Integration tests will verify that all features
are still\r\nsupported in non-default Kibana Space and when user
switches between\r\nspaces. |\r\n| Multiple nodes&mdash;Elasticsearch
polling might have race conditions\r\nwhen multiple Kibana nodes are
polling for the same tasks. | High | Low\r\n| Tasks are idempotent, so
executing them multiple times will not result\r\nin logical error, but
will degrade performance. To test for this case we\r\nadd plenty of unit
tests around this logic and document manual testing\r\nprocedure. |\r\n|
Code should gracefully handle cases when feature X or plugin Y
are\r\ndisabled. | Medium | High | Unit tests will verify that any
feature flag\r\nor plugin combination still results in our service
operational. |\r\n| [See more potential
risk\r\nexamples](https://github.com/elastic/kibana/blob/main/RISK_MATRIX.mdx)
|\r\n\r\n\r\n### For maintainers\r\n\r\n- [ ] This was checked for
breaking API changes and was
[labeled\r\nappropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)\r\n\r\n---------\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"2922dd206fa463d455cac7b958787f3a68bc4d0d"}}]}]
BACKPORT-->

Co-authored-by: Quynh Nguyen (Quinn) <[email protected]>
  • Loading branch information
kibanamachine and qn895 authored Mar 13, 2024
1 parent 1cb17ce commit 5950c69
Show file tree
Hide file tree
Showing 13 changed files with 412 additions and 22 deletions.
4 changes: 2 additions & 2 deletions x-pack/plugins/data_visualizer/common/types/field_stats.ts
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,8 @@ export interface StringFieldStats {
fieldName: string;
isTopValuesSampled: boolean;
topValues: Bucket[];
topValuesSampleSize: number;
topValuesSamplerShardSize: number;
topValuesSampleSize?: number;
topValuesSamplerShardSize?: number;
}

export interface DateFieldStats {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,16 @@ export const TopValues: FC<Props> = ({ stats, fieldFormat, barColor, compressed,
} = useDataVisualizerKibana();

if (stats === undefined || !stats.topValues) return null;
const { topValues, fieldName, sampleCount } = stats;
const { topValues: originalTopValues, fieldName, sampleCount } = stats;

if (topValues?.length === 0) return null;
if (originalTopValues?.length === 0) return null;
const totalDocuments = stats.totalDocuments ?? sampleCount ?? 0;

const topValues = originalTopValues.map((bucket) => ({
...bucket,
percent:
typeof bucket.percent === 'number' ? bucket.percent : bucket.doc_count / totalDocuments,
}));
const topValuesOtherCountPercent =
1 - (topValues ? topValues.reduce((acc, bucket) => acc + bucket.percent, 0) : 0);
const topValuesOtherCount = Math.floor(topValuesOtherCountPercent * (sampleCount ?? 0));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,34 +12,39 @@ import type { ESQLDefaultLimitSizeOption } from '../../../embeddables/grid_embed

const options = [
{
'data-test-subj': 'dvESQLLimitSize-5000',
value: '5000',
text: i18n.translate('xpack.dataVisualizer.searchPanel.esql.limitSizeOptionLabel', {
defaultMessage: '{limit} rows',
values: { limit: '5,000' },
}),
},
{
'data-test-subj': 'dvESQLLimitSize-10000',
value: '10000',
text: i18n.translate('xpack.dataVisualizer.searchPanel.esql.limitSizeOptionLabel', {
defaultMessage: '{limit} rows',
values: { limit: '10,000' },
}),
},
{
'data-test-subj': 'dvESQLLimitSize-100000',
value: '100000',
text: i18n.translate('xpack.dataVisualizer.searchPanel.esql.limitSizeOptionLabel', {
defaultMessage: '{limit} rows',
values: { limit: '100,000' },
}),
},
{
'data-test-subj': 'dvESQLLimitSize-1000000',
value: '1000000',
text: i18n.translate('xpack.dataVisualizer.searchPanel.esql.limitSizeOptionLabel', {
defaultMessage: '{limit} rows',
values: { limit: '1,000,000' },
}),
},
{
'data-test-subj': 'dvESQLLimitSize-none',
value: 'none',
text: i18n.translate('xpack.dataVisualizer.searchPanel.esql.analyzeAll', {
defaultMessage: 'Analyze all',
Expand All @@ -62,6 +67,7 @@ export const ESQLDefaultLimitSizeSelect = ({

return (
<EuiSelect
data-test-subj="dvESQLLimitSizeSelect"
id={basicSelectId}
options={options}
value={limitSize}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,19 +63,14 @@ export const getESQLKeywordFieldStats = async ({
if (isFulfilled(resp)) {
const results = resp.value?.rawResponse.values as Array<[BucketCount, BucketTerm]>;
if (results) {
const topValuesSampleSize = results?.reduce((acc: number, row) => acc + row[0], 0);

const terms = results.map((row) => ({
key: row[1],
doc_count: row[0],
percent: row[0] / topValuesSampleSize,
}));

return {
fieldName: field.name,
topValues: terms,
topValuesSampleSize,
topValuesSamplerShardSize: topValuesSampleSize,
isTopValuesSampled: false,
} as StringFieldStats;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ const getESQLNumericFieldStatsInChunk = async ({
const median = values[startIndex + numericAccessorMap.p50];

const percentiles = values
.slice(startIndex + numericAccessorMap.p0, startIndex + numericAccessorMap.p100)
.slice(startIndex + numericAccessorMap.p5, startIndex + numericAccessorMap.p100 + 1)
.map((value: number) => ({ value }));

const distribution = processDistributionData(percentiles, PERCENTILE_SPACING, min);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ export const getESQLExampleFieldValues = async ({

if (textFieldsResp) {
return textFields.map((textField, idx) => {
const examples = (textFieldsResp.rawResponse.values as unknown[][]).map(
(row) => row[idx]
);
const examples = [
...new Set((textFieldsResp.rawResponse.values as unknown[][]).map((row) => row[idx])),
];

return {
fieldName: textField.name,
Expand Down
Loading

0 comments on commit 5950c69

Please sign in to comment.