-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Add random sampler to Data visualizer document count chart #136150
[ML] Add random sampler to Data visualizer document count chart #136150
Conversation
…lue from undefined to null
<EuiIconTip | ||
content={i18n.translate('xpack.dataVisualizer.searchPanel.randomSamplerMessage', { | ||
defaultMessage: | ||
'Random sampler is being used for the total document count and the chart. Values shown are estimated. Adjust the slider to a higher percentage for better accuracy, or 100% to exact values.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't downloaded this PR to test it, but here's a drive-by suggestion:
'Random sampler is being used for the total document count and the chart. Values shown are estimated. Adjust the slider to a higher percentage for better accuracy, or 100% to exact values.', | |
'The chart and total document count use random sampler aggregations, which increase speed at the cost of accuracy. Adjust the accuracy with the slider. For exact values, set it to 100%.', |
As discussed, here's my initial feedback from testing against larger APM data sets (approx 15M to 40M docs):
|
Pinging @elastic/ml-ui (:ml) |
29b2526
to
728677e
Compare
728677e
to
c52b0f2
Compare
...lizer/public/application/common/components/document_count_content/document_count_content.tsx
Outdated
Show resolved
Hide resolved
<EuiFlexItem> | ||
<EuiText size="s" data-test-subj="dataVisualizerTotalDocCountHeader"> | ||
<EuiFlexItem grow={false} style={{ flexDirection: 'row' }}> | ||
<EuiText size="s" data-test-subj="dataVisualizerTotalDocCountHeader" textAlign="center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'xpack.dataVisualizer.randomSamplerSettingsPopUp.infoCalloutMessage', | ||
{ | ||
defaultMessage: | ||
'Random sampler is being used for the total document count and the chart. Pick a higher percentage for better accuracy, or "Off" for no sampling.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this message should change depending on what option you have selected. For example, if it is set to Off
, say that random sampling can be turned on for the total document count and chart to increase performance although some accuracy will be lost.
If set to On - automatic
, then something like, Random sampling is being used for the total document count and the chart. The probability used in the aggregation will be automatically set to balance accuracy and speed.
If set to On - manual
, then something like, Random sampling is being used for the total document count and the chart. A lower percentage probability will increase performance, but some accuracy will be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if it's possible to customize those messages, it would be great!
{ | ||
value: RANDOM_SAMPLER_OPTION.ON_AUTOMATIC, | ||
text: i18n.translate('xpack.dataVisualizer.randomSamplerPreference.onAutomaticLabel', { | ||
defaultMessage: 'On (automatic use best %)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text for the 'On' options needs some tweaking I think. On (automatic configuration)
On (manual configuration)
? @lcawl any suggestions?
This is all about balancing speed against accuracy. We want to encourage the user to leave it as 'automatic'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think if we can avoid using "%" in the label (and thus avoid having to explain what that percent actually means), that'd be simpler. Maybe even as simple as "On (automatic)" and "On (manual)"
<EuiIconTip | ||
content={i18n.translate('xpack.dataVisualizer.searchPanel.randomSamplerMessage', { | ||
defaultMessage: | ||
'Random sampler is being used for the total document count and the chart. Values shown are estimated.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using approximate
rather than estimated
, e.g. Approximate counts are shown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 5959f47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested latest changes, including the cloud instance with up to 94M docs, and LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some text suggestions, but otherwise LGTM
'xpack.dataVisualizer.randomSamplerSettingsPopUp.onManualCalloutMessage', | ||
{ | ||
defaultMessage: | ||
'Random sampling can be turned on for the total document count and chart to increase speed although some accuracy will be lost.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not mandatory, here's another version of that sentence where we start with the "why":
'Random sampling can be turned on for the total document count and chart to increase speed although some accuracy will be lost.', | |
'To increase speed, turn on random sampling for the total document count and chart. Some accuracy will be lost.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 5959f47
'xpack.dataVisualizer.randomSamplerSettingsPopUp.onAutomaticCalloutMessage', | ||
{ | ||
defaultMessage: | ||
'Random sampling is being used for the total document count and the chart. The probability used in the aggregation will be automatically set to balance accuracy and speed.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not mandatory, but here's a slightly shorter suggestion:
'Random sampling is being used for the total document count and the chart. The probability used in the aggregation will be automatically set to balance accuracy and speed.', | |
'The total document count and chart use random sampler aggregations. The probability is automatically set to balance accuracy and speed.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 5959f47
default: | ||
return i18n.translate('xpack.dataVisualizer.randomSamplerSettingsPopUp.offCalloutMessage', { | ||
defaultMessage: | ||
'Random sampling is being used for the total document count and the chart. A lower percentage probability will increase performance, but some accuracy will be lost.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match the other suggestion:
'Random sampling is being used for the total document count and the chart. A lower percentage probability will increase performance, but some accuracy will be lost.', | |
'The total document count and chart use random sampler aggregations. A lower percentage probability increases performance, but some accuracy is lost.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 5959f47
<EuiIconTip | ||
content={i18n.translate('xpack.dataVisualizer.searchPanel.randomSamplerMessage', { | ||
defaultMessage: | ||
'Random sampler is being used for the total document count and the chart. Values shown are estimated.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To align with my other suggestions and to front-load the most important info:
'Random sampler is being used for the total document count and the chart. Values shown are estimated.', | |
'Approximate values are shown in the total document count and chart, which use random sampler aggregations.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated here 5959f47
): DocumentCountStats | undefined => { | ||
if (!body) return undefined; | ||
|
||
const totalCount = (body.hits.total as estypes.SearchTotalHits).value ?? body.hits.total ?? 0; | ||
let totalCount = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does 'totalCount' need to be set to 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are updating the totalCount by adding the count in dataForTime later on as well.
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]Module Count
Async chunks
Unknown metric groupsESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @qn895 |
Summary
This PR addresses #136124 and uses the new random sampler in the Data visualizer document count chart.
It adds 3 options for sampling:
On (automatic use best %)
is selected, it will first initially run a random sampler agg at a default probability of 0.000001. Then, depending on the result of the initial response, it will either:On (manually set %)
is selected, it will show a slider. When user first switch to this option, it will first suggest the last calculated best probability. Once the user picks the probability, it will remember this value for any subsequent queries (like changing time range, modifying the queries or filters).Off
is selected, it will always run at probability = 1 (which is no sampling)Screen.Recording.2022-07-20.at.14.17.39.mov
Checklist