-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reasoning for choosing shardSpec to the MSQ report #16175
Conversation
) | ||
{ | ||
if (mayHaveMultiValuedClusterByFields) { | ||
// DimensionRangeShardSpec cannot handle multi-valued fields. | ||
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, the fields in the CLUSTER BY clause contains a multivalues. Using NumberedShardSpec instead."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: grammar
Also, if its possible to pinpoint the multiValue fields without much refactoring, then we can mention that here.
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, the fields in the CLUSTER BY clause contains a multivalues. Using NumberedShardSpec instead."); | |
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, the fields in the CLUSTERED BY clause contains multivalues in column [%s]. Using NumberedShardSpec instead."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have the column name at this point, we only store a boolean mayContainMultivalues. Updated the message a bit
} | ||
|
||
// DimensionRangeShardSpec only handles columns that appear as-is in the output. | ||
if (outputColumns.isEmpty()) { | ||
return Collections.emptyList(); | ||
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, RangeShardSpec only supports columns that appear as-is in the output. Using NumberedShardSpec instead."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does as-is mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the message to "Could not find output column name for column [%s]" to include the column name. I'm not sure what conditions would cause the output column to not be found here.
final List<KeyColumn> clusterByColumns = clusterBy.getColumns(); | ||
final List<String> shardColumns = new ArrayList<>(); | ||
final boolean boosted = isClusterByBoosted(clusterBy); | ||
final int numShardColumns = clusterByColumns.size() - clusterBy.getBucketByCount() - (boosted ? 1 : 0); | ||
|
||
if (numShardColumns == 0) { | ||
return Collections.emptyList(); | ||
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, as there are no shardColumns. Using NumberedShardSpec instead."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the user doesn't supply the clustered by. In that case, the reason doesn't seem necessary, or it can be reworded.
return Pair.of(Collections.emptyList(), "Cannot use RangeShardSpec, as there are no shardColumns. Using NumberedShardSpec instead."); | |
return Pair.of(Collections.emptyList(), "Using NumberedShardSpec as no columns are supplied in the 'CLUSTERED BY' clause."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
I missed it before, but we should also add MSQTests where these cases are getting tripped, and assert the reason in the report. |
cc @vogievetsky for the web console changes. |
MSQ chooses the shard spec based on certain criteria. However, this criteria is not very transparent to the user. The only way to find the shard spec which was chosen is to search for a segment in the segment UI after the ingestion is finished.
This PR logs the segment type and reason chosen. It also adds it to the query report, to be displayed in the UI.
This PR adds a new section to the reports,
segmentReport
. This contains the segment type created, if the query is an ingestion, and null otherwise.The shardSpec mentions the shardSpec type generated. MSQ prefers to use RangedShardSpec when possible. For inserts and replace queries, the default shard spec is NumberedShardSpec and DimensionRangeShardSpec respectively. If a ranged shard spec cannot be chosen for the replace query, the details field will contain the reason why it could not be used.
This PR has: