-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSQ arrayIngestMode to control if arrays are ingested as ARRAY, MVD, or an exception #15093
MSQ arrayIngestMode to control if arrays are ingested as ARRAY, MVD, or an exception #15093
Conversation
This change is important but we need to do it less disruptively. People are using For this reason the change needs to be opt-in. I suggest documenting the context parameter and swapping the default so the old behavior is retained. In addition I'd suggest writing up a doc page about how people can migrate from MVDs to string arrays (with examples of how to rewrite queries) and pointing people at that in the docs for this parameter. And showing people how to use |
Another thought: is it possible to add explicit dimension schemas for the various types that can be generated by "auto"? In MSQ we know the exact type we want so it seems odd & circuitous to use "auto". |
I've started doing some work on this, but its sort of non-trivial and shouldn't be part of this PR, so using 'auto' is currently the only way to ingest array columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 👍
Are there any tests for trying to insert numeric arrays in mvd mode or any type of arrays in none mode?
public static final boolean DEFAULT_USE_AUTO_SCHEMAS = false; | ||
|
||
public static final String CTX_ARRAY_INGEST_MODE = "arrayIngestMode"; | ||
public static final ArrayIngestMode DEFAULT_ARRAY_INGEST_MODE = ArrayIngestMode.NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default should be MVD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the document changes for this as well.
public static final boolean DEFAULT_USE_AUTO_SCHEMAS = false; | ||
|
||
public static final String CTX_ARRAY_INGEST_MODE = "arrayIngestMode"; | ||
public static final ArrayIngestMode DEFAULT_ARRAY_INGEST_MODE = ArrayIngestMode.NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default mode should be MVD IMHO since it will not break stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally disagree, I think we should default to none, then people explicitly choose to use MVDs or arrays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly because the behavior of MVD is totally incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on updating the docs/examples refer to the parameter and to separate them into ones that explicitly store MVDs using ARRAY_TO_MV function, and ones which only use ARRAY_ functions into examples of storing array typed columns instead.
MVD mode is not going to have examples, because it should be the first to go and no one should rely on this behavior, because again the behavior is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO we cannot break user's ingestion sql query so I propose the mode should be MVD.
The patch should also throw a warning from the controller and throw it on the console if a row signature with Array is detected. The warning clearly lays out the path of migration.
There are lot of layers build out in organizations. A change even if its adding a context flag to sometimes takes weeks to reach to production. With this warning we effectively nudge the user that we are going to break compatibility soon.
"String arrays can not be ingested when '%s' is set to '%s'. Either set '%s' in query context " | ||
+ "to 'array' to ingest the string array as an array, or set it to 'mvd' to ingest the string array " | ||
+ "as MVD (which is legacy behaviour and not recommmended)", | ||
MultiStageQueryContext.CTX_ARRAY_INGEST_MODE, | ||
StringUtils.toLowerCase(arrayIngestMode.name()), | ||
MultiStageQueryContext.CTX_ARRAY_INGEST_MODE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should recommend MVD mode at all here, instead we should always recommend array mode and suggest using ARRAY_TO_MV
if people want to store things as a MVD.
Yes, there are tests verifying that arrays cannot be inserted in none mode, and numeric arrays cannot be inserted in MVD mode. |
Thanks, @clintropolis for updating & aligning the description with what we actually merged, and not the stale changes. |
MSQ uses the string dimension schema for ARRAY<STRING> typed columns, which creates MVDs instead of string arrays as required. Therefore someone trying to ingest columns of type ARRAY<STRING> from an external data source or another data source would get STRING columns in the newly generated segments. This patch changes the following: - Use auto dimension schema to ingest the ARRAY<STRING> columns, which will create columns with the desired type. - Add an undocumented flag ingestStringArraysAsMVDs to preserve the legacy behavior. Legacy behaviour is turned on by default. - Create MSQArraysInsertTest and refactor some of the tests in MSQInsertTest.
MSQ uses the string dimension schema for ARRAY<STRING> typed columns, which creates MVDs instead of string arrays as required. Therefore someone trying to ingest columns of type ARRAY<STRING> from an external data source or another data source would get STRING columns in the newly generated segments. This patch changes the following: - Use auto dimension schema to ingest the ARRAY<STRING> columns, which will create columns with the desired type. - Add an undocumented flag ingestStringArraysAsMVDs to preserve the legacy behavior. Legacy behaviour is turned on by default. - Create MSQArraysInsertTest and refactor some of the tests in MSQInsertTest.
Description
MSQ uses the string dimension schema for
ARRAY<STRING>
typed columns, which creates MVDs instead of string arrays as is correct. Therefore someone trying to ingest columns of typeARRAY<STRING>
from an external data source or another data source would getSTRING
columns in the newly generated segments.This patch adds a
arrayIngestMode
query context parameter with the following behavior:array
, uses auto dimension schema to correctly ingest theARRAY<STRING>
columns asARRAY<STRING>
columns, and also allows numericARRAY
types to be ingestedmvd
, currently default to ease the transition,ARRAY<STRING>
will continue to ingest asSTRING
typed MVDs, but logs will warn operators that this is a deprecated mode. Numeric arrays cannot be ingested in this mode.none
, neither string or numericARRAY
types can be ingested, this mode will be used as a forcing mechanism to force people to choose if they explicitly want MVDs or ARRAY typed columns, and in either case, suggest setting the mode toarray
and explicitly useARRAY_TO_MV
if users still want MVDs.Release note
MSQ supports a new array mode
arrayIngestMode
which specifies the behavior of MSQ while ingesting arrays. Please refer to the docs in the release for a detailed behavior of what each option specifies. The default 'mvd' is the existing behaviour of MSQ, therefore doesn't require immediate intervention, however it is subject to removal in future releases.Key changed/added classes in this PR
MyFoo
OurBar
TheirBaz
This PR has: