-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for nested JSON structure while reading data in JSONEachRow input format #3144
Support for nested JSON structure while reading data in JSONEachRow input format #3144
Conversation
The new test demonstrates current handling of flat JSON data corresponding to nested columns.
Also changed the names of previously extracted functions to camel case.
By default mapping of nested json data to nested tables is disabled. To enable the import of nested json data (into corresponding nested tables) clickhouse must be run with the --input_format_import_nested_json=1 option.
A note on the failed Travis CI build - a couple of changed header files forced almost full recompilation and the Travis CI job was interrupted due to exceeded maximum time limit. At the time of the job termination the build was completed and most of the tests have been run. Strangely enough the only failed test is the new test introduced by me. While playing with clickhouse I noticed that the requirement of different nested columns to have the same length was not enforced (at least it was running without an error in my environment) so I created the test that way. Now, in the Travis CI run it gives the error that I expected to see initially. I have already fixed the test and will push the changes to my repo shortly. |
This time the build itself timed out. |
Don't worry, Travis build barely works. |
Looks (almost) Ok. |
Thank you! FYI. I also was thinking about another possible way for implementation. We have named tuples. Nested column may be represented as a single column of type |
@alexey-milovidov Thank you for accepting my first contribution! I like your idea from the use model point of view, but it seems to me that implementing it would require more code (though, I can definitely be wrong). |
Hi @veloman-yunkan @alexey-milovidov
Is enough to fix it I was not able to make a minimal reproduction case yet, but I can privately share an example for now I get errors like:
field name is |
This reverts commit 46ef387.
Hi @champtar, I reviewed my changes and couldn't spot anything that could break existing functionality (unless the old code contained hidden bugs). Did you observe this problem with a clean build? |
Hi @veloman-yunkan |
@champtar Please send the curl post example to my email address. |
done |
This enhancement adds to clickhouse a new option
--input_format_import_nested_json
which enables reading nested fields from similarly structured JSON (whenJSONEachRow
input format is used). For example, for a table created as followsthe following JSON
is processed in
--input_format_import_nested_json=1
mode with the same outcome as for the JSON below:Handling of unknown fields is respected.
I have performed some refactoring of the code in
dbms/src/Formats/JSONEachRowRowInputStream.cpp
, and also introduced a new testdbms/tests/queries/0_stateless/00715_json_each_row_input_nested.sh
that covers this new functionality.I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en with the exception of the clause "9. Liability" (however hypothetical, it is still an unmanageable risk and I am not going to undertake it).