-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUES-2973 support nested json struct for visitParamExtractRaw #2974
ISSUES-2973 support nested json struct for visitParamExtractRaw #2974
Conversation
Take the best result of the three runs. Compile Parameters:cmake .. -DCMAKE_CXX_COMPILER=`which g++-8 ` -DCMAKE_C_COMPILER="`which gcc-8`" -DENABLE_TESTS=0 -DUSE_UNWIND=0 -DENABLE_CLICKHOUSE_COMPRESSOR=0 -DENABLE_MONGODB=1 -DENABLE_JEMALLOC=0 -DCMAKE_BUILD_TYPE=Release Before[
{
"hostname": "i-2okyk0c6",
"main_metric": "max_rows_per_second",
"num_cores": 16,
"num_threads": 16,
"parameters": {
"param": ["'{"myparam":"test_string"}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"]
},
"ram": 33736384512,
"runs": [
{
"max_rows_per_second": 21752173.000000,
"parameters": {
"param": "'{"myparam":"test_string"}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":"test_string"}'), 'myparam'))"
},
{
"max_rows_per_second": 7467469.000000,
"parameters": {
"param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
},
{
"max_rows_per_second": 1936649.000000,
"parameters": {
"param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
}
],
"server_version": "18.10.3",
"test_name": "visit_param_extract_raw",
"time": "2018-08-31 15:18:16"
}
] After[
{
"hostname": "i-2okyk0c6",
"main_metric": "max_rows_per_second",
"num_cores": 16,
"num_threads": 16,
"parameters": {
"param": ["'{"myparam":"test_string"}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"]
},
"ram": 33736384512,
"runs": [
{
"max_rows_per_second": 17394195.000000,
"parameters": {
"param": "'{"myparam":"test_string"}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":"test_string"}'), 'myparam'))"
},
{
"max_rows_per_second": 7092361.000000,
"parameters": {
"param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
},
{
"max_rows_per_second": 2552266.000000,
"parameters": {
"param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"
},
"query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
}
],
"server_version": "18.10.3",
"test_name": "visit_param_extract_raw",
"time": "2018-08-31 15:05:25"
}
] |
Thank you! PS. The performance test you quote is incorrect because (it's obvious because otherwise you cannot get 2+ billion rows/sec. on single core.) To be sure, you can transform a string to non-constant with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
res_data.push_back(*pos); | ||
static void extract(const UInt8 * pos, const UInt8 * end, ColumnString::Chars_t & res_data) | ||
{ | ||
std::vector<char> expect_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will lead to allocation/deallocation inside a loop.
We can use PODArray with AllocatorWithStackMemory with for example, 64 bytes of automatic memory.
@@ -87,61 +87,45 @@ struct ExtractBool | |||
|
|||
struct ExtractRaw | |||
{ | |||
static void extract(const UInt8 * pos, const UInt8 * end, ColumnString::Chars_t & res_data) | |||
inline static void skipAfterQuotationIfNeed(const UInt8 *& pos, const UInt8 * end, ColumnString::Chars_t & res_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more clear if we just put this code where it is called.
PS. I think, it should be named skipAfterBackslashIfNeed
.
@@ -0,0 +1,31 @@ | |||
<test> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
materialize
should be added.
746e6e9
to
a1f2b9a
Compare
Done and update #2974 (comment) |
Ok. |
#2973
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en