Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUES-2973 support nested json struct for visitParamExtractRaw #2974

Merged

Conversation

zhang2014
Copy link
Contributor

#2973
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

@zhang2014
Copy link
Contributor Author

zhang2014 commented Aug 28, 2018

Take the best result of the three runs.

Compile Parameters:

cmake .. -DCMAKE_CXX_COMPILER=`which g++-8 ` -DCMAKE_C_COMPILER="`which gcc-8`" -DENABLE_TESTS=0  -DUSE_UNWIND=0 -DENABLE_CLICKHOUSE_COMPRESSOR=0 -DENABLE_MONGODB=1 -DENABLE_JEMALLOC=0 -DCMAKE_BUILD_TYPE=Release

Before

[
{
    "hostname": "i-2okyk0c6",
    "main_metric": "max_rows_per_second",
    "num_cores": 16,
    "num_threads": 16,
    "parameters": {
        "param": ["'{"myparam":"test_string"}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"]
    },
    "ram": 33736384512,
    "runs": [
        {
            "max_rows_per_second": 21752173.000000,
            "parameters": {
                "param": "'{"myparam":"test_string"}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":"test_string"}'), 'myparam'))"
        },
        {
            "max_rows_per_second": 7467469.000000,
            "parameters": {
                "param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
        },
        {
            "max_rows_per_second": 1936649.000000,
            "parameters": {
                "param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
        }
    ],
    "server_version": "18.10.3",
    "test_name": "visit_param_extract_raw",
    "time": "2018-08-31 15:18:16"
}
]

After

[
{
    "hostname": "i-2okyk0c6",
    "main_metric": "max_rows_per_second",
    "num_cores": 16,
    "num_threads": 16,
    "parameters": {
        "param": ["'{"myparam":"test_string"}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'", "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"]
    },
    "ram": 33736384512,
    "runs": [
        {
            "max_rows_per_second": 17394195.000000,
            "parameters": {
                "param": "'{"myparam":"test_string"}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":"test_string"}'), 'myparam'))"
        },
        {
            "max_rows_per_second": 7092361.000000,
            "parameters": {
                "param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
        },
        {
            "max_rows_per_second": 2552266.000000,
            "parameters": {
                "param": "'{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'"
            },
            "query": "SELECT count() FROM system.numbers WHERE NOT ignore(visitParamExtractRaw(materialize('{"myparam":{"nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2","nested_1":"test_string","nested_2":"test_2"}}'), 'myparam'))"
        }
    ],
    "server_version": "18.10.3",
    "test_name": "visit_param_extract_raw",
    "time": "2018-08-31 15:05:25"
}
]

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Aug 30, 2018

Thank you!

PS. The performance test you quote is incorrect because
visitParamExtractRaw('{"myparam":"test_string"}', 'myparam')
is a constant expression and it is calculated just once.

(it's obvious because otherwise you cannot get 2+ billion rows/sec. on single core.)

To be sure, you can transform a string to non-constant with materialize function:

visitParamExtractRaw(materialize('{"myparam":"test_string"}', 'myparam'))

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

res_data.push_back(*pos);
static void extract(const UInt8 * pos, const UInt8 * end, ColumnString::Chars_t & res_data)
{
std::vector<char> expect_end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will lead to allocation/deallocation inside a loop.
We can use PODArray with AllocatorWithStackMemory with for example, 64 bytes of automatic memory.

@@ -87,61 +87,45 @@ struct ExtractBool

struct ExtractRaw
{
static void extract(const UInt8 * pos, const UInt8 * end, ColumnString::Chars_t & res_data)
inline static void skipAfterQuotationIfNeed(const UInt8 *& pos, const UInt8 * end, ColumnString::Chars_t & res_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clear if we just put this code where it is called.
PS. I think, it should be named skipAfterBackslashIfNeed.

@@ -0,0 +1,31 @@
<test>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

materialize should be added.

@alexey-milovidov alexey-milovidov self-assigned this Aug 30, 2018
@zhang2014 zhang2014 force-pushed the fix/nested_json_for_visitRaw branch from 746e6e9 to a1f2b9a Compare August 31, 2018 07:25
@zhang2014
Copy link
Contributor Author

Done and update #2974 (comment)

@alexey-milovidov
Copy link
Member

Ok.
Performance drop is significant, but it's ok as the cost for correctness of this rather specific function.

@alexey-milovidov alexey-milovidov merged commit 087570e into ClickHouse:master Sep 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants