Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place new logic of schema inference in insert select from table function under setting #36275

Merged
merged 3 commits into from
Apr 19, 2022

Conversation

Avogar
Copy link
Member

@Avogar Avogar commented Apr 14, 2022

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Place new logic of schema inference in insert select from table function (#35760) under setting use_structure_from_insertion_table_in_table_functions and disable it by default

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 14, 2022
@alexey-milovidov
Copy link
Member

Also I tried to workaround from this behavior in three different ways, but none works:

play-eu :) INSERT INTO reviews SELECT * FROM (SELECT unixReviewTime::Date, overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow) SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000);

INSERT INTO reviews SELECT *
FROM
(
    SELECT
        CAST(unixReviewTime, 'Date'),
        overall,
        replaceAll(vote, ',', ''),
        verified,
        reviewerID,
        asin,
        reviewerName,
        reviewText,
        summary,
        image,
        style
    FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow)
    SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000
)

Query id: dfa5c4ef-47c5-4bbc-aa23-d07a7c08d4a2


0 rows in set. Elapsed: 0.012 sec. 

Received exception from server (version 22.4.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'unixReviewTime' while processing query: 'SELECT CAST(unixReviewTime, 'Date'), overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM file('reviews/All_Amazon_Review.json.gz', 'JSONEachRow') SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000', required columns: 'verified' 'unixReviewTime' 'reviewerID' 'overall' 'asin' 'reviewerName' 'reviewText' 'summary' 'image' 'vote' 'style', maybe you meant: ['verified','reviewerID','overall','asin','reviewerName','reviewText','summary','image','vote','style']. (UNKNOWN_IDENTIFIER)

play-eu :) INSERT INTO reviews SELECT * FROM view(SELECT unixReviewTime::Date, overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow) SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000);

INSERT INTO reviews SELECT *
FROM view(
    SELECT
        CAST(unixReviewTime, 'Date'),
        overall,
        replaceAll(vote, ',', ''),
        verified,
        reviewerID,
        asin,
        reviewerName,
        reviewText,
        summary,
        image,
        style
    FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow)
    SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000
)

Query id: 19dd7b62-5c43-469d-99ba-2205f1e3aefc


0 rows in set. Elapsed: 0.019 sec. 

Received exception from server (version 22.4.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'unixReviewTime' while processing query: 'SELECT CAST(unixReviewTime, 'Date'), overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM file('reviews/All_Amazon_Review.json.gz', 'JSONEachRow') SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000', required columns: 'verified' 'unixReviewTime' 'reviewerID' 'overall' 'asin' 'reviewerName' 'reviewText' 'summary' 'image' 'vote' 'style', maybe you meant: ['verified','reviewerID','overall','asin','reviewerName','reviewText','summary','image','vote','style']. (UNKNOWN_IDENTIFIER)

play-eu :) INSERT INTO reviews SELECT unixReviewTime::Date, overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM (SELECT * FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow) SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000);

INSERT INTO reviews SELECT
    CAST(unixReviewTime, 'Date'),
    overall,
    replaceAll(vote, ',', ''),
    verified,
    reviewerID,
    asin,
    reviewerName,
    reviewText,
    summary,
    image,
    style
FROM
(
    SELECT *
    FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow)
    SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000
)

Query id: a3c62625-114a-40ce-a208-9d3a80cd079a


0 rows in set. Elapsed: 0.011 sec. 

Received exception from server (version 22.4.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'unixReviewTime' while processing query: 'SELECT CAST(unixReviewTime, 'Date'), overall, replaceAll(vote, ',', ''), verified, reviewerID, asin, reviewerName, reviewText, summary, image, style FROM (SELECT * FROM file('reviews/All_Amazon_Review.json.gz', JSONEachRow) SETTINGS input_format_max_rows_to_read_for_schema_inference = 100000)', required columns: 'verified' 'unixReviewTime' 'reviewerID' 'overall' 'asin' 'reviewerName' 'reviewText' 'summary' 'image' 'vote' 'style' 'verified' 'unixReviewTime' 'reviewerID' 'overall' 'asin' 'reviewerName' 'reviewText' 'summary' 'image' 'vote' 'style'. (UNKNOWN_IDENTIFIER)

@alexey-milovidov
Copy link
Member

It's unclear for me why workarounds did not work.

@Avogar
Copy link
Member Author

Avogar commented Apr 19, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Apr 19, 2022

update

✅ Branch has been successfully updated

@Avogar Avogar merged commit 7fb7fc9 into ClickHouse:master Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-not-for-changelog This PR should not be mentioned in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants