Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) #38809

Commits on Aug 2, 2024

  1. [feature](hive)Support reading renamed Parquet Hive and Orc Hive tabl…

    …es. (apache#38432)
    
    Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
    session variables to read the table after rename column in `Hive`.
    
    These two session variables are referenced from
    `parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
    connector.
    
    By default, these two session variables are true. When they are set to
    false, reading orc/parquet will access the columns according to the
    ordinal position in the Hive table definition.
    
    For example:
    ```mysql
    in Hive :
    hive> create table tmp (a int , b string) stored as parquet;
    hive> insert into table tmp values(1,"2");
    hive> alter table tmp  change column  a new_a int;
    hive> insert into table tmp values(2,"4");
    
    in Doris :
    mysql> set hive_parquet_use_column_names=true;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select  * from tmp;
    +-------+------+
    | new_a | b    |
    +-------+------+
    |  NULL | 2    |
    |     2 | 4    |
    +-------+------+
    2 rows in set (0.02 sec)
    
    mysql> set hive_parquet_use_column_names=false;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select  * from tmp;
    +-------+------+
    | new_a | b    |
    +-------+------+
    |     1 | 2    |
    |     2 | 4    |
    +-------+------+
    2 rows in set (0.02 sec)
    ```
    
    You can use `set
    parquet.column.index.access/orc.force.positional.evolution = true/false`
    in hive 3 to control the results of reading the table like these two
    session variables. However, for the rename struct inside column parquet
    table, the effects of hive and doris are different.
    hubgeter committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    f7255c4 View commit details
    Browse the repository at this point in the history