Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIMESTAMP behaviour does not match sql standard #37

Closed
17 of 26 tasks
findepi opened this issue Jan 23, 2019 · 2 comments · Fixed by #4799 or #10963
Closed
17 of 26 tasks

TIMESTAMP behaviour does not match sql standard #37

findepi opened this issue Jan 23, 2019 · 2 comments · Fixed by #4799 or #10963
Assignees
Labels
bug Something isn't working roadmap Top level issues for major efforts in the project

Comments

@findepi
Copy link
Member

findepi commented Jan 23, 2019

Problem description

See below.

Roadmap

(Note: since GH doesn't send updates for edits, when adding a bullet please be sure sure to add a comment too.)

Original problem description

It seems like meaning of TIMESTAMP and TIMESTAMP WITH TIMEZONE datatypes in Presto is totally not what is specified by SQL standard (and what other databases do).

This is my understanding of SQL 2003 standard (4.6.2 Datetimes):

TIMESTAMP WITH TIMEZONE represents absolute point in time. Typically databases store it internally as seconds since epoch in some fixed timezone (usually UTC). When querying TIMESTAMP WITH TIMEZONE data the values are presented to user in session timezone (yet session timezone is used just for presentation purposes).

TIMESTAMP does not represent specific point in time, but rather a reading of a wall clock+calendar. Selecting values from TIMESTAMP column should return same result set no matter what is the client's session timezone.

While Presto semantics is different:

TIMESTAMP seems to do what TIMESTAMP WITH TIMEZONE should.

TIMESTAMP WITH TIMEZONE encodes explicit timezone information to each value stored in table. The sql standard does not define a type like that. But it does not seem very practical. Assuming that values selected from TIMESTAMP WITH TIMEZONE are presented to user in session timezone anyway, the per-row timezone information can be stripped away and all values can be stored in some arbitrary fixed timezone (e.g. UTC).

Please comment on the semantics. It seems wrong. Why the choice - as it is hard to believe that it was not done intentionally.

@haozhun clarifying comments

1

@losipiuk I agree with you that Timestamp w/o TZ in Presto is broken. I do NOT agree that Timestamp w TZ should behave like Instant. I believe it should also have an associated time zone. (In other word, I believe Timestamp w TZ is implemented correctly today.) Below is an excerpt of something I wrote early last year that summarizes the current behavior and my understanding.


To summarize how things work today:

  • Timestamp w TZ = DateTime in joda = ZonedDateTime in java8
  • Timestamp w/o TZ = Instant in joda = Instant in java8.
    • In other words, Timestamp w/o TZ represents an instant in time, they just don't have a timezone associated with them.
    • When you print them out for human consumption, you need to resort to user's session time zone.
    • When you turn it into a Timestamp w TZ, you just stick that time zone to it without changing the instant.

The way I understand it

  • Timestamp w TZ = DateTime in joda = ZonedDateTime in java8
  • Timestamp w/o TZ = LocalDateTime in joda = LocalDateTime in java8.
    • In other words, Timestamp w/o TZ represents a specific Year/Month/Day/Hour/Minute/Second. But it doesn't represent a specific instant. (Side note: this concept does not need expensive representation. It can still be represented as millis/nanos since epoch, observing only chronology rules and no tz rules)
    • When you print it out, it should be printed out as is.
    • If you want to turn it into a Timestamp w TZ, you need to resort to the user's session time zone for the missing piece of information (and this is not always possible because Timestamp w TZ has gaps).

Here is the reason I believe the first understanding is inconsistent. I can only think of one possible interpretation for the other 3 concepts:

  • Time w TZ = no single class correspondence in joda = OffsetTime in java8
    • Offset and Zone is different in that only constant offset exists for OffsetX. Political zone is not allowed.
    • ZonedTime doesn't make sense and doesn't exist. ZonedDateTime and OffsetDateTime both exists.
  • Time w/o TZ = LocalTime in joda = LocalTime in java8
  • Date = LocalDate in joda = LocalDate in java8

Note here the inconsistency between interpretation of Timestamp w/o TZ and Time w/o TZ if we adopt the first interpretation of Timestamp w/o TZ (Instant vs LocalTime). Whereas under the second interpretation, it will be consistent (LocalDateTime vs LocalTime).

I went to SQL spec for the definitive answer:

  • Abbreviations
    • SV is the source value, TV is the target value
    • UTC is the UTC component of SV or TV (if and only if the source or target has time zone)
    • TZ is the timezone displacement of SV or TV (if and only if the source or target has time zone)
    • STZD is the SQL-session default time zone displacement
  • To convert Timestamp w/o TZ to Timestamp w/ TZ: TV.UTC = SV - STZD; TV.TZ = STZD
  • To convert Timestamp w/ TZ to Timestamp w/o TZ: TV = SV.UTC + SV.TZ

I believe these two rules proves that SQL spec agrees with my interpretation. Let's consider cast from Timestamp w/o TZ to Timestamp w/ TZ

  • Cast from Timestamp w/o TZ
    • Given, SV = 3600000 millis (1 hour)
  • Cast to Timestamp w TZ in MPK
    • STZD = America/Los_Angeles
    • TV.UTC = 3600000 - (-28800000) = 32400000, TV.TZ=America/Los_Angeles
    • When written out in human format, TV is: 1970-01-01 01:00:00 America/Los_Angeles, TV.UTC is 1970-01-01 09:00:00
  • Cast to Timestamp w TZ in Shanghai
    • STZD = Asia/Shanghai
    • TV.UTC = 3600000 - 28800000 = -25200000, TV.TZ=Asia/Shanghai
    • When written out in human format, TV is: 1970-01-01 01:00:00 Asia/Shanghai, TV.UTC is 1969-12-31 17:00:00

Under first interpretation, these two cast should have yield results that are equal. Under second interpretation, they would produce different result. The rule in SQL spec produces two different results.

Lastly, a side note from me. Both interpretation can produce results that is dependent on user session time zone:

  • Under current Presto interpretation of Timestamp w/o TZ:
    • Same storage representation. Let's say 60000 milliseconds.
    • Prints out as 1969-12-31 16:01:00 in MPK, and 1970-01-01 08:01:00 in China. They are different.
    • When cast to Timestamp w TZ, become 1969-12-31 16:01:00 America/Los_Angeles in MPK, and 1970-01-01 08:01:00 Asia/Shanghai in China. They are equal.
  • Under SQL spec interpretation of Timestamp w/o TZ:
    • Same storage representation. Let's say 60000 milliseconds.
    • Prints out as 1970-01-01 00:01:00 in MPK, and 1970-01-01 00:01:00 in China. They are the same.
    • When cast to Timestamp w TZ, become 1970-01-01 00:01:00 America/Los_Angeles in MPK, and 1970-01-01 00:01:00 Asia/Shanghai in China. They are different.

Under the SQL spec, cast from timestamp w/o TZ to timestamp w/ TZ can produce different results based on user time zone. As a result, I guess this cast probably should NOT have been implicit.

2

@losipiuk, @dain, and I reached agreement:

  • Timestamp with Timezone in Presto is implemented properly today (like DateTime in joda, ZonedDateTime in jdk8).
  • Timestamp in Presto is like Instant in joda/jdk8 today. It should be like LocalDateTime in joda/jdk8.
  • Extracting hour from 2016-01-01 12:00:00 <TZ> should return 12 no matter what <TZ> is put in template.
  • As part of fixing Timestamp in Presto, we should remove implicit coercion from Timestamp to Timestamp with Timezone because the result value is environment dependent.

Notes

Ticket migrated from prestodb/presto#7122, prestodb/presto#10326

@findepi
Copy link
Member Author

findepi commented Aug 21, 2020

Happy to see this resolved! Thank you @dain.

@findepi
Copy link
Member Author

findepi commented Sep 28, 2020

There are some TODOs left in the code.

rice668 pushed a commit to rice668/trino that referenced this issue Jan 31, 2023
yuuteng added a commit to yuuteng/trino that referenced this issue Jun 30, 2023
# This is the 1st commit message:

Add Snowflake JDBC Connector

# This is the commit message #2:

Update trino snapshot version to 372

# This is the commit message trinodb#3:

Update format of the doc of snowflake

# This is the commit message trinodb#4:

Update trino jdbc library import

# This is the commit message trinodb#5:

Fix date formatter from yyyy to uuuu

# This is the commit message trinodb#6:

Fix date test case

# This is the commit message trinodb#7:

Remove defunct property allow-drop-table

# This is the commit message trinodb#8:

Update trino version to 374

# This is the commit message trinodb#9:

Update snowflake config to adapt 374

# This is the commit message trinodb#10:

Update the range of the test of Date type

# This is the commit message trinodb#11:

Update to version 375

# This is the commit message trinodb#12:

Fix snowflake after updating to 375

# This is the commit message trinodb#13:

Update to 381

# This is the commit message trinodb#14:

Fix mvn pom import

# This is the commit message trinodb#15:

Format snowflake.rst

# This is the commit message trinodb#16:

Reorderd Data tests in type mapping

# This is the commit message trinodb#17:

Update function code

# This is the commit message trinodb#18:

Add product test

# This is the commit message trinodb#19:

Rename product test tablename

# This is the commit message trinodb#20:

Add Env, Suite and properties of Snowflake for production test

# This is the commit message trinodb#21:

Add trinoCreateAndInsert()

# This is the commit message trinodb#22:

Refactor snowflake from single node to multi node

# This is the commit message trinodb#23:

Pass product tests

# This is the commit message trinodb#24:

Removed snowflake.properties in trino server dev

# This is the commit message trinodb#25:

Resolved issues 19 05 2022 and fixed tests

# This is the commit message trinodb#26:

Remove Types.VARBINARY

# This is the commit message trinodb#27:

Add private static SliceWriteFunction charWriteFunction

# This is the commit message trinodb#28:

Update test case

# This is the commit message trinodb#29:

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#30:

Update docs/src/main/sphinx/connector/snowflake.rst

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#31:

Update plugin/trino-snowflake/pom.xml

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#32:

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#33:

Resolved review open issues

# This is the commit message trinodb#34:

Disabled JDBC_TREAT_DECIMAL_AS_INT and fixed test case

# This is the commit message trinodb#35:

Updated properties file

# This is the commit message trinodb#36:

Updated properties files

# This is the commit message trinodb#37:

Renamed properties in Testing

# This is the commit message trinodb#38:

Revert "Renamed properties in Testing"

This reverts commit 82f9eb3f3811e8d90a482f5359e98e7c729afa17.

# This is the commit message trinodb#39:

Renamed properties and fixed tests

# This is the commit message trinodb#40:

Update the way to pass ENV values for production test

# This is the commit message trinodb#41:

Update trino version to 388

# This is the commit message trinodb#42:

Update Trino version 391

# This is the commit message trinodb#43:

Update trino version to 394

# This is the commit message trinodb#44:

Update to 395

# This is the commit message trinodb#45:

Update to 411

# This is the commit message trinodb#46:

Update and fix errors

# This is the commit message trinodb#47:

Build successfully with 411

# This is the commit message trinodb#48:

Adding Bloomberg Snowflake connector.

Fully tested with Trino 406. Untested with 410.

Fix version number problem with build.

Adding varchar type to mapping Snowflake.

Adding --add-opens to enable support for Apache Arrow via shared memory buffers.

Fixing some tests.

Fix TestSnowflakeConnectorTest.

TODO: testDataMappingSmokeTest: time and timestamp
testSetColumnTypes: time and timestamp

Fix type mapper

Fix testconnector

Remove unused argument from DeltaLakeMetastore.getTableLocation

Extract removeS3Directory into DeltaLakeTestUtils

Additionally, replace toUnmodifiableList with toImmutableList.

Extract method in HiveMetastoreBackedDeltaLakeMetastore

Don't install flush_metadata_cache procedure in Iceberg

The procedure was unusable because Iceberg connector always
disables caching metastore.

Flush transaction log cache in Delta flush_metadata_cache procedure

Co-Authored-By: Marius Grama <[email protected]>

Remove extra digest copy during Digest merge

Reduce number of TDigestHistogram allocations

On coordinator operators stats from all tasks
will be merged. It does make sense to perform
merging as bulk operation.

Tune JDBC fetch-size automatically based on column count

PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size
value of 1000. The value was found not to be optimal when server is far
(high latency) or when number of columns selected is low. This commit
improves in the latter case by picking fetch size automatically based on
number of columns projected. After the change, the fetch size will be
automatically picked in the range 1000 to 100,000.

Remove redundant LongDoubleState interface

Fix import of wrong Preconditions class

Test Iceberg cost-based plans with small files on TPC-DS

Test against unpartitioned small Parquet files

Upgrade Pinot libraries to 0.12.1

Simplify MaxDataSizeForStats and SumDataSizeForStats

Block#getEstimatedDataSizeForStats is well defined for null positions.
We can use this to replace NullableLongState with LongState.

Fix formatting and simplify condition in HiveMetadata

Skip listing Glue tables with invalid column types

Exclude all synthetic columns in applyProjection validation

DefaultJdbcMetadata#applyProjection already excludes the delete row id
from validation. Same is now done for the merge row id column as well.

Fail fast on unexpected case

Fail, instead of returning, on an impossible case that was supposed to
be handled earlier in a method.

Remove redundant accessor calls

Leverage information already had within the method.

Remove Iceberg, Delta $data system table

It was not intentional to expose a table's data as `a_table$data`
"system" table. This commit removes support for these tables.

Encapsulate table name class constructor

Encapsulate constructors of IcebergTableName and DeltaLakeTableName. The
classes are used primarily as utility classes. Constructor encapsulation
is preparation to convert them into proper utility classes.

Remove table name/type `from` parsing method

After recent changes it became used only in tests. This also converts
the IcebergTableName, DeltaLakeTableName into utility classes.

Add more mirrors

Future-proof against Ubuntu codename update

This approach works as long the following assumptions hold:
- the location of `/etc/os-release` file does NOT change
- the name of the UBUNTU_CODENAME environment variable does NOT change
- `eclipse-temurin` still uses Ubuntu as it's base

Bring Docker build.sh help message in-line with reality

Add retries in TestImpersonation

Decrease number of old JDBC drivers tested

Test parquet column mapping using name based and index based mapping

Change t_char in AbstractTestHiveFileFormats to unpartitioned column

Set thread name for DDL queries

Before the change, the DDL tasks where being executed with a thread name
of `dispatch-query-%d`.

Fix thread names in TestEventDrivenTaskSource

Remove unused constant in environment definition

Capture column names in LocalQueryRunner

Remove un-necessary projected call with assertThat in redshift connector

Introduce new methods projected & exceptColumns take string varargs

For better readability, replace projected(int... columns) with
projected(String... columnNamesToInclude) and introduce
exceptColumns(String... columnNamesToExclude) leveraging
MaterializedResult.getColumnNames

Access fields directly in DeltaLakeTableHandle.withProjectedColumns

This makes it consistent with other methods like that, in particular
with `DeltaLakeTableHandle.forOptimize`.

Provide schema name in exception when Delta table lacks metadata

Test Delta connector behavior for a corrupted table

Handle corrupted Delta Lake tables with explicit table handle

Previously, a corrupted table handle was represented with a
`DeltaLakeTableHandle` missing a `MetadataEntry`.  When drop support for
corrupted tables was implemented, a connector could have only one table
handle class.  This commit improves the distinction by introducing a
dedicated table handle class to represent corrupted tables. As a
necessity, this class is explicitly handled in multiple
`DeltaLakeMetadata` methods. This sets viable example to follow for
implementing drop support for corrupted Iceberg tables as a follow-up.

Note: `testGetInsertLayoutTableNotFound` test was removed, instead of
being updated, since `ConnectorMetadata.getInsertLayout` cannot be
reached for a corrupted table, as getting column list will fail earlier.

Remove unnecessary override method in phoenix metadata

Make queryModifier final

Refactor testRenameTableToLongTableName to remove Iceberg override

Refactor `testRenameTableToLongTableName` test so that Iceberg tests do
not have to override the test method.

Improve Iceberg test testCreateTableLikeForFormat code formatting

Improve retries in AbstractTestHiveViews

Previously we were retrying on "Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask". As CI shown, the return
code may vary, sometimes it is e.g. 1.

In the meantime we introduced broader retry patterns for Hive query
failures, so let's use these.

Fix TestIcebergInsert.testIcebergConcurrentInsert timeout on CI

The test seems dominated by `finishInsert` time. since
bf04a72 `finishInsert` is slower as we
commit twice (first committing data then statistics).

Reduce lock contention in SqlStage

Threads in the application were blocked on locks for a total of 4 h 19 min before this patch and 2 h 59 min after in
concurrent benchmark with 40 nodes and 64 queries in parallel.

Reduce lock contention in Query

Use keySet in execution-query-purger

Make setup of bucketed tables in HiveQueryRunner optional

Bucketed tables are unnecessary in many tests

Remove superfluous accumulator add function

 - Enhance the test case as well

Add test for partitioned by non-lowercase column in Delta

Fix failure when partition column contains uppercase in Iceberg

Remove duplicate getParquetType method from ParquetPageSourceFactory

Remove redundant boolean state from LongDecimalWithOverflowAndLongState

Reduce synchronization on PipelinedStageExecution

Removes synchronization from beginScheduling() and
transitionToSchedulingSplits() both of which only perform state machine
updates (if necessary) and do not require accessing synchronized state.

Add NullablePosition to SumDataSizeForStats

Avoids an extra null check as block.getEstimatedDataSizeForStats
will also check for null

Remove unused HiveHudiPartitionInfo.getTable method

Co-Authored-By: Will Zhang <[email protected]>

Support DELETE statement in Ignite connector

Add an example JDBC connector plugin

Fix typo

Split createDriverRunner method for partitioned and unpartitioned cases

Use checkArgument formatting in StatementUtils

Avoids an eager and unnecessary String.format call by letting
checkArgument perform the required formatting only when the check
fails.

Avoid String.format in ExpressionFormatter

Also replaces unnecessary usages of Guava's Joiner in favor of
Collectors.joining where appropriate.

Replace String.format with String concat

Replaces simple String.format usages in non-exceptional code paths
with simple string concatenations where applicable.

Fix bad parameter count in code generator when column uses two slots

Document hive.max-outstanding-splits-size property

Add missing groups to testMigrateHiveBucketedOnMultipleColumns

Remove unused updatedColumns from IcebergTableHandle

Remove OkHttp as a runtime dependency for the engine

Remove unused dependencies from discovery-server

These are not used in embedded mode.

Update to ASM 9.4

Update object storage definition in glossary

Improve size accounting of SingleLongDecimalWithOverflowState

Make classes in LongDecimalWithOverflowAndLongStateFactory private final

Improve size accounting of SingleLongDecimalWithOverflowAndLongState

Simplify DecimalAverageAggregation#inputShortDecimal

Remove unsed NullableBooleanState

Fix Kerberos ticket refresh

The Hadoop UGI class handles ticket refresh only if the Subject is not
provided externally. For external Subject UGI expects the refresh will
be handled by the creator of the Subject which in our case we did not
do.

Because of this before this change any Trino query which ran longer than
the ticket_lifetime failed with errors like

    GSS initiate failed [Caused by GSSException: No valid credentials
    provided (Mechanism level: Failed to find any Kerberos tgt)].

In Hadoop code the UGI instance also gets re-used in some places (e.g.
DFSClient) which means we cannot just create a new UGI with refreshed
credentials and return that since other parts of code will keep using
the old UGI with expired credentials. So the fix is to create a new UGI,
extract the credentials from it and update the existing UGI's
credentials with them so that all users of the existing UGI also observe
the new valid credentials.

Extend the list of status codes retried in client

This commit extends list of codes on which client will retry to:
 * HTTP_BAD_GATEWAY (502)
 * HTTP_UNAVAILABLE (503)
 * HTTP_GATEWAY_TIMEOUT(504)

Allow listening for single container events

Make environment listener always required

Used enhanced switch

Remove redundant local variable

Remove redundant throws

Use StringBuilder instead of string concatenation

Fix typo in hive parquet doc

Add test for trailing space in location in hive metadata

Remove unnecessary and brittle tests

Column names coming out of the query are not necessarily
related to the column names in the table function. These
tests are testing behavior that is not necessarily expected
or guaranteed, so they are brittle and can break at any time.

A couple of reasons why it's problematic:
* Trino doesn't (yet) follow standard SQL identifier semantics. The
  column names might change between the output of the table function
  and the query output
* At the query output all columns have names. Within the query they
  might not. A table function can produce an anonymous column, but
  the test will see "_col0".

Upgrade Confluent version to 7.3.1

Updates transitive dependencies for Avro and ZooKeeper.
Wire 4.x is required for Confluent 7.3.1 and is updated
in the modules that need it, but leaves Wire at 3.x for
the remaining modules.

Fix potential Kerberos failure with SymlinkTextInputFormat

Add benchmark for array filter object

Optimize filter function performance with copyPositions

Before the change:
Benchmark                             (name)  Mode  Cnt   Score   Error  Units
BenchmarkArrayFilter.benchmark        filter  avgt   20  22.543 ± 0.979  ns/op
BenchmarkArrayFilter.benchmarkObject  filter  avgt   20  42.045 ± 2.088  ns/op

After the change:
Benchmark                             (name)  Mode  Cnt   Score   Error  Units
BenchmarkArrayFilter.benchmark        filter  avgt   20  13.327 ± 0.359  ns/op
BenchmarkArrayFilter.benchmarkObject  filter  avgt   20  34.443 ± 1.943  ns/op

Add quantile_at_value function

Co-authored-by: Peizhen Guo <[email protected]>

Use parent partitioning for aggregations

If parent partitioning provides enough parallelism,
and is a subset of the current node preferred
partitioning (grouping keys for the aggregation) we
can use the parent partitioning to skip data shuffle
required by the parent.

Extract MappedPageSource and MappedRecordSet to toolkit

Introduce BaseJdbcConnectorTableHandle

Extract methods in BaseJdbcClient

These methods can be reused for Procedures PTF
- Extract building columns from ResultSetMetaData as a separate method.
- Extract creating connection based on session

Add table function to execute stored procedure in SQLServer

Use URI path for Glue location in tests

Glue started throwing "InvalidInputException: One or more inputs failed validation"
when getting a table if the table location doesn't have "file:" prefix
in case of local file system.

Test trino-main with JDK 20

Clarify comment in BigQuery ReadSessionCreator

Consistently handle table types across BigQuery connector

This also fixes a bug where createEmptyProjection failed for non-TABLE
and non-VIEW even though those could be supported.

Combine some redundant tests in BigQuery

Remove duplicate test case

Disable CSV quoting when quote character is zero

Disable CSV escaping when escape character is zero

Fix race condition in the hive table stats cache

putIfAbsent method is not implemented in the EvictableCache
because of race condition with invalidation so to avoid the race
condition we use AtomicReference that at some cases can be thrown
away, but it makes cached value fresh even if invalidation happens
during value load

Provide convenience overload to get MV storage table in test

Setup global state before test methods

`storageSchemaName` is defined on class level, so the storage schema
should be created in `@BeforeClass`, not within a test.

Allow Iceberg MV with partitioning transforms on timestamptz

Allow creation of Iceberg Materialized Views partitioned with a
temporal partitioning function on a `timestamp with time zone` column.

In MVs, the `timestamp with time zone` columns are generally stored as
text to preserve time zone information. However, this prevents use of
temporal partitioning functions on these columns. The commit keeps
`timestamp with time zone` columns with partitioning applied on them as
`timestamp with time zone` in the storage table.

An obvious downside to this approach is that the time zone information
is erased and it is not known whether this aligns with user intention or
not. A better solution would be to introduce a point-in-time type
(trinodb#2273) to discern between the
cases where time zone information is important (like Java's
`ZonedDateTime`) from cases where only point-in-time matters (like
Java's `Instant`).

Remove backticks from backtick-unrelated test cases

They were probably copied over from the preceding backtick test case.

Reuse TrackingFileSystemFactory between connectors

Move TrackingFileSystemFactory out of Iceberg tests to allow reuse e.g.
with Delta Lake tests.

Refactor Delta file operations tests to encapsulate checked code

Pair tested operation and expected filesystem access counts in a single
assertion calls. Similar to how it's done in
`TestIcebergMetadataFileOperations`.

Convert TestIcebergMetadataFileOperations helper to record

Add Trino 411 release notes

[maven-release-plugin] prepare release 411

[maven-release-plugin] prepare for next development iteration

Enhance test for managed and external delta table location validation

The purpose of deleting the transaction log directory is solely to confirm
that when the DROP TABLE command is used, the table location is also removed
when table is MANAGED TABLE.

Improve naming of methods and fields to match Trino concepts

Fix incorrect result when hidden directory exist in migrate procedure

Add support for ADD COLUMN in Ignite connector

Use a more specific name for all connectors smoke test suite

Document the all connectors smoke test suite

Verify pass-through specifications in tests

Add more detailed check for TableFunctionProcessorNode.passThroughSpecifications
in TableFunctionProcessorMatcher.

Prune unreferenced pass-through columns of table function

Verify required columns in tests

Add check for TableFunctionProcessorNode.requiredSymbols
in TableFunctionProcessorMatcher.

Verify hashSymbol in tests

Add check for TableFunctionProcessorNode.hashSymbol
in TableFunctionProcessorMatcher.

Prune unreferenced columns of table function source

Test table function column pruning in query plan

Remove table function with empty source

Adds an optimizer rule to remove TableFunctionProcessorNode with
source being an empty relation, based on the "prune when empty"
property.

Test pruning of redundant table function in query plan

Test table functions with column and node pruning optimizations

Fix typo in TestDeltaLakePerTransactionMetastoreCache

Extract assertMetastoreInvocations method

Use CountingAccessHiveMetastore in TestDeltaLakePerTransactionMetastoreCache

Make cleanup methods alwaysRun

This attribute says that this after-method will get executed even if the
methods executed previously failed or were skipped. Note that it also
applies to skipped test methods, so if the tests were skipped for some
reason, the cleanup won't run. This attribute will ensure that the
clean-up will run even in this case.

Failure to run clean up may cause secondary effects, especially our
resource leak detector; failure on this will in turn mask other errors,
the ones which caused the tests to be skipped in the first place.

Add a check to enforce alwaysRun = true on test after-methods

See the previous commit for details. This check will enforce that the
`alwaysRun = true` is present.

Remove redundant call toString()

Remove use of deprecated isEqualToComparingFieldByFieldRecursively

Use usingRecursiveComparison instead deprecated isEqualToComparingFieldByFieldRecursively

Remove unused helper methods in delta-lake connector

Add an explict config to define standardSplitSizeInBytes in FTE

Implement adaptive task sizing for arbitrary distribution in FTE

Improve task sizing for hash distribution in FTE

Round up targetPartitionSizeInBytes to a multiple of minTargetPartitionSizeInBytes

For adaptive task sizing in ArbitraryDistributionSplitAssigner

Adjust retry policy for dropping delta tables backed by AWS Glue on Databricks

Fix output rendering issue in docs

Alphabetize glossary entries

Add use_cost_based_partitioning

Use use_cost_based_partitioning instead of use_exact_partitioning to
control the cost based optimization to prefer parent partitioning.
The motivation is to be able to disable the optimization if the NDV
statistics are overestimated and the optimization would hurt parallelism.

Provide injection mechanism for the file system factory

Reorder instance and static fields in FlushMetadataCacheProcedure

Flush extended statistics in Delta's flush_metadata_cache()

Clean up Delta code a bit

Test Delta Lake query file system accesses

Ensure that TestingHydraIdentityProvider is releasing resources

Update maven to 3.9.1

Expose rule stats in QueryStats

Expose optimizer rule statistics per query in QueryInfo JSON. The number of rules exposed could be
adjusted using the `query.reported-rule-stats-limit` configuration parameter.

Cleanup BigintGroupByHash instanceof checks

Include QueryId in assertDistrubutedQuery failure message

Remove TestClickHouseConnectorTest

There are 4 smoke tests and 2 connectors test.
Remove TestClickHouseConnectorTest as a redundant test.

Remove base class for ClickHouse connector test

Run smoke test for BigQuery arrow serialization

We want to verify SELECT behavior for Arrow serialization in BigQuery.
Smoke test and the existing type mapping test should be enough.

Make construction parameters final

Support arithmetic predicate pushdown for Phoenix

Make LeafTableFunctionOperator handle EmptySplit correctly

Remove CatalogHandle from TableFunctionProcessorNode

Exclude snakeyaml from elasticsearch dependencies

It's a transitive dependency of elasticsearch-x-content,
which we use in ElasticsearchLoader to load tpch data
to Elasticsearch with json encoding. Yaml support is not
needed at all.

Pass partition values as optional to DeltaLakePageSource

The partition values list is filled only when row ID column is
projected, so it's a conditional information. When row ID is not
present, pass it as the empty optional, rather than list that happens to
be empty.

Add cleaner FixedPageSource constructors

Previously, the only constructor would take `Iterable`, which is nice,
but it would also materialize it twice (once in the constructor to
calculate memory usage).

The commit adds a constructor taking a `List` (so double iteration is not
a problem) and one taking `Iterator` and delivering on the promise to
iterate once.

The old constructor is kept deprecated, but apparently all usages use
the new, list-based constructor.

Project a data column in MinIO access test

Read a data column to ensure the data file gets read.
This increases number of accesses to a file, because both footer and
data are being read.

Accelerate Delta when reading partition columns only

Regenerate expected test plans with one-click

Traverse into JsonObject members in AST

Before this change, JsonObject members were not visited
in AstVisitor. As a result, aggregations or parameters
inside the  members were not supported.

Traverse into JsonArray elements in AST

Before this change, JsonArray elements were not visited
in AstVisitor. As a result, aggregations or parameters
inside the elements were not supported.

Document avro.schema.literal property use for interpreting table data

Update Oracle JDBC driver version to 21.9.0.0

Document predicate pushdown support for string-type columns in SQL Server

Enable oracle.remarks-reporting.enabled in connector test

Remove unnecessary wrapping of IOException in TransactionLogTail

Translate `The specified key does not exist` to FileNotFoundException

Relax test assertion

It is possible for more than one task to fail due to injected failure.

Remove obsolete assertion

Lowercase bucketing and sort column names

In the metastore, the bucketing and sorting column names can differ
in case from its corresponding table column names.
This change makes certain that, even though a table can be
delivered by the metastore with such inconsistencies, Trino will lowercase
the same bucketing and sort column names to ensure they correspond to the
data column names.

Add test for RenameColumnTask

Migrate assertStatement in TestSqlParser.testRenameColumn

Allow configuring a custom DNS resolver for the JDBC driver

Reorganize Hudi connector documentation

Migrate some assertExpression in TestSqlParser

Look for a non-Trino protocol without using X-User-*

Update ASM to 9.5

Reorganize Iceberg connector documentation

Reorganize Delta Lake connector documentation

Fix handling of Hive ACID tables with hidden directories

Test CREATE TABLE AS SELECT in Ignite type mapping

Additionally, check time zones in setUp method.

Override equals and hashCode in Delta Identifier

Change Object to Row in testCheckConstraintCompatibility

Support arithmetic binary in Delta check constraints

Introduce EmptyTableFunctionHandle as default handle

Before this change, if a table function did not pass
a ConnectorTableFunctionHandle in the TableFunctionAnalysis,
the default handle was used, which was an anonymous
implementation of ConnectorTableFunctionHandle.

It did not work with table functions executed by operator,
due to lack of serialization.

This change introduces EmptyTableFunctionHandle, and sets
it as default.

Support returning anonymous columns by table functions

Per SQL standard, all columns must be named. In Trino,
we support unnamed columns.
This change adjusts table functions so that they can return
anonymous columns.
It is achieved by changing the Descriptor structure so that
field name is optional. This optionality can only be used
for the returned type of table functions. Descriptor arguments
pased by the user as well as default values for descriptor
argumens have mandatory field names.

Add table function `exclude_columns`

Bump spotbugs-annotations version

Airbase already has `4.7.3`

Remove ValidateLimitWithPresortedInput

It's not powerful enough to validate properties of plans
that get modified by predicate pushdown after AddExchanges
runs, resulting in false positives such as trinodb#16768

Use OrcReader#MAX_BATCH_SIZE = 8 * 1024

Previous value 8196 was bigger than PageProcessor#MAX_BATCH_SIZE
causing PageProcessor to create small, 4 position pages every other page.

Bring DistinguishedNameParser from okhttp3

Inline OkHostnameVerifier and Util.verifyAsIpAddress

They were removed in OkHttp 4.x and we still rely on the legacy SSL hostname verification

Update okhttp to 4.10.0

Fix testTableWithNonNullableColumns to update NOT NULL column in Delta

Add  for support creating table with comment for more Jdbc based connectors (trinodb#16135)

Allow PostgreSQL create table with comment

Also allow to set a comment for PostgreSQL tables

Support sum(distinct) for jdbc connectors

Add Trino 412 release notes

[maven-release-plugin] prepare release 412

[maven-release-plugin] prepare for next development iteration

Add doc for ignite join pushdown

Add missing config properties to Hive docs

Co-Authored-By: Marius Grama <[email protected]>

Fix layout in Iceberg documentation

Add docs for property to skip glue archive

Fix typo

Support nested timestamp with time zone in Delta Lake

Add test for duplicated partition statistics on Thrift metastore

Support table comment for oracle connector

Allow Oracle create table with comment

Also allow to set table comment for Oracle tables

Support MERGE for Phoenix connector

Remove unused class FallbackToFullNodePartitionMemoryEstimator.

Remove unnecessary dependency management in trino-pinot

The `protobuf-java` one was overriding a corresponding declaration in
the parent POM, and was effectively downgrading it. The other two were
not used at all.

Bump Protobuf version

Add Snowflake JDBC Connector

Update trino snapshot version to 372

Update format of the doc of snowflake

Update trino jdbc library import

Fix date formatter from yyyy to uuuu

Fix date test case

Remove defunct property allow-drop-table

Update trino version to 374

Update snowflake config to adapt 374

Update the range of the test of Date type

Update to version 375

Fix snowflake after updating to 375

Update to 381

Fix mvn pom import

Format snowflake.rst

Reorderd Data tests in type mapping

Update function code

Add product test

Rename product test tablename

Add Env, Suite and properties of Snowflake for production test

Add trinoCreateAndInsert()

Refactor snowflake from single node to multi node

Pass product tests

Removed snowflake.properties in trino server dev

Resolved issues 19 05 2022 and fixed tests

Remove Types.VARBINARY

Add private static SliceWriteFunction charWriteFunction

Update test case

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>

Update docs/src/main/sphinx/connector/snowflake.rst

Co-authored-by: Yuya Ebihara <[email protected]>

Update plugin/trino-snowflake/pom.xml

Co-authored-by: Yuya Ebihara <[email protected]>

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>

Resolved review open issues

Disabled JDBC_TREAT_DECIMAL_AS_INT and fixed test case

Updated properties file

Updated properties files

Renamed properties in Testing

Revert "Renamed properties in Testing"

This reverts commit 82f9eb3f3811e8d90a482f5359e98e7c729afa17.

Renamed properties and fixed tests

Update the way to pass ENV values for production test

Update trino version to 388

Update Trino version 391

Update trino version to 394

Update to 395

Update to 411

Update and fix errors

Build successfully with 411

Adding Bloomberg Snowflake connector.

Fully tested with Trino 406. Untested with 410.

Adding varchar type to mapping Snowflake.

Adding --add-opens to enable support for Apache Arrow via shared memory buffers.

Fixing some tests.

Fix TestSnowflakeConnectorTest.

TODO: testDataMappingSmokeTest: time and timestamp
testSetColumnTypes: time and timestamp

Fix type mapper

Fix testconnector

Update version to 413-SNAPSHOT

Added support for HTTP_PROXY testing.

Connector doesnt support setColumnType. This causes lots of problems with Snowflake server.

Disabled the testsetColumnTypes testing.

Fixed and skipped error tests
rice668 added a commit to rice668/trino that referenced this issue Jul 24, 2023
yuuteng added a commit to yuuteng/trino that referenced this issue Dec 4, 2023
# This is the 1st commit message:

Add Snowflake JDBC Connector

# This is the commit message #2:

Update trino snapshot version to 372

# This is the commit message trinodb#3:

Update format of the doc of snowflake

# This is the commit message trinodb#4:

Update trino jdbc library import

# This is the commit message trinodb#5:

Fix date formatter from yyyy to uuuu

# This is the commit message trinodb#6:

Fix date test case

# This is the commit message trinodb#7:

Remove defunct property allow-drop-table

# This is the commit message trinodb#8:

Update trino version to 374

# This is the commit message trinodb#9:

Update snowflake config to adapt 374

# This is the commit message trinodb#10:

Update the range of the test of Date type

# This is the commit message trinodb#11:

Update to version 375

# This is the commit message trinodb#12:

Fix snowflake after updating to 375

# This is the commit message trinodb#13:

Update to 381

# This is the commit message trinodb#14:

Fix mvn pom import

# This is the commit message trinodb#15:

Format snowflake.rst

# This is the commit message trinodb#16:

Reorderd Data tests in type mapping

# This is the commit message trinodb#17:

Update function code

# This is the commit message trinodb#18:

Add product test

# This is the commit message trinodb#19:

Rename product test tablename

# This is the commit message trinodb#20:

Add Env, Suite and properties of Snowflake for production test

# This is the commit message trinodb#21:

Add trinoCreateAndInsert()

# This is the commit message trinodb#22:

Refactor snowflake from single node to multi node

# This is the commit message trinodb#23:

Pass product tests

# This is the commit message trinodb#24:

Removed snowflake.properties in trino server dev

# This is the commit message trinodb#25:

Resolved issues 19 05 2022 and fixed tests

# This is the commit message trinodb#26:

Remove Types.VARBINARY

# This is the commit message trinodb#27:

Add private static SliceWriteFunction charWriteFunction

# This is the commit message trinodb#28:

Update test case

# This is the commit message trinodb#29:

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#30:

Update docs/src/main/sphinx/connector/snowflake.rst

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#31:

Update plugin/trino-snowflake/pom.xml

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#32:

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>
# This is the commit message trinodb#33:

Resolved review open issues

# This is the commit message trinodb#34:

Disabled JDBC_TREAT_DECIMAL_AS_INT and fixed test case

# This is the commit message trinodb#35:

Updated properties file

# This is the commit message trinodb#36:

Updated properties files

# This is the commit message trinodb#37:

Renamed properties in Testing

# This is the commit message trinodb#38:

Revert "Renamed properties in Testing"

This reverts commit 82f9eb3f3811e8d90a482f5359e98e7c729afa17.

# This is the commit message trinodb#39:

Renamed properties and fixed tests

# This is the commit message trinodb#40:

Update the way to pass ENV values for production test

# This is the commit message trinodb#41:

Update trino version to 388

# This is the commit message trinodb#42:

Update Trino version 391

# This is the commit message trinodb#43:

Update trino version to 394

# This is the commit message trinodb#44:

Update to 395

# This is the commit message trinodb#45:

Update to 411

# This is the commit message trinodb#46:

Update and fix errors

# This is the commit message trinodb#47:

Build successfully with 411

# This is the commit message trinodb#48:

Adding Bloomberg Snowflake connector.

Fully tested with Trino 406. Untested with 410.

Fix version number problem with build.

Adding varchar type to mapping Snowflake.

Adding --add-opens to enable support for Apache Arrow via shared memory buffers.

Fixing some tests.

Fix TestSnowflakeConnectorTest.

TODO: testDataMappingSmokeTest: time and timestamp
testSetColumnTypes: time and timestamp

Fix type mapper

Fix testconnector

Remove unused argument from DeltaLakeMetastore.getTableLocation

Extract removeS3Directory into DeltaLakeTestUtils

Additionally, replace toUnmodifiableList with toImmutableList.

Extract method in HiveMetastoreBackedDeltaLakeMetastore

Don't install flush_metadata_cache procedure in Iceberg

The procedure was unusable because Iceberg connector always
disables caching metastore.

Flush transaction log cache in Delta flush_metadata_cache procedure

Co-Authored-By: Marius Grama <[email protected]>

Remove extra digest copy during Digest merge

Reduce number of TDigestHistogram allocations

On coordinator operators stats from all tasks
will be merged. It does make sense to perform
merging as bulk operation.

Tune JDBC fetch-size automatically based on column count

PostgreSQL, Redshift and Oracle connectors had hard-coded fetch-size
value of 1000. The value was found not to be optimal when server is far
(high latency) or when number of columns selected is low. This commit
improves in the latter case by picking fetch size automatically based on
number of columns projected. After the change, the fetch size will be
automatically picked in the range 1000 to 100,000.

Remove redundant LongDoubleState interface

Fix import of wrong Preconditions class

Test Iceberg cost-based plans with small files on TPC-DS

Test against unpartitioned small Parquet files

Upgrade Pinot libraries to 0.12.1

Simplify MaxDataSizeForStats and SumDataSizeForStats

Block#getEstimatedDataSizeForStats is well defined for null positions.
We can use this to replace NullableLongState with LongState.

Fix formatting and simplify condition in HiveMetadata

Skip listing Glue tables with invalid column types

Exclude all synthetic columns in applyProjection validation

DefaultJdbcMetadata#applyProjection already excludes the delete row id
from validation. Same is now done for the merge row id column as well.

Fail fast on unexpected case

Fail, instead of returning, on an impossible case that was supposed to
be handled earlier in a method.

Remove redundant accessor calls

Leverage information already had within the method.

Remove Iceberg, Delta $data system table

It was not intentional to expose a table's data as `a_table$data`
"system" table. This commit removes support for these tables.

Encapsulate table name class constructor

Encapsulate constructors of IcebergTableName and DeltaLakeTableName. The
classes are used primarily as utility classes. Constructor encapsulation
is preparation to convert them into proper utility classes.

Remove table name/type `from` parsing method

After recent changes it became used only in tests. This also converts
the IcebergTableName, DeltaLakeTableName into utility classes.

Add more mirrors

Future-proof against Ubuntu codename update

This approach works as long the following assumptions hold:
- the location of `/etc/os-release` file does NOT change
- the name of the UBUNTU_CODENAME environment variable does NOT change
- `eclipse-temurin` still uses Ubuntu as it's base

Bring Docker build.sh help message in-line with reality

Add retries in TestImpersonation

Decrease number of old JDBC drivers tested

Test parquet column mapping using name based and index based mapping

Change t_char in AbstractTestHiveFileFormats to unpartitioned column

Set thread name for DDL queries

Before the change, the DDL tasks where being executed with a thread name
of `dispatch-query-%d`.

Fix thread names in TestEventDrivenTaskSource

Remove unused constant in environment definition

Capture column names in LocalQueryRunner

Remove un-necessary projected call with assertThat in redshift connector

Introduce new methods projected & exceptColumns take string varargs

For better readability, replace projected(int... columns) with
projected(String... columnNamesToInclude) and introduce
exceptColumns(String... columnNamesToExclude) leveraging
MaterializedResult.getColumnNames

Access fields directly in DeltaLakeTableHandle.withProjectedColumns

This makes it consistent with other methods like that, in particular
with `DeltaLakeTableHandle.forOptimize`.

Provide schema name in exception when Delta table lacks metadata

Test Delta connector behavior for a corrupted table

Handle corrupted Delta Lake tables with explicit table handle

Previously, a corrupted table handle was represented with a
`DeltaLakeTableHandle` missing a `MetadataEntry`.  When drop support for
corrupted tables was implemented, a connector could have only one table
handle class.  This commit improves the distinction by introducing a
dedicated table handle class to represent corrupted tables. As a
necessity, this class is explicitly handled in multiple
`DeltaLakeMetadata` methods. This sets viable example to follow for
implementing drop support for corrupted Iceberg tables as a follow-up.

Note: `testGetInsertLayoutTableNotFound` test was removed, instead of
being updated, since `ConnectorMetadata.getInsertLayout` cannot be
reached for a corrupted table, as getting column list will fail earlier.

Remove unnecessary override method in phoenix metadata

Make queryModifier final

Refactor testRenameTableToLongTableName to remove Iceberg override

Refactor `testRenameTableToLongTableName` test so that Iceberg tests do
not have to override the test method.

Improve Iceberg test testCreateTableLikeForFormat code formatting

Improve retries in AbstractTestHiveViews

Previously we were retrying on "Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask". As CI shown, the return
code may vary, sometimes it is e.g. 1.

In the meantime we introduced broader retry patterns for Hive query
failures, so let's use these.

Fix TestIcebergInsert.testIcebergConcurrentInsert timeout on CI

The test seems dominated by `finishInsert` time. since
bf04a72 `finishInsert` is slower as we
commit twice (first committing data then statistics).

Reduce lock contention in SqlStage

Threads in the application were blocked on locks for a total of 4 h 19 min before this patch and 2 h 59 min after in
concurrent benchmark with 40 nodes and 64 queries in parallel.

Reduce lock contention in Query

Use keySet in execution-query-purger

Make setup of bucketed tables in HiveQueryRunner optional

Bucketed tables are unnecessary in many tests

Remove superfluous accumulator add function

 - Enhance the test case as well

Add test for partitioned by non-lowercase column in Delta

Fix failure when partition column contains uppercase in Iceberg

Remove duplicate getParquetType method from ParquetPageSourceFactory

Remove redundant boolean state from LongDecimalWithOverflowAndLongState

Reduce synchronization on PipelinedStageExecution

Removes synchronization from beginScheduling() and
transitionToSchedulingSplits() both of which only perform state machine
updates (if necessary) and do not require accessing synchronized state.

Add NullablePosition to SumDataSizeForStats

Avoids an extra null check as block.getEstimatedDataSizeForStats
will also check for null

Remove unused HiveHudiPartitionInfo.getTable method

Co-Authored-By: Will Zhang <[email protected]>

Support DELETE statement in Ignite connector

Add an example JDBC connector plugin

Fix typo

Split createDriverRunner method for partitioned and unpartitioned cases

Use checkArgument formatting in StatementUtils

Avoids an eager and unnecessary String.format call by letting
checkArgument perform the required formatting only when the check
fails.

Avoid String.format in ExpressionFormatter

Also replaces unnecessary usages of Guava's Joiner in favor of
Collectors.joining where appropriate.

Replace String.format with String concat

Replaces simple String.format usages in non-exceptional code paths
with simple string concatenations where applicable.

Fix bad parameter count in code generator when column uses two slots

Document hive.max-outstanding-splits-size property

Add missing groups to testMigrateHiveBucketedOnMultipleColumns

Remove unused updatedColumns from IcebergTableHandle

Remove OkHttp as a runtime dependency for the engine

Remove unused dependencies from discovery-server

These are not used in embedded mode.

Update to ASM 9.4

Update object storage definition in glossary

Improve size accounting of SingleLongDecimalWithOverflowState

Make classes in LongDecimalWithOverflowAndLongStateFactory private final

Improve size accounting of SingleLongDecimalWithOverflowAndLongState

Simplify DecimalAverageAggregation#inputShortDecimal

Remove unsed NullableBooleanState

Fix Kerberos ticket refresh

The Hadoop UGI class handles ticket refresh only if the Subject is not
provided externally. For external Subject UGI expects the refresh will
be handled by the creator of the Subject which in our case we did not
do.

Because of this before this change any Trino query which ran longer than
the ticket_lifetime failed with errors like

    GSS initiate failed [Caused by GSSException: No valid credentials
    provided (Mechanism level: Failed to find any Kerberos tgt)].

In Hadoop code the UGI instance also gets re-used in some places (e.g.
DFSClient) which means we cannot just create a new UGI with refreshed
credentials and return that since other parts of code will keep using
the old UGI with expired credentials. So the fix is to create a new UGI,
extract the credentials from it and update the existing UGI's
credentials with them so that all users of the existing UGI also observe
the new valid credentials.

Extend the list of status codes retried in client

This commit extends list of codes on which client will retry to:
 * HTTP_BAD_GATEWAY (502)
 * HTTP_UNAVAILABLE (503)
 * HTTP_GATEWAY_TIMEOUT(504)

Allow listening for single container events

Make environment listener always required

Used enhanced switch

Remove redundant local variable

Remove redundant throws

Use StringBuilder instead of string concatenation

Fix typo in hive parquet doc

Add test for trailing space in location in hive metadata

Remove unnecessary and brittle tests

Column names coming out of the query are not necessarily
related to the column names in the table function. These
tests are testing behavior that is not necessarily expected
or guaranteed, so they are brittle and can break at any time.

A couple of reasons why it's problematic:
* Trino doesn't (yet) follow standard SQL identifier semantics. The
  column names might change between the output of the table function
  and the query output
* At the query output all columns have names. Within the query they
  might not. A table function can produce an anonymous column, but
  the test will see "_col0".

Upgrade Confluent version to 7.3.1

Updates transitive dependencies for Avro and ZooKeeper.
Wire 4.x is required for Confluent 7.3.1 and is updated
in the modules that need it, but leaves Wire at 3.x for
the remaining modules.

Fix potential Kerberos failure with SymlinkTextInputFormat

Add benchmark for array filter object

Optimize filter function performance with copyPositions

Before the change:
Benchmark                             (name)  Mode  Cnt   Score   Error  Units
BenchmarkArrayFilter.benchmark        filter  avgt   20  22.543 ± 0.979  ns/op
BenchmarkArrayFilter.benchmarkObject  filter  avgt   20  42.045 ± 2.088  ns/op

After the change:
Benchmark                             (name)  Mode  Cnt   Score   Error  Units
BenchmarkArrayFilter.benchmark        filter  avgt   20  13.327 ± 0.359  ns/op
BenchmarkArrayFilter.benchmarkObject  filter  avgt   20  34.443 ± 1.943  ns/op

Add quantile_at_value function

Co-authored-by: Peizhen Guo <[email protected]>

Use parent partitioning for aggregations

If parent partitioning provides enough parallelism,
and is a subset of the current node preferred
partitioning (grouping keys for the aggregation) we
can use the parent partitioning to skip data shuffle
required by the parent.

Extract MappedPageSource and MappedRecordSet to toolkit

Introduce BaseJdbcConnectorTableHandle

Extract methods in BaseJdbcClient

These methods can be reused for Procedures PTF
- Extract building columns from ResultSetMetaData as a separate method.
- Extract creating connection based on session

Add table function to execute stored procedure in SQLServer

Use URI path for Glue location in tests

Glue started throwing "InvalidInputException: One or more inputs failed validation"
when getting a table if the table location doesn't have "file:" prefix
in case of local file system.

Test trino-main with JDK 20

Clarify comment in BigQuery ReadSessionCreator

Consistently handle table types across BigQuery connector

This also fixes a bug where createEmptyProjection failed for non-TABLE
and non-VIEW even though those could be supported.

Combine some redundant tests in BigQuery

Remove duplicate test case

Disable CSV quoting when quote character is zero

Disable CSV escaping when escape character is zero

Fix race condition in the hive table stats cache

putIfAbsent method is not implemented in the EvictableCache
because of race condition with invalidation so to avoid the race
condition we use AtomicReference that at some cases can be thrown
away, but it makes cached value fresh even if invalidation happens
during value load

Provide convenience overload to get MV storage table in test

Setup global state before test methods

`storageSchemaName` is defined on class level, so the storage schema
should be created in `@BeforeClass`, not within a test.

Allow Iceberg MV with partitioning transforms on timestamptz

Allow creation of Iceberg Materialized Views partitioned with a
temporal partitioning function on a `timestamp with time zone` column.

In MVs, the `timestamp with time zone` columns are generally stored as
text to preserve time zone information. However, this prevents use of
temporal partitioning functions on these columns. The commit keeps
`timestamp with time zone` columns with partitioning applied on them as
`timestamp with time zone` in the storage table.

An obvious downside to this approach is that the time zone information
is erased and it is not known whether this aligns with user intention or
not. A better solution would be to introduce a point-in-time type
(trinodb#2273) to discern between the
cases where time zone information is important (like Java's
`ZonedDateTime`) from cases where only point-in-time matters (like
Java's `Instant`).

Remove backticks from backtick-unrelated test cases

They were probably copied over from the preceding backtick test case.

Reuse TrackingFileSystemFactory between connectors

Move TrackingFileSystemFactory out of Iceberg tests to allow reuse e.g.
with Delta Lake tests.

Refactor Delta file operations tests to encapsulate checked code

Pair tested operation and expected filesystem access counts in a single
assertion calls. Similar to how it's done in
`TestIcebergMetadataFileOperations`.

Convert TestIcebergMetadataFileOperations helper to record

Add Trino 411 release notes

[maven-release-plugin] prepare release 411

[maven-release-plugin] prepare for next development iteration

Enhance test for managed and external delta table location validation

The purpose of deleting the transaction log directory is solely to confirm
that when the DROP TABLE command is used, the table location is also removed
when table is MANAGED TABLE.

Improve naming of methods and fields to match Trino concepts

Fix incorrect result when hidden directory exist in migrate procedure

Add support for ADD COLUMN in Ignite connector

Use a more specific name for all connectors smoke test suite

Document the all connectors smoke test suite

Verify pass-through specifications in tests

Add more detailed check for TableFunctionProcessorNode.passThroughSpecifications
in TableFunctionProcessorMatcher.

Prune unreferenced pass-through columns of table function

Verify required columns in tests

Add check for TableFunctionProcessorNode.requiredSymbols
in TableFunctionProcessorMatcher.

Verify hashSymbol in tests

Add check for TableFunctionProcessorNode.hashSymbol
in TableFunctionProcessorMatcher.

Prune unreferenced columns of table function source

Test table function column pruning in query plan

Remove table function with empty source

Adds an optimizer rule to remove TableFunctionProcessorNode with
source being an empty relation, based on the "prune when empty"
property.

Test pruning of redundant table function in query plan

Test table functions with column and node pruning optimizations

Fix typo in TestDeltaLakePerTransactionMetastoreCache

Extract assertMetastoreInvocations method

Use CountingAccessHiveMetastore in TestDeltaLakePerTransactionMetastoreCache

Make cleanup methods alwaysRun

This attribute says that this after-method will get executed even if the
methods executed previously failed or were skipped. Note that it also
applies to skipped test methods, so if the tests were skipped for some
reason, the cleanup won't run. This attribute will ensure that the
clean-up will run even in this case.

Failure to run clean up may cause secondary effects, especially our
resource leak detector; failure on this will in turn mask other errors,
the ones which caused the tests to be skipped in the first place.

Add a check to enforce alwaysRun = true on test after-methods

See the previous commit for details. This check will enforce that the
`alwaysRun = true` is present.

Remove redundant call toString()

Remove use of deprecated isEqualToComparingFieldByFieldRecursively

Use usingRecursiveComparison instead deprecated isEqualToComparingFieldByFieldRecursively

Remove unused helper methods in delta-lake connector

Add an explict config to define standardSplitSizeInBytes in FTE

Implement adaptive task sizing for arbitrary distribution in FTE

Improve task sizing for hash distribution in FTE

Round up targetPartitionSizeInBytes to a multiple of minTargetPartitionSizeInBytes

For adaptive task sizing in ArbitraryDistributionSplitAssigner

Adjust retry policy for dropping delta tables backed by AWS Glue on Databricks

Fix output rendering issue in docs

Alphabetize glossary entries

Add use_cost_based_partitioning

Use use_cost_based_partitioning instead of use_exact_partitioning to
control the cost based optimization to prefer parent partitioning.
The motivation is to be able to disable the optimization if the NDV
statistics are overestimated and the optimization would hurt parallelism.

Provide injection mechanism for the file system factory

Reorder instance and static fields in FlushMetadataCacheProcedure

Flush extended statistics in Delta's flush_metadata_cache()

Clean up Delta code a bit

Test Delta Lake query file system accesses

Ensure that TestingHydraIdentityProvider is releasing resources

Update maven to 3.9.1

Expose rule stats in QueryStats

Expose optimizer rule statistics per query in QueryInfo JSON. The number of rules exposed could be
adjusted using the `query.reported-rule-stats-limit` configuration parameter.

Cleanup BigintGroupByHash instanceof checks

Include QueryId in assertDistrubutedQuery failure message

Remove TestClickHouseConnectorTest

There are 4 smoke tests and 2 connectors test.
Remove TestClickHouseConnectorTest as a redundant test.

Remove base class for ClickHouse connector test

Run smoke test for BigQuery arrow serialization

We want to verify SELECT behavior for Arrow serialization in BigQuery.
Smoke test and the existing type mapping test should be enough.

Make construction parameters final

Support arithmetic predicate pushdown for Phoenix

Make LeafTableFunctionOperator handle EmptySplit correctly

Remove CatalogHandle from TableFunctionProcessorNode

Exclude snakeyaml from elasticsearch dependencies

It's a transitive dependency of elasticsearch-x-content,
which we use in ElasticsearchLoader to load tpch data
to Elasticsearch with json encoding. Yaml support is not
needed at all.

Pass partition values as optional to DeltaLakePageSource

The partition values list is filled only when row ID column is
projected, so it's a conditional information. When row ID is not
present, pass it as the empty optional, rather than list that happens to
be empty.

Add cleaner FixedPageSource constructors

Previously, the only constructor would take `Iterable`, which is nice,
but it would also materialize it twice (once in the constructor to
calculate memory usage).

The commit adds a constructor taking a `List` (so double iteration is not
a problem) and one taking `Iterator` and delivering on the promise to
iterate once.

The old constructor is kept deprecated, but apparently all usages use
the new, list-based constructor.

Project a data column in MinIO access test

Read a data column to ensure the data file gets read.
This increases number of accesses to a file, because both footer and
data are being read.

Accelerate Delta when reading partition columns only

Regenerate expected test plans with one-click

Traverse into JsonObject members in AST

Before this change, JsonObject members were not visited
in AstVisitor. As a result, aggregations or parameters
inside the  members were not supported.

Traverse into JsonArray elements in AST

Before this change, JsonArray elements were not visited
in AstVisitor. As a result, aggregations or parameters
inside the elements were not supported.

Document avro.schema.literal property use for interpreting table data

Update Oracle JDBC driver version to 21.9.0.0

Document predicate pushdown support for string-type columns in SQL Server

Enable oracle.remarks-reporting.enabled in connector test

Remove unnecessary wrapping of IOException in TransactionLogTail

Translate `The specified key does not exist` to FileNotFoundException

Relax test assertion

It is possible for more than one task to fail due to injected failure.

Remove obsolete assertion

Lowercase bucketing and sort column names

In the metastore, the bucketing and sorting column names can differ
in case from its corresponding table column names.
This change makes certain that, even though a table can be
delivered by the metastore with such inconsistencies, Trino will lowercase
the same bucketing and sort column names to ensure they correspond to the
data column names.

Add test for RenameColumnTask

Migrate assertStatement in TestSqlParser.testRenameColumn

Allow configuring a custom DNS resolver for the JDBC driver

Reorganize Hudi connector documentation

Migrate some assertExpression in TestSqlParser

Look for a non-Trino protocol without using X-User-*

Update ASM to 9.5

Reorganize Iceberg connector documentation

Reorganize Delta Lake connector documentation

Fix handling of Hive ACID tables with hidden directories

Test CREATE TABLE AS SELECT in Ignite type mapping

Additionally, check time zones in setUp method.

Override equals and hashCode in Delta Identifier

Change Object to Row in testCheckConstraintCompatibility

Support arithmetic binary in Delta check constraints

Introduce EmptyTableFunctionHandle as default handle

Before this change, if a table function did not pass
a ConnectorTableFunctionHandle in the TableFunctionAnalysis,
the default handle was used, which was an anonymous
implementation of ConnectorTableFunctionHandle.

It did not work with table functions executed by operator,
due to lack of serialization.

This change introduces EmptyTableFunctionHandle, and sets
it as default.

Support returning anonymous columns by table functions

Per SQL standard, all columns must be named. In Trino,
we support unnamed columns.
This change adjusts table functions so that they can return
anonymous columns.
It is achieved by changing the Descriptor structure so that
field name is optional. This optionality can only be used
for the returned type of table functions. Descriptor arguments
pased by the user as well as default values for descriptor
argumens have mandatory field names.

Add table function `exclude_columns`

Bump spotbugs-annotations version

Airbase already has `4.7.3`

Remove ValidateLimitWithPresortedInput

It's not powerful enough to validate properties of plans
that get modified by predicate pushdown after AddExchanges
runs, resulting in false positives such as trinodb#16768

Use OrcReader#MAX_BATCH_SIZE = 8 * 1024

Previous value 8196 was bigger than PageProcessor#MAX_BATCH_SIZE
causing PageProcessor to create small, 4 position pages every other page.

Bring DistinguishedNameParser from okhttp3

Inline OkHostnameVerifier and Util.verifyAsIpAddress

They were removed in OkHttp 4.x and we still rely on the legacy SSL hostname verification

Update okhttp to 4.10.0

Fix testTableWithNonNullableColumns to update NOT NULL column in Delta

Add  for support creating table with comment for more Jdbc based connectors (trinodb#16135)

Allow PostgreSQL create table with comment

Also allow to set a comment for PostgreSQL tables

Support sum(distinct) for jdbc connectors

Add Trino 412 release notes

[maven-release-plugin] prepare release 412

[maven-release-plugin] prepare for next development iteration

Add doc for ignite join pushdown

Add missing config properties to Hive docs

Co-Authored-By: Marius Grama <[email protected]>

Fix layout in Iceberg documentation

Add docs for property to skip glue archive

Fix typo

Support nested timestamp with time zone in Delta Lake

Add test for duplicated partition statistics on Thrift metastore

Support table comment for oracle connector

Allow Oracle create table with comment

Also allow to set table comment for Oracle tables

Support MERGE for Phoenix connector

Remove unused class FallbackToFullNodePartitionMemoryEstimator.

Remove unnecessary dependency management in trino-pinot

The `protobuf-java` one was overriding a corresponding declaration in
the parent POM, and was effectively downgrading it. The other two were
not used at all.

Bump Protobuf version

Add Snowflake JDBC Connector

Update trino snapshot version to 372

Update format of the doc of snowflake

Update trino jdbc library import

Fix date formatter from yyyy to uuuu

Fix date test case

Remove defunct property allow-drop-table

Update trino version to 374

Update snowflake config to adapt 374

Update the range of the test of Date type

Update to version 375

Fix snowflake after updating to 375

Update to 381

Fix mvn pom import

Format snowflake.rst

Reorderd Data tests in type mapping

Update function code

Add product test

Rename product test tablename

Add Env, Suite and properties of Snowflake for production test

Add trinoCreateAndInsert()

Refactor snowflake from single node to multi node

Pass product tests

Removed snowflake.properties in trino server dev

Resolved issues 19 05 2022 and fixed tests

Remove Types.VARBINARY

Add private static SliceWriteFunction charWriteFunction

Update test case

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>

Update docs/src/main/sphinx/connector/snowflake.rst

Co-authored-by: Yuya Ebihara <[email protected]>

Update plugin/trino-snowflake/pom.xml

Co-authored-by: Yuya Ebihara <[email protected]>

Update plugin/trino-snowflake/src/main/java/io/trino/plugin/snowflake/SnowflakeClient.java

Co-authored-by: Yuya Ebihara <[email protected]>

Resolved review open issues

Disabled JDBC_TREAT_DECIMAL_AS_INT and fixed test case

Updated properties file

Updated properties files

Renamed properties in Testing

Revert "Renamed properties in Testing"

This reverts commit 82f9eb3f3811e8d90a482f5359e98e7c729afa17.

Renamed properties and fixed tests

Update the way to pass ENV values for production test

Update trino version to 388

Update Trino version 391

Update trino version to 394

Update to 395

Update to 411

Update and fix errors

Build successfully with 411

Adding Bloomberg Snowflake connector.

Fully tested with Trino 406. Untested with 410.

Adding varchar type to mapping Snowflake.

Adding --add-opens to enable support for Apache Arrow via shared memory buffers.

Fixing some tests.

Fix TestSnowflakeConnectorTest.

TODO: testDataMappingSmokeTest: time and timestamp
testSetColumnTypes: time and timestamp

Fix type mapper

Fix testconnector

Update version to 413-SNAPSHOT

Added support for HTTP_PROXY testing.

Connector doesnt support setColumnType. This causes lots of problems with Snowflake server.

Disabled the testsetColumnTypes testing.

Fixed and skipped error tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working roadmap Top level issues for major efforts in the project
2 participants