Support for dictionary encoded INT96 timestamp in parquet files #4680

rui-mo · 2023-04-20T07:17:43Z

Support timestamp reader for Parquet file format to read from dictionary-
encoded INT96 timestamps. Hive configs kReadTimestampUnit and
kReadTimestampUnitSession are added to control the precision when
reading timestamps from files.
Parquet documentation for INT96:
https://github.com/apache/parquet-format/pull/49/files#diff-0e877db0daf579f98a11e5e113b29250a2dcae3decb1e83a88db1e6f092bee96R149-R157

netlify · 2023-04-20T07:17:57Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`fdceec2`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/6699f6bec2d39a0008c88862

majetideepak

@rui-mo the implementation looks good to me. Left a couple of comments. Can you add a Parquet File with timestamp type to velox/dwio/parquet/tests/examples and add a test?

velox/dwio/parquet/reader/TimestampColumnReader.h

Yuhta

Please add some tests in https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp

You probably need to make the writer to generate INT96 in writeToMemory

velox/dwio/parquet/reader/TimestampColumnReader.h

majetideepak · 2023-04-21T18:29:14Z

You probably need to make the writer to generate INT96 in writeToMemory

@Yuhta I doubt if the Arrow Bridge supports int96 type. But worth checking. The alternative is to check in a file.
Arrow Bridge has a similar issue with Parquet Decimal types backed by int64.

Yuhta · 2023-04-21T19:17:32Z

@majetideepak Using a fixed file gives less coverage, but if the writer is not working then we have to do it this way for now. Either way we should make sure the result is correct with or without filters.

rui-mo · 2023-04-23T02:27:49Z

@majetideepak @Yuhta Thanks for your review! Your comments are well received, and I'm working on them.

rui-mo · 2023-04-28T07:04:09Z

Please add some tests in https://github.com/facebookincubator/velox/blob/main/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp

You probably need to make the writer to generate INT96 in writeToMemory

@Yuhta I also tried that. Use enable_deprecated_int96_timestamps can make the arrow writer generate INT96. Since int128_t is used in Timestamp reader for now, the decoder calls readInt128() but little endian is not support currently (see IntDecoder.h). I will continue to work on this after the type issue is decided.

velox/type/Timestamp.h

velox/dwio/parquet/writer/Writer.cpp

velox/dwio/parquet/tests/reader/ParquetTableScanTest.cpp

velox/dwio/parquet/reader/TimestampColumnReader.h

velox/vector/arrow/Bridge.cpp

rui-mo · 2023-05-18T08:09:52Z

I don't think using int128_t will work here, the valueSize_ is different and you will end up reading different part of data and even read out of bound.

hi @Yuhta, I spent more time on Int96Timestamp type support but it is not easy to make it work through.
We also made more tests on the int128_t workaround, and found it could work for a pure scan. As posted in this PR, int96 in Parquet is converted to Velox Timestamp type (which is of 16-byte length) in PageReader (see link), and only numValues * sizeof(Int96Timestamp) bytes of data was read in PageReader.

Below is the stack of current timestamp scan.

facebook::velox::parquet::PageReader::prepareDictionary(facebook::velox::parquet::thrift::PageHeader const&) in ./velox_dwio_parquet_table_scan_test
 1# facebook::velox::parquet::PageReader::seekToPage(long) in ./velox_dwio_parquet_table_scan_test
 2# facebook::velox::parquet::PageReader::rowsForPage(facebook::velox::dwio::common::SelectiveColumnReader&, bool, bool, folly::Range<int const*>&, unsigned long const*&) in ./velox_dwio_parquet_table_scan_test
 3# void facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true> >(facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true>&) in ./velox_dwio_parquet_table_scan_test

Could you explain more about the possible risks? Thank you.

Yuhta

I don't think using int128_t will work here, the valueSize_ is different and you will end up reading different part of data and even read out of bound.

hi @Yuhta, I spent more time on Int96Timestamp type support but it is not easy to make it work through. We also made more tests on the int128_t workaround, and found it could work for a pure scan. As posted in this PR, int96 in Parquet is converted to Velox Timestamp type (which is of 16-byte length) in PageReader (see link), and only numValues * sizeof(Int96Timestamp) bytes of data was read in PageReader.

Below is the stack of current timestamp scan.
facebook::velox::parquet::PageReader::prepareDictionary(facebook::velox::parquet::thrift::PageHeader const&) in ./velox_dwio_parquet_table_scan_test
 1# facebook::velox::parquet::PageReader::seekToPage(long) in ./velox_dwio_parquet_table_scan_test
 2# facebook::velox::parquet::PageReader::rowsForPage(facebook::velox::dwio::common::SelectiveColumnReader&, bool, bool, folly::Range<int const*>&, unsigned long const*&) in ./velox_dwio_parquet_table_scan_test
 3# void facebook::velox::parquet::PageReader::readWithVisitor<facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true> >(facebook::velox::dwio::common::ColumnVisitor<__int128, facebook::velox::common::AlwaysTrue, facebook::velox::dwio::common::ExtractToReader<facebook::velox::dwio::common::SelectiveIntegerColumnReader>, true>&) in ./velox_dwio_parquet_table_scan_test
Could you explain more about the possible risks? Thank you.

So the assumption here is it is always dictionary-encoded? If this assumption holds all the time, we can probably go this way. It's only a problem when we want to apply a filter on the column of flat values.

Make sure to beef up your E2E filter tests with filters on some primary keys (int64 is fine), and also put timestamp in complex types (array, map, struct) in addition to top-level column.

rui-mo · 2023-05-23T09:26:32Z

@Yuhta Thanks for your reply.

So the assumption here is it is always dictionary-encoded? If this assumption holds all the time, we can probably go this way. It's only a problem when we want to apply a filter on the column of flat values.

Understood the gap here. I guess RLEV1 and Plain encoding are also possible because the column encoding can be set during Parquet write, but we only tested the Parquet generated with default configs.

Make sure to beef up your E2E filter tests with filters on some primary keys (int64 is fine), and also put timestamp in complex types (array, map, struct) in addition to top-level column.

Got it, will do.

rui-mo · 2024-06-27T03:25:40Z

@bikramSingh91 Could you help import and merge this PR? Thanks!

yingsu00 · 2024-07-05T17:30:58Z

velox/exec/tests/utils/PlanBuilder.cpp

@@ -99,13 +99,11 @@ PlanBuilder& PlanBuilder::tableScan(
    const RowTypePtr& dataColumns,
    const std::unordered_map<
        std::string,
-        std::shared_ptr<connector::ColumnHandle>>& assignments,
-        bool isFilterPushdownEnabled) {
+        std::shared_ptr<connector::ColumnHandle>>& assignments) {


Is removing isFilterPushdownEnabled parameter related to the Timestamp reader?

Thanks for your review. To add isFilterPushdownEnabled parameter was a temporary change before supporting the filter pushdown of Timestamp. After its support in Filter.h, this change has been removed from this PR.

yingsu00 · 2024-07-05T17:36:12Z

velox/connectors/hive/HiveConfig.cpp

@@ -272,6 +272,16 @@ bool HiveConfig::s3UseProxyFromEnv() const {
  return config_->get<bool>(kS3UseProxyFromEnv, false);
 }

+uint8_t HiveConfig::readTimestampUnit(const Config* session) const {


The unit should be read from the Parquet logical type for this column, not set by the user as a config property. See https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp

This is corresponding to the timestamp unit that can be handled in compute engine (usually milliseconds for Presto and maybe some other values for Spark), not related to the type in the file. The reader should use the more coarse one of both.

@yingsu00 In a Parquet file, the unit of int96 is fixed because it is made up of days and nanos, unlike int64-timestamp, which can have different units. We parse days and nanos from Parquet, while the compute engine may need different units of timestamps, e.g. Presto needs milli while Spark needs micro. This config allows us to adjust timestamp precision according to user's requirement.

Without this change, the filter result could become incorrect. For example, for a Spark filter a == 2000-09-12 22:36:29.000000, if a is stored as nano unit in Velox, when a is 2000-09-12 22:36:29.000000111 Velox returns false but Spark needs true because it only cares about the micro digits.

Therefore, we need to truncate the value and this logic is also needed for int64-timestamp reader. Does this makes sense? Thanks.

Reference for Int96 in Parquet: https://github.com/apache/parquet-format/pull/49/files#diff-0e877db0daf579f98a11e5e113b29250a2dcae3decb1e83a88db1e6f092bee96R149-R150

@yingsu00 In a Parquet file, the unit of int96 is fixed because it is made up of days and nanos, unlike int64-timestamp, which can have different units. We parse days and nanos from Parquet, while the compute engine may need different units of timestamps, e.g. Presto needs milli while Spark needs micro. This config allows us to adjust timestamp precision according to user's requirement.

Without this change, the filter result could become incorrect. For example, for a Spark filter a == 2000-09-12 22:36:29.000000, if a is stored as nano unit in Velox, when a is 2000-09-12 22:36:29.000000111 Velox returns false but Spark needs true because it only cares about the micro digits.

Therefore, we need to truncate the value and this logic is also needed for int64-timestamp reader. Does this makes sense? Thanks.

Reference for Int96 in Parquet: https://github.com/apache/parquet-format/pull/49/files#diff-0e877db0daf579f98a11e5e113b29250a2dcae3decb1e83a88db1e6f092bee96R149-R150

THanks @rui-mo for explaining. Sorry I didn't check the INT96 Timestamp spec. Just approved this PR.

Thank you for helping review this PR.

chliang71 · 2024-07-08T23:53:04Z

We have ported this PR internally and so far running fine. Thanks for working on this @rui-mo ! We do encounter one issue though related IntDecoder reading int128.

Since int128_t is used in Timestamp reader for now, the decoder calls readInt128() but little endian is not support currently (see IntDecoder.h). I will continue to work on this after the type issue is decided.

Any quick insights on what needs to be done here? i.e. If the data file uses INT96 (12 bytes), readInt128() would read 16 bytes? Then will the reader need to re-align the bytes correspondingly, plus use little endian?

rui-mo · 2024-07-09T05:34:06Z

We do encounter one issue though related IntDecoder reading int128.

@chliang71 Thanks for your feedback. I assume this issue is on plain-encoded timestamp reading, while this PR focuses on dictionary-encoding. There is a draft on plain-encoding support oap-project@533bb9e by @mskapilks, which may go into a separate PR after this one.

mskapilks · 2024-07-16T10:17:08Z

We do encounter one issue though related IntDecoder reading int128.

@chliang71 Thanks for your feedback. I assume this issue is on plain-encoded timestamp reading, while this PR focuses on dictionary-encoding. There is a draft on plain-encoding support oap-project@533bb9e by @mskapilks, which may go into a separate PR after this one.

I can raise the follow up PR for that change once this is done

facebook-github-bot · 2024-07-17T22:00:46Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mbasmanova · 2024-07-18T06:38:04Z

@rui-mo @Yuhta Folks, is anything blocking this PR from being merged?

rui-mo · 2024-07-18T06:51:27Z

@mbasmanova I assume there has been some discussions on whether to merge this one or #8325 first. Seeing #8325 (comment) & #8325 (comment). If possible, we would like merge this one first as it is ready.
cc: @yingsu00 @mskapilks

Yuhta · 2024-07-18T15:22:37Z

@rui-mo There is a assertion failure in unit test:

Note: Google Test filter = E2EFilterTest.timestampDictionary
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from E2EFilterTest
[ RUN      ] E2EFilterTest.timestampDictionary

terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
  what():  Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: (11646767826930344353 vs. 999999999) Timestamp nanos out of range
Retriable: False
Expression: nanos <= kMaxNanos
Function: Timestamp
File: buck-out/v2/gen/fbcode/5ce5662abd58612b/velox/type/__velox_timestamp__/buck-headers/velox/type/Timestamp.h
Line: 113
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.

*** Aborted at 1721281199 (Unix time, try 'date -d @1721281199') ***
*** Signal 6 (SIGABRT) (0x75590001bb2f) received by PID 113455 (pthread TID 0x7fc9d1292d80) (linux TID 113455) (maybe from PID 113455, UID 30041) (code: -6), stack trace: ***
    @ 000000000000fd47 folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:453
    @ 000000000000e4c1 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./fbcode/folly/debugging/symbolizer/SignalHandler.cpp:474
    @ 000000000004455f (unknown)
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/libc_sigaction.c:8
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c
    @ 000000000009c993 __GI___pthread_kill
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_kill.c:46
    @ 00000000000444ac __GI_raise
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/signal/../sysdeps/posix/raise.c:26
    @ 000000000002c432 __GI_abort
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/stdlib/abort.c:79
    @ 00000000000a3fd4 __gnu_cxx::__verbose_terminate_handler()
                       /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/vterminate.cc:95
    @ 00000000000a1b39 __cxxabiv1::__terminate(void (*)())
                       /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/eh_terminate.cc:48
    @ 00000000000a1ba4 std::terminate()
                       /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/eh_terminate.cc:58
    @ 00000000000a1e6f __cxa_throw
                       /home/engshare/third-party2/libgcc/11.x/src/gcc-11.x/x86_64-facebook-linux/libstdc++-v3/libsupc++/../../.././libstdc++-v3/libsupc++/eh_throw.cc:95
    @ 000000000001464f void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxUserError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
                       fbcode/velox/common/base/Exceptions.h:75
                       -> ./fbcode/velox/common/base/Exceptions.cpp
    @ 000000000285ca2f facebook::velox::Timestamp::Timestamp(long, unsigned long)
                       fbcode/velox/type/Timestamp.h:113
                       -> ./fbcode/velox/dwio/parquet/reader/ParquetColumnReader.cpp
    @ 000000000285a0f3 facebook::velox::parquet::TimestampColumnReader::getValues(folly::Range<int const*>, std::shared_ptr<facebook::velox::BaseVector>*)
                       fbcode/velox/dwio/parquet/reader/TimestampColumnReader.h:61
                       -> ./fbcode/velox/dwio/parquet/reader/ParquetColumnReader.cpp
    @ 0000000000f1db0b facebook::velox::dwio::common::SelectiveStructColumnReaderBase::getValues(folly::Range<int const*>, std::shared_ptr<facebook::velox::BaseVector>*)
                       ./fbcode/velox/dwio/common/SelectiveStructColumnReader.cpp:397
    @ 0000000000f1a066 facebook::velox::dwio::common::SelectiveStructColumnReaderBase::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       ./fbcode/velox/dwio/common/SelectiveStructColumnReader.cpp:127
    @ 0000000002916615 facebook::velox::parquet::ParquetRowReader::Impl::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:851
    @ 00000000028ccef8 facebook::velox::parquet::ParquetRowReader::next(unsigned long, std::shared_ptr<facebook::velox::BaseVector>&, facebook::velox::dwio::common::Mutation const*)
                       ./fbcode/velox/dwio/parquet/reader/ParquetReader.cpp:948
    @ 0000000000099dc9 facebook::velox::dwio::common::E2EFilterTestBase::readWithFilter(std::shared_ptr<facebook::velox::common::ScanSpec>, facebook::velox::dwio::common::MutationSpec const&, std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long&, bool, bool)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:197
    @ 00000000000d2338 facebook::velox::dwio::common::E2EFilterTestBase::testFilterSpecs(std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<facebook::velox::dwio::common::FilterSpec, std::allocator<facebook::velox::dwio::common::FilterSpec> > const&)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:306
    @ 00000000000d4d23 facebook::velox::dwio::common::E2EFilterTestBase::testNoRowGroupSkip(std::vector<std::shared_ptr<facebook::velox::RowVector>, std::allocator<std::shared_ptr<facebook::velox::RowVector> > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:336
    @ 00000000000de35e facebook::velox::dwio::common::E2EFilterTestBase::testScenario(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int)
                       ./fbcode/velox/dwio/common/tests/utils/E2EFilterTestBase.cpp:417
    @ 0000000000370b18 E2EFilterTest::testWithTypes(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>, bool, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, int)
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:44
    @ 00000000003398c7 E2EFilterTest_timestampDictionary_Test::TestBody()
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:263
    @ 0000000000123f7e void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)
                       fbsource/src/gtest.cc:2675
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 0000000000123804 testing::Test::Run()
                       fbsource/src/gtest.cc:2692
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000012943f testing::TestInfo::Run()
                       fbsource/src/gtest.cc:2841
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 00000000001313f6 testing::TestSuite::Run()
                       fbsource/src/gtest.cc:3020
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016cd5b testing::internal::UnitTestImpl::RunAllTests()
                       fbsource/src/gtest.cc:5925
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016bdbb bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)
                       fbsource/src/gtest.cc:2675
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 000000000016b2f9 testing::UnitTest::Run()
                       fbsource/src/gtest.cc:5489
                       -> ./third-party/googletest/1.14.0/googletest/googletest/src/gtest-all.cc
    @ 00000000004c9820 RUN_ALL_TESTS()
                       fbsource/gtest/gtest.h:2317
                       -> ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp
    @ 00000000004c96ec main
                       ./fbcode/velox/dwio/parquet/tests/reader/E2EFilterTest.cpp:720
    @ 000000000002c656 __libc_start_call_main
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c
    @ 000000000002c717 __libc_start_main_alias_2
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409
                       -> /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86/libc-start.c
    @ 000000000032c160 _start
                       /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

Test was never completed. The test process might have crashed.

rui-mo · 2024-07-19T05:25:11Z

@rui-mo There is a assertion failure in unit test:

@Yuhta Thanks for the catch. I reproduced locally on debug mode and fixed with this change: https://github.com/facebookincubator/velox/pull/4680/files#diff-ae87451c1577f3b47d2863187de8bf30c7351484d39537419016487cc7b2f71cR49-R51. Would you take another look? Thank you.

facebook-github-bot · 2024-07-19T15:53:32Z

@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-07-19T19:51:37Z

@Yuhta merged this pull request in facd967.

conbench-facebook · 2024-07-19T20:39:10Z

Conbench analyzed the 1 benchmark run on commit facd967a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

rui-mo · 2024-07-22T01:00:45Z

Thank you all for helping review this PR.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2023

rui-mo mentioned this pull request Apr 20, 2023

Timestamp reader support for Parquet file format #4681

Open

Yuhta self-requested a review April 20, 2023 15:13

rui-mo force-pushed the wip_ts_reader branch from f402fee to ad309ea Compare April 21, 2023 02:17

majetideepak reviewed Apr 21, 2023

View reviewed changes

velox/dwio/parquet/reader/TimestampColumnReader.h Outdated Show resolved Hide resolved

velox/dwio/parquet/reader/TimestampColumnReader.h Show resolved Hide resolved

Yuhta reviewed Apr 21, 2023

View reviewed changes

velox/dwio/parquet/reader/TimestampColumnReader.h Show resolved Hide resolved

velox/dwio/parquet/reader/TimestampColumnReader.h Outdated Show resolved Hide resolved

rui-mo force-pushed the wip_ts_reader branch 3 times, most recently from f5c944e to a8dee34 Compare April 28, 2023 06:15

rui-mo commented Apr 28, 2023

View reviewed changes

velox/type/Timestamp.h Outdated Show resolved Hide resolved

rui-mo force-pushed the wip_ts_reader branch from a8dee34 to 0d3ef7e Compare April 28, 2023 07:31

Yuhta reviewed Apr 28, 2023

View reviewed changes

rui-mo marked this pull request as draft May 8, 2023 05:35

zeodtr reviewed May 16, 2023

View reviewed changes

velox/vector/arrow/Bridge.cpp Outdated Show resolved Hide resolved

rui-mo force-pushed the wip_ts_reader branch 6 times, most recently from 53d9408 to 1fe1694 Compare May 18, 2023 07:03

Yuhta self-requested a review May 18, 2023 15:25

Yuhta reviewed May 18, 2023

View reviewed changes

rui-mo force-pushed the wip_ts_reader branch from 1fe1694 to cd38df2 Compare August 30, 2023 03:59

Yuhta added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Jun 20, 2024

rui-mo force-pushed the wip_ts_reader branch from 8c39885 to 979f142 Compare June 27, 2024 01:27

rui-mo force-pushed the wip_ts_reader branch from 979f142 to e83fe4c Compare July 2, 2024 02:50

yingsu00 requested changes Jul 5, 2024

View reviewed changes

yingsu00 approved these changes Jul 9, 2024

View reviewed changes

rui-mo mentioned this pull request Jul 17, 2024

Add Spark query runner #10357

Closed

rui-mo and others added 3 commits July 19, 2024 13:04

Support timestamp reader

1fd5323

Fix negative nano

e31bc7d

update

fdceec2

rui-mo force-pushed the wip_ts_reader branch from e83fe4c to fdceec2 Compare July 19, 2024 05:16

facebook-github-bot closed this in facd967 Jul 19, 2024

facebook-github-bot added the Merged label Jul 19, 2024

rui-mo mentioned this pull request Aug 27, 2024

Support reading plain encoded INT96 timestamp from Parquet file #10850

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for dictionary encoded INT96 timestamp in parquet files #4680

Support for dictionary encoded INT96 timestamp in parquet files #4680

rui-mo commented Apr 20, 2023 •

edited

Loading

netlify bot commented Apr 20, 2023 •

edited

Loading

majetideepak left a comment

Yuhta left a comment

majetideepak commented Apr 21, 2023 •

edited

Loading

Yuhta commented Apr 21, 2023

rui-mo commented Apr 23, 2023

rui-mo commented Apr 28, 2023

rui-mo commented May 18, 2023 •

edited

Loading

Yuhta left a comment •

edited

Loading

rui-mo commented May 23, 2023

rui-mo commented Jun 27, 2024

yingsu00 Jul 5, 2024

rui-mo Jul 9, 2024

yingsu00 Jul 5, 2024

Yuhta Jul 5, 2024 •

edited

Loading

rui-mo Jul 9, 2024

yingsu00 Jul 9, 2024

rui-mo Jul 10, 2024

chliang71 commented Jul 8, 2024

rui-mo commented Jul 9, 2024

mskapilks commented Jul 16, 2024

facebook-github-bot commented Jul 17, 2024

mbasmanova commented Jul 18, 2024

rui-mo commented Jul 18, 2024 •

edited

Loading

Yuhta commented Jul 18, 2024 •

edited

Loading

rui-mo commented Jul 19, 2024

facebook-github-bot commented Jul 19, 2024

facebook-github-bot commented Jul 19, 2024

conbench-facebook bot commented Jul 19, 2024

rui-mo commented Jul 22, 2024

Support for dictionary encoded INT96 timestamp in parquet files #4680

Support for dictionary encoded INT96 timestamp in parquet files #4680

Conversation

rui-mo commented Apr 20, 2023 • edited Loading

netlify bot commented Apr 20, 2023 • edited Loading

✅ Deploy Preview for meta-velox canceled.

majetideepak left a comment

Choose a reason for hiding this comment

Yuhta left a comment

Choose a reason for hiding this comment

majetideepak commented Apr 21, 2023 • edited Loading

Yuhta commented Apr 21, 2023

rui-mo commented Apr 23, 2023

rui-mo commented Apr 28, 2023

rui-mo commented May 18, 2023 • edited Loading

Yuhta left a comment • edited Loading

Choose a reason for hiding this comment

rui-mo commented May 23, 2023

rui-mo commented Jun 27, 2024

yingsu00 Jul 5, 2024

Choose a reason for hiding this comment

rui-mo Jul 9, 2024

Choose a reason for hiding this comment

yingsu00 Jul 5, 2024

Choose a reason for hiding this comment

Yuhta Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

rui-mo Jul 9, 2024

Choose a reason for hiding this comment

yingsu00 Jul 9, 2024

Choose a reason for hiding this comment

rui-mo Jul 10, 2024

Choose a reason for hiding this comment

chliang71 commented Jul 8, 2024

rui-mo commented Jul 9, 2024

mskapilks commented Jul 16, 2024

facebook-github-bot commented Jul 17, 2024

mbasmanova commented Jul 18, 2024

rui-mo commented Jul 18, 2024 • edited Loading

Yuhta commented Jul 18, 2024 • edited Loading

rui-mo commented Jul 19, 2024

facebook-github-bot commented Jul 19, 2024

facebook-github-bot commented Jul 19, 2024

conbench-facebook bot commented Jul 19, 2024

rui-mo commented Jul 22, 2024

rui-mo commented Apr 20, 2023 •

edited

Loading

netlify bot commented Apr 20, 2023 •

edited

Loading

majetideepak commented Apr 21, 2023 •

edited

Loading

rui-mo commented May 18, 2023 •

edited

Loading

Yuhta left a comment •

edited

Loading

Yuhta Jul 5, 2024 •

edited

Loading

rui-mo commented Jul 18, 2024 •

edited

Loading

Yuhta commented Jul 18, 2024 •

edited

Loading