Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS file read support #3617

Merged
merged 13 commits into from
Dec 7, 2018
Merged

HDFS file read support #3617

merged 13 commits into from
Dec 7, 2018

Conversation

chenxing-xc
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

I would like to add HDFS file read support(based on libhdfs3), this delivery include:

hdfs table function
new insert syntax that server could read file from local disk or hdfs file directly
URI of HDFS is represented as "hdfs://namenodeip:namenodeport/path-of-hdfsfile"
INSERT SYNTAX is as below:
INSERT INTO TABLE FORMAT XXX INFILE 'file_name'
The above syntax is similar to INSERT INTO TABLE SELECT * FROM TABLE_FUNCTION(.....), but it
is more convenient as we don't need to remember table schema and put it into TABLE_FUNCTION.

@blinkov blinkov mentioned this pull request Nov 20, 2018
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Nov 20, 2018

INSERT INFILE is very controversial feature in this PR:

  1. It requires special support in client (to read data from local file), but it is not implemented. It is expected that I can send data from local file with clickhouse-client using this feature (analogy to SELECT ... INTO OUTFILE).

  2. It requires special check on server side (to avoid security issues when reading local files).

  3. The behaviour is completely different if you provide local path or URL.

  4. The way how we distinguish URIs and local paths looks too hacky - need double check it.

  5. Better to implement INSERT ... INFILE ... FORMAT ... instead of INSERT ... FORMAT ... INFILE due to compatibility issues.

  6. I cannot find support for INSERT ... INFILE in MySQL or Postgres. MySQL uses LOAD DATA [LOCAL] INFILE and Postgres (and most other DBMS) uses COPY. I also cannot quickly find any DBMS that implements INSERT ... INFILE.

  7. For our team members, INSERT ... INFILE itself sounds controversial. People may think that it will insert data into file, but actually it reads data from file.

@alexey-milovidov
Copy link
Member

We can proceed first without INSERT INFILE feature.

@alexey-milovidov
Copy link
Member

LICENSE file should be located in the https://github.com/chenxing-xc/ClickHouse-Extras-libhdfs3 root.

@alexey-milovidov
Copy link
Member

I have added you to the ClickHouse-Extras organization. You can create/move any repositories you want there.

@chenxing-xc
Copy link
Contributor Author

We can proceed first without INSERT INFILE feature.

Ok, I can withdraw this feature, and put new submodules into ClickHouse-Extras. The original goal of "INFILE' feature is to simulate COPY syntax in PG.
BTW, this feature is heavily used in my team because our ETL task is scheduled on a platform without ClickHouse, and server side data access is needed.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Nov 21, 2018

Could you please add an integration test?

If you have some troubles, you can instead write step-by-step instruction right here on how this feature can be tested (roll our HDFS in Docker, write files into it, and read them with ClickHouse).

@alexey-milovidov
Copy link
Member

BTW, this feature is heavily used in my team because our ETL task is scheduled on a platform without ClickHouse, and server side data access is needed.

We should think how we can made this feature more consistent.

@chenxing-xc
Copy link
Contributor Author

Could you please add an integration test?

If you have some troubles, you can instead write step-by-step instruction right here on how this feature can be tested (roll our HDFS in Docker, write files into it, and read them with ClickHouse).

  1. This feature depends on HDFS cluster, I am not sure how to add regression test in public.
    To test it, we can do the following steps:
  1. create a table
  2. insert some data
  3. select it to outfiles
  4. put outfiles into hdfs
  5. create another table and populate it with files in hdfs by the new syntax. e.g.
    select * from hdfs('url', format, column-description)
    insert into new_table select * from hdfs('url', format, column-description)

@zhang2014
Copy link
Contributor

@chenxing-xc Maybe:

  1. INSERT INTO TABLE FUNCTION hdfs('url', format, column-description)
  2. SELECT * FROM hdfs('url', format, column-description)

But this needs implement write method in StorageHDFS.

@alexey-milovidov
Copy link
Member

None of the builds succeed.

@chenxing-xc
Copy link
Contributor Author

None of the builds succeed.

What 's the build error? there are some extra dependency mentioned in submodule libhdfs3 which need to be installed( by apt-get).
e.g. libxml2, krb5, libuuid, libgsasl

@chenxing-xc
Copy link
Contributor Author

https://clickhouse-builds.s3.yandex.net/3617/86f1a18185eb1a67e1c0d445352458838fa02804/report.html

I saw the following error in cmake output, "CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find CURL (missing: CURL_LIBRARY CURL_INCLUDE_DIR)
Call Stack (most recent call first):
/usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.10/Modules/FindCURL.cmake:48 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
contrib/libhdfs3-cmake/CMakeLists.txt:22 (find_package)" .

we need to install it

@alexey-milovidov
Copy link
Member

We have to put all required libraries as contrib submodules. But it's Ok to start testing with system libraries. You can add the required libraries here:
https://github.com/yandex/ClickHouse/blob/master/docker/packager/binary/Dockerfile
for tests to be run.

@alexey-milovidov
Copy link
Member

~/work/libhdfs3/src$ grep -i -r uuid_ .
./rpc/RpcClient.cpp:    uuid_t id;
./rpc/RpcClient.cpp:    uuid_generate(id);
./rpc/RpcClient.cpp:    clientId.resize(sizeof(uuid_t));
./rpc/RpcClient.cpp:    memcpy(&clientId[0], id, sizeof(uuid_t));
./CMakeLists.txt:    TARGET_LINK_LIBRARIES(libhdfs3-static ${LIBUUID_LIBRARIES})
./CMakeLists.txt:    TARGET_LINK_LIBRARIES(libhdfs3-shared ${LIBUUID_LIBRARIES})
./CMakeLists.txt:    INCLUDE_DIRECTORIES(${LIBUUID_INCLUDE_DIRS})

libuuid is used to simply generate a value. We can provide our own minimal implementation to avoid extra dependency.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Nov 23, 2018

libcurl is used only for interaction with "Key Management Server" (KMS).
I don't know whether the usage of KMS is mandatory.

fuzzyFileNames = uri.substr(uriPrefix.length());
}

std::vector<String> fuzzyNameList = parseDescription(fuzzyFileNames, 0, fuzzyFileNames.length(), ',' , 100/* hard coded max files */);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great.
But how {a,b} and {a|b} are different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave only the first. Because it is compatible with the syntax of supported "globs" in bash.
And we can eventually add the support for globs in StorageFile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed.

@chenxing-xc
Copy link
Contributor Author

~/work/libhdfs3/src$ grep -i -r uuid_ .
./rpc/RpcClient.cpp:    uuid_t id;
./rpc/RpcClient.cpp:    uuid_generate(id);
./rpc/RpcClient.cpp:    clientId.resize(sizeof(uuid_t));
./rpc/RpcClient.cpp:    memcpy(&clientId[0], id, sizeof(uuid_t));
./CMakeLists.txt:    TARGET_LINK_LIBRARIES(libhdfs3-static ${LIBUUID_LIBRARIES})
./CMakeLists.txt:    TARGET_LINK_LIBRARIES(libhdfs3-shared ${LIBUUID_LIBRARIES})
./CMakeLists.txt:    INCLUDE_DIRECTORIES(${LIBUUID_INCLUDE_DIRS})

libuuid is used to simply generate a value. We can provide our own minimal implementation to avoid extra dependency.

how about replace it with boost::uuid lib?

@chenxing-xc
Copy link
Contributor Author

libcurl is used only for interaction with "Key Management Server" (KMS).
I don't know whether the usage of KMS is mandatory.
KMS is not mandatory, maybe we can tail it or updating HttpClient using Poco::Net lib

@alexey-milovidov
Copy link
Member

how about replace it with boost::uuid lib?

It's Ok if it doesn't have extra dependencies. We need to just create a wrapper with single function (fake libuuid).

@alexey-milovidov
Copy link
Member

KMS is not mandatory, maybe we can tail it or updating HttpClient using Poco::Net lib

It's not really necessarily to rewrite parts of library :)
The only intent is to avoid/postpone full integration of libcurl to our build system, if we can start without it.
There are also possible security implications of using libcurl, but it's unrelated concern.

@alesapin
Copy link
Member

alesapin commented Dec 3, 2018

@chenxing-xc Unfortunately I can't build it locally:

  1. With gcc I get cmake error:
CMake Error at contrib/libhdfs3-cmake/CMake/Platform.cmake:19 (LIST):
  list index: 1 out of range (-1, 0)
Call Stack (most recent call first):
  contrib/libhdfs3-cmake/CMakeLists.txt:17 (include)
  1. With clang cmake is OK, but I have compilation problems:
[57/650] Building CXX object dbms/CMakeFiles/dbms.dir/src/Interpreters/InterpreterInsertQuery.cpp.o
FAILED: /usr/bin/ccache /usr/bin/clang++-6.0   -DBOOST_SYSTEM_NO_DEPRECATED -DPOCO_STATIC -DPOCO_UNBUNDLED_ZLIB -DUNALIGNED_OK -DWITH_GZFILEOP -DX86_64 -DZLIB_COMPAT -isystem contrib/jemalloc-cmake/include -isystem contrib/jemalloc-cmake/include_linux_x86_64 -isystem contrib/libsparsehash -isystem contrib/libdivide -isystem contrib/poco/Data/include -Idbms/src -isystem contrib/double-conversion -isystem contrib/libpcg-random/include -Ilibs/libcommon/include -isystem contrib/poco/Foundation/include -Icontrib/zlib-ng -Icontrib/cityhash102/include -isystem contrib/boost -isystem contrib/poco/Net/include -isystem contrib/poco/Util/include -isystem contrib/poco/XML/include -isystem contrib/poco/JSON/include -isystem contrib/re2_st -isystem contrib/re2 -Ilibs/libpocoext/include -Ilibs/libmysqlxx/include -Icontrib/mariadb-connector-c-cmake/linux_x86_64/include -Icontrib/mariadb-connector-c-cmake/common/include -Icontrib/mariadb-connector-c/include -isystem contrib/ssl/include -Icontrib/libbtrie/include -isystem contrib/poco/Data/ODBC/include -Icontrib/unixodbc-cmake/linux_x86_64 -Icontrib/unixodbc/include -Icontrib/unixodbc/libltdl -Icontrib/unixodbc/libltdl/libltdl -isystem contrib/poco/MongoDB/include -isystem contrib/poco/NetSSL_OpenSSL/include -isystem contrib/poco/Crypto/include -isystem contrib/librdkafka/src -fdiagnostics-color=always -std=c++1z  -pipe -msse4.1 -msse4.2 -mpopcnt  -fno-omit-frame-pointer  -Wall -Wno-unused-command-line-argument  -Wnon-virtual-dtor -Wextra -Wextra-semi -Wcomma -Winconsistent-missing-destructor-override -Wunused-exception-parameter -Wshadow-uncaptured-local -Wredundant-parens -Wzero-as-null-pointer-constant -O2 -g -DNDEBUG -O3    -pthread -MMD -MT dbms/CMakeFiles/dbms.dir/src/Interpreters/InterpreterInsertQuery.cpp.o -MF dbms/CMakeFiles/dbms.dir/src/Interpreters/InterpreterInsertQuery.cpp.o.d -o dbms/CMakeFiles/dbms.dir/src/Interpreters/InterpreterInsertQuery.cpp.o -c dbms/src/Interpreters/InterpreterInsertQuery.cpp
In file included from dbms/src/Interpreters/InterpreterInsertQuery.cpp:27:
dbms/src/IO/ReadBufferFromHDFS.h:4:10: fatal error: 'hdfs/hdfs.h' file not found
#include <hdfs/hdfs.h>
         ^~~~~~~~~~~~~
1 error generated.
[57/650] Building CXX object dbms/CMakeFiles/dbms.dir/src/Storages/StorageHDFS.cpp.o
FAILED: /usr/bin/ccache /usr/bin/clang++-6.0   -DBOOST_SYSTEM_NO_DEPRECATED -DPOCO_STATIC -DPOCO_UNBUNDLED_ZLIB -DUNALIGNED_OK -DWITH_GZFILEOP -DX86_64 -DZLIB_COMPAT -isystem contrib/jemalloc-cmake/include -isystem contrib/jemalloc-cmake/include_linux_x86_64 -isystem contrib/libsparsehash -isystem contrib/libdivide -isystem contrib/poco/Data/include -Idbms/src -isystem contrib/double-conversion -isystem contrib/libpcg-random/include -Ilibs/libcommon/include -isystem contrib/poco/Foundation/include -Icontrib/zlib-ng -Icontrib/cityhash102/include -isystem contrib/boost -isystem contrib/poco/Net/include -isystem contrib/poco/Util/include -isystem contrib/poco/XML/include -isystem contrib/poco/JSON/include -isystem contrib/re2_st -isystem contrib/re2 -Ilibs/libpocoext/include -Ilibs/libmysqlxx/include -Icontrib/mariadb-connector-c-cmake/linux_x86_64/include -Icontrib/mariadb-connector-c-cmake/common/include -Icontrib/mariadb-connector-c/include -isystem contrib/ssl/include -Icontrib/libbtrie/include -isystem contrib/poco/Data/ODBC/include -Icontrib/unixodbc-cmake/linux_x86_64 -Icontrib/unixodbc/include -Icontrib/unixodbc/libltdl -Icontrib/unixodbc/libltdl/libltdl -isystem contrib/poco/MongoDB/include -isystem contrib/poco/NetSSL_OpenSSL/include -isystem contrib/poco/Crypto/include -isystem contrib/librdkafka/src -fdiagnostics-color=always -std=c++1z  -pipe -msse4.1 -msse4.2 -mpopcnt  -fno-omit-frame-pointer  -Wall -Wno-unused-command-line-argument  -Wnon-virtual-dtor -Wextra -Wextra-semi -Wcomma -Winconsistent-missing-destructor-override -Wunused-exception-parameter -Wshadow-uncaptured-local -Wredundant-parens -Wzero-as-null-pointer-constant -O2 -g -DNDEBUG -O3    -pthread -MMD -MT dbms/CMakeFiles/dbms.dir/src/Storages/StorageHDFS.cpp.o -MF dbms/CMakeFiles/dbms.dir/src/Storages/StorageHDFS.cpp.o.d -o dbms/CMakeFiles/dbms.dir/src/Storages/StorageHDFS.cpp.o -c dbms/src/Storages/StorageHDFS.cpp
In file included from dbms/src/Storages/StorageHDFS.cpp:7:
dbms/src/IO/ReadBufferFromHDFS.h:4:10: fatal error: 'hdfs/hdfs.h' file not found
#include <hdfs/hdfs.h>
         ^~~~~~~~~~~~~
1 error generated.

Seems like some misconfiguration in cmake? Do you have any ideas how to fix these problems? I hope it doesn't require installation of libhdfs3 systemwide?

cmake/find_llvm.cmake Outdated Show resolved Hide resolved
dbms/src/Storages/StorageHDFS.cpp Outdated Show resolved Hide resolved
dbms/src/Storages/StorageHDFS.cpp Outdated Show resolved Hide resolved
dbms/CMakeLists.txt Show resolved Hide resolved
dbms/src/Storages/StorageHDFS.h Outdated Show resolved Hide resolved
@@ -42,5 +44,6 @@ class ITableFunction

using TableFunctionPtr = std::shared_ptr<ITableFunction>;

std::vector<String> parseDescription(const String & description, size_t l, size_t r, char separator, size_t max_addresses);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't seems too common. Let's move it to separate file?

@@ -89,6 +89,7 @@ endif ()

option (TEST_COVERAGE "Enables flags for test coverage" OFF)
option (ENABLE_TESTS "Enables tests" ON)
option (ENABLE_INSERT_INFILE "Enables INSERT INFILE syntax" OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move INSERT INFILE to separate Pull Request?

Copy link
Member

@alesapin alesapin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you add libxml2, libkrb and libgsasl to contrib or you need help with this libraries?

@alesapin
Copy link
Member

alesapin commented Dec 6, 2018

@chenxing-xc Where did you get libhdfs3? There are a lot of forks on github and we would like to have commits history in our Clickhouse-Extras repo.

@chenxing-xc
Copy link
Contributor Author

@chenxing-xc Where did you get libhdfs3? There are a lot of forks on github and we would like to have commits history in our Clickhouse-Extras repo.

I got it from [https://github.com/Pivotal-DataFabric/attic-libhdfs3.git]

@alesapin alesapin merged commit 8256c19 into ClickHouse:master Dec 7, 2018
alexey-milovidov added a commit that referenced this pull request Dec 8, 2018
@alexey-milovidov
Copy link
Member

@chenxing-xc Congratulations!

@gubinjie
Copy link

@chenxing-xc Hello, when we use clickhouse to read hdfs files, the following exception often occurs,Can you tell me what problems may cause this exception, and what methods can be used to resolve this exception,thank you!

executeQuery: Code: 210, e.displayText() = DB::Exception: Fail to read HDFS file: hdfs://myuser@IP::PORT/user/hive
/warehouse/mydb.db/my_table_tsv/000007_0 HdfsIOException: InputStreamImpl: cannot read file: /user/hive/warehouse/mydb.db/mytable_tsv/000007_0, from position 0, size: 1048576. Caused by: HdfsTimeoutException: Read 8 bytes timeout (version 19.16.4.12 (official build)) (from [::ffff:30.18.54.105]:38882) (in query: insert into

。。。

Stack trace:

0x55a39713fed0 StackTrace::StackTrace() /usr/bin/clickhouse
0x55a39713fca5 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int) /usr/bin/clickhouse
0x55a396fb534d ? /usr/bin/clickhouse
0x55a39a9228dd DB::CSVRowInputFormat::readPrefix() /usr/bin/clickhouse
0x55a39ad2642b DB::IRowInputFormat::generate() /usr/bin/clickhouse
0x55a39a8f5dde DB::ISource::work() /usr/bin/clickhouse
0x55a39a8d1385 DB::InputStreamFromInputFormat::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a398db4e98 DB::OwningBlockInputStreamDB::ReadBuffer::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39a6c8f98 ? /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39aa83efb DB::ExpressionBlockInputStream::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39aa83efb DB::ExpressionBlockInputStream::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39a4ca3b2 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::loop(unsigned long) /usr/bin/clickhouse
0x55a39a4caa75 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::thread(std::shared_ptrDB::ThreadGroupStatus, unsigned long) /usr/bin/clickhouse
0x55a39a4cb3ed ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler, std::shared_ptrDB::ThreadGroupStatus, unsigned long&>(void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::&&)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler&&, std::shared_ptrDB::ThreadGroupStatus&&, unsigned long&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
0x55a39718a47c ThreadPoolImplstd::thread::worker(std::_List_iteratorstd::thread) /usr/bin/clickhouse
0x55a39cea6460 ? /usr/bin/clickhouse
0x7efcdf5ea6ba start_thread /lib/x86_64-linux-gnu/libpthread-2.23.so
0x7efcdef1482d __clone /lib/x86_64-linux-gnu/libc-2.23.so
2020.01.04 05:28:36.088689 [ 379 ] {90a86a21-7ef2-4e29-b16c-fa563a84e528} HTTPHandler: Code: 210, e.displayText() = DB::Exception: Fail to read HDFS file: hdfs://user@IP:PORT/user/hive/warehouse/ mydb.db/my_table_tsv 000007_0 HdfsIOException: InputStreamImpl: cannot read file: /user/hive/warehouse/ mydb.db/my_table_tsv/000007_0, from position 0, size: 1048576. Caused by: HdfsTimeoutException: Read 8 bytes timeout, Stack trace:

0x55a39713fed0 StackTrace::StackTrace() /usr/bin/clickhouse
0x55a39713fca5 DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int) /usr/bin/clickhouse
0x55a396fb534d ? /usr/bin/clickhouse
0x55a39a9228dd DB::CSVRowInputFormat::readPrefix() /usr/bin/clickhouse
0x55a39ad2642b DB::IRowInputFormat::generate() /usr/bin/clickhouse
0x55a39a8f5dde DB::ISource::work() /usr/bin/clickhouse
0x55a39a8d1385 DB::InputStreamFromInputFormat::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a398db4e98 DB::OwningBlockInputStreamDB::ReadBuffer::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39a6c8f98 ? /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39aa83efb DB::ExpressionBlockInputStream::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39aa83efb DB::ExpressionBlockInputStream::readImpl() /usr/bin/clickhouse
0x55a39a3523c7 DB::IBlockInputStream::read() /usr/bin/clickhouse
0x55a39a4ca3b2 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::loop(unsigned long) /usr/bin/clickhouse
0x55a39a4caa75 DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::thread(std::shared_ptrDB::ThreadGroupStatus, unsigned long) /usr/bin/clickhouse
0x55a39a4cb3ed ThreadFromGlobalPool::ThreadFromGlobalPool<void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler, std::shared_ptrDB::ThreadGroupStatus, unsigned long&>(void (DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler::&&)(std::shared_ptrDB::ThreadGroupStatus, unsigned long), DB::ParallelInputsProcessorDB::UnionBlockInputStream::Handler&&, std::shared_ptrDB::ThreadGroupStatus&&, unsigned long&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
0x55a39718a47c ThreadPoolImplstd::thread::worker(std::_List_iteratorstd::thread) /usr/bin/clickhouse
0x55a39cea6460 ? /usr/bin/clickhouse
0x7efcdf5ea6ba start_thread /lib/x86_64-linux-gnu/libpthread-2.23.so
0x7efcdef1482d __clone /lib/x86_64-linux-gnu/libc-2.23.so
(version 19.16.4.12 (official build))

@filimonov
Copy link
Contributor

@gubinjie #9263

@chenxing-xc
Copy link
Contributor Author

chenxing-xc commented Feb 27, 2020 via email

@gubinjie
Copy link

@chenxing-xc Hello, we want to know what you said using the clickhouse process's user method. If you use this method, will it solve the previous exception? Thank you

@SenCoder
Copy link

SenCoder commented Sep 1, 2020

We also get this promblem in clickhouse:

executeQuery: Code: 210, e.displayText() = DB::Exception: Fail to read HDFS file: hdfs://myuser@IP::PORT/xx/xx/xx/xx HdfsIOException: InputStreamImpl: cannot read file: /xx/xx/xx/xx , from position 805306368, size: 1048576. Caused by: HdfsTimeoutException: Read 8 bytes timeout (version 19.16.4.12 (official build)) (from [::ffff:xx.xx.xx.xx]:37456) (in query:
...

I am sure the user has got access to hdfs, and the network is OK. Is there any configure to change timeout param for reading HDFS ?

thank you~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants