-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDFS file read support #3617
HDFS file read support #3617
Conversation
|
We can proceed first without |
LICENSE file should be located in the https://github.com/chenxing-xc/ClickHouse-Extras-libhdfs3 root. |
I have added you to the ClickHouse-Extras organization. You can create/move any repositories you want there. |
Ok, I can withdraw this feature, and put new submodules into ClickHouse-Extras. The original goal of "INFILE' feature is to simulate COPY syntax in PG. |
Could you please add an integration test? If you have some troubles, you can instead write step-by-step instruction right here on how this feature can be tested (roll our HDFS in Docker, write files into it, and read them with ClickHouse). |
We should think how we can made this feature more consistent. |
|
@chenxing-xc Maybe:
But this needs implement |
None of the builds succeed. |
What 's the build error? there are some extra dependency mentioned in submodule libhdfs3 which need to be installed( by apt-get). |
I saw the following error in cmake output, "CMake Error at /usr/share/cmake-3.10/Modules/FindPackageHandleStandardArgs.cmake:137 (message): we need to install it |
We have to put all required libraries as contrib submodules. But it's Ok to start testing with system libraries. You can add the required libraries here: |
libuuid is used to simply generate a value. We can provide our own minimal implementation to avoid extra dependency. |
libcurl is used only for interaction with "Key Management Server" (KMS). |
fuzzyFileNames = uri.substr(uriPrefix.length()); | ||
} | ||
|
||
std::vector<String> fuzzyNameList = parseDescription(fuzzyFileNames, 0, fuzzyFileNames.length(), ',' , 100/* hard coded max files */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great.
But how {a,b}
and {a|b}
are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are the same :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave only the first. Because it is compatible with the syntax of supported "globs" in bash.
And we can eventually add the support for globs in StorageFile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fixed.
how about replace it with boost::uuid lib? |
|
It's Ok if it doesn't have extra dependencies. We need to just create a wrapper with single function (fake libuuid). |
It's not really necessarily to rewrite parts of library :) |
@chenxing-xc Unfortunately I can't build it locally:
Seems like some misconfiguration in cmake? Do you have any ideas how to fix these problems? I hope it doesn't require installation of libhdfs3 systemwide? |
@@ -42,5 +44,6 @@ class ITableFunction | |||
|
|||
using TableFunctionPtr = std::shared_ptr<ITableFunction>; | |||
|
|||
std::vector<String> parseDescription(const String & description, size_t l, size_t r, char separator, size_t max_addresses); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't seems too common. Let's move it to separate file?
@@ -89,6 +89,7 @@ endif () | |||
|
|||
option (TEST_COVERAGE "Enables flags for test coverage" OFF) | |||
option (ENABLE_TESTS "Enables tests" ON) | |||
option (ENABLE_INSERT_INFILE "Enables INSERT INFILE syntax" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move INSERT INFILE to separate Pull Request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you add libxml2
, libkrb
and libgsasl
to contrib or you need help with this libraries?
@chenxing-xc Where did you get libhdfs3? There are a lot of forks on github and we would like to have commits history in our Clickhouse-Extras repo. |
I got it from [https://github.com/Pivotal-DataFabric/attic-libhdfs3.git] |
@chenxing-xc Congratulations! |
@chenxing-xc Hello, when we use clickhouse to read hdfs files, the following exception often occurs,Can you tell me what problems may cause this exception, and what methods can be used to resolve this exception,thank you!executeQuery: Code: 210, e.displayText() = DB::Exception: Fail to read HDFS file: hdfs://myuser@IP::PORT/user/hive 。。。 Stack trace: 0x55a39713fed0 StackTrace::StackTrace() /usr/bin/clickhouse 0x55a39713fed0 StackTrace::StackTrace() /usr/bin/clickhouse |
this looks like clickhouse cannot access hdfs datanode. Currently we don't
use the "user" in the uri(user@ip...) to access hdfs, instead use
clickhouse process's user.
filimonov <[email protected]> 于2020年2月27日周四 下午1:35写道:
… @gubinjie <https://github.com/gubinjie> #9263
<#9263>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3617?email_source=notifications&email_token=AIJ57WDLWFTRX7YJB5ILDG3RE5GLTA5CNFSM4GFLXWU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENDAI7A#issuecomment-591791228>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIJ57WBQTJ22E7FXABGAZWDRE5GLTANCNFSM4GFLXWUQ>
.
|
@chenxing-xc Hello, we want to know what you said using the clickhouse process's user method. If you use this method, will it solve the previous exception? Thank you |
We also get this promblem in clickhouse: executeQuery: Code: 210, e.displayText() = DB::Exception: Fail to read HDFS file: hdfs://myuser@IP::PORT/xx/xx/xx/xx HdfsIOException: InputStreamImpl: cannot read file: /xx/xx/xx/xx , from position 805306368, size: 1048576. Caused by: HdfsTimeoutException: Read 8 bytes timeout (version 19.16.4.12 (official build)) (from [::ffff:xx.xx.xx.xx]:37456) (in query: I am sure the user has got access to hdfs, and the network is OK. Is there any configure to change timeout param for reading HDFS ? thank you~ |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
I would like to add HDFS file read support(based on libhdfs3), this delivery include:
hdfs table function
new insert syntax that server could read file from local disk or hdfs file directly
URI of HDFS is represented as "hdfs://namenodeip:namenodeport/path-of-hdfsfile"
INSERT SYNTAX is as below:
INSERT INTO TABLE FORMAT XXX INFILE 'file_name'
The above syntax is similar to INSERT INTO TABLE SELECT * FROM TABLE_FUNCTION(.....), but it
is more convenient as we don't need to remember table schema and put it into TABLE_FUNCTION.