-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_s3: add Apache Arrow support #3184
Conversation
Arrow support is disabled by default because not every server has the required $ cmake .. -GFLB_ARROW=On
$ cmake --build . Here is an example that shows how one can utilize Apache Arrow support. Configuration[INPUT]
Name cpu
[OUTPUT]
Name s3
Match *
Region ap-northeast-1
Bucket fluent-bit-20210308
total_file_size 1M
use_put_object On
upload_timeout 1m
Compression arrow ResultNow the uploaded data can be loaded instantly via Arrow's S3 interface. https://arrow.apache.org/docs/python/filesystems.html For example, the above configuration produces a very clean tabular time-series >>> import pyarrow as pa
>>> table = load_data_from_s3()
>>> print(table)
date cpu_p user_p system_p cpu0.p_cpu cpu0.p_user cpu0.p_system
0 2021-03-08T09:03:03.668251Z 0.0 0.0 0.0 0.0 0.0 0.0
1 2021-03-08T09:03:04.668156Z 1.0 1.0 0.0 1.0 1.0 0.0
2 2021-03-08T09:03:05.668242Z 0.0 0.0 0.0 0.0 0.0 0.0
3 2021-03-08T09:03:06.668269Z 0.0 0.0 0.0 0.0 0.0 0.0
4 2021-03-08T09:03:07.668218Z 0.0 0.0 0.0 0.0 0.0 0.0
5 2021-03-08T09:03:08.739886Z 2.0 1.0 1.0 2.0 1.0 1.0
6 2021-03-08T09:03:09.668181Z 1.0 1.0 0.0 1.0 1.0 0.0
7 2021-03-08T09:03:10.668247Z 1.0 0.0 1.0 1.0 0.0 1.0
8 2021-03-08T09:03:11.668182Z 2.0 2.0 0.0 2.0 2.0 0.0
9 2021-03-08T09:03:12.668275Z 1.0 0.0 1.0 1.0 0.0 1.0
10 2021-03-08T09:03:13.668428Z 0.0 0.0 0.0 0.0 0.0 0.0
11 2021-03-08T09:03:14.668320Z 2.0 2.0 0.0 2.0 2.0 0.0
12 2021-03-08T09:03:15.668256Z 0.0 0.0 0.0 0.0 0.0 0.0
13 2021-03-08T09:03:16.668287Z 0.0 0.0 0.0 0.0 0.0 0.0
14 2021-03-08T09:03:17.668307Z 1.0 1.0 0.0 1.0 1.0 0.0
15 2021-03-08T09:03:18.668257Z 0.0 0.0 0.0 0.0 0.0 0.0
16 2021-03-08T09:03:19.668281Z 0.0 0.0 0.0 0.0 0.0 0.0
17 2021-03-08T09:03:20.668317Z 0.0 0.0 0.0 0.0 0.0 0.0
18 2021-03-08T09:03:21.668231Z 0.0 0.0 0.0 0.0 0.0 0.0
19 2021-03-08T09:03:22.668222Z 0.0 0.0 0.0 0.0 0.0 0.0 |
CC @zhonghui12 |
I'm wondering if we can just add the build requirements to the build server, seems like it could help with this feature adoption. I'm thinking of how the tensorflow filter is rarely used because a user needs to build with specific settings on in order to enable it. |
+1 to @agup006 comment/question- will this be included by default in the upstream distro/build? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the question of gating this behind a build flag that defaults to false, the code LGTM
@kou Do you have any comments on this point? The basic problem here is that enabling Arrow support makes Fluent Bit executable |
If we enable Apache Arrow support by default, we should use Apache Arrow C++ directly instead of using via Apache Arrow GLib. If we use Apache Arrow GLib, we need We can use |
1dc22ef
to
9287d10
Compare
@edsiper I updated this patch accordingly. BTW, I discussed with @kou about enabling Apache Arrow support by default: There is a future plan to create a C-friendly library within Apache Arrow project. |
9287d10
to
03bcb43
Compare
7a2c4db
to
50443bb
Compare
@kou I applied your feedback. Please approve this PR if you are fine. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
@fujimotos what is needed to get this feature finished? |
@PettitWesley Sorry. I've been a bit absent from Fluent Bit recently, I'm gonna find some time today to finish this PR! So WFM. |
50443bb
to
fdb7147
Compare
@kou Thank you! I have fixed that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@PettitWesley I believe this PR is mergeable now. |
fluent/fluent-bit-docs/pull/523 is the documentation patch for the feature. |
@fujimotos Awesome! @edsiper You requested changes; can you approve/re-review so we can get this merged? |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
@edsiper @fujimotos What do we need to get this merged? |
|
Apache Arrow is an efficient columnar data format that is suitable for statistical analysis, and popular in machine learning community. https://arrow.apache.org/ With this patch merged, users now can specify 'arrow' as the compression type like this: [OUTPUT] Name s3 Bucket some-bucket total_file_size 1M use_put_object On Compression arrow which makes Fluent Bit convert the request buffer into Apache Arrow format before uploading. Signed-off-by: Fujimoto Seiji <[email protected]>
6b62f7b
to
e210a45
Compare
@PettitWesley @edsiper Sorry for being late. I submit a update e210a45.
I cleaned up
It's now cleanly mergiable with master.
If it seems okay for @PettitWesley, let's merge this branch. |
ADDENDUM: I can confirm e210a45 fine with the latest version of Apache Arrow (4.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fujimotos Thanks!
I merged this PR into mainline via 544fa89. Thanks for all who reviewed and helped this PR. Close this PR now. |
Apache Arrow is an efficient columnar data format that is suitable
for statistical analysis, and popular in machine learning community.
With this patch merged, users now can specify 'arrow' as the
compression type like this:
[OUTPUT] Name s3 Bucket some-bucket total_file_size 1M use_put_object On compression arrow
which makes Fluent Bit convert the request buffer into Apache Arrow
format before uploading.
Signed-off-by: Fujimoto Seiji [email protected]