-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
out_s3: update for getting upload time from timestamp. #6486
Conversation
8e8f3e7
to
763583e
Compare
plugins/out_s3/s3.c
Outdated
create_time = time(NULL); | ||
flb_plg_error(ctx->ins, "Failed to get timestamp used for the S3 key, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it clear we tried to get the timestamp from the log record(s).
plugins/out_s3/s3.c
Outdated
m_upload->init_time = time(NULL); | ||
if (tms == NULL) { | ||
m_upload->init_time = time(NULL); | ||
flb_plg_error(ctx->ins, "Failed to get timestamp used for the S3 key, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
763583e
to
4fa6031
Compare
@matthewfala note from discussion today of we need to double check:
|
plugins/out_s3/s3.c
Outdated
/* unpack msgpack */ | ||
msgpack_unpacked_init(&result); | ||
|
||
/* Get the first record timestamp */ | ||
while (msgpack_unpack_next(&result, | ||
event_chunk->data, | ||
event_chunk->size, &off) == MSGPACK_UNPACK_SUCCESS) { | ||
flb_time_pop_from_msgpack(&tms, &result, &obj); | ||
if (&tms.tm.tv_sec != 0) { | ||
break; | ||
} | ||
} | ||
|
||
msgpack_unpacked_destroy(&result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to get timestamp for first flush per chunk S3_file. So, we can run this code only if there is no existing chunk S3_file for this tag...
plugins/out_s3/s3.c
Outdated
if (tms == NULL) { | ||
m_upload->init_time = time(NULL); | ||
flb_plg_error(ctx->ins, "Failed to get timestamp from the log record " | ||
"used for the S3 key, use the current time instead."); | ||
} | ||
else { | ||
m_upload->init_time = tms->tm.tv_sec; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Claych Further up in this code we create the S3 key, this is where the init_time timestamp needs to be used... right now we are still using time(NULL)... please address this and test it.
Tested both multipart and put object uploads. Both work. The files have a name that reflects the first log's timestamp Multi Part upload
Single put upload
|
PR Upload Timeout IssueThere may be an issue with this PR related to upload timeouts which needs resolving. Basic Test Case:Fluent Bit Config
FireLens Datajet Config
The following message appears many times per second (once per batch):
However this should not be, because upload_timeout should only be reached once per 10 seconds. Test Case TriggerNote how the test case has "timeOffset" set to -100. That means that the logs are set to the current time -100 seconds. Root Cause AnalysisThe root cause is the following lines of code along with the PR. This section may be okay because it uses init_time, but further investigation is needed to determine if this is correct or not. Note how determining if the time out has occurred, we compare current time to the upload_file->create_time. This is an issue because this PR changes the definition of create_time from the time when the file was created (which is the obvious meaning) to the time of the first log (which can be far back in history and should not be used for upload timeouts.) ResolutionTo resolve this issue we need a separate variable to hold the file's actual creation time vs the time of the first log which are two different things. We need to use actual creation time for the upload timeout detection Further InvestigationWe need to further investigate where create_time is used. If changing the meaning of create_time obscures anything else about the operation of the S3 plugin. Validation & Regression testingFurther validation is needed to determine if the issue has been introduced with this PR. Testing is currently being done to determine this. Here's the logs when Clay's PR is not included. Note how there is no message of the upload timeout. Also note how multipart upload is used.
This indicates that this PR
Further Test Cases4 Test cases need to be created to test the resolution of this issue
Test casesThe following test cases outline tests and expected results, as found in 1.9.10 without the patch. Upload timeout + put object used
NO Upload timeout + Multipart (failure case)
Upload timeout + multipart
No Upload Timeout + Single put object
Initialization test casesMultipart upload on s3 plugin initialization (hard to write a test case for)Run the following for 10 seconds:
Stop firelens datajet. Restart fluent bit
Put Object upload on s3 plugin initialization (hard to write a test case for)Run the following for 10 seconds:
Stop firelens datajet. Restart fluent bit
|
4fa6031
to
fc2752a
Compare
plugins/out_s3/s3.c
Outdated
if (timep == NULL) { | ||
m_upload->init_time = time(NULL); | ||
flb_plg_error(ctx->ins, "Failed to get timestamp from the log record " | ||
"used for the S3 key, use the current time instead."); | ||
} | ||
else { | ||
m_upload->init_time = timep; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you set init_time to timep which is the first_log_timestamp?
This will cause more timeout related errors see
fluent-bit/plugins/out_s3/s3.c
Lines 2161 to 2165 in 760956f
if (m_upload_file != NULL && time(NULL) > | |
(m_upload_file->init_time + ctx->upload_timeout)) { | |
upload_timeout_check = FLB_TRUE; | |
flb_plg_info(ctx->ins, "upload_timeout reached for %s", event_chunk->tag); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return to origin
plugins/out_s3/s3.c
Outdated
else { | ||
else if (timep == NULL) { | ||
create_time = time(NULL); | ||
flb_plg_error(ctx->ins, "Failed to get timestamp from the log record " | ||
"used for the S3 key, use the current time instead."); | ||
} | ||
else { | ||
create_time = timep; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is create_time being set to timep? timep is the first_log_time. It has nothing to do with the file's create_time which is used by the timers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
plugins/out_s3/s3.c
Outdated
@@ -50,13 +50,14 @@ static int s3_put_object(struct flb_s3 *ctx, const char *tag, time_t create_time | |||
|
|||
static int put_all_chunks(struct flb_s3 *ctx); | |||
|
|||
static void cb_s3_upload(struct flb_config *ctx, void *data); | |||
static void cb_s3_upload(struct flb_config *ctx, void *data, time_t *timep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could change all the timep s to something more descriptive like "first_log_time" that would help greatly to clarify your code and logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could you rename each to either:
chunk_first_log_time
or
file_first_log_time
Right now it's really confusing whether you're talking about the chunk's log timestamp (for a single flush) or the file's log timestamp (for multiple flushes) I'm having trouble following the logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will rename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think part of the problem with the logic is that the variable names are fuzzy so it's hard to follow what the code is doing. I have some suggestions on clearer variable names that should help.
plugins/out_s3/s3.c
Outdated
if (!upload_file) { | ||
ret = send_upload_request(ctx, chunk, upload_file, | ||
m_upload_file,event_chunk->tag, | ||
flb_sds_len(event_chunk->tag), &tms.tm.tv_sec); | ||
} | ||
else { | ||
ret = send_upload_request(ctx, chunk, upload_file, | ||
m_upload_file,event_chunk->tag, | ||
flb_sds_len(event_chunk->tag), | ||
upload_file->first_log_time); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To simplify logic can you always pass in chunk_first_log_time
to this function, and have the function decide whether to use the upload_file's time or the current time if upload_file is null or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you should always just do:
send_upload_request(ctx, chunk, upload_file, m_upload_file,
event_chunk->tag,
flb_sds_len(event_chunk->tag),
chunk_first_log_time
);
You need to add some logic to the send_upload request function and change the signature to
send_upload_request(..., time_t chunk_first_log_time)
plugins/out_s3/s3.c
Outdated
if (!upload_file) { | ||
ret = buffer_chunk(ctx, upload_file, chunk, chunk_size, | ||
event_chunk->tag, flb_sds_len(event_chunk->tag), | ||
&tms.tm.tv_sec); | ||
} | ||
else { | ||
ret = buffer_chunk(ctx, upload_file, chunk, chunk_size, | ||
event_chunk->tag, flb_sds_len(event_chunk->tag), | ||
upload_file->first_log_time); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this conditional logic into the buffer_chunk function. It's confusing what the timep variable means.
You should always do:
ret = buffer_chunk(ctx, upload_file, chunk, chunk_size,
event_chunk->tag, flb_sds_len(event_chunk->tag),
time_t chunk_first_log_time
);
src/aws/flb_aws_util.c
Outdated
if (!gmtime_r(&time, &gmt)) { | ||
gmt_tm = gmtime_r(&time, &gmt); | ||
if (!gmt_tm) { | ||
flb_error("[s3_key] Failed to create timestamp."); | ||
goto error; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the point of this code change? I guess it follows convention better maybe? Not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I advocate for removing all unnecessary code changes in this pr.
plugins/out_s3/s3_store.h
Outdated
int s3_store_buffer_put(struct flb_s3 *ctx, struct s3_file *s3_file, | ||
const char *tag, int tag_len, | ||
char *data, size_t bytes); | ||
char *data, size_t bytes, | ||
time_t *time); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your declared function signature should match your defined function signature. in otherwords, you need to make sure the .c and .h file functions look the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's time
here and timep
in the other file
plugins/out_s3/s3.c
Outdated
if (!upload_file) { | ||
unit_test_flush(ctx, upload_file, | ||
event_chunk->tag, flb_sds_len(event_chunk->tag), | ||
chunk, chunk_size, | ||
m_upload_file, &tms.tm.tv_sec); | ||
} | ||
else { | ||
unit_test_flush(ctx, upload_file, | ||
event_chunk->tag, flb_sds_len(event_chunk->tag), | ||
chunk, chunk_size, | ||
m_upload_file, upload_file->first_log_time); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this logic into the unit_test_flush function. you should always pass in chunk_first_log_time
see other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense
plugins/out_s3/s3.c
Outdated
@@ -50,13 +50,14 @@ static int s3_put_object(struct flb_s3 *ctx, const char *tag, time_t create_time | |||
|
|||
static int put_all_chunks(struct flb_s3 *ctx); | |||
|
|||
static void cb_s3_upload(struct flb_config *ctx, void *data); | |||
static void cb_s3_upload(struct flb_config *ctx, void *data, time_t *timep); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, could you rename each to either:
chunk_first_log_time
or
file_first_log_time
Right now it's really confusing whether you're talking about the chunk's log timestamp (for a single flush) or the file's log timestamp (for multiple flushes) I'm having trouble following the logic
msgpack_unpacked result; | ||
msgpack_object *obj; | ||
size_t off = 0; | ||
struct flb_time tms; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename this to:
chunk_first_log_time
TMS could mean
- Current time
- Time of first log in file
- Time of first log in chunk
I think changing this to chunk_first_log_time
would make our logic clearer and help us to avoid/discover logic problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
e23cca2
to
d822be8
Compare
plugins/out_s3/s3.c
Outdated
msgpack_object *obj; | ||
size_t off = 0; | ||
struct flb_time tms; | ||
time_t file_first_log_time; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to initialize it to zero
52fe212
to
a8a34c1
Compare
a8a34c1
to
c563aba
Compare
plugins/out_s3/s3.c
Outdated
|
||
/* | ||
* When chunk does not exist, file_first_log_time will be 0. | ||
* This is only for unit tests. Also, this part is important, please don't remove it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why should it not be removed? You could simplify it by saying, "this prevents unit tests from segfaulting". More description would be good though.
I actually meant though to put a comment in cb_s3_flush
that currently our upload functions require chunk to be non-null. To do it around this line: if ((upload_file != NULL) && (upload_timeout_check == FLB_TRUE || total_file_size_check == FLB_TRUE)) {
that the (upload_file != NULL)
should not be removed.
c563aba
to
f54eadf
Compare
plugins/out_s3/s3.c
Outdated
time_t file_first_log_time = time(NULL); | ||
|
||
/* | ||
* When chunk does not exist, file_first_log_time will be 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer true now that we changed the code, please update the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'd be ideal if you stated why teh unit test will crash: because the chunk will be null in it
Signed-off-by: Clay Cheng <[email protected]>
f54eadf
to
cd70b5e
Compare
/* File is ready for upload */ | ||
if (upload_timeout_check == FLB_TRUE || total_file_size_check == FLB_TRUE) { | ||
/* File is ready for upload, upload_file != NULL prevents from segfaulting. */ | ||
if ((upload_file != NULL) && (upload_timeout_check == FLB_TRUE || total_file_size_check == FLB_TRUE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prevents unit tests from segfaulting, but not a big deal. this comment is okay
Signed-off-by: Clay Cheng [email protected]
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Upload timeout + put object used
first log time and key format
NO Upload timeout + Multipart
first log time and key format
Upload timeout + multipart
first log time and key format
No Upload Timeout + Single put object
first log time and time format
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.