Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flb_aws_util: added function flb_aws_strftime_precision for time output #6328

Merged
merged 3 commits into from
Jan 27, 2023

Conversation

Claych
Copy link
Contributor

@Claych Claych commented Nov 1, 2022


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
[INPUT]
    Name                         forward
    Listen                       0.0.0.0
    Port                         24224
[OUTPUT]
    Name                         kinesis_streams
    Match                        *
    region                       us-east-1
    stream                       test-clay-firehose
    time_key                     @timestamp
    time_key_format              %Y-%m-%dT%H:%M:%S.%3N.%L.%L
  • Debug log output from testing the change
    Normal time_key_format
[INPUT]
    Name                         forward
    Listen                       0.0.0.0
    Port                         24224

[OUTPUT]
    Name                         kinesis_streams
    Match                        *
    region                       us-east-1
    stream              test-clay-firehose
    time_key                     @timestamp
    time_key_format              %Y-%m-%dT%H:%M:%S.%3N.%L.%L
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.250.250161647.250161647.250161647"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.250.250229597.250229597.250229597"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.250.250403404.250403404.250403404"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.251.251213550.251213550.251213550"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.251.251456260.251456260.251456260"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.251.251516580.251516580.251516580"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.251.251573085.251573085.251573085"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.251.251810073.251810073.251810073"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.252.252052307.252052307.252052307"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.252.252310276.252310276.252310276"}
{"from":"userA","to":"userB","@timestamp":"2023-01-04T00:39:14.253.253333330.253333330.253333330"}

Wrong time_key_format

[INPUT]
    Name                         forward
    Listen                       0.0.0.0
    Port                         24224

[OUTPUT]
    Name                         kinesis_streams
    Match                        *
    region                       us-east-1
    stream              test-clay-firehose
    time_key                     @timestamp
    time_key_format              %Y-%m-%dT%H:%M:%S.%3N.%L.%L-%9N-9N~%9N-99N~%99N-99NN~%9%9NN-9NL~%9N%L

fluent-bit log

[2023/01/04 01:05:02] [ info] [output:kinesis_streams:kinesis_streams.0] worker #0 started
[2023/01/04 01:05:19] [error] [output:kinesis_streams:kinesis_streams.0] Failed to add time_key @timestamp to record, test-clay-firehose
[2023/01/04 01:05:19] [error] [output:kinesis_streams:kinesis_streams.0] Failed to add time_key @timestamp to record, test-clay-firehose
[2023/01/04 01:05:19] [error] [output:kinesis_streams:kinesis_streams.0] Failed to add time_key @timestamp to record, test-clay-firehose
[2023/01/04 01:05:19] [error] [output:kinesis_streams:kinesis_streams.0] Failed to add time_key @timestamp to record, test-clay-firehose
[2023/01/04 01:05:19] [error] [output:kinesis_streams:kinesis_streams.0] Failed to add time_key @timestamp to record, test-clay-firehose
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
{"from":"userA","to":"userB"}
  • Attached Valgrind output that shows no leaks or memory corruption was found
==8834== 
==8834== LEAK SUMMARY:
==8834==    definitely lost: 0 bytes in 0 blocks
==8834==    indirectly lost: 0 bytes in 0 blocks
==8834==      possibly lost: 0 bytes in 0 blocks
==8834==    still reachable: 102,912 bytes in 3,432 blocks
==8834==         suppressed: 0 bytes in 0 blocks
==8834== 
==8834== For lists of detected and suppressed errors, rerun with: -s
==8834== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

fluent/fluent-bit-docs#981 fluent/fluent-bit-docs#980
Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@PettitWesley
Copy link
Contributor

@Claych its kinesis

size_t s;
const char *format = "%Y-%m-%dT%H:%M:%S";
len = strftime(str, maxsize, format, timeptr);
format = strstr(time_key_format, ".%F");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the user required to put a dot . character in there? Can we support user provided timestamp formats where the milliseconds do not include a dot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, customers can add millisecond or nanosecond with time_key_format %Y-%m-%dT%H:%M:%S.%F or %Y-%m-%dT%H:%M:%S.%L. because default setting is %Y-%m-%dT%H:%M:%S so I put a dot . here. Of course we can delete this dot in logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of time_key_format is to give the user full control over the timestamp format. If we require that milliseconds or nanoseconds are prefaced with a dot then we are restricting the user unnecessarily IMO. Just because that might be the most common or obvious use case does not mean it is the only one.

@Claych Claych force-pushed the clay-kineses-data-stream branch from 13d3554 to a743d21 Compare November 1, 2022 23:52
Comment on lines 115 to 118
format = strstr(time_key_format, ".%L");
if (format) {
s = snprintf(str + len, sizeof(str) - 1 - len,
".%09" PRIu64 "Z",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the format var do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you said we can't put function as argument so I reused the format to avoid that happen

Copy link
Contributor

@PettitWesley PettitWesley Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am not following here- format is a char *, not a function pointer.

format is set to the return value of strstr but then its not used later in the function... is the goal just to check whether or not the format contains this string? What is the purpose of this variable? Is it a format or something else? What is the most ideal name here?

const char *time_key_format, const struct tm *timeptr,
size_t len, struct flb_time *tms){
size_t s;
const char *format = "%Y-%m-%dT%H:%M:%S";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait is format supposed to be the default format? Sorry I am very confused here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"%Y-%m-%dT%H:%M:%S.%L" => "2022-11 ... 12:30:45.%L" =>

"%Y-%m-%dT%H:%M:%S.%L" => "%Y-%m-%dT%H:%M:%S.23923942332" => strftime

} else {
format = strstr(time_key_format, ".%L");
if (format) {
s = snprintf(str + len, sizeof(str) - 1 - len,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So len is the number of bytes that strftime wrote... if I am understanding this correctly, then it is just appending the nanoseconds or milliseconds at the end and also a "Z" character. Will it always add a "Z" at the end?

Can this code handle timestamps where there is no Z at the end or where the nanoseconds are not at the end? For example. what if the timestamp is backwards to our normal order, with nanosecond, then second, then minute, etc

@Claych Claych temporarily deployed to pr November 2, 2022 10:48 Inactive
@Claych Claych temporarily deployed to pr November 2, 2022 10:48 Inactive
@Claych Claych temporarily deployed to pr November 2, 2022 11:03 Inactive
@Claych Claych force-pushed the clay-kineses-data-stream branch from a743d21 to 38ce39e Compare November 2, 2022 20:26
@Claych Claych changed the title out_kineses_streams: update for time output can support millisecond a… out_kineses_streams: added function get_time_format_string for time output can support millisecond and nanosecond by adding %F or %L to time_key_format. Nov 2, 2022
@Claych Claych changed the title out_kineses_streams: added function get_time_format_string for time output can support millisecond and nanosecond by adding %F or %L to time_key_format. out_kinesis_streams: added function get_time_format_string for time output can support millisecond and nanosecond by adding %F or %L to time_key_format. Nov 2, 2022
@Claych Claych force-pushed the clay-kineses-data-stream branch 3 times, most recently from b9566fb to d4113ad Compare November 3, 2022 18:36
Comment on lines 52 to 55
#define millisecond_length 4
#define nanosecond_length 10
#define milli_string_old "%F"
#define nano_string_old "%L"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should to be prefixed with FLB and need to have good long descriptive names once they are moved to the flb_aws_util.c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is because in C, everything is global basically, so all these constants have to have unique names I think

{
char millisecond_string[millisecond_length];
char nanosecond_string[nanosecond_length];
/* calculate the length of str, replace %F to 3 digits and %L to 9 digits */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: replace "str" with out_buff

char millisecond_string[millisecond_length];
char nanosecond_string[nanosecond_length];
/* calculate the length of str, replace %F to 3 digits and %L to 9 digits */
out_size = strlen(time_key_format)+count_substring(time_key_format, milli_string_old)+7*count_substring(time_key_format, nano_string_old);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not understand the math here, why 7 * the nano string but just 1 * the milli string? We can discuss if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 7 a special number such that it could be a constant?

Comment on lines 136 to 138
mem_for_time_key_format = (char *)calloc(strlen(time_key_format), sizeof(char));
tem_buff_time_str = (char *)calloc(out_size, sizeof(char));
out_buff = (char *)calloc(out_size, sizeof(char));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time you alloc, you need to check if it succeeded or not and have error handling

/*
* Adjust the time_key output format according to time_key_format.
*/
static void create_time_format_string(char *out_buff, size_t out_size,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be: char **out_buff, size_t *out_size,

since these are used to get output from the function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember what i said about the out_buf pattern? Look up the int flb_gzip_compress(void *in_data, size_t in_len, void **out_data, size_t *out_len) function as an example of this in existing code

snprintf(nanosecond_string, nanosecond_length, "%" PRIu64, (uint64_t) tms->tm.tv_nsec);
for (int i = 0; i < strlen(time_key_format); i++) {
if (!strncmp(time_key_format + i, milli_string_old, strlen(milli_string_old))) {
strcat(tem_buff_time_str, millisecond_string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's always use strncat since its safer

strcat(tem_buff_time_str, millisecond_string);
i += strlen(milli_string_old) - 1;
} else if (!strncmp(time_key_format + i, nano_string_old, strlen(nano_string_old))) {
strcat(tem_buff_time_str, nanosecond_string);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strncat plz

@Claych Claych temporarily deployed to pr November 28, 2022 21:32 Inactive
@Claych Claych temporarily deployed to pr November 28, 2022 21:32 Inactive
@Claych Claych temporarily deployed to pr November 28, 2022 21:47 Inactive
@Claych Claych force-pushed the clay-kineses-data-stream branch 2 times, most recently from 5521109 to 8f5a5c7 Compare November 28, 2022 22:06
PettitWesley
PettitWesley previously approved these changes Nov 28, 2022
@Claych Claych temporarily deployed to pr November 30, 2022 20:08 Inactive
@Claych Claych temporarily deployed to pr November 30, 2022 20:08 Inactive
@Claych Claych temporarily deployed to pr November 30, 2022 20:29 Inactive
@Claych Claych force-pushed the clay-kineses-data-stream branch 2 times, most recently from f904861 to 2a109bf Compare December 2, 2022 00:43
@Claych Claych changed the title flb_aws_util: added function flb_aws_strftime_precision for time output can support millisecond and nanosecond by adding %3N and %9N or %L to time_format. flb_aws_util: added function flb_aws_strftime_precision for time output Dec 5, 2022
@Claych Claych force-pushed the clay-kineses-data-stream branch 2 times, most recently from fec66d0 to 8b0fea0 Compare December 12, 2022 19:50
@Claych Claych force-pushed the clay-kineses-data-stream branch from 891b81b to 597b615 Compare December 14, 2022 03:24
@Claych Claych force-pushed the clay-kineses-data-stream branch from b37274c to f97cafc Compare December 28, 2022 01:21
Copy link
Contributor

@PettitWesley PettitWesley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is fully tested I think we are good to go.

Copy link
Contributor

@matthewfala matthewfala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Please address Wesley's comment though. If this is well tested, then it still has my approval for merge.

out_size = strftime(buf, tmp_parsed_time_str_len,
tmp_parsed_time_str, &timestamp);

/* Check whether tmp_parsed_time_str_len is enough for tmp_out_buff */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address this comment

if (out_size == 0) {
flb_free(tmp_parsed_time_str);
flb_free(buf);
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this before- why is the function returning 0 to indicate failure?

This breaks the pattern of FLB code. We usually use -1 to indicate failure and 0 to indicate success for function integer returns.

I see that strftime returns 0 if it fails, however, to match FLB code practices, I think we should have this function return -1 in that case to make it easy for the caller to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forgot that size_t is an unsigned type so we have to use 0 as return value for error. So this makes sense.

@Claych Claych temporarily deployed to pr January 11, 2023 19:36 — with GitHub Actions Inactive
@Claych Claych temporarily deployed to pr January 11, 2023 19:36 — with GitHub Actions Inactive
@Claych Claych temporarily deployed to pr January 11, 2023 20:05 — with GitHub Actions Inactive
@edsiper edsiper merged commit f3e480a into fluent:master Jan 27, 2023
sumitd2 pushed a commit to sumitd2/fluent-bit that referenced this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants