-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream Processing converts strings to numbers #2498
Comments
Hi @angristan, that was an intention in the initial design of Stream Processor. The main motivation was the assumption that clients may mistakenly use stringified numbers in records (for example in JSON objects), and such string-to-int conversion would be helpful in general and allows applying mathematical computations on such fields (similar to However, in your case, @edsiper What do you think? |
I see! It makes total sense, unfortunately I hit an edge case 🙂 |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
This issue was closed because it has been stalled for 5 days with no activity. |
@koleini This should be looked at again. Items like short guids, for example "39hy3a" are being converted to 39 by this code. Which is entirely incorrect. This magic conversion is pretty bad in the stream/SQL world. An explicit cast function, even one which supports some form of duck typing would be far more functional. I'm trying to group by short guids and they are getting converted all over the place, which is weird. |
The issue
We're using fluent-bit to process some Haproxy logs. Using
[PARSER]
, we extract some fields using a regex. By looking at #310, it seems these fields are typeable, and strings by defaults.The output of [PARSER] looks like this:
As we can, see, these are all strings.
We use a simple stream task:
Looking at the output, everything seems fine:
But we can already see than
return_code
has been transformed from a string to an int. We didn't notice because this was convenient for us: we output this in Elasticsearch and when querying it, we would get a number in the JSON response.However, it became an issue for another field,
name
, which should be a string. Below, the name should be"12345.67890"
, a valid name in our system, but fluent-bit transformed it to a float, which we can't process anymore because of the approximation.Looking at the output right before stream processing, after the
[PARSER]
, we get"name"=>"1234567890"
or"name"=>"12345.67890"
, which means integers and floats are not converted at this point, which means stream processing is the culprit.After looking around the code, here's the problem:
fluent-bit/src/stream_processor/flb_sp.c
Lines 339 to 385 in 40a9822
Strings are automatically converted to numbers when possible, which we don't want.
Our patch
We patched our fluent-bit system here:
fluent-bit/src/stream_processor/flb_sp.c
Lines 372 to 381 in 40a9822
To bypass string conversion (quick & dirty):
And now we actually type the fields we need to not be strings by using the
Types
feature (#310) (<- btw documentation is gone)Now, types are correctly preserved:
(
return_code
is still an int, as it was in my first snippet, but that's because we type it during the parsing, so we expect it to be an int)Long-term solution
The automatic string-to-number feature in stream processing by itself makes sense, however, since we can type fields earlier in the pipeline using
[PARSER]
, I'm not sure it is useful, on the contrary.My understanding is that it's probably been overlooked when the stream processing feature was introduced?
Based on our use-case, I would have sent a PR deleting all the code related to that feature, but I lack knowledge and historical context so I would like the maintainers' input on this. I think it should at least be optional/configurable (it isn't, right?). We maintain our patched build of fluent-bit so we're fine as it is with the one-line patch I added above, but it would be great to fix this behavior upstream.
Hopefully we provided enough details, let us know if we missed something!
Thanks for fluent-bit 🙏
The text was updated successfully, but these errors were encountered: