Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse skip_unknown_fields has no effect #22013

Closed
srstrickland opened this issue Dec 10, 2024 · 2 comments · Fixed by #22020
Closed

Clickhouse skip_unknown_fields has no effect #22013

srstrickland opened this issue Dec 10, 2024 · 2 comments · Fixed by #22020
Labels
sink: clickhouse Anything `clickhouse` sink related type: bug A code related bug.

Comments

@srstrickland
Copy link
Contributor

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

The clickhouse sink option skip_unknown_fields likely has no effect. I say "likely" because it really depends on what the Clickhouse server settings are. The problem is that this option is enabled in Clickhouse by default.

The way this feature is implemented, vector will only set option input_format_skip_unknown_fields=1 when skip_unknown_fields is true. If a user explicitly sets it to false, they might expect the behavior to be "strict"; that any attempts to send data with additional fields will result in an error. But this isn't the case; vector simply won't send the option, and the server setting (default 1) will be used. So with default settings on a Clickhouse instance, there is no way for a user to enable "strict" mode in vector.

Using the "server default setting" should remain an option, so perhaps the configuration needs to:

  • Send input_format_skip_unknown_fields=1 when skip_unknown_fields: true
  • Send input_format_skip_unknown_fields=0 when skip_unknown_fields: false
  • Don't send the parameter at all when this option is not specified, and let the CH server decide what to do

This is a really minor issue, but thought it was worth raising in case anyone else was surprised by the current functionality. It's entirely possible that the Clickhouse default used to be 0, in which case the current implementation makes a bit more sense... but even so, there should be a way to force one way or another from vector's perspective, independent of the CH server settings.

It's likely that the other vector options for the clickhouse sink need to similarly consider what to do for true, false, and not-set, as they are all currently implemented as "only do something if true", and this may lead to unexpected behavior depending on the various levels of defaulting going on (in both vector config & clickhouse config).

Configuration

sinks:
  my_clickhouse:
    type: clickhouse
    ...
    skip_unknown_fields: false

Version

0.43.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@srstrickland srstrickland added the type: bug A code related bug. label Dec 10, 2024
@jszwedko jszwedko added the sink: clickhouse Anything `clickhouse` sink related label Dec 11, 2024
@jszwedko
Copy link
Member

Thanks @srstrickland . I agree with your assessment. If unspecified, it shouldn't send the option and use the server default. If specified (either true or false), it should send the value to override the server default. A PR for this would be welcome.

@PriceHiller
Copy link
Contributor

A PR for this would be welcome.

I'll give it a shot. I haven't used clickhouse before, so additional testing will almost certainly be required.

I'm assuming I can just encode that bool value as an Option<bool> and go from there.

PriceHiller added a commit to PriceHiller/vector that referenced this issue Dec 11, 2024
Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See vectordotdev#22013 for the original
issue report.

Closes vectordotdev#22013
PriceHiller added a commit to PriceHiller/vector that referenced this issue Dec 11, 2024
Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See vectordotdev#22013 for the original
issue report.

Closes vectordotdev#22013
PriceHiller added a commit to PriceHiller/vector that referenced this issue Dec 11, 2024
Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See vectordotdev#22013 for the original
issue report.

Closes vectordotdev#22013
PriceHiller added a commit to PriceHiller/vector that referenced this issue Dec 12, 2024
Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See vectordotdev#22013 for the original
issue report.

Closes vectordotdev#22013
github-merge-queue bot pushed a commit that referenced this issue Dec 13, 2024
Problem: The Clickhouse sink's `skip_unknown_fields` doesn't follow the Clickhouse
server default. It always sends a value for `skip_unknown_fields`;
furthermore, there is also no way to _disable_ `skip_unknown_fields`
setting a "strict" mode for Clickhouse. We really want to permit either
a default value from the Clickhouse server, meaning we shouldn't specify
`skip_unknown_fields` by default. Otherwise, if a user _wants_ to
specify the strict mode for unknown fields, they should then pass either
`true` or `false` for click house.

Solution: Change the `skip_unknown_fields` value to be of an
`Option<bool>` instead of just a `bool`. This permits using the defaults
provided by the Clickhouse server and doesn't send the
`skip_unknown_fields` value to the server if left unspecified.

See #22013 for the original
issue report.

Closes #22013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: clickhouse Anything `clickhouse` sink related type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants