Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Experiment with Factory Function definition of existing Function #16571

Closed
wants to merge 2 commits into from

Conversation

TylerHelmuth
Copy link
Member

Description:
This PR is mainly to have a discussion about how OTTL/transformprocessor should support situations where a functionality is needed both as a factory function and as a standard function. I'm using KeepKeys to center the conversation but there will be other functions/factory functions that fit as well.

(KeepKeys should do regex matching like DeleteAllMatches but let's ignore that for now)

At the moment the OTTL has a function called KeepKeys, referenced as keep_keys in the transform processor, that takes a map and a list of string keys and drops all entries from the map that aren't in the list.

There is also a need to be able to adjust a map returned from parsing (#14938 and #9410). For this we need a Factory Function so that we can do something like merge_maps(attributes, KeepKeys(ParseJSON(body), ["log_type"]), "upsert").

Both keep_keys and KeepKeys have the same intention, but keep_keys returns nil and KeepKeys returns the map that was modified.

Non-exhaustive Options

  1. Introduce "keep keys" functionality as a function and a factory function. The function's implementation can rely on the factory function's. This is the implementation in this PR. This solution require's users to understand the difference between a Factory Function and a Function, which is currently only a conceptual difference; nothing in the grammar enforces this (but it could).
  2. Update the existing keep_keys function to return the value it modifies. This will allow statements like merge_maps(attributes, keep_keys(ParseJSON(body), ["log_type"]), "upsert") but it also allows stuff like set(attributes["test"], keep_keys(resource.attributes, ["log"])). Since keep_keys is not pure, it would allow modifying multiple paths in the same statement, which could get users into bad situations unexpectedly.
  3. Remove keep_keys and only allow modifying the map with the factory function of KeepKeys. This would require KeepKeys to always be used with set. keep_keys(attributes, ["test"]) would become set(attributes, KeepKeys(attributes, ["test"])).

Personally I like option 1 in combination with a grammar/package change to make factory functions a core concept in the language. I don't think I want to go to the extreme of removing standard function's ability to return yet. But I do think we can restrict the grammar's value type to functions that start with a capital letter and the Invocation function name to start with a lower case letter. This starts to enforce a difference between the 2 types of functions (although there is still a problem of the transformprocessor being the one that actually defines the function map, but that can also be adjusted).

@runforesight
Copy link

runforesight bot commented Dec 1, 2022

Foresight Summary

    
Major Impacts

build-and-test-windows duration(4 seconds) has decreased 33 minutes 36 seconds compared to main branch avg(33 minutes 40 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 4 seconds (33 minutes 36 seconds less than main branch avg.) and finished at 5th Dec, 2022.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  tracegen workflow has finished in 8 minutes 34 seconds (⚠️ 5 minutes 53 seconds more than main branch avg.) and finished at 5th Dec, 2022.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

❌  changelog workflow has finished in 8 minutes 58 seconds (⚠️ 3 minutes 48 seconds more than main branch avg.) and finished at 5th Dec, 2022. 1 job failed.


Job Failed Steps Tests
changelog Ensure ./.chloggen/*.yaml addition(s)     🔗  N/A See Details

✅  check-links workflow has finished in 8 minutes 48 seconds (⚠️ 6 minutes 30 seconds more than main branch avg.) and finished at 5th Dec, 2022.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  load-tests workflow has finished in 21 minutes 27 seconds (⚠️ 6 minutes 31 seconds more than main branch avg.) and finished at 5th Dec, 2022.


Job Failed Steps Tests
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 19  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 20 minutes 25 seconds (⚠️ 12 minutes 4 seconds more than main branch avg.) and finished at 5th Dec, 2022.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

❌  build-and-test workflow has finished in 44 minutes 36 seconds (11 minutes 17 seconds less than main branch avg.) and finished at 5th Dec, 2022. 3 jobs failed.


Job Failed Steps Tests
unittest-matrix (1.19, internal) -     🔗  ✅ 592  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
correctness-traces -     🔗  ✅ 0  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, internal) -     🔗  ✅ 592  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, processor) -     🔗  ✅ 1464  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, processor) -     🔗  ✅ 1464  ❌ 0  ⏭ 0    🔗 See Details
correctness-metrics -     🔗  ✅ 0  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, extension) -     🔗  ✅ 528  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-0) -     🔗  ✅ 2533  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-0) -     🔗  ✅ 2532  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, other) -     🔗  ✅ 4363  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, exporter) -     🔗  ✅ 2412  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, exporter) -     🔗  ✅ 2412  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) -     🔗  ✅ 4362  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.18, receiver-1) -     🔗  ✅ 1845  ❌ 0  ⏭ 0    🔗 See Details
integration-tests -     🔗  ✅ 59  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
checks Impi     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) Lint     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  ✅ 1845  ❌ 0  ⏭ 0    🔗 See Details
build-examples -     🔗  N/A See Details
lint Interpret result     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
unittest (1.18) -     🔗  N/A See Details
cross-compile -     🔗  N/A See Details
build-package -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

Copy link
Contributor

@evan-bradley evan-bradley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach overall, and I agree that some guard rails would help prevent confusion. Is there a way to provide better error messages if we make the two function types distinct within the grammar? I find the grammar's error messages to be pretty cryptic in many cases.

return nil, fmt.Errorf("target must be a map but got %T", val)
}

mapVal.RemoveIf(func(key string, value pcommon.Value) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make it so that the factory function doesn't mutate the input? I'd like to see what the code overlap looks like if we keep it pure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I can try that out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good as it both simplifies the logic and enables it be used in other locations. I agree with @evan-bradley's suggestion of the factory itself should not mutate input params 👍🏻

@TylerHelmuth
Copy link
Member Author

@evan-bradley I'll submit a PR with a grammar change to start enforcing function types

@evan-bradley
Copy link
Contributor

I'll submit a PR with a grammar change to start enforcing function types

If the grammar can't be configured to print better error messages, I'd be alright with enforcing it in the parser instead.

@TylerHelmuth
Copy link
Member Author

I am going to close this PR for now in favor of @kentquirk "temp storage" idea. I'll open a PR for that soon. We can come back to this PR in the future if needed.

codeboten pushed a commit that referenced this pull request Jan 18, 2023
We've had discussion recently around how to handle functions and converters (previously factory functions) that share the same functionality (see #16571)

This PR suggests a different approach that should reduce the need to chain converters/functions.

Before we chained converters together because OTTL had no place to store data except for the telemetry payload itself. While attributes could be used, it results in the need for cleanup and potentially unwanted transformations based on condition resolution. This PR introduces the concept of tmp to the logs contexts (and future contexts if we like the solution) that statements can use as a "staging" location for complex transformations.

Before to handle situations like the one in #14938 we would have to chain together functions. Pretending we had KeepKeys converter the statement would look like

merge_maps(attributes, KeepKeys(ParseJSON(body), ["log_type"]), "upsert").

These types of statements are tricky to write and can difficult to comprehend, especially for new users. Each Converter we add on increased the burden. Adding a single extra function, like Flatten, really makes a difference: merge_maps(attributes, Flatten(KeepKeys(ParseJSON(body), ["log_type"]), "."), "upsert")

Co-authored-by: Evan Bradley <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants