Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Validate that all parts of a path are used by a Context #22744

Closed
Tracked by #28892
evan-bradley opened this issue May 24, 2023 · 11 comments · Fixed by #30042
Closed
Tracked by #28892

[pkg/ottl] Validate that all parts of a path are used by a Context #22744

evan-bradley opened this issue May 24, 2023 · 11 comments · Fixed by #30042
Assignees
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium

Comments

@evan-bradley
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

The contexts don't do any validation that extraneous path parts or indices are passed by a user in a statement. For example, the following paths both just access the name path and ignore the extra parts:

  • name.data
  • name["data"]
  • name[0]

This can be confusing to users, who will get data they don't expect from what is really a broken path.

Describe the solution you'd like

We need to enforce that Contexts use all parts of a path that have been parsed from a statement. While we could rely on the Contexts' path parsers to check this themselves, I think we should try to enforce this in the OTTL parser and relieve Contexts of the burden of enforcing these checks.

The first part of this should involve making the OTTL AST structure (everything in grammar.go) internal to the OTTL and providing Contexts only with structures created by the OTTL parser. This will make additions to the DSL easier (no need to make sweeping changes to the Contexts when changing the grammar) and should make for a cleaner overall structure.

As part of this, I think we should offer Contexts with structs that look something like the below for use in their path parsing functions. Note that these are very rough drafts and will need to have their implementations revised.

type Path struct {
    // paths is for internal tracking of path objects.
    paths []ottl.Path
    
    // Name returns the name of this path.
    Name() string
    
    // Keys gets the first Key for this path. Subkeys can be obtained
    // through Key.Next.
    Keys() Key

    // Next gets the next Path. The second value returns whether
    // there is another Path available.
    // Next gets the Next path by returning paths[0], which has
    // paths[1:] as its internal slice.
    Next() (Path, bool)

    // isComplete is a method to be used by the OTTL parser to
    // tell the parser whether all paths and keys were used by
    // the Context.
    isComplete() bool
}

type Key struct {
    // keys is for internal tracking of path objects.
    keys []ottl.Key

    // String gets a string key, or nil if this isn't a string key.
    String() *string

    // Int gets an int key, or nil if this isn't an int key.
    Int() *int

    // Next gets the next Key. The second value returns whether
    // there is another Key available.
    // Next gets the Next key by returning keys[0], which has
    // keys[1:] as its internal slice.
    Next() (Key, bool)
}

Describe alternatives you've considered

There may be other changes to the internal structure of the OTTL that I haven't considered here that would help accomplish this.

Additional context

No response

@TylerHelmuth
Copy link
Member

@evan-bradley I am fully onboard with finally separating the AST from the Contexts and passing in a standard structure for the contexts to use.

For the actual validation, I want to make sure I am understanding your idea. Is the proposal to create the struct in the Parser, hand it to the Context for use, have the struct keep track of how it is used, and then once the Context is done parsing the path for the Parser to check the struct to see how it was used?

@evan-bradley
Copy link
Contributor Author

Yeah, that's exactly it.

@TylerHelmuth
Copy link
Member

TylerHelmuth commented May 24, 2023

Sounds like a good strategy to me, that'll save any context creator a bunch of work.

@TylerHelmuth TylerHelmuth added the help wanted Extra attention is needed label May 30, 2023
@TylerHelmuth TylerHelmuth self-assigned this Jun 7, 2023
@TylerHelmuth TylerHelmuth removed the help wanted Extra attention is needed label Jun 7, 2023
@TylerHelmuth
Copy link
Member

@evan-bradley ran into an interesting, but common, situation we need to consider with this proposal.

Consider the situation where a multiple paths are supplied, such as body.string for ottllog. If the context checks for Next(), successfully sees one, and then that path's name is not string what should happen?

case "body":
  nextPath, ok := path.Next()
  if ok {
	  if nextPath.Name() == "string" {
		  return accessStringBody(), nil
	  }
          // what should happen here?
  } else {
	  keys := path.Keys()
	  if keys == nil {
		  return accessBody(), nil
	  }
	  return accessBodyKey(*keys), nil
  }

At the moment the situation would cause the pathParser function to return a default fmt.Errorf("invalid path expression %v", path) error, which seems appropriate. But this means that the struct is not handling the incorrect name itself.

These are the error situations I've classified so far

  1. A path name is not recognized
  2. Extra paths are provided but expected
  3. A key is supplied on a path that cannot accept keys

Of these errors, I believe the context must be involved with 1 and 3. I also believe that the context is handling 1 and 3 today; any time a context would accept a second path it will error if the second path is not what it expects.

We can continue to let the contexts handle 1 and 3 and rely on the new grammar-hiding struct to handle case 2. This would spread out the error handling responsibilities tho.

@TylerHelmuth
Copy link
Member

@evan-bradley here is the current API I'm working with as I explore this issue, please review:

func newPath(fields []field) *Path {
	p := newPathHelper(fields)
	if p == nil {
		return nil
	}
	p.fetched = true
	return p
}

func newPathHelper(fields []field) *Path {
	if len(fields) == 0 {
		return nil
	}
	return &Path{
		name:     fields[0].Name,
		key:      newKey(fields[0].Keys),
		nextPath: newPath(fields[1:]),
	}
}

type Path struct {
	name     string
	key      *Key
	nextPath *Path
	fetched  bool
}

func (p *Path) Name() string {
	return p.name
}

func (p *Path) Next() (Path, bool) {
	if p.nextPath == nil {
		return Path{}, false
	}
	p.nextPath.fetched = true
	return *p.nextPath, true
}

func (p *Path) Keys() *Key {
	return p.key
}

func (p *Path) isComplete() error {
	if !p.fetched {
		return fmt.Errorf("path section '%v' was not fetched", p.name)
	}
	if p.nextPath == nil {
		return nil
	}
	return p.nextPath.isComplete()
}

func newKey(keys []key) *Key {
	if len(keys) == 0 {
		return nil
	}
	return &Key{
		s:       keys[0].String,
		i:       keys[0].Int,
		nextKey: newKey(keys[1:]),
	}
}

// map value manipulation in `contexts/internal` code and tests requires being able to create a new Key.
// Also added `Set*` functions.
func NewEmptyKey() Key {
	return Key{}
}

type Key struct {
	s       *string
	i       *int64
	nextKey *Key
}

func (k *Key) String() *string {
	return k.s
}

func (k *Key) Int() *int64 {
	return k.i
}

func (k *Key) Next() *Key {
	if k.nextKey == nil {
		return nil
	}
	return k.nextKey
}

func (k *Key) SetString(s *string) {
	k.s = s
}

func (k *Key) SetInt(i *int64) {
	k.i = i
}

func (k *Key) SetNext(nk *Key) {
	k.nextKey = nk
}

@evan-bradley
Copy link
Contributor Author

These are the error situations I've classified so far

  1. A path name is not recognized
  2. Extra paths are provided but expected
  3. A key is supplied on a path that cannot accept keys

Of these errors, I believe the context must be involved with 1 and 3. I also believe that the context is handling 1 and 3 today; any time a context would accept a second path it will error if the second path is not what it expects.

We can continue to let the contexts handle 1 and 3 and rely on the new grammar-hiding struct to handle case 2. This would spread out the error handling responsibilities tho.

The error situations you outlined make sense to me. In my testing, the contexts currently only handle the first error case. In the span context I can access name["key"] without an error being thrown since the handler for the name path doesn't consider whether any keys were passed. I expect that only case 1 will be handled by contexts going forward.

You make a good point about the error handling being spread out a bit, but I think that's okay since it relieves the contexts of the burden of handling cases where extra paths or keys are provided and just focusing on the cases where paths are invalid. Additionally, when I tried to exhaustively handle 2 and 3 in the contexts I couldn't do it without the logic getting out of hand pretty quickly.

I think what distinguishes 1 from 2 and 3 is that you can validate 2 and 3 without knowing anything about the context, as where you can't do that with 1.

@evan-bradley
Copy link
Contributor Author

evan-bradley commented Jun 14, 2023

here is the current API I'm working with as I explore this issue, please review

Overall what you have looks good to me. Two nitpicky thoughts:

  • If we have interfaces for Path and Key, we could have OTTL implement two structs: one that is used and only constructable by the parser, and another that can be constructed anywhere but isn't used by the parser for validation. This would allow getting rid of the Set methods for Key, though I realize it adds some internal complexity.
  • I think we could optimize the structure just a bit if we kept a slice of grammar objects internally in the struct and just updated the current index on calling Next. That way we don't have to iterate through the objects at parsing time and isComplete could just be a quick O(1) check for i == len(slice) - 1. With the current length of our paths though, this is probably a meaningless optimization, so the linked list approach works fine too.

@TylerHelmuth
Copy link
Member

If we have interfaces for Path and Key, we could have OTTL implement two structs: one that is used and only constructable by the parser, and another that can be constructed anywhere but isn't used by the parser for validation. This would allow getting rid of the Set methods for Key, though I realize it adds some internal complexity.

I hate those Set methods, I'll look into this.

I think we could optimize the structure just a bit if we kept a slice of grammar objects internally in the struct and just updated the current index on calling Next

Linked list was the first approach I tried bc it felt natural but I can play around with this idea as well.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Aug 14, 2023
@TylerHelmuth TylerHelmuth removed their assignment Aug 14, 2023
@TylerHelmuth
Copy link
Member

I don't currently have time to work on this issue so someone else can pick it up. If it is still around later this year I will pick it back up.

@TylerHelmuth TylerHelmuth added the help wanted Extra attention is needed label Aug 14, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Oct 16, 2023
@TylerHelmuth TylerHelmuth self-assigned this Dec 7, 2023
@TylerHelmuth TylerHelmuth removed the help wanted Extra attention is needed label Dec 7, 2023
TylerHelmuth added a commit that referenced this issue Dec 12, 2023
**Description:**
This is the first PR working towards
#22744.

It adds 2 new interfaces, `path` and `key`, that will be given to
contexts to use when interpreting the paths/keys passed from the
grammar. Eventually we'll be able to use the `isComplete` functions
during parsing to ensure that all parts of the path were used by the
context.

By the end of that issue we'll have removed all public API's of OTTL's
grammar, which is going to cause massive breaking changes to all our
contexts and huge PRs. To help decrease the amount of changes per PR
these new interfaces are not yet exported and not yet implemented. To
see them partially in use checkout
#29703.

**Link to tracking Issue:**

Related to
#22744

**Testing:**

Added unit tests

**Documentation:** <Describe the documentation added.>

I'll add godoc comments once these things are made public

---------

Co-authored-by: Evan Bradley <[email protected]>
TylerHelmuth added a commit that referenced this issue Dec 14, 2023
**Description:** 
In order to support dynamic indexing we need the `key.Name` method to be
able to invoke Getters during hot path execution.

**Link to tracking Issue:** <Issue number if applicable>
Related to
#22744

**Testing:** <Describe what testing was performed and which tests were
added.>
Updated tests
TylerHelmuth added a commit that referenced this issue Dec 15, 2023
This PR publicizes and implements the new `Path` and `Key` interfaces.
These interfaces allow contexts to get access to the parsed paths/keys
without needed to export the grammar's structs. Since everything was
using those structs before, this has significant breaking changes.

This PR should not change any functionality.

Related to
#22744
evan-bradley pushed a commit that referenced this issue Dec 18, 2023
**Description:**
On the continuing quest to unexport the grammar, this PR unexports the
grammar's constants.

Depends on
#29924

Link to tracking Issue:
Related to
#22744
TylerHelmuth added a commit that referenced this issue Dec 18, 2023
**Description:** 
Updates OTTL to be able to detect when a context has not used all parts
of a path and its keys. It only validates that each path/key was grabbed
- it cannot validate that the context used the value "correctly".

An example error message, taken from the unit test, looks like:

```
error while parsing arguments for call to "testing_getsetter": invalid argument at position 0: the path section "not-used" was not used by the context - this likely means you are using extra path sections
```

**Link to tracking Issue:**
Closes
#22744

**Testing:**
Added new unit tests

---------

Co-authored-by: Evan Bradley <[email protected]>
TylerHelmuth added a commit that referenced this issue Dec 18, 2023
**Description:** 
The desire to validate both path segments AND keys led to a bug where a
totally valid statement like

`replace_match(body["metadata"]["uid"], "*", "12345")` 

fails since only `metadata` is checked during parsing; `uid` is checked
during hot-path get/set of the value.

Failing such a statement is not the intention of
#22744
and it was incorrect to fail such a statement. In fact, I believe
validating the key's use in the context will be even more complex once
we introduce dynamic indexing.

For these reasons, this PR removes Key validation for now. If, in the
future, we want to re-add these validations, our Interfaces allow that.

**Link to tracking Issue:** 

#22744

#30051

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

Also we wouldve caught this issue earlier if we had an e2e test that did
complex indexing but unfortunately we did in the transform processor.
All the more reason to implement
#28642.
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
…try#29883)

**Description:** 
In order to support dynamic indexing we need the `key.Name` method to be
able to invoke Getters during hot path execution.

**Link to tracking Issue:** <Issue number if applicable>
Related to
open-telemetry#22744

**Testing:** <Describe what testing was performed and which tests were
added.>
Updated tests
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
This PR publicizes and implements the new `Path` and `Key` interfaces.
These interfaces allow contexts to get access to the parsed paths/keys
without needed to export the grammar's structs. Since everything was
using those structs before, this has significant breaking changes.

This PR should not change any functionality.

Related to
open-telemetry#22744
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
**Description:**
On the continuing quest to unexport the grammar, this PR unexports the
grammar's constants.

Depends on
open-telemetry#29924

Link to tracking Issue:
Related to
open-telemetry#22744
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
…0042)

**Description:** 
Updates OTTL to be able to detect when a context has not used all parts
of a path and its keys. It only validates that each path/key was grabbed
- it cannot validate that the context used the value "correctly".

An example error message, taken from the unit test, looks like:

```
error while parsing arguments for call to "testing_getsetter": invalid argument at position 0: the path section "not-used" was not used by the context - this likely means you are using extra path sections
```

**Link to tracking Issue:**
Closes
open-telemetry#22744

**Testing:**
Added new unit tests

---------

Co-authored-by: Evan Bradley <[email protected]>
cparkins pushed a commit to AmadeusITGroup/opentelemetry-collector-contrib that referenced this issue Jan 10, 2024
**Description:** 
The desire to validate both path segments AND keys led to a bug where a
totally valid statement like

`replace_match(body["metadata"]["uid"], "*", "12345")` 

fails since only `metadata` is checked during parsing; `uid` is checked
during hot-path get/set of the value.

Failing such a statement is not the intention of
open-telemetry#22744
and it was incorrect to fail such a statement. In fact, I believe
validating the key's use in the context will be even more complex once
we introduce dynamic indexing.

For these reasons, this PR removes Key validation for now. If, in the
future, we want to re-add these validations, our Interfaces allow that.

**Link to tracking Issue:** 

open-telemetry#22744

open-telemetry#30051

**Testing:** <Describe what testing was performed and which tests were
added.>
Unit tests

Also we wouldve caught this issue earlier if we had an e2e test that did
complex indexing but unfortunately we did in the transform processor.
All the more reason to implement
open-telemetry#28642.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p2 Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants