Remove requirement that new components be presented across multiple PRs #4630

Aneurysm9 · 2022-01-04T16:29:14Z

The current process for adding a new component is not working for many reviewers and approvers.

First PR ends up basically being a rubber stamp PR. The second PR tends to be massive anyways, and splitting out rubber-stamp type content doesn't help me that much, I can skim them and focus more on the code even within a single PR without the second PR feeling any more overwhelming

A PR with only boilerplate often stubs out empty methods for interface implementations and it can be hard to follow what is meant to be reviewed as final code and what is in-progress waiting for the second PR

GitHub will often assign a different reviewer for first and second PRs, making any context lost

Alternatives should be proposed and an update to the policy adopted.

jpkrohling · 2022-01-04T16:49:07Z

Some other arguments having a fully functional first PR for a new component are available at #1952

bogdandrutu · 2022-01-04T23:33:53Z

@jpkrohling @Aneurysm9 @anuraaga the logic of this for new "Components" (Processor/Receiver/etc.) is not to create "rubber stamps" PRs. Let's look at the suggestion to better understand, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/CONTRIBUTING.md#when-adding-a-new-component:

1:

* First PR should include the overall structure of the new component:
  * Readme, configuration, and factory implementation usually using the helper
    factory structs.
  * This PR is usually trivial to review, so the size limit does not apply to
    it.

This PR as you can read should add a "readme" and "configuration" which are the most important user interfaces. They should definitely not be "rubber stamp PR." and if anything I think this is the most important PR out of all. This PR should explain what is the purpose of the new component, and how users can use it. Also this PR is short enough, so if we cannot find sponsors, or the other maintainers have other ideas we don't feel bad throwing this away.

Also if from the PR is not clear enough what the component does, then the user facing documentation and configuration is not clear enough and we should fix that instead of looking at the code to understand what the component does.

2:

* Second PR should include the concrete implementation of the component. If the
  size of this PR is larger than the recommended size consider splitting it in
  multiple PRs.

This is all about internal implementation, and things that can easily be fixed in the future without breaking changes. Also nothing stops the reviewer to split this if massive. I lots of times ask for this to be split into more simpler/easier to review PRs that are focus on one "logic".

3:

* Last PR should enable the new component and add it to the `otelcontribcol`
  binary by updating the `components.go` file. The component must be enabled
  only after sufficient testing, and there is enough confidence in the
  stability and quality of the component.
* Once a new component has been added to the executable, please add the component 
  to the [OpenTelemetry.io registry](https://github.com/open-telemetry/opentelemetry.io#adding-a-project-to-the-opentelemetry-registry).

This happens once the implementation PR(s) are merged and usually touches multiple packages (different codeowners). So it is better to keep this as small as possible to not add too much work on the "global approvers" (term does not exists, this is usually the maintainers that need to handle it), and focus on minimum changes possible when having a PR across multiple packages.

anuraaga · 2022-01-05T02:43:45Z

This PR as you can read should add a "readme" and "configuration" which are the most important user interfaces. They should definitely not be "rubber stamp PR." and if anything I think this is the most important PR out of all.

My word rubber stamp was probably not correct - I agree that this is an important part of the component. But as the comment mentions, it is "usually trivial to review", meaning yes it needs to be looked through well but doesn't actually take anywhere near as much time / energy as reviewing the business logic. I've also seen many cases of the config looking fine on paper but when comparing with the logic, finding points to improve and change the config.

I just don't find the split to between 1/2 to be that beneficial as a reviewer - actually it's harmful for the reviewer of PR2 who has a good chance of not seeing PR1 and have absolutely no context for thousands of lines of code until they, essentially code review the contents of PR1 again to get caught up. We could set some policy that PR2 reviewer is always manually set to ensure it's the same as PR1, but I don't get much benefit from the split as a reviewer, at least personally speaking.

Also this PR is short enough, so if we cannot find sponsors, or the other maintainers have other ideas we don't feel bad throwing this away.

I don't think this is generally true as the contributor will almost always have written a lot of code. It's not common to feel comfortable beginning the contribution process without having verified the concept is actually sound through actually implementing it. Discussion on issue or a design doc, a more streamlined version of some steps I took with the transform processor, will be better for this gate, not PR1.

This happens once the implementation PR(s) are merged and usually touches multiple packages (different codeowners).

Separating out a PR to enable the component is fine with me, I think there are reasons other than reviewer / contributor experience for this. And a component with unit tests is usually enough to have confidence for the initial contribution before enabling it in the distributions.

bogdandrutu · 2022-01-05T02:55:58Z

I've also seen many cases of the config looking fine on paper but when comparing with the logic, finding points to improve and change the config.

This is a bad thing in my opinion, because from a user perspective they should not look at code, and if from the documentation and config documentation is not clear how to use this component then is not good.

There may be exception where you may be right that the initial author may overengineer the solution and config may change, but usually that is not very important, if it becomes significant I would argue for a simple PR to change the config that can be reviewed by more maintainers/approvers (see next point).

meaning yes it needs to be looked through well but doesn't actually take anywhere near as much time / energy as reviewing the business logic.

It is important that the first PR is reviewed by more maintainers/approvers to avoid duplication, ensure consistency, to understand overall direction, etc. This is the only way we can have more eyes on this and ensure that we call a concept the same across different configs, etc.

I just don't find the split to between 1/2 to be that beneficial as a reviewer - actually it's harmful for the reviewer of PR2 who has a good chance of not seeing PR1 and have absolutely no context for thousands of lines of code until they, essentially code review the contents of PR1 again to get caught up. We could set some policy that PR2 reviewer is always manually set to ensure it's the same as PR1, but I don't get much benefit from the split as a reviewer, at least personally speaking.

With the new "sponsor" role that we added, the Sponsor should review both PRs anyway, so I see no problem.

codeboten · 2022-01-05T16:52:10Z

As an approver, I can see the argument on both sides being valid.

The smaller PRs in general do tend to be easier to review for me even if a larger PR is broken up in different commits. I personally work through a PR, mark it reviewed and move on. With individual commits on a larger PR, I often find that it I will review some changes, then have my attention pulled somewhere else and end up losing track of just where in the PR I was, meaning I will spend additional time retracing my steps. As changes need to be made throughout the review process, it's also often easier to review the scope of the changes in smaller PRs. From this perspective I support the current workflow.

Including all parts of a new component in a single PR does provide the benefit of encompassing all the context required for said component. This means I can use the component in its entirety to validate the original requirement of the component has been fulfilled, rather than having to validate that the configuration works, and then wait until the next PR (or sometimes miss the next PR) to know whether the component works. From this perspective I support a change in the workflow.

The single PR for a component may also be somewhat easier from a contributor standpoint, where I don't have to know on which boundaries to break up my work to submit a new component. That being said, if it's easier to submit a new component, but that submission isn't reviewed, I would find it frustrating as a user.

Large new features do require significant amount of time dedicated from approvers though, and that is tricky with the limited number of approvers we currently have. We have 7 the the collector core and 10 for contrib, which seems pretty limited considering the scope of the components in the collector. I do think, based on experience not on hard data, that smaller PRs tend to get approvals more easily, as I've seen in other projects as well, where new features/components sit in the review queue for weeks (if not months) before anyone has enough time to dedicate to them.

Ultimately, there is no perfect solution here. I would suggest we discuss this at the SIG meeting, to gather additional feedback from the community.

@open-telemetry/collector-approvers @open-telemetry/collector-contrib-approvers

bogdandrutu · 2022-01-05T21:13:46Z

There was a discussion during the SIG meeting and we decided to:

Try a new proposal, create an issue template that will contain all the information necessary to decide about how the component will work, capabilities and usage examples.
Once the issue is approved we will not enforce the current split of the PRs for 1m, to see if that improves.

Aneurysm9 mentioned this issue Jan 4, 2022

Checklist for new components open-telemetry/opentelemetry-collector-contrib#6926

Closed

jpkrohling added discussion-needed Community discussion needed enhancement New feature or request labels Jan 4, 2022

Aneurysm9 mentioned this issue Jan 5, 2022

Transform expand provider to converter #4636

Merged

codeboten mentioned this issue Jan 6, 2022

[documentation] template for new component issue open-telemetry/opentelemetry-collector-contrib#7058

Merged

github-actions bot added the Stale label Jan 7, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove requirement that new components be presented across multiple PRs #4630

Remove requirement that new components be presented across multiple PRs #4630

Aneurysm9 commented Jan 4, 2022

jpkrohling commented Jan 4, 2022 •

edited

Loading

bogdandrutu commented Jan 4, 2022 •

edited

Loading

anuraaga commented Jan 5, 2022

bogdandrutu commented Jan 5, 2022

codeboten commented Jan 5, 2022

bogdandrutu commented Jan 5, 2022

Remove requirement that new components be presented across multiple PRs #4630

Remove requirement that new components be presented across multiple PRs #4630

Comments

Aneurysm9 commented Jan 4, 2022

jpkrohling commented Jan 4, 2022 • edited Loading

bogdandrutu commented Jan 4, 2022 • edited Loading

anuraaga commented Jan 5, 2022

bogdandrutu commented Jan 5, 2022

codeboten commented Jan 5, 2022

bogdandrutu commented Jan 5, 2022

jpkrohling commented Jan 4, 2022 •

edited

Loading

bogdandrutu commented Jan 4, 2022 •

edited

Loading