Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-updating the registry #1855

Closed
svrnm opened this issue Oct 12, 2022 · 8 comments
Closed

Auto-updating the registry #1855

svrnm opened this issue Oct 12, 2022 · 8 comments
Labels
question Further information is requested

Comments

@svrnm
Copy link
Member

svrnm commented Oct 12, 2022

Related: #1852 & 1844

I am wondering if we can find a way to create registry entries for the non-third-party-plugins & keep them updated semi-automatically?

@svrnm svrnm added the question Further information is requested label Oct 12, 2022
@cartermp
Copy link
Contributor

Only thing that comes to mind would be that each relevant repo has some entry that we pull in (via submodules??) and they're on the hook for maintaining it. I don't know if that would be any better than what we do today.

@svrnm
Copy link
Member Author

svrnm commented Oct 13, 2022

We could do some github workflows for scrapping them? Not ideal, but maybe better than yet another submodule?

@chalin
Copy link
Contributor

chalin commented Oct 13, 2022

/cc @tedsuo and @austinlparker since they were present when we discussed this not too long ago.

@svrnm
Copy link
Member Author

svrnm commented Oct 13, 2022

/cc @tedsuo and @austinlparker since they were present when we discussed this not too long ago.

new maintainers, same ideas 🤣

@austinlparker
Copy link
Member

Not that I'm opposed to this but for context this has been a recurring idea for almost five years now (it was suggested for the OpenTracing registry, which was the precursor to the OpenTelemetry one).

I don't want to prejudice y'all against anything in particular so let me summarize the major discussion points that have come up:

  • Any sort of automated system would require three things, at minimum: a defined schema for registry entries, a store for registry state, and some automated process for updating that store. We currently do have a schema (the registry yaml files), a store (github), but no automated process.
  • The problem comes when you start making tradeoffs about these things. If we distributed the store (i.e., add the metadata files to the repos where the content originates) then we now need to keep a different sort of state (links to those repos) and the ability to crawl them or some way to programatically identify where the metadata is located (perhaps through github tags). This is certainly doable, but you pretty quickly get away from something that can be done without investing some amount of computing power.
  • If we wanted to change the store away from files to, say, a database it would open up options in terms of automatically populating the schema (i.e., looking at other artifact metadata and populating it based off that) but would still have a burden in terms of coordination (how do we know where to look) and because OTel is polyglot, we'd need to have support for multiple artifact metadata services (maven, crates, go packages, npm, etc.) and we'd need to migrate the existing schema off to this new thing
  • Ultimately, what's the actual benefit of making those aforementioned tradeoffs? Do we think that it would actually improve discoverability or correctness, since we'd still be relying on maintainers to make this data available and add in whatever tags we need to populate things we can't infer (i.e., is this an instrumentation thing, an extension, a distro, whatever).

@svrnm
Copy link
Member Author

svrnm commented Oct 13, 2022

Ok, maybe "automatic" was a little bit too much, I understand why this is not feasible.

But let's call this "semi-automatic" for now, i.e. by using a github workflow, and here's a few specific examples:

  • A runner crawls the content of https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/{receiver,exporter,processor} and checks if any of those components is either missing or has been removed. Then it just raises an issue (or draft PR?) with the basic details
  • Same for folders like https://github.com/open-telemetry/opentelemetry-ruby-contrib/tree/main/instrumentation/, it crawls the list of items, checks if some where deleted or some where added and then raises an issue for it.

It's far from perfect, but it might give us some quick updates on the things from the different SIGs.

Of course, this script could also just live in ./scripts and one of us runs it from time to time manually.

@svrnm
Copy link
Member Author

svrnm commented Oct 25, 2022

Here's a script I created to get the 46 entries generated for #1909:

https://github.com/svrnm/opentelemetry.io/blob/scan-for-registry/scripts/registry-scanner/index.mjs

I will reuse it for processors & exporters alike and then try to see if I can make it fly for some language specific implementations.

For the receivers it worked 95% automatic, I needed to touch a few files because they had issues, but overall it took me only a few minutes to get the PR ready.

@svrnm
Copy link
Member Author

svrnm commented Nov 14, 2022

This is kind of working now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants