Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get and expose npm + GitHub data for dependency packages #2827

Closed
humphd opened this issue Feb 4, 2022 · 2 comments · Fixed by #3295
Closed

Get and expose npm + GitHub data for dependency packages #2827

humphd opened this issue Feb 4, 2022 · 2 comments · Fixed by #3295
Assignees
Labels
area: dependency visualization Shows what dependencies are used in Telescope area: redis Redis Database related type: enhancement New feature or request
Milestone

Comments

@humphd
Copy link
Contributor

humphd commented Feb 4, 2022

In #2797 we are adding a dependency service, and it currently only includes package names. Let's add a way to expose both npm and GitHub data for a package.

For npm, we can query the npmjs registry programmatically. For example, you can get the package info for React using https://registry.npmjs.org/react. This will return everything we need in JSON format.

Here are some docs:

The npm data for a package will also give us GitHub URLs that we can use to query the GitHub API via Octokit.

I don't know if there is any point in doing this for every package, or only on-demand when requested. Maybe at first, we can expose this data via a REST endpoint in the dependency service: GET /projects/:name.

We should cache this data in Redis (available via Satellite), and then we can re-use it for future requests for the same data.

Down the road, once this working, we can decide how to use this data (e.g., find issues to work on).

@humphd humphd added type: enhancement New feature or request area: dependency visualization Shows what dependencies are used in Telescope labels Feb 4, 2022
@humphd humphd added the area: redis Redis Database related label Feb 4, 2022
@JerryHue JerryHue self-assigned this Feb 4, 2022
@JerryHue JerryHue added this to the 2.7 Release milestone Feb 4, 2022
@JerryHue
Copy link
Contributor

JerryHue commented Feb 4, 2022

I think we can go with a mix of both.

GitHub data is going to be very dynamic. There are some projects that constantly get new issues, as well as close some issues, and so on.

On the other hand, the npm data is not that dynamic. Unless there are some extreme circumstances (the package gets completely removed from npm), the npm package metadata is going to be the same for a long time.

So I would plan it out like this:

During initialization stage, we get the package names from deps.txt, pull the data from the npm on mass (all async to not block the service). Hook some callbacks to pull npm data or GitHub data (usually on a request for a package).

These callbacks will get in charge of fetching the data and cache it, as well as make sure we are not requesting too many times to npm or GitHub (gonna have to look deeper into the GitHub API if we can do something like this).

@JerryHue JerryHue modified the milestones: 2.7 Release, 2.8 Release Feb 4, 2022
@humphd
Copy link
Contributor Author

humphd commented Feb 4, 2022

I agree, the data is both stable and dynamic, and our fetching and data backend can reflect that.

NOTE: you're describing something very similar to how our RSS feed parser works. We have a queue in Redis managed via JS with BullMQ. It lets us add jobs, and process them over a long period of time using worker processes.

You could easily add another queue to process dependencies/github projects. We'd enqueue the project, and over time, process (or re-process) it and cache or store the data somehow.

Some of this could be done in the parser service itself, and you could consume the results in your dependency service.

cc'ing @TueeNguyen, who is working on the parser service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dependency visualization Shows what dependencies are used in Telescope area: redis Redis Database related type: enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants