Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite Cronjob build triggering logic in python; read from influxdb #275

Merged
merged 52 commits into from
Sep 8, 2024

Conversation

alsuren
Copy link
Collaborator

@alsuren alsuren commented Sep 6, 2024

  • rewrite a bunch of the bash build scripts in python
    • dependabot (completely untested)
    • automatic installation of deps when you run relevant make rules
    • delete old bash scripts
  • switch to reading from influxdb cloud directly, instead of redis via the webserver. See Rust-based stats server on fly.io #165
    • rebase Rust-based stats server on fly.io #165 and address comments
    • you now need INFLUXDB_TOKEN to read these stats.
      • I have added one to our github secrets (with read-only creds)
      • I need to put everyone in the influx org (different one from the one I already invited everyone too: sorry)
      • I can either make a new public read-only token and put it in the repo (can revoke it if someone eats our rate limits or something) or add a /stats endpoint on the server like the old one had? - you can use p1YOOpyPWwrcJ5wqeBqpsIVPGMiKDHg_PUkUxFdwl9lW6W5jHLdxL67Fsz4STm1yDd8pwH42YHuX7ZxjRkOu4w== for now. I will think about revoking it later.
  • not sure where my print() statements output is disappearing to in the github actions output :(

cronjob_scripts/stats.py Outdated Show resolved Hide resolved
cronjob_scripts/get_latest_version.py Outdated Show resolved Hide resolved
cronjob_scripts/get_latest_version.py Outdated Show resolved Hide resolved
cronjob_scripts/trigger-package-build.py Outdated Show resolved Hide resolved
cronjob_scripts/trigger-package-build.py Show resolved Hide resolved
cronjob_scripts/trigger-package-build.py Outdated Show resolved Hide resolved
if not version:
return None

url = f"{repo_url}/releases/download/{crate}-{version['vers']}/{crate}-{version['vers']}-{target_arch}.tar.gz"
Copy link
Member

@NobodyXu NobodyXu Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a python library for it would be more readable https://pypi.org/project/PyGithub/

We could use the github api to get all artifacts and cache it.

For the rate-limit problem, I think we can provide a read-only GitHub token via secrets, to the github api.

While secrets.GITHUB_TOKEN would work, it

  • has a rate limit of 1000 requests per hour
  • has too many permissions within the repo

If we create a read-only token using a robot account (which I've setup), we will have:

  • 5000 requests per hour
  • has read-only permission and cannot access any private info

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alsuren if you think this is too much work for this PR, we could leave this to a follow-up PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main goals in this PR are to stop using the old stats server and to delete 400 lines of bash, so I am going to leave it like this.

I also quite like the symmetry of using exactly the same url that the quickinstall client uses. The old code even used the same http client (curl) but I decided that the connection pooling from requests was too much of a win.

We could use the github api to get all artifacts and cache it.

I'm a bit wary of using a "do a lot of work and rely on caches to save us" approach. I've so far been taking the approach of "do as little work as possible and still make forward progress". I realise that there is a cutover though. As we check more crates every hour, we will want to have some kind of cached pre-filter to reduce the number of http requests we need to make. We would need to be wary of stale caches as well: if the cache says the package is there then we can trust that, but if the cache says it's not there, we need to re-check to make sure we didn't build it in the last hour.

In another unrelated note: I think it would be really interesting to gather stats around our build processes as well as our client requests. This would allow us to make better decisions when we make tweaks to the scheduling algorithm. I'm quite annoyed that the new influxdb cloud org doesn't have good notebook/dashboards. I really don't want to have to plug it into grafana just to share some graphs.

cronjob_scripts/trigger-package-build.py Outdated Show resolved Hide resolved
cronjob_scripts/architectures.py Outdated Show resolved Hide resolved
cronjob_scripts/architectures.py Outdated Show resolved Hide resolved
@NobodyXu
Copy link
Member

NobodyXu commented Sep 8, 2024

Thank you, it LGTM, let's merge this in and switch to influx!

i will update the cargo-binstall to use the influx API afterwards and cuts a new release

and we also need to update cargo-quickinstall as well.

@alsuren
Copy link
Collaborator Author

alsuren commented Sep 8, 2024

We also need to get #165 merged, since that's what's currently powering the influx stuff (note that the vercel stats server is already forwarding stats to the new server, so as long as we get this done in the next month, it should't make much difference).

@alsuren alsuren merged commit 0626258 into main Sep 8, 2024
33 checks passed
@alsuren alsuren deleted the cronjob-in-python branch September 8, 2024 09:58
@NobodyXu
Copy link
Member

NobodyXu commented Sep 8, 2024

not sure where my print() statements output is disappearing to in the github actions output :(

For output, we could use grouped output which is more readable

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#grouping-log-lines

so we can have a

@contextmanager
def group_output(title):
    print("::group::{title}")
    yield
    print("::endgroup::")

with group_output("Collecting popular crates"):
    print(...)

will then looks like image

EliahKagan added a commit to EliahKagan/cargo-quickinstall that referenced this pull request Nov 28, 2024
This points the "supported-targets" link in the readme back to the
`supported-targets` file, which it had originally linked to and
which its text suggests it is still intended to link to.

After cargo-bins#275, the readme link to the supported targets list pointed
to the `trigger-package-build.py` script. This appears to have been
based on idea of removing the `supported-targets` file. As noted in
https://github.com/cargo-bins/cargo-quickinstall/pull/275/files#r1747798959,
the file was not removed. But the link target was not restored.

As long as the file does exist, it is easier for users reading the
readme to consult it rather than examine a script. In addition, the
link was fully broken since cargo-bins#300 when the script was rewritten and
named `trigger_package_build.py` (with underscores).
NobodyXu pushed a commit that referenced this pull request Nov 28, 2024
This points the "supported-targets" link in the readme back to the
`supported-targets` file, which it had originally linked to and
which its text suggests it is still intended to link to.

After #275, the readme link to the supported targets list pointed
to the `trigger-package-build.py` script. This appears to have been
based on idea of removing the `supported-targets` file. As noted in
https://github.com/cargo-bins/cargo-quickinstall/pull/275/files#r1747798959,
the file was not removed. But the link target was not restored.

As long as the file does exist, it is easier for users reading the
readme to consult it rather than examine a script. In addition, the
link was fully broken since #300 when the script was rewritten and
named `trigger_package_build.py` (with underscores).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants