rewrite Cronjob build triggering logic in python; read from influxdb #275

alsuren · 2024-09-06T14:48:40Z

cronjob_scripts/stats.py

cronjob_scripts/get_latest_version.py

cronjob_scripts/trigger-package-build.py

NobodyXu · 2024-09-08T01:42:56Z

cronjob_scripts/trigger-package-build.py

+    if not version:
+        return None
+
+    url = f"{repo_url}/releases/download/{crate}-{version['vers']}/{crate}-{version['vers']}-{target_arch}.tar.gz"


I think using a python library for it would be more readable https://pypi.org/project/PyGithub/

We could use the github api to get all artifacts and cache it.

For the rate-limit problem, I think we can provide a read-only GitHub token via secrets, to the github api.

While secrets.GITHUB_TOKEN would work, it

has a rate limit of 1000 requests per hour

has too many permissions within the repo

If we create a read-only token using a robot account (which I've setup), we will have:

5000 requests per hour

has read-only permission and cannot access any private info

@alsuren if you think this is too much work for this PR, we could leave this to a follow-up PR.

My main goals in this PR are to stop using the old stats server and to delete 400 lines of bash, so I am going to leave it like this.

I also quite like the symmetry of using exactly the same url that the quickinstall client uses. The old code even used the same http client (curl) but I decided that the connection pooling from requests was too much of a win.

We could use the github api to get all artifacts and cache it.

I'm a bit wary of using a "do a lot of work and rely on caches to save us" approach. I've so far been taking the approach of "do as little work as possible and still make forward progress". I realise that there is a cutover though. As we check more crates every hour, we will want to have some kind of cached pre-filter to reduce the number of http requests we need to make. We would need to be wary of stale caches as well: if the cache says the package is there then we can trust that, but if the cache says it's not there, we need to re-check to make sure we didn't build it in the last hour.

In another unrelated note: I think it would be really interesting to gather stats around our build processes as well as our client requests. This would allow us to make better decisions when we make tweaks to the scheduling algorithm. I'm quite annoyed that the new influxdb cloud org doesn't have good notebook/dashboards. I really don't want to have to plug it into grafana just to share some graphs.

cronjob_scripts/trigger-package-build.py

cronjob_scripts/architectures.py

NobodyXu · 2024-09-08T09:48:06Z

Thank you, it LGTM, let's merge this in and switch to influx!

i will update the cargo-binstall to use the influx API afterwards and cuts a new release

and we also need to update cargo-quickinstall as well.

alsuren · 2024-09-08T09:58:44Z

We also need to get #165 merged, since that's what's currently powering the influx stuff (note that the vercel stats server is already forwarding stats to the new server, so as long as we get this done in the next month, it should't make much difference).

NobodyXu · 2024-09-08T10:06:57Z

not sure where my print() statements output is disappearing to in the github actions output :(

For output, we could use grouped output which is more readable

https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#grouping-log-lines

so we can have a

@contextmanager
def group_output(title):
    print("::group::{title}")
    yield
    print("::endgroup::")

with group_output("Collecting popular crates"):
    print(...)

will then looks like

This points the "supported-targets" link in the readme back to the `supported-targets` file, which it had originally linked to and which its text suggests it is still intended to link to. After cargo-bins#275, the readme link to the supported targets list pointed to the `trigger-package-build.py` script. This appears to have been based on idea of removing the `supported-targets` file. As noted in https://github.com/cargo-bins/cargo-quickinstall/pull/275/files#r1747798959, the file was not removed. But the link target was not restored. As long as the file does exist, it is easier for users reading the readme to consult it rather than examine a script. In addition, the link was fully broken since cargo-bins#300 when the script was rewritten and named `trigger_package_build.py` (with underscores).

This points the "supported-targets" link in the readme back to the `supported-targets` file, which it had originally linked to and which its text suggests it is still intended to link to. After #275, the readme link to the supported targets list pointed to the `trigger-package-build.py` script. This appears to have been based on idea of removing the `supported-targets` file. As noted in https://github.com/cargo-bins/cargo-quickinstall/pull/275/files#r1747798959, the file was not removed. But the link target was not restored. As long as the file does exist, it is easier for users reading the readme to consult it rather than examine a script. In addition, the link was fully broken since #300 when the script was rewritten and named `trigger_package_build.py` (with underscores).

alsuren added 30 commits September 4, 2024 17:05

uv init scripts

daaa305

declare my intentions

e705e8a

in scripts uv add influxdb3-python

4e9392c

initial stats from influxdb

268ff5d

point out where arch could be filtered

da4c769

rewrite triggering logic in python

05b7a29

make recheck

86075e2

exclude prereleases

0d519a9

automatically install deps

def5be7

what does github actions use for its python version?

142c81d

github actions uses python 3.8

35f3dd9

don't evaluate annotations at runtime

ad24c0e

document make rule

ab775b8

make trigger-all; make sure env is correct

22a87f4

dependabot for python?

f66add3

NameError: name 'GITHUB_REPOSITORY' is not defined

83ef9f3

stricter bash scripting

c656083

logging; refactors; run on actions branch?

11b9aa2

/bin/sh is dash on ubuntu

5e34dc3

fix typo

890d9e7

better description

1355176

why is stderr going missing?

a0541f8

cancel in progress

5906835

delete old bash implementation and unused scripts

96512fc

rename to cronjob_scripts/

94b2383

I'm not going to fight against the autoformatter

c940f43

turns out we use this elsewhere

6c2ee1c

more fixes after rename

5fb536c

just print to stdout?

481980b

don't need this either anymore

171de24

alsuren added 7 commits September 7, 2024 10:48

code review suggestion

f3a94f8

reinstate supported-targets; unit test to make sure it's correct

9df16f7

we're using 3.8 in CI

530c426

assert nonempty crate name

22a3dcf

fix current arch detection code

cb27dfd

nits

3fbb088

note about crates.io db dump

6010852

alsuren requested a review from NobodyXu September 7, 2024 10:49

alsuren added 3 commits September 7, 2024 11:54

only run python tests on linux

de55bbb

actually don't put python tests in the matrix in the first place

80acbf7

apparently the name is important

afecc2c

NobodyXu requested changes Sep 7, 2024

View reviewed changes

alsuren added 3 commits September 7, 2024 16:38

cache influxdb client between invocations

1858110

bit of loop fusion in get_latest_version()

30bda7d

another walrus suggestion

62bd530

alsuren mentioned this pull request Sep 7, 2024

clean up files in the trigger/* branches #276

Open

stop with the blanket exception hiding

14f912b

NobodyXu requested changes Sep 8, 2024

View reviewed changes

alsuren added 3 commits September 8, 2024 09:22

another try-catch-throw turned into an assert

745284c

typo

52fe242

don't ask crates.io more than once about the same crate

eaadb22

NobodyXu approved these changes Sep 8, 2024

View reviewed changes

alsuren merged commit 0626258 into main Sep 8, 2024
33 checks passed

alsuren deleted the cronjob-in-python branch September 8, 2024 09:58

EliahKagan mentioned this pull request Nov 28, 2024

Fix link to list of supported targets #327

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrite Cronjob build triggering logic in python; read from influxdb #275

rewrite Cronjob build triggering logic in python; read from influxdb #275

alsuren commented Sep 6, 2024 •

edited by NobodyXu

Loading

NobodyXu Sep 8, 2024 •

edited

Loading

NobodyXu Sep 8, 2024

alsuren Sep 8, 2024

NobodyXu commented Sep 8, 2024 •

edited

Loading

alsuren commented Sep 8, 2024

NobodyXu commented Sep 8, 2024

rewrite Cronjob build triggering logic in python; read from influxdb #275

rewrite Cronjob build triggering logic in python; read from influxdb #275

Conversation

alsuren commented Sep 6, 2024 • edited by NobodyXu Loading

NobodyXu Sep 8, 2024 • edited Loading

Choose a reason for hiding this comment

NobodyXu Sep 8, 2024

Choose a reason for hiding this comment

alsuren Sep 8, 2024

Choose a reason for hiding this comment

NobodyXu commented Sep 8, 2024 • edited Loading

alsuren commented Sep 8, 2024

NobodyXu commented Sep 8, 2024

alsuren commented Sep 6, 2024 •

edited by NobodyXu

Loading

NobodyXu Sep 8, 2024 •

edited

Loading

NobodyXu commented Sep 8, 2024 •

edited

Loading