Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ray-on-golem stats command #106

Merged
merged 11 commits into from
Nov 9, 2023
Merged

Add ray-on-golem stats command #106

merged 11 commits into from
Nov 9, 2023

Conversation

lucekdudek
Copy link
Contributor

@lucekdudek lucekdudek commented Nov 7, 2023

  • ray-on-golem now has command positional argument. Old behaviour is moved to ray-on-golem webserver
$ ray-on-golem --help
usage: ray-on-golem [-h] {webserver,network-stats} ...

Ray on Golem's webserver.

positional arguments:
 {webserver,network-stats}

options:
 -h, --help            show this help message and exit
$ ray-on-golem webserver --help
usage: ray-on-golem webserver [-h] [-p PORT] [--self-shutdown] [--registry-stats] [--no-registry-stats] [--no-self-shutdown]

options:
 -h, --help            show this help message and exit
 -p PORT, --port PORT  port for Ray on Golem's webserver to listen on, default: 4578
 --self-shutdown       flag to enable self-shutdown after last node termination, default: False
 --registry-stats      flag to enable collection of Golem Registry stats about resolved images, default: True
 --no-registry-stats
 --no-self-shutdown
  • new command ray-on-golem network-stats
$ ray-on-golem network-stats --help
usage: ray-on-golem network-stats [-h] [-d DURATION] [-log] CLUSTER_CONFIG_FILE

positional arguments:
 CLUSTER_CONFIG_FILE   Cluster config yaml

options:
 -h, --help            show this help message and exit
 -d DURATION, --duration DURATION
                       For how long in minutes to gather stats, default: 5
 -log, --enable-logging
                       flag to enable logging, default: False
$ ray-on-golem network-stats golem-cluster.dev.yaml
Gathering stats data for 5 minutes...
Gathering stats data done

Proposals count:
Initial: 81
Not blacklisted: 81
Passed Reject if max_average_usage_cost exceeds 1.5: 81
Passed Reject if price_initial exceeds 0.5: 81
Passed Reject if price_cpu_sec exceeds 0.0005: 81
Passed Reject if price_duration_sec exceeds 0.0005: 81
Scored: 81
Negotiated: 13

Negotiation errors:
Failed to send proposal response! Request timed out: 45
Failed to send proposal response! 500: {"message":"Failed to send response for Proposal [***]. Error: Countering Proposal [***] GSB error: GSB failure: Net: error forwarding message: Failed to find Node on relay server: Request timed out after 3000 ms.."}: 14
Failed to send proposal response! 500: {"message":"Failed to send response for Proposal [***]. Error: Countering Proposal [***] GSB error: GSB failure: Net: error forwarding message: Socket closed.."}: 5
Failed to send proposal response! 500: {"message":"Failed to send response for Proposal [***]. Error: Countering Proposal [***] GSB error: GSB failure: Net: error forwarding message: Connection timed out.."}: 3
Outbound rejected because: Everyone rule is disabled ; Audited-Payload rule requires manifest signature ; Partner rule requires node descriptor ; : 1
...

@lucekdudek lucekdudek self-assigned this Nov 7, 2023
@lucekdudek lucekdudek requested a review from approxit November 7, 2023 11:54
@mateuszsrebrny
Copy link
Contributor

I would change RUN_TIME to DURATION, very similar, but RUNTIME is looking out of context here.
Also stats seems too general - I would change it to network-stats

I still struggle with the name webserver ;) but this is for different discussion, I would use something like service instead, and call the logs ray-on-golem.log & ray-on-golem-debug.log`

@mateuszsrebrny
Copy link
Contributor

Oh, and I would change #1... #4 to names of properties from yaml that are taken into account there - so it is easier to translate into yaml tweaks

@lucekdudek
Copy link
Contributor Author

I would change RUN_TIME to DURATION, very similar, but RUNTIME is looking out of context here. Also stats seems too general - I would change it to network-stats

I still struggle with the name webserver ;) but this is for different discussion, I would use something like service instead, and call the logs ray-on-golem.log & ray-on-golem-debug.log`

Oh, and I would change #1... #4 to names of properties from yaml that are taken into account there - so it is easier to translate into yaml tweaks

Done, see description @mateuszsrebrny

Copy link
Contributor

@shadeofblue shadeofblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally okay, as far as I could tell, please see my comments though...

but the most general observation is that it seems the managers' stack usage requires a considerable amount of boilerplate code ... and the code seems quite complex for a solution that, by design, was supposed to make such constructions simple and straighforward...

@mateuszsrebrny
Copy link
Contributor

maybe we could enhance Gathering stats data... with the duration?
Gathering stats data for xx minutes... 🤔 ?

Copy link
Contributor

@approxit approxit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but updated __init__s would be a nice tweak.

Should we make network_stats service separate from golem service?

@approxit approxit force-pushed the ld/golem-network-stats branch from 0aedaac to 13448bc Compare November 9, 2023 12:53
ray_on_golem/network_stats/main.py Outdated Show resolved Hide resolved
ray_on_golem/network_stats/main.py Show resolved Hide resolved
ray_on_golem/network_stats/main.py Show resolved Hide resolved
ray_on_golem/server/main.py Outdated Show resolved Hide resolved
ray_on_golem/server/main.py Outdated Show resolved Hide resolved
ray_on_golem/server/main.py Outdated Show resolved Hide resolved
ray_on_golem/server/main.py Outdated Show resolved Hide resolved
ray_on_golem/server/main.py Show resolved Hide resolved
@approxit approxit merged commit 81d8445 into develop Nov 9, 2023
1 check passed
@approxit approxit deleted the ld/golem-network-stats branch November 9, 2023 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants