-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requestor doesn't find available provider #427
Comments
Hints from @johny-b for reproducing this issue: try to create a number of clusters (of any service) larger than the number of available providers |
@johny-b @azawlocki should this be blocking the beta.2 release? |
@mateuszsrebrny |
@azawlocki
|
@johny-b I've created a test requestor script and run it on a two-provider The script tries to create three clusters, each with a single instance. 10s after two clusters are started, one cluster is stopped. With
With
The difference is that with With Perhaps this is something the core team should look at? @tworec, what do you think? Here's the test script: #!/usr/bin/env python3
import asyncio
from datetime import datetime, timedelta
from yapapi import windows_event_loop_fix,
from yapapi import Golem
from yapapi.services import Service
from yapapi.log import enable_default_logger, log_summary, log_event_repr, pluralize
from yapapi.payload import vm
col_green = "\033[32;1m"
col_cyan = "\033[36;1m"
col_yellow = "\033[33;1m"
col_magenta = "\033[35;1m"
col_default = "\033[0m"
def cluster_color(cluster_num):
return [col_green, col_cyan, col_yellow][cluster_num % 3]
class SimpleService(Service):
next_num: int = 1
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.num = SimpleService.next_num
self.color = cluster_color(self.num)
SimpleService.next_num += 1
@staticmethod
async def get_payload():
return await vm.repo(
image_hash="8b11df59f84358d47fc6776d0bb7290b0054c15ded2d6f54cf634488",
min_mem_gib=0.5,
min_storage_gib=2.0,
)
async def start(self):
print(f"{self.color}starting...{col_default}")
self._ctx.run("/bin/echo", "START")
await (yield self._ctx.commit())
async def run(self):
print(f"{self.color}running...{col_default}")
import itertools
for n in itertools.count(1):
await asyncio.sleep(3)
self._ctx.run("/bin/echo", "RUN", str(n))
await (yield self._ctx.commit())
async def shutdown(self):
print(f"{self.color}shutting down...{col_default}")
await asyncio.sleep(1)
self._ctx.run("/bin/echo", "SHUTDOWN")
await (yield self._ctx.commit())
async def main(subnet_tag):
async with Golem(budget=1.0, subnet_tag=subnet_tag) as golem:
clusters = [
await golem.run_service(SimpleService),
await golem.run_service(SimpleService),
await golem.run_service(SimpleService),
]
two_clusters_started = None
three_clusters_started = None
one_cluster_stopped = False
while True:
await asyncio.sleep(1)
status = ""
for n, cluster in enumerate(clusters):
color = cluster_color(n+1)
status += f"{color}cluster {n+1}: "
if cluster.instances:
status += ", ".join(
f"{s.state.value} on {s.provider_name}" for s in cluster.instances
)
else:
status += "no instances"
status += col_default + "; "
print(status)
have_instances = len([c for c in clusters if c.instances])
if have_instances == 2 and not two_clusters_started:
two_clusters_started = datetime.now()
elif have_instances == 3 and not three_clusters_started:
three_clusters_started = datetime.now()
if (
two_clusters_started and not one_cluster_stopped and
datetime.now() - two_clusters_started > timedelta(seconds=10)
):
print("Stopping one cluster...")
[c for c in clusters if c.instances][0].stop()
one_cluster_stopped = True
if (
three_clusters_started and
datetime.now() - three_clusters_started > timedelta(seconds=10)
):
break
print("Stopping all clusters...")
for c in clusters:
c.stop()
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"--subnet", default="devnet-beta.2", help="Subnet name; default: %(default)s"
)
args = parser.parse_args()
now = datetime.now().strftime("%Y-%m-%d_%H.%M.%S")
log_file=f"simple-service-yapapi-{now}.log"
# This is only required when running on Windows with Python prior to 3.8:
windows_event_loop_fix()
enable_default_logger(
log_file=f"simple-service-yapapi-{now}.log",
debug_activity_api=True,
debug_market_api=True,
debug_payment_api=True,
)
loop = asyncio.get_event_loop()
task = loop.create_task(main(subnet_tag=args.subnet))
try:
loop.run_until_complete(task)
except KeyboardInterrupt:
print(
f"{col_yellow}"
"Shutting down gracefully, please wait a short while "
"or press Ctrl+C to exit immediately..."
f"{col_default}"
)
task.cancel()
try:
loop.run_until_complete(task)
print(
f"{col_yellow}Shutdown completed, thank you for waiting!{col_default}"
)
except (asyncio.CancelledError, KeyboardInterrupt):
pass |
@azawlocki |
ok, closing then 👯 |
I made a demo on the today call. I will attach the logs.
Tell me if you need to run it on your own, I will prepare the code.
I'm pretty sure that at least once I've seen it starting, so maybe it is not "doesn't start" but "it takes veeery long for it to start".
The text was updated successfully, but these errors were encountered: