-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webserver and yagna autostart #69
Conversation
if await self._check_if_yagna_is_running(): | ||
logger.info("Yagna service is already running") | ||
else: | ||
await self._run_yagna_service() | ||
await self._run_yagna_payment_fund() # FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yagna payment fund
is needed only when yagna is launched the first time - or - when it depletes testnet fund (be it ETH or tGLM)
on a regular restart, only yagna payment init
is currently required (and, actually, could be done without if we ported the mechanism currently used in yapapi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to payment init
.
ray_on_golem/client/client.py
Outdated
@@ -19,6 +25,9 @@ def __init__(self, base_url: URL) -> None: | |||
|
|||
self._session = requests.Session() | |||
|
|||
if not is_running_on_golem_network(): | |||
self._start_webserver() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering aloud... is it sensible to put such long-blocking operations into a constructor? my intuition suggests this should be a separate step...
plus, it sounds somewhat unintuitive in general to start a server when you instiantiate a client ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... It sounded better at first, but good point. Moving this code to node provider fixes the problem of ray logs while starting webserver.
ray_on_golem/client/client.py
Outdated
@@ -19,6 +25,9 @@ def __init__(self, base_url: URL) -> None: | |||
|
|||
self._session = requests.Session() | |||
|
|||
if not is_running_on_golem_network(): | |||
self._start_webserver() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a separate note - question - what if the webserver startup fails when it's required? wouldn't we want to raise an explicit exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_start_webserver()
already handles explicit exception for that case.
ray_on_golem/client/client.py
Outdated
start_new_session=True, | ||
) | ||
|
||
for _ in range(10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpicking but possibly such magic values as the number of retries and the interval between them should be extracted to module-level constants ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added configurable and time based waiting.
ray_on_golem/client/client.py
Outdated
@@ -192,3 +203,62 @@ def _make_request( | |||
raise RayOnGolemClientValidationError( | |||
"Couldn't validate response data", | |||
) from e | |||
|
|||
def _start_webserver(self) -> None: | |||
with cli_logger.group("Ray On Golem webserver"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the logging group should most likely be a constant defined somewhere...
ray_on_golem/client/client.py
Outdated
url=settings.URL_SELF_SHUTDOWN, | ||
request_data=models.SelfShutdownRequestData(), | ||
response_model=models.SelfShutdownResponseData, | ||
error_message="Couldn't send self shutdown request", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error_message="Couldn't send self shutdown request", | |
error_message="Couldn't send a self-shutdown request", |
ray_on_golem/client/client.py
Outdated
) | ||
|
||
if response.shutdown_state == ShutdownState.NOT_ENABLED: | ||
cli_logger.print("No need to stop webserver, as it was ran externally") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cli_logger.print("No need to stop webserver, as it was ran externally") | |
cli_logger.print("No need to stop the webserver, as it was started externally") |
ray_on_golem/client/client.py
Outdated
cli_logger.print("No need to stop webserver, as it was ran externally") | ||
return | ||
elif response.shutdown_state == ShutdownState.CLUSTER_NOT_EMPTY: | ||
cli_logger.print("No need to stop webserver, as cluster is not empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cli_logger.print("No need to stop webserver, as cluster is not empty") | |
cli_logger.print("No need to stop the webserver, as the cluster is not empty") |
ray_on_golem/client/client.py
Outdated
cli_logger.print("No need to stop webserver, as cluster is not empty") | ||
return | ||
|
||
cli_logger.print("Requesting webserver done, will stop soon") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this way?
cli_logger.print("Requesting webserver done, will stop soon") | |
cli_logger.print("Webserver shutdown request done, will stop soon") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a missing word, but I'm keeping symmetric logs text with line Requesting webserver shutdown...
to keep logs more readable.
ray_on_golem/client/client.py
Outdated
def _is_webserver_running(self) -> bool: | ||
try: | ||
response = self._session.get( | ||
str(self._base_url / URL_HEALTH_CHECK.lstrip("/")), timeout=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd define this timeout also as some constant on a module level...
(or, we could have them as parameters provided to the class's constructor)
@@ -14,6 +14,12 @@ class NodeState(Enum): | |||
stopping = "stopping" | |||
|
|||
|
|||
class ShutdownState(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not now, but if, in the future, we want to support various transitions of the server/cluster state, this might benefit from being a state machine...
ray_on_golem/server/run.py
Outdated
app = create_application(args.port, args.self_shutdown) | ||
web.run_app(app, port=args.port, print=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if we're already using the app
's setitem/getitem interface to store arbitrary additional properties, we could store the port
there as well ... and then, there would be just one place where we define the server's port...
app = create_application(args.port, args.self_shutdown) | |
web.run_app(app, port=args.port, print=None) | |
app = create_application(args.port, args.self_shutdown) | |
web.run_app(app, port=app["port"], print=None) |
plus, (not now, just a thought) -> if we're allowing the port
to be chosen, maybe we should also allow override of the host
argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, good point, added.
But regarding host
- I've never changed this from 0.0.0.0
to anything else in my whole career. Even more - every time where host was not 0.0.0.0
it causes some problems. Having host in that way is enough for my taste.
from ray_on_golem.exceptions import RayOnGolemError | ||
from ray_on_golem.server.settings import YAGNA_APPKEY | ||
from ray_on_golem.utils import run_subprocess | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
YAGNA_APPNAME = "ray-on-golem" | ||
YAGNA_API_URL = URL("http://127.0.0.1:7465") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we want to allow this to be overridden? by an environment variable perhaps?
|
||
async def _stop_yagna_service(self): | ||
if self._yagna_process is None: | ||
logger.info("No need to stop Yagna service, as it was ran externally") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.info("No need to stop Yagna service, as it was ran externally") | |
logger.info("No need to stop Yagna service, as it was started externally") |
ray_on_golem/server/views.py
Outdated
shutdown_state = ShutdownState.WILL_SHUTDOWN | ||
|
||
if shutdown_state == ShutdownState.WILL_SHUTDOWN: | ||
logger.info("Received self shutdown request, exiting in 10 seconds...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, a twice-hardcoded magic number...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added my comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it looks good now
What I've done:
yagna payment fund
, as its required each time after yagna restart, due to usage of goerli instead of rinkebyNotable remarks:
yagna payment fund
should be tested / changed while using mainnet, as the is no reason to force the user to wait to collect testnet fundsup
anddown
commands, hence we don't know the true intent. This arises in webserver stopping, node termination is not a last commands that is ran atdown
. I've assumed that 10s delay from