-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor everserver #9367
base: main
Are you sure you want to change the base?
Refactor everserver #9367
Conversation
1f0ef43
to
080ce06
Compare
eed62a1
to
bc30caf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Rebasing will probably cause some issues now...
from everest.util import makedirs_if_needed | ||
|
||
|
||
def _get_machine_name() -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a get_machine_name
in ERT already, can we use that?
In general, there might be other port/authentication related stuff in ERT. Maybe there should be a utils file for that somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will replace with the one in ert_shared
"running": status.running, | ||
"waiting": status.waiting, | ||
"pending": status.pending, | ||
"complete": status.complete, | ||
"failed": status.failed, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is going to change slightly in PR #9342, somebody will have to adapt
uvicorn.run( | ||
self.app, | ||
host="0.0.0.0", | ||
port=self.port, | ||
ssl_keyfile=self.key_path, | ||
ssl_certfile=self.cert_path, | ||
ssl_version=ssl.PROTOCOL_SSLv23, | ||
ssl_keyfile_password=self.key_pw, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recently a log_level=logging.CRITICAL
was added to that call, otherwise we got log messages in the console. If there is a better solution that would be fine, otherwise maybe add it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add it here
bc30caf
to
cc87699
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9367 +/- ##
==========================================
+ Coverage 91.83% 91.86% +0.03%
==========================================
Files 434 434
Lines 26735 26815 +80
==========================================
+ Hits 24551 24633 +82
+ Misses 2184 2182 -2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
response = requests.post( | ||
url + "/" + START_ENDPOINT, verify=cert, auth=auth, proxies=PROXY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider starting the experiment here:
ert/src/everest/bin/everest_script.py
Line 132 in 6c86e6a
print("Everest server found!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between everserver.py
and everest_script.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like:
@app.post("/experiments/")
async def submit_experiment(experiment: EverestConfig, ...):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the advantage of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We force more separation by starting the experiment through an api instead of relying on files on disk. We would also like to extend this concept to ert, and having a clear separation helps then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between everserver.py and everest_script.py ?
everserver.py
is the actual server, while everest_script
the client which starts the server and queries for status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the current client does:
submit_server_to_cluster(path_to_config_file)
wait_for_server
monitor
think in the future we would like:
connect_to_already_running_server
requests.post(server_url, json=everest_config)
monitor
Then we are one step closer to having the experiment server, and more clear separation between the frontend and the backend.
8690547
to
43eedff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I was wondering a bit about the structure? Have asked some questions below.
@@ -308,26 +171,57 @@ def main(): | |||
return | |||
|
|||
try: | |||
# wait until the api server is running | |||
is_running = False | |||
while not is_running: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need some timeouts here? We should be doing something of the same for dark_storage
i guess? Also starting a server there, would it be possible to alight this and that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is work for another PR if we want to start evereserver as we do the storage server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But here we can throw an exception after a few retries
class ExperimentRunner(threading.Thread): | ||
def __init__(self, everest_config, state: dict): | ||
super().__init__() | ||
|
||
self.everest_config = everest_config | ||
self.state = state | ||
self.status: Optional[ExperimentRunnerStatus] = None | ||
|
||
def run(self): | ||
run_model = EverestRunModel.create( | ||
self.everest_config, | ||
simulation_callback=partial(_sim_monitor, shared_data=self.state), | ||
optimization_callback=partial(_opt_monitor, shared_data=self.state), | ||
) | ||
|
||
evaluator_server_config = EvaluatorServerConfig( | ||
custom_port_range=range(49152, 51819) | ||
if run_model.ert_config.queue_config.queue_system == QueueSystem.LOCAL | ||
else None | ||
) | ||
|
||
try: | ||
run_model.run_experiment(evaluator_server_config) | ||
self.status = ExperimentRunnerStatus( | ||
status="Experiment finished", exit_code=run_model.exit_code | ||
) | ||
except Exception: | ||
self.status = ExperimentRunnerStatus( | ||
status="Experiment failed", message=traceback.format_exc() | ||
) | ||
|
||
def get_status(self) -> Optional[ExperimentRunnerStatus]: | ||
return self.status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider using a background task for this instead? https://fastapi.tiangolo.com/tutorial/background-tasks/ Would be fairly similar, but can offload more to fastapi.
something like:
class ExperimentTask:
def __init__(self, model: EverestRunModel ) -> None:
self._model = model
self.model_type = str(model.name())
self._events: List[StatusEvents] = []
def cancel(self) -> None:
if self._model is not None:
self._model.cancel()
async def run(self):
loop = asyncio.get_running_loop()
logger.info(f"Starting experiment")
port_range = None
if self._model.queue_system == QueueSystem.LOCAL:
port_range = range(49152, 51819)
evaluator_server_config = EvaluatorServerConfig(custom_port_range=port_range)
simulation_future = loop.run_in_executor(
None,
lambda: self._model.start_simulations_thread(
evaluator_server_config
),
)
then the endpoint can do:
@app.post("/experiments/", response_model=ExperimentOut)
async def submit_experiment(experiment: EverestConfig, background_tasks: BackgroundTasks):
try:
run_model = EverestRunModel.create(
self.everest_config,
simulation_callback=partial(_sim_monitor, shared_data=self.state),
optimization_callback=partial(_opt_monitor, shared_data=self.state),
)
except ValueError as e:
return HTTPException(
status_code=420,
detail="Experiment was not valid, failed with: {e}",
)
task = ExperimentTask(model=run_model)
experiments[experiment_id] = task
background_tasks.add_task(task.run)
return ExperimentOut(id=uuid_here, type=task.model_type)
then the next step after this PR is to extract the events from the run model.
security = HTTPBasic() | ||
|
||
|
||
class EverestServerAPI(threading.Thread): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this code is mostly moved around? What is the advantage of using a class here instead of:
def everserver_thread(output_dir: str, optimization_output_dir: str):
app = FastAPI()
state = {
SIM_PROGRESS_ENDPOINT: {},
STOP_ENDPOINT: False,
}
runner: Optional[ExperimentRunner] = None
authentication = _generate_authentication()
cert_path, key_path, key_pw = _generate_certificate(
ServerConfig.get_certificate_dir(output_dir)
)
host = ert_shared_get_machine_name()
port = _find_open_port(host, lower=5000, upper=5800)
_write_hostfile(
ServerConfig.get_hostfile_path(output_dir), host, port, cert_path, authentication
)
def check_user(credentials: HTTPBasicCredentials) -> None:
if credentials.password != authentication:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid credentials",
headers={"WWW-Authenticate": "Basic"},
)
def log(request: Request) -> None:
logging.getLogger("everserver").info(
f"{request.scope['path']} entered from {request.client.host if request.client else 'unknown host'} with HTTP {request.method}"
)
@app.get("/")
def get_status(request: Request, credentials: HTTPBasicCredentials = Depends(security)):
log(request)
check_user(credentials)
...
...
uvicorn.run(
app,
host="0.0.0.0",
port=port,
ssl_keyfile=key_path,
ssl_certfile=cert_path,
ssl_version=ssl.PROTOCOL_SSLv23,
ssl_keyfile_password=key_pw,
log_level=logging.CRITICAL,
)
Personally I find it easier to read the endpoints when they are decorated, but there might be good reasons for making a class instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make the everserver.py file smaller and the code more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no problem with moving it, but think the history between moving/refactoring and adding the new end-point, as it was a bit difficult to tell them apart. Personally I prefer the decorated functions instead of the class approach, as I find them easier to read.
ce686fd
to
cfa67d4
Compare
cfa67d4
to
cfa871f
Compare
I see that there is some rebasing needed in any case. My PR will also cause some need for manually fixing when rebasing, in particular the way optimization exit code is handled. Since you are going to need to resolve things anyway, I will merge my PR, let me know if you need some help with the rebase. |
Sounds good |
Issue
Resolves
Approach
Short description of the approach
(Screenshot of new behavior in GUI if applicable)
git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"'
)When applicable