Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor everserver #9367

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

frode-aarstad
Copy link
Contributor

@frode-aarstad frode-aarstad commented Nov 27, 2024

Issue
Resolves

Approach
Short description of the approach

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/ert/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@frode-aarstad frode-aarstad marked this pull request as draft November 27, 2024 14:08
@frode-aarstad frode-aarstad force-pushed the refactor-everserver branch 2 times, most recently from 1f0ef43 to 080ce06 Compare November 28, 2024 12:00
@frode-aarstad frode-aarstad self-assigned this Nov 28, 2024
@frode-aarstad frode-aarstad added the release-notes:maintenance Automatically categorise as maintenance change in release notes label Nov 28, 2024
@frode-aarstad frode-aarstad marked this pull request as ready for review November 28, 2024 12:17
@frode-aarstad frode-aarstad marked this pull request as draft November 28, 2024 12:18
@frode-aarstad frode-aarstad marked this pull request as ready for review November 28, 2024 13:23
@frode-aarstad frode-aarstad force-pushed the refactor-everserver branch 2 times, most recently from eed62a1 to bc30caf Compare November 28, 2024 19:16
Copy link
Contributor

@verveerpj verveerpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments. Rebasing will probably cause some issues now...

from everest.util import makedirs_if_needed


def _get_machine_name() -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a get_machine_name in ERT already, can we use that?

In general, there might be other port/authentication related stuff in ERT. Maybe there should be a utils file for that somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will replace with the one in ert_shared

Comment on lines 177 to 153
"running": status.running,
"waiting": status.waiting,
"pending": status.pending,
"complete": status.complete,
"failed": status.failed,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is going to change slightly in PR #9342, somebody will have to adapt

Comment on lines 288 to 270
uvicorn.run(
self.app,
host="0.0.0.0",
port=self.port,
ssl_keyfile=self.key_path,
ssl_certfile=self.cert_path,
ssl_version=ssl.PROTOCOL_SSLv23,
ssl_keyfile_password=self.key_pw,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently a log_level=logging.CRITICAL was added to that call, otherwise we got log messages in the console. If there is a better solution that would be fine, otherwise maybe add it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it here

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 95.17544% with 11 lines in your changes missing coverage. Please review.

Project coverage is 91.86%. Comparing base (90c14a9) to head (43eedff).
Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
src/everest/detached/jobs/everest_server_api.py 95.73% 7 Missing ⚠️
src/everest/detached/__init__.py 75.00% 2 Missing ⚠️
src/everest/detached/jobs/everserver.py 96.07% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9367      +/-   ##
==========================================
+ Coverage   91.83%   91.86%   +0.03%     
==========================================
  Files         434      434              
  Lines       26735    26815      +80     
==========================================
+ Hits        24551    24633      +82     
+ Misses       2184     2182       -2     
Flag Coverage Δ
cli-tests 39.37% <ø> (+0.04%) ⬆️
everest-models-test 34.76% <29.38%> (+0.02%) ⬆️
gui-tests 72.17% <ø> (+0.04%) ⬆️
integration-test 37.14% <84.64%> (+0.14%) ⬆️
performance-tests 51.91% <ø> (-0.01%) ⬇️
test 40.82% <92.54%> (+0.13%) ⬆️
unit-tests 73.97% <29.38%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 183 to 184
response = requests.post(
url + "/" + START_ENDPOINT, verify=cert, auth=auth, proxies=PROXY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider starting the experiment here:

print("Everest server found!")
, where we start the server initially? Then passing the config through here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between everserver.py and everest_script.py ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

@app.post("/experiments/")
async def submit_experiment(experiment: EverestConfig, ...):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the advantage of doing this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We force more separation by starting the experiment through an api instead of relying on files on disk. We would also like to extend this concept to ert, and having a clear separation helps then.

Copy link
Collaborator

@oyvindeide oyvindeide Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between everserver.py and everest_script.py ?

everserver.py is the actual server, while everest_script the client which starts the server and queries for status

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current client does:

submit_server_to_cluster(path_to_config_file)
wait_for_server
monitor

think in the future we would like:

connect_to_already_running_server
requests.post(server_url, json=everest_config)
monitor

Then we are one step closer to having the experiment server, and more clear separation between the frontend and the backend.

@frode-aarstad frode-aarstad force-pushed the refactor-everserver branch 4 times, most recently from 8690547 to 43eedff Compare December 10, 2024 09:54
Copy link
Collaborator

@oyvindeide oyvindeide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I was wondering a bit about the structure? Have asked some questions below.

@@ -308,26 +171,57 @@ def main():
return

try:
# wait until the api server is running
is_running = False
while not is_running:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need some timeouts here? We should be doing something of the same for dark_storage i guess? Also starting a server there, would it be possible to alight this and that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is work for another PR if we want to start evereserver as we do the storage server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But here we can throw an exception after a few retries

Comment on lines 174 to 205
class ExperimentRunner(threading.Thread):
def __init__(self, everest_config, state: dict):
super().__init__()

self.everest_config = everest_config
self.state = state
self.status: Optional[ExperimentRunnerStatus] = None

def run(self):
run_model = EverestRunModel.create(
self.everest_config,
simulation_callback=partial(_sim_monitor, shared_data=self.state),
optimization_callback=partial(_opt_monitor, shared_data=self.state),
)

evaluator_server_config = EvaluatorServerConfig(
custom_port_range=range(49152, 51819)
if run_model.ert_config.queue_config.queue_system == QueueSystem.LOCAL
else None
)

try:
run_model.run_experiment(evaluator_server_config)
self.status = ExperimentRunnerStatus(
status="Experiment finished", exit_code=run_model.exit_code
)
except Exception:
self.status = ExperimentRunnerStatus(
status="Experiment failed", message=traceback.format_exc()
)

def get_status(self) -> Optional[ExperimentRunnerStatus]:
return self.status
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider using a background task for this instead? https://fastapi.tiangolo.com/tutorial/background-tasks/ Would be fairly similar, but can offload more to fastapi.

something like:

class ExperimentTask:
    def __init__(self, model: EverestRunModel ) -> None:
        self._model = model
        self.model_type = str(model.name())
        self._events: List[StatusEvents] = []

    def cancel(self) -> None:
        if self._model is not None:
            self._model.cancel()

    async def run(self):
        loop = asyncio.get_running_loop()
        logger.info(f"Starting experiment")

        port_range = None
        if self._model.queue_system == QueueSystem.LOCAL:
            port_range = range(49152, 51819)
        evaluator_server_config = EvaluatorServerConfig(custom_port_range=port_range)

        simulation_future = loop.run_in_executor(
            None,
            lambda: self._model.start_simulations_thread(
                evaluator_server_config
            ),
        )

then the endpoint can do:


@app.post("/experiments/", response_model=ExperimentOut)
async def submit_experiment(experiment: EverestConfig, background_tasks: BackgroundTasks):
    try:
        run_model = EverestRunModel.create(
            self.everest_config,
            simulation_callback=partial(_sim_monitor, shared_data=self.state),
            optimization_callback=partial(_opt_monitor, shared_data=self.state),
        )
    except ValueError as e:
        return HTTPException(
            status_code=420,
            detail="Experiment was not valid, failed with: {e}",
        )

    task = ExperimentTask(model=run_model)
    experiments[experiment_id] = task
    background_tasks.add_task(task.run)
    return ExperimentOut(id=uuid_here, type=task.model_type)

then the next step after this PR is to extract the events from the run model.

security = HTTPBasic()


class EverestServerAPI(threading.Thread):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this code is mostly moved around? What is the advantage of using a class here instead of:

def everserver_thread(output_dir: str, optimization_output_dir: str):
    app = FastAPI()
    state = {
        SIM_PROGRESS_ENDPOINT: {},
        STOP_ENDPOINT: False,
    }
    runner: Optional[ExperimentRunner] = None
    authentication = _generate_authentication()
    cert_path, key_path, key_pw = _generate_certificate(
        ServerConfig.get_certificate_dir(output_dir)
    )
    host = ert_shared_get_machine_name()
    port = _find_open_port(host, lower=5000, upper=5800)
    _write_hostfile(
        ServerConfig.get_hostfile_path(output_dir), host, port, cert_path, authentication
    )

    def check_user(credentials: HTTPBasicCredentials) -> None:
        if credentials.password != authentication:
            raise HTTPException(
                status_code=status.HTTP_401_UNAUTHORIZED,
                detail="Invalid credentials",
                headers={"WWW-Authenticate": "Basic"},
            )

    def log(request: Request) -> None:
        logging.getLogger("everserver").info(
            f"{request.scope['path']} entered from {request.client.host if request.client else 'unknown host'} with HTTP {request.method}"
        )

    @app.get("/")
    def get_status(request: Request, credentials: HTTPBasicCredentials = Depends(security)):
        log(request)
        check_user(credentials)
        ...
    ...
    
    uvicorn.run(
        app,
        host="0.0.0.0",
        port=port,
        ssl_keyfile=key_path,
        ssl_certfile=cert_path,
        ssl_version=ssl.PROTOCOL_SSLv23,
        ssl_keyfile_password=key_pw,
        log_level=logging.CRITICAL,
    )

Personally I find it easier to read the endpoints when they are decorated, but there might be good reasons for making a class instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make the everserver.py file smaller and the code more readable

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with moving it, but think the history between moving/refactoring and adding the new end-point, as it was a bit difficult to tell them apart. Personally I prefer the decorated functions instead of the class approach, as I find them easier to read.

@verveerpj
Copy link
Contributor

I see that there is some rebasing needed in any case. My PR will also cause some need for manually fixing when rebasing, in particular the way optimization exit code is handled. Since you are going to need to resolve things anyway, I will merge my PR, let me know if you need some help with the rebase.

@frode-aarstad
Copy link
Contributor Author

I see that there is some rebasing needed in any case. My PR will also cause some need for manually fixing when rebasing, in particular the way optimization exit code is handled. Since you are going to need to resolve things anyway, I will merge my PR, let me know if you need some help with the rebase.

Sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:maintenance Automatically categorise as maintenance change in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants