Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experiments watcher (webserver) #43

Open
Cadene opened this issue May 23, 2020 · 0 comments
Open

Add experiments watcher (webserver) #43

Cadene opened this issue May 23, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@Cadene
Copy link
Owner

Cadene commented May 23, 2020

Goals

Display a list of experiments on a webpage to easily check their status.
This webpage must be frequently updated, must be easy to customize, and must scale to 1000 experiments.

We would like to improve over this POC:
Screenshot 2020-05-23 at 23 10 32

Propositions

Run the webserver

Using default options:

python -m bootstrap.watch --exp.dirs logs/myproject

Using custom options:

python -m bootstrap.watch -o myproject/options/watch.yaml

The yaml file myproject_watch.yaml is used to generate a custom webpage. It contains experiment directories to watch, filtering rules to select the experiment to display and columns to display.

Example:

exp:
  dirs: logs/myproject
filters:
  - table: options
    column: exp.dir
    rule: mnist_resnet_*
  - SQL: SELECT MAX(accuracy) FROM test_epoch WHERE accuracy > 0.3
columns:
  - name: exp.dir
    table: options
    column: exp.dir
  - name: accuracy
    SQL: SELECT MAX(accuracy) FROM test_epoch WHERE accuracy > 0.3
  - name: status
    type: status
ranking:
  - name: status
    order: [crashed, ended, running]
  - name: accuracy
    order: asc

Design of webpage

Core features

Similarly to the POC, a first header with statistics of experiments followed by the custom table.

Example:

Ended: 10 | Crashed: 3 | Running: 40 (| Pending: 150)

server | exp.dir | accuracy | status
--- | --- | --- | ---
pascal[3] | lolilol | 0.19 | crashed

Optional

In the header, we could have the list of filtering rules and columns to display. We could remove them or add new ones dynamically. Then, we could have a button to export as yaml file or update the original yaml file.

Filtering rules

Drop down pannel containing filtering rules which are based on SQLite data (select('options') or select('env_info') or select('train_epoch') or select('test_epoch') or custom SQL query).

Example:

List of positive filters:
- [table: options] [column: exp.dir] [rule: mnist_resnet_*]
- [table: env_info] [column: nodename] [rule: pascal & titan]
- [SQL: SELECT MAX(accuracy) FROM test_epoch WHERE accuracy > 0.3]
- [type: status] [rule: crashed]
- [info: end_datetime] [rule: >2020-05-22 10:00:00]

Display options that have a experiment directory which match the regexp mnist_resnet_*, are trained on pascaland titan server, have an accuracy higher than 0.3, and crashed after a certain datetime.

We could have a list of negative filters as well, corresponding to rules to remove experiments from the list.

Column to display

Same interface to select the column to display.
By default: server and gpu ids, experiment directory, number of epochs done / number of total epochs, datetime of creation, status.

Example:

... (default)
- [SQL: SELECT MAX(accuracy) FROM test_epoch]

Implementation

bootstrap/watch.py
bootstrap/watch/css
bootstrap/watch/js
bootstrap/watch/index.html

Use Werkzeug to create webserver. (see shortly example).

Send in POST request all the options in the json format to the webpage.

Use simple javascript (no ReactJS) to get these options, look for experiment directories, send SQL queries to sqlite files, update the HTML. Every x seconds, update list of experiments and experiments if needed.

@Cadene Cadene added the enhancement New feature or request label May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants