-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move explorer execution to background #1504
Conversation
tools/explorer/test/run_tests.py
Outdated
assert result.ok | ||
if "error" in result.json(): | ||
print(result.json()) | ||
assert False | ||
|
||
|
||
def wait_for_execution_to_finish(): | ||
for _ in range(1000): # Try for up to 100 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test model compilation should be performed under 100s? Might not be in the future. Would you want to define wait time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Will clean this up a bit and leave a nice error message if we timeout. But i wouldn't increase this much until we have bigger models, in order to catch hang-like bugs faster atm.
I was hitting the metal regression locally while testing this PR, that is why i added timeouts to all requests in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is defining wait time per test via some attribute, not having a single value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will add.
) | ||
self.reset_state() | ||
|
||
options = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here should be new override class be plugged in with its toString method, also replacing hardcodings below. Ofc it needs to be populated from TTE frontend data first - feeding the override values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
error = "Error running compile TTIR to TTNN Backend Pipeline" | ||
self.log(error) | ||
raise ExplorerRunException(error) | ||
self.progress = 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be more than 20 in the future, lot more. :) Would be cool to extend it in future with more granularity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :), once we have some compile and execution output we can build a good indication op progress based on that.
tools/explorer/CMakeLists.txt
Outdated
@@ -3,7 +3,7 @@ include(ExternalProject) | |||
set(TT_EXPLORER_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/run.py) | |||
set(TTMLIR_BUILD_BIN_DIR ${TTMLIR_BINARY_DIR}/bin) | |||
|
|||
set(MODEL_EXPLORER_VERSION "95d79ec933643c3b145537ce6fd5d9f0e735683d") | |||
set(MODEL_EXPLORER_VERSION "0ac804da35e6d42b2e946879e3b3bdd8bd17254a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably coordinate with the model-explorer
repo to sync this version to the one with the logging window (when it gets merged in)
@@ -70,9 +74,23 @@ def execute( | |||
memory_layout_analysis_enabled = False | |||
memory_layout_analysis_policy = None | |||
|
|||
ttnn_ir = self.model_runner.run( | |||
self.model_runner.run( | |||
model_path, memory_layout_analysis_enabled, memory_layout_analysis_policy | |||
) | |||
|
|||
# TODO(odjuricic, #933) Parse TTNN IR and return the post optimized graph. | |||
return {"graphs": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm about it here: this is the expected return in case the execution started, if something goes wrong, it should throw an error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Is that an ok flow from frontend side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is okay, I just wanted to confirm as my previous implementation was expecting some response.
I made a change to accept graphs
as an empty array as the success response :)
"isDone": done, | ||
"progress": progress, | ||
"total": 100, | ||
"error": error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be interesting to add properties to send logs to the UI, like, actual text that gives a better perspective of what is going on?
Here is the current interface for the UI, let me know if there is some property that doesn't make sense: https://github.com/tenstorrent/model-explorer/blob/724abaf54e3ae87b70b8869d1cf7716bbc565a63/src/ui/src/common/extension_command.ts#L99-L107
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding that today. Left a comment on your PR to discuss these properties.
405d848
to
83e6cb8
Compare
graph = mlir.build_graph(module) | ||
return {"graphs": [graph]} | ||
graph, perf_data = mlir.build_graph(module, perf_trace) | ||
return {"graphs": [graph], "perf_data": perf_data} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The perf_data
will not be returned because it needs to be wrapped into an adapter_response and kept in graphs
. It should look something like:
return to_adapter_format({
"graphs": [graph],
"perf_data": perf_data
})
In case the graph isn't passed we should escalate this to change how the frontend parses this as well since the graphs: graph property needs to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try out, if this is not the case i'll try to merge now in order to unblock your further work and create an issue for this.
@svuckovicTT @mtopalovicTT Codeowners file got messed up and your review is required here now. Please take a look. I've put in fix for codeowners in seperate PR. |
83e6cb8
to
303beef
Compare
* Use subprocesses for compile and to_flatbuffer in order for asserts to not kill the server * Setup high level error messages for execution steps * Send incremental logs to the frontend
Explorer execute command now spawns a new thread in which all model compile and execution is handled. Results are stored as state on the server and will be sent to the frontend with the next covert command.
Additionally: