This guide is for developers who want to better understand how their model runs on Neuron Cores.
TensorBoard-Neuron is adapted to provide useful information related to Neuron devices, such as compatibility and profiling. It also preserves TensorBoard’s Debugger plugin, which may be useful in finding numerical mismatches.
By default, TensorBoard-Neuron will be installed when you install TensorFlow-Neuron.
$ pip install tensorflow-neuron
It can also be installed separately.
$ pip install tensorboard-neuron
Additionally, if you would like to profile your model (see below), you will also need to have Neuron tools installed.
$ sudo apt install aws-neuron-tools
When using TensorFlow-Neuron, MXNet-Neuron, or PyTorch-Neuron, raw profile data will be collected if NEURON_PROFILE environment variable is set. The raw profile is dumped into the directory pointed by NEURON_PROFILE environment variable.
The steps to do this:
- Set NEURON_PROFILE environment variable, e.g.:
export NEURON_PROFILE=/some/output/directory
NOTE: this directory must exist before you move on to the next step. Otherwise, profile data will not be emitted.
- Run inference through the framework. See the tutorials for each framework for more info.
To view data in TensorBoard-Neuron, run the command below, where “logdir” is the directory where TensorFlow logs are generated. (Note that this "logdir" is not the same as the NEURON_PROFILE directory that you set during inference, and in fact, depending on your configuration you may not have any tensorflow logs. For this step, NEURON_PROFILE still needs to be set to the same directory you used during your inference run. tensorboard_neuron
will process the neuron profile data from this directory at startup.)
$ tensorboard_neuron --logdir /path/to/logdir --run_neuron_profile
By default, TensorBoard-Neuron will be launched at “localhost:6006,” by specifying "--host" and "--port" option the URL can be changed.
Now, in a browser visit localhost:6006 to view the visualization or and enter the host and port if specified above.
TensorBoard-Neuron can visualize which operators are supported on Neuron devices. All Neuron compatible operators would run on Neuron Cores and other operators would run on CPU.
Use the TensorFlow APIs to create the event file. See the sample Python code snippet below for TensorFlow:
import tensorflow as tf
your_graph_file = '/path/to/graph/file'
your_graph_def = tf.GraphDef()
with open(your_graph_file, 'rb') as f:
graph_def.ParseFromString(f.read())
your_graph = tf.Graph()
with your_graph.as_default():
tf.import_graph_def(your_graph_def, name='')
fw = tf.summary.FileWriter(graph=yourgraph, logdir='/path/to/logdir'
fw.flush()
See the above section Visualizing data with TensorBoard-Neuron.
In the navigation pane on the left, under the “Color” section, select “Neuron Compatibility.”
Now, the graph should be colored red and/or green. Green indicates that an operator that is compatible with Neuron devices, while red indicates that the operator is currently not supported. If there are unsupported operators, all of these operators’ names will be listed under the “Incompatible Operations” section.
After successfully analyzing the profiled run on a Neuron device, you can launch TensorBoard-Neuron to view the graph and see how much time each operator is taking.
This step requires Neuron tools in order to work.
See the above section Visualizing data with TensorBoard-Neuron
The “neuron_profile” tag contains timing information regarding the inference you profiled.
In the navigation pane on the left, under the “Color” section, select “Compute time.”
This view will show time taken by each layer and will be colored according to how much relative time the layer took to compute. A lighter shade of red means that a relatively small portion of compute time was spent in this layer, while a darker red shows that more compute time was used. Some layers may also be blank, which indicates that these layers may have been optimized out to improve inference performance. Clicking on a node will show the compute time, if available.
To get a better understanding of the profile, you can check out the Neuron Profile plugin. Here, you will find more information on the inference, including an overview, a list of the most time-consuming operators (op profile tool), and an execution timeline view (Chrome trace).
This step requires Neuron tools in order to work.
See the above section Visualizing data with TensorBoard-Neuron
On the navigation bar at the top of the page, there will be a list of active plugins. In this case, you will need to use the “Neuron Profile” plugin. The plugin may take a while to register on first load. If this tab does not show initially, please refresh the page.
The first page you will land on in the Neuron Profile plugin is the overview page. It contains various information regarding the inference. In the “Performance Summary” section, you will see execution stats, such as the total execution time, the average layer execution time, and the utilization of NeuronMatrix Units.
The “Neuron Time Graph” shows how long a portion of the graph (a NeuronOp) took to execute.
The “Top TensorFlow operations executed on Neuron Cores” sections gives a quick summary of the most time-consuming operators that were executed on the device.
“Run Environment” shows the information on devices used during this inference.
Finally, the “Recommendation for Next Steps” section gives helpful pointers to place to learn more about what to do next
In the “Tools” dropdown menu, select “op_profile.”
The “op profile” tool displays the percentage of overall time taken for each operator, sorted by the most expensive operators at the top. It gives a better understanding of where the bottlenecks in a model may be.
In the “Tools” dropdown menu, select “trace_viewer.”
For developers wanting to better understand the timeline of the inference, the Chrome trace view is the tool for you. It shows the history of execution organized by the operator names.
Please note that this tool can only be used in Chrome browsers.
To make use of the Debugger plugin, you must specify your desired output tensors before creating the saved model. See Step 1: Get a TensorFlow SavedModel that runs on Inferentia: Getting Started: TensorFlow-Neuron for how to create the saved model. Essentially, adding these tensors to the “outputs” dictionary will allow you to view them in the debugger later on.
Please note that this feature is currently only available for TensorFlow users.
To use the Debugger plugin, you will need to launch with an extra flag:
$ tensorboard_neuron --logdir /path/to/logdir --debugger_port PORT
where PORT is your desired port number.
In order to run the inference in “debug mode,” you must use TensorFlow’s debug wrapper. The following lines will need to be added to your script.
from tensorflow.python import debug as tf_debug
# The port must be the same as the one used for --debugger_port above
# in this example, PORT is 7000
DEBUG_SERVER_ADDRESS = 'localhost:7000'
# create your TF session here
sess = tf_debug.TensorBoardDebugWrapperSession(
sess, DEBUG_SERVER_ADDRESS)
# run inference using the wrapped session
After adding these modifications, run the script to begin inference. The execution will be paused before any calculation starts.
On the navigation bar at the top of the page, there will be a list of active plugins. In this case, you will need to use the “Debugger” plugin.
In the “Runtime Node List” on the left, there will be a list of operators and a checkbox next to each. Select all of the operators that you would like the view the tensor output of.
On the bottom left of the page, there will be a “Continue...” button that will resume the inference execution. As the graph is executed, output tensors will be saved for later viewing.
At the bottom of the page, there will be a“Tensor Value Overview” section that shows a summary of all the output tensors that were selected as watchpoints in Step 4. To view more specific information on a tensor, you can click on a tensor’s value. You may also hover over the bar in the “Health Pill” column for a more detailed summary of values.