You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
When I run resnet50-OT (SCI/build/bin) under the inference mode (i.e. removing the definition of macro TRAINING), the output is:
Total time taken = 2789929 milliseconds.
Total data sent = 346429 MiB.
Number of rounds = 5579
Total comm (sent+received) = 376212 MiB.
------------------------------------------------------
Total time in Conv = 2507.34 seconds.
Total time in MatMul = 2516.68 seconds.
Total time in BatchNorm = 47.581 seconds.
Total time in Truncation = 115.27 seconds.
Total time in Relu = 92.644 seconds.
Total time in MaxPool = 16.519 seconds.
Total time in AvgPool = 0.043 seconds.
Total time in ArgMax = 0.078 seconds.
Total time in MatAdd = 0 seconds.
Total time in MatAddBroadCast = 0 seconds.
Total time in MulCir = 0 seconds.
Total time in ScalarMul = 0 seconds.
Total time in Sigmoid = 0 seconds.
Total time in Tanh = 0 seconds.
Total time in Sqrt = 0 seconds.
Total time in NormaliseL2 = 0 seconds.
------------------------------------------------------
Conv data sent = 342519 MiB.
MatMul data sent = 343677 MiB.
BatchNorm data sent = 901.882 MiB.
Truncation data sent = 945.903 MiB.
Relu data sent = 766.831 MiB.
Maxpool data sent = 136.633 MiB.
Avgpool data sent = 0.239258 MiB.
ArgMax data sent = 0.136467 MiB.
MatAdd data sent = 0 MiB.
MatAddBroadCast data sent = 0 MiB.
MulCir data sent = 0 MiB.
Sigmoid data sent = 0 MiB.
Tanh data sent = 0 MiB.
Sqrt data sent = 0 MiB.
NormaliseL2 data sent = 0 MiB.
------------------------------------------------------
Conv data (sent+received) = 355150 MiB.
MatMul data (sent+received) = 356514 MiB.
BatchNorm data (sent+received) = 5993.38 MiB.
Truncation data (sent+received) = 7614.44 MiB.
Relu data (sent+received) = 5165.8 MiB.
Maxpool data (sent+received) = 921.391 MiB.
Avgpool data (sent+received) = 2.24609 MiB.
ArgMax data (sent+received) = 0.76857 MiB.
MatAdd data (sent+received) = 0 MiB.
MatAddBroadCast data (sent+received) = 0 MiB.
MulCir data (sent+received) = 0 MiB.
ScalarMul data (sent+received) = 0 MiB.
Sigmoid data (sent+received) = 0 MiB.
Tanh data (sent+received) = 0 MiB.
Sqrt data (sent+received) = 0 MiB.
NormaliseL2 data (sent+received) = 0 MiB.
The sum of each component's running time is significantly larger than the total running time and the sum of each component's communication size is significantly larger than the total communication size.
Therefore, I guess that the running time and communication size of the convolution component and the matmul component may be repeatedly counted. Could you help me explain this?
The text was updated successfully, but these errors were encountered:
Hi,
When I run resnet50-OT (SCI/build/bin) under the inference mode (i.e. removing the definition of macro TRAINING), the output is:
The sum of each component's running time is significantly larger than the total running time and the sum of each component's communication size is significantly larger than the total communication size.
The text was updated successfully, but these errors were encountered: