Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

test_metric_performance #18330

Open
leezu opened this issue May 15, 2020 · 4 comments
Open

test_metric_performance #18330

leezu opened this issue May 15, 2020 · 4 comments

Comments

@leezu
Copy link
Contributor

leezu commented May 15, 2020

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-18284/runs/6/nodes/364/steps/731/log/?start=0


[2020-05-14T23:42:14.444Z] =================================== FAILURES ===================================
[2020-05-14T23:42:14.444Z] ___________________________ test_metric_performance ____________________________
[2020-05-14T23:42:14.444Z] [gw1] linux -- Python 3.6.9 /usr/bin/python3
[2020-05-14T23:42:14.444Z] 
[2020-05-14T23:42:14.444Z]     def test_metric_performance():
[2020-05-14T23:42:14.444Z]         """ unittest entry for metric performance benchmarking """
[2020-05-14T23:42:14.444Z]         # Each dictionary entry is (metric_name:(kwargs, DataGenClass))
[2020-05-14T23:42:14.444Z]         metrics = [
[2020-05-14T23:42:14.444Z]             ('acc', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('top_k_acc', ({'top_k': 5}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('F1', ({}, F1MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('Perplexity', ({'ignore_label': -1}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('MAE', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('MSE', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('RMSE', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('ce', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('nll_loss', ({}, MetricDataGen)),
[2020-05-14T23:42:14.444Z]             ('pearsonr', ({}, PearsonMetricDataGen)),
[2020-05-14T23:42:14.444Z]         ]
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]         data_size = 1024 * 128
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]         batch_sizes = [16, 64, 256, 1024]
[2020-05-14T23:42:14.444Z]         output_dims = [128, 1024, 8192]
[2020-05-14T23:42:14.444Z]         ctxs = [mx.cpu(), mx.gpu()]
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]         print("\nmx.gluon.metric benchmarks", file=sys.stderr)
[2020-05-14T23:42:14.444Z]         print(
[2020-05-14T23:42:14.444Z]             "{:15}{:10}{:12}{:12}{:15}{:15}{}".format(
[2020-05-14T23:42:14.444Z]                 'Metric', 'Data-Ctx', 'Label-Ctx', 'Data Size', 'Batch Size', 'Output Dim', 'Elapsed Time'),
[2020-05-14T23:42:14.444Z]             file=sys.stderr)
[2020-05-14T23:42:14.444Z]         print("{:-^90}".format(''), file=sys.stderr)
[2020-05-14T23:42:14.444Z]         for k, v in metrics:
[2020-05-14T23:42:14.444Z]             for c in output_dims:
[2020-05-14T23:42:14.444Z]                 for n in batch_sizes:
[2020-05-14T23:42:14.444Z]                     for pred_ctx, label_ctx in itertools.product(ctxs, ctxs):
[2020-05-14T23:42:14.444Z] >                       run_metric(k, v[1], (data_size * 128)//(n * c), n, c, pred_ctx, label_ctx, **v[0])
[2020-05-14T23:42:14.444Z] 
[2020-05-14T23:42:14.444Z] tests/python/unittest/test_metric_perf.py:118: 
[2020-05-14T23:42:14.444Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2020-05-14T23:42:14.444Z] tests/python/unittest/test_metric_perf.py:76: in run_metric
[2020-05-14T23:42:14.444Z]     mx.nd.waitall()
[2020-05-14T23:42:14.444Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2020-05-14T23:42:14.444Z] 
[2020-05-14T23:42:14.444Z]     def waitall():
[2020-05-14T23:42:14.444Z]         """Wait for all async operations to finish in MXNet.
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]         This function is used for benchmarking only.
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]         .. note::
[2020-05-14T23:42:14.444Z]     
[2020-05-14T23:42:14.444Z]            If your mxnet code throws an exception, then waitall can cause performance impact.
[2020-05-14T23:42:14.444Z]         """
[2020-05-14T23:42:14.444Z] >       check_call(_LIB.MXNDArrayWaitAll())
[2020-05-14T23:42:14.444Z] E       Failed: Timeout >1200.0s
[2020-05-14T23:42:14.444Z] 
[2020-05-14T23:42:14.444Z] python/mxnet/ndarray/ndarray.py:211: Failed
[2020-05-14T23:42:14.444Z] ---------------------------- Captured stderr setup -----------------------------

metric now rely on mxnet numpy implementation and may be slower until mxnet overhead is reduced.

@acphile should the timeout on this test be temporarily increased?

@leezu
Copy link
Contributor Author

leezu commented May 15, 2020

Notice that this timeout only happens on Python3: MKL-CPU jobs. The job there times out after 1200 seconds (global timeout), whereas it finishes in under 300 seconds on the non-MKL Python 3 CPU job. So the problem is due to #18244 (now the test relies on MXNet numpy, whereas it relied on upstream numpy before)

cc @TaoLv

@leezu
Copy link
Contributor Author

leezu commented May 15, 2020

Suggest we mark this test as xfail for MKL builds for now

@leezu
Copy link
Contributor Author

leezu commented May 15, 2020

The test doesn't actually enforce anything. It's output is not monitored (AFAIK) so it may be best to simply delete this test.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants