Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorFlow to 2.5 #9123

Closed
wants to merge 33 commits into from
Closed

Update TensorFlow to 2.5 #9123

wants to merge 33 commits into from

Conversation

samsucik
Copy link
Contributor

Proposed changes:

  • ...

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

Ghostvv and others added 27 commits January 14, 2021 11:55
…ment for previously used `to_numpy_or_python_type`.
…ackage as opposed to tensorflow.python.keras. We don't want to use or rely on standalone keras.
…rker` - the decorator was just copy-pasted from TF code and has now been removed.
@github-actions github-actions bot deleted a comment from koernerfelicia Jul 29, 2021
@github-actions
Copy link
Contributor

Commit: b8d42a6, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 2m34s, train: 19m45s, total: 22m18s
0.7942 (0.00) 0.7529 (0.00) 0.5316 (-0.01)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m25s, train: 27m32s, total: 30m56s
0.8000 (0.00) 0.7766 (0.00) 0.5298 (-0.01)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 2m36s, train: 29m54s, total: 32m29s
0.7864 (0.01) 0.7529 (0.00) 0.5828 (0.01)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m27s, train: 30m5s, total: 33m32s
0.7806 (0.00) 0.7935 (0.00) 0.5695 (0.04)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 36s, train: 15m40s, total: 16m16s
0.7476 (0.01) 0.7529 (0.00) 0.5249 (0.03)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m32s, train: 22m45s, total: 24m17s
0.7398 (0.00) 0.6819 (0.00) 0.5232 (0.02)

Dataset: Hermit, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 4m47s, train: 57m48s, total: 1h2m34s
0.8987 (0.00) 0.7504 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 4m59s, train: 1h34m6s, total: 1h39m4s
0.8717 (0.00) 0.7504 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 1m2s, train: 1h6m47s, total: 1h7m48s
0.8290 (-0.01) 0.7504 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m42s, train: 4h6m47s, total: 4h8m29s
0.8346 (0.00) 0.7520 (-0.00) no data

Dataset: Private 1, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 3m55s, train: 19m55s, total: 23m49s
0.9096 (0.00) 0.9612 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 4m42s, train: 24m46s, total: 29m28s
0.9137 (0.00) 0.9709 (0.00) no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 31s, train: 11m54s, total: 12m24s
0.8420 (0.00) 0.9574 (0.00) no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m0s, train: 21m26s, total: 22m26s
0.8555 (0.00) 0.9402 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 23s, train: 14m47s, total: 15m10s
0.8960 (-0.00) 0.9612 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m0s, train: 18m55s, total: 19m55s
0.8992 (-0.00) 0.9672 (-0.00) no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 33s, train: 17m48s, total: 18m21s
0.8940 (0.00) 0.9574 (0.00) no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m12s, train: 22m16s, total: 23m27s
0.8971 (0.00) 0.9718 (0.00) no data

Dataset: Private 2, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 4m19s, train: 1h29m45s, total: 1h34m4s
0.8745 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 4m44s, train: 1h54m39s, total: 1h59m23s
0.8830 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 34s, train: 54m51s, total: 55m25s
0.7253 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 49s, train: 1h53m40s, total: 1h54m28s
0.7811 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 29s, train: 42m31s, total: 43m0s
0.8509 (-0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 36s, train: 1h31m40s, total: 1h32m15s
0.8541 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 40s, train: 1h4m1s, total: 1h4m40s
0.8702 (0.01) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 53s, train: 1h54m6s, total: 1h54m59s
0.8530 (0.00) no data no data

Dataset: Private 3, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m44s, train: 3m30s, total: 5m14s
0.9177 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m47s, train: 4m33s, total: 6m20s
0.8436 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 31s, train: 2m7s, total: 2m38s
0.6173 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 37s, train: 3m23s, total: 4m0s
0.6214 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 28s, train: 2m47s, total: 3m14s
0.8683 (0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 34s, train: 3m38s, total: 4m11s
0.8642 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 34s, train: 3m14s, total: 3m47s
0.8477 (0.00) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 39s, train: 3m42s, total: 4m21s
0.8601 (0.00) no data no data

Dataset: Sara, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 8m0s, train: 29m3s, total: 37m3s
0.7126 (-0.00) 0.7895 (0.00) 0.7938 (0.00)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 8m48s, train: 34m57s, total: 43m45s
0.7087 (0.00) 0.7903 (0.00) 0.7736 (-0.00)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 8m3s, train: 56m14s, total: 1h4m16s
0.6880 (-0.00) 0.7895 (0.00) 0.7938 (-0.00)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 8m59s, train: 43m2s, total: 52m0s
0.6919 (-0.00) 0.7823 (-0.00) 0.7891 (0.00)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 1m27s, train: 33m44s, total: 35m11s
0.6577 (-0.00) 0.7895 (0.00) 0.7758 (-0.01)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m55s, train: 31m27s, total: 33m21s
0.6726 (0.00) 0.7743 (-0.00) 0.8016 (0.00)
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.1266 (0.00) 0.0000 (0.00) 2m44s 1m21s
Rules + AugMemo 0.9229 (0.00) 0.6644 (0.00) 2m42s 3m15s
Rules + AugMemo + TED 0.9733 (0.00) 0.7603 (0.02) 10m51s 4m21s
Rules + Memo 0.3860 (0.00) 0.1438 (0.00) 2m51s 1m30s
Rules + Memo + TED 0.9576 (0.00) 0.6644 (0.01) 10m41s 2m26s
Rules + TED 0.9576 (0.00) 0.6644 (0.01) 10m26s 2m11s

Dataset: financial-demo, Dataset repository branch: main (external repository), commit: bc73f669152147ae7f880ecba172c6e44f59e2c5
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 45s, train: 2m31s, total: 3m16s
1.0000 (0.00) 0.8333 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m23s, train: 3m48s, total: 5m11s
1.0000 (0.00) 0.8333 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 47s, train: 3m20s, total: 4m7s
1.0000 (0.00) 0.8333 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m16s, train: 4m4s, total: 5m19s
1.0000 (0.00) 0.8800 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 17s, train: 1m42s, total: 1m59s
0.9643 (0.00) 0.8333 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 43s, train: 3m7s, total: 3m50s
0.9643 (0.00) 0.8800 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.7218 (0.00) 0.5417 (0.00) 14s 11s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 14s 11s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 1m44s 30s
Rules + Memo 0.9807 (0.00) 0.9167 (0.00) 13s 11s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 1m45s 29s
Rules + TED 0.9144 (0.00) 0.7083 (0.00) 1m41s 37s

Dataset: helpdesk-assistant, Dataset repository branch: main (external repository), commit: 8e74cea8f931705239613a989282bba8bc6af5c7
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 40s, train: 2m3s, total: 2m43s
1.0000 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m9s, train: 2m44s, total: 3m53s
1.0000 (0.00) no data no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 42s, train: 2m32s, total: 3m14s
1.0000 (0.00) no data no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m9s, train: 2m59s, total: 4m7s
1.0000 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 13s, train: 1m10s, total: 1m23s
1.0000 (0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 41s, train: 2m2s, total: 2m43s
1.0000 (0.00) no data no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.5714 (0.00) 0.2500 (0.00) 8s 8s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 7s 9s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 1m45s 26s
Rules + Memo 0.9796 (0.00) 0.9167 (0.00) 7s 8s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 1m17s 27s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 1m17s 26s

Dataset: insurance-demo, Dataset repository branch: main (external repository), commit: 03ec1d9b9f2a41a877381ad24bfd91a3994692b7
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 49s, train: 1m27s, total: 2m15s
1.0000 (0.00) 1.0000 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m5s, train: 2m3s, total: 3m8s
1.0000 (0.00) 0.0000 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 39s, train: 1m40s, total: 2m19s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m10s, train: 2m5s, total: 3m14s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 14s, train: 1m0s, total: 1m14s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 48s, train: 2m3s, total: 2m51s
1.0000 (0.00) 1.0000 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.5909 (0.00) 0.0000 (0.00) 12s 9s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 9s 9s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 2m35s 28s
Rules + Memo 0.7600 (0.00) 0.5000 (0.00) 8s 8s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 2m28s 28s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 2m31s 27s

Dataset: retail-demo, Dataset repository branch: main (external repository), commit: 4c6a477775dbd538782278ca4c2fb651a63d3c29
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 45s, train: 1m26s, total: 2m10s
0.8387 (0.00) 0.2857 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m13s, train: 2m13s, total: 3m26s
0.8750 (0.00) 0.2857 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 48s, train: 2m5s, total: 2m52s
0.9375 (0.00) 0.2857 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m16s, train: 2m46s, total: 4m1s
0.8125 (0.00) 0.2857 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 20s, train: 1m1s, total: 1m21s
1.0000 (0.00) 0.2857 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 47s, train: 1m33s, total: 2m19s
0.9375 (0.00) 0.2857 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.9531 (0.00) 0.7778 (0.00) 6s 9s
Rules + AugMemo 0.9692 (0.00) 0.8889 (0.00) 6s 9s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 57s 24s
Rules + Memo 0.9692 (0.00) 0.8889 (0.00) 6s 9s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 58s 25s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 1m1s 25s

@github-actions github-actions bot deleted a comment from koernerfelicia Aug 5, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2021

Commit: 7ff8874, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 3m4s, train: 14m53s, total: 17m57s
0.7942 (0.00) 0.7529 (0.00) 0.5316 (-0.01)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m7s, train: 12m36s, total: 15m42s
0.8000 (0.00) 0.7766 (0.00) 0.5298 (-0.01)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 2m31s, train: 16m34s, total: 19m4s
0.7981 (0.02) 0.7529 (0.00) 0.5762 (0.04)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m44s, train: 13m18s, total: 16m2s
0.7806 (0.00) 0.7935 (0.00) 0.5430 (0.00)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 8m55s, total: 9m32s
0.7398 (-0.00) 0.7529 (0.00) 0.5099 (0.02)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m48s, train: 16m58s, total: 18m47s
0.7398 (0.00) 0.6819 (0.00) 0.5232 (0.01)

Dataset: Private 2, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 3m8s, train: 37m55s, total: 41m2s
0.8745 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m19s, train: 45m55s, total: 49m13s
0.8830 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 31s, train: 24m24s, total: 24m55s
0.7253 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 45s, train: 52m3s, total: 52m47s
0.7811 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 26s, train: 22m13s, total: 22m39s
0.8584 (0.02) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 32s, train: 37m37s, total: 38m9s
0.8541 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 33m48s, total: 34m24s
0.8616 (0.01) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 42s, train: 45m6s, total: 45m48s
0.8498 (-0.00) no data no data

Dataset: Private 3, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m37s, train: 2m54s, total: 4m31s
0.9177 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m13s, train: 4m19s, total: 6m32s
0.8436 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 29s, train: 1m14s, total: 1m42s
0.6173 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 46s, train: 2m33s, total: 3m18s
0.6214 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 2m9s, total: 2m46s
0.8683 (0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 29s, train: 1m39s, total: 2m8s
0.8642 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 30s, train: 1m59s, total: 2m29s
0.8477 (0.00) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 42s, train: 2m21s, total: 3m3s
0.8601 (0.00) no data no data

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2021

Hey @koernerfelicia! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot" (NLU)
# - "Hermit" (NLU)
# - "Private 1" (NLU)
# - "Private 2" (NLU)
# - "Private 3" (NLU)
# - "Sara" (NLU, Core)
# - "financial-demo" (NLU, Core)
# - "helpdesk-assistant" (NLU, Core)
# - "insurance-demo" (NLU, Core)
# - "retail-demo" (NLU, Core)

##########
## Available NLU configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + BERT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

##########
## Available Core configurations
##########
# - "Rules"
# - "Rules + AugMemo"
# - "Rules + AugMemo + TED"
# - "Rules + Memo"
# - "Rules + Memo + TED"
# - "Rules + TED"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "BERT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "BERT + DIET(seq) + ResponseSelector(t2t)"]
#
## Example: Define a branch name to check-out for a dataset repository. Default branch is 'main'
## dataset_branch: "test-branch"
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##
## Shortcuts:
## You can use the "all" shortcut to include all available configurations or datasets.
## You can use the "all-nlu" shortcut to include all available NLU configurations or datasets.
## You can use the "all-core" shortcut to include all available core configurations or datasets.

include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2021

/modeltest

include:
 - dataset: ["all"]
   config: ["all"]

@github-actions
Copy link
Contributor

github-actions bot commented Aug 6, 2021

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 7, 2021

Commit: 7ff8874, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 2m2s, train: 9m59s, total: 12m0s
0.7942 (0.00) 0.7529 (0.00) 0.5316 (-0.01)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m51s, train: 12m11s, total: 15m2s
0.8000 (0.00) 0.7766 (0.00) 0.5298 (-0.01)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 2m18s, train: 17m28s, total: 19m46s
0.7981 (0.02) 0.7529 (0.00) 0.5762 (0.04)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m57s, train: 15m22s, total: 18m18s
0.7806 (0.00) 0.7935 (0.00) 0.5430 (0.00)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 31s, train: 8m54s, total: 9m25s
0.7398 (-0.00) 0.7529 (0.00) 0.5099 (0.02)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m14s, train: 9m46s, total: 11m0s
0.7398 (0.00) 0.6819 (0.00) 0.5430 (0.03)

Dataset: Hermit, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 4m55s, train: 41m7s, total: 46m1s
0.8987 (0.00) 0.7504 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 3m52s, train: 56m47s, total: 1h0m39s
0.8690 (0.00) 0.7504 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 55s, train: 43m49s, total: 44m44s
0.8374 (0.01) 0.7504 (0.00) no data

Dataset: Private 1, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 3m31s, train: 11m4s, total: 14m34s
0.9096 (0.00) 0.9612 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m29s, train: 11m11s, total: 14m39s
0.9137 (0.00) 0.9709 (0.00) no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 27s, train: 5m52s, total: 6m19s
0.8420 (0.00) 0.9574 (0.00) no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m21s, train: 13m35s, total: 14m56s
0.8555 (0.00) 0.9402 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 21s, train: 7m35s, total: 7m56s
0.9012 (0.00) 0.9612 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 49s, train: 8m40s, total: 9m28s
0.8992 (-0.00) 0.9671 (-0.00) no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 29s, train: 10m20s, total: 10m49s
0.8940 (0.00) 0.9574 (0.00) no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m26s, train: 13m57s, total: 15m22s
0.9002 (0.00) 0.9672 (-0.00) no data

Dataset: Private 2, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 4m57s, train: 1h3m15s, total: 1h8m11s
0.8745 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 3m21s, train: 49m5s, total: 52m26s
0.8830 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 27s, train: 23m6s, total: 23m32s
0.7253 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 38s, train: 42m46s, total: 43m23s
0.7811 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 35s, train: 28m32s, total: 29m6s
0.8509 (0.01) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 34s, train: 37m16s, total: 37m50s
0.8541 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 33s, train: 33m5s, total: 33m38s
0.8616 (0.01) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 48s, train: 50m42s, total: 51m29s
0.8498 (-0.00) no data no data

Dataset: Private 3, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m33s, train: 2m51s, total: 4m24s
0.9177 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m34s, train: 2m46s, total: 4m19s
0.8436 (0.00) no data no data
Spacy + DIET(bow) + ResponseSelector(bow)
test: 30s, train: 1m10s, total: 1m40s
0.6173 (0.00) no data no data
Spacy + DIET(seq) + ResponseSelector(t2t)
test: 40s, train: 2m12s, total: 2m51s
0.6214 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 2m12s, total: 2m48s
0.8683 (0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 30s, train: 1m35s, total: 2m5s
0.8642 (0.00) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 34s, train: 2m10s, total: 2m43s
0.8477 (0.00) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 33s, train: 1m48s, total: 2m21s
0.8601 (0.00) no data no data

Dataset: Sara, Dataset repository branch: main, commit: 624f54ebc82536b144d8eebf40c27369c93fa99d

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 5m55s, train: 14m21s, total: 20m15s
0.7126 (-0.00) 0.7895 (0.00) 0.7975 (0.01)
BERT + DIET(seq) + ResponseSelector(t2t)
test: 9m11s, train: 24m15s, total: 33m26s
0.7087 (0.00) 0.7903 (0.00) 0.7736 (-0.00)
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 5m47s, train: 26m22s, total: 32m9s
0.6923 (0.00) 0.7895 (0.00) 0.8031 (0.03)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 7m8s, train: 21m35s, total: 28m43s
0.6962 (-0.00) 0.7849 (-0.00) 0.7953 (-0.00)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 1m27s, train: 24m20s, total: 25m47s
0.6577 (-0.01) 0.7895 (0.00) 0.7758 (-0.01)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m38s, train: 15m6s, total: 16m43s
0.6731 (-0.00) 0.7778 (0.01) 0.7876 (-0.01)
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.1266 (0.00) 0.0000 (0.00) 2m37s 1m10s
Rules + AugMemo 0.9229 (0.00) 0.6644 (0.00) 2m38s 2m52s
Rules + AugMemo + TED 0.9733 (0.00) 0.7603 (0.01) 9m49s 4m46s
Rules + Memo 0.3860 (0.00) 0.1438 (0.00) 2m40s 1m18s
Rules + Memo + TED 0.9570 (-0.00) 0.6575 (-0.00) 6m59s 1m53s
Rules + TED 0.9570 (0.00) 0.6575 (-0.00) 7m5s 1m47s

Dataset: financial-demo, Dataset repository branch: main (external repository), commit: bc73f669152147ae7f880ecba172c6e44f59e2c5
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 1m0s, train: 2m36s, total: 3m35s
1.0000 (0.00) 0.8333 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m19s, train: 3m31s, total: 4m49s
1.0000 (0.00) 0.8333 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 49s, train: 2m28s, total: 3m17s
1.0000 (0.00) 0.8333 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m34s, train: 3m38s, total: 5m12s
1.0000 (0.00) 0.8800 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 17s, train: 1m4s, total: 1m21s
0.9643 (0.00) 0.8333 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 35s, train: 1m27s, total: 2m1s
0.9643 (0.00) 0.8800 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.7218 (0.00) 0.5417 (0.00) 15s 11s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 15s 11s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 1m19s 33s
Rules + Memo 0.9807 (0.00) 0.9167 (0.00) 13s 10s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 1m20s 30s
Rules + TED 0.9144 (0.00) 0.7083 (0.00) 1m7s 27s

Dataset: helpdesk-assistant, Dataset repository branch: main (external repository), commit: 8e74cea8f931705239613a989282bba8bc6af5c7
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 50s, train: 1m58s, total: 2m48s
1.0000 (0.00) no data no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m29s, train: 3m36s, total: 5m4s
1.0000 (0.00) no data no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 52s, train: 2m18s, total: 3m10s
1.0000 (0.00) no data no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m24s, train: 2m38s, total: 4m2s
1.0000 (0.00) no data no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 17s, train: 1m2s, total: 1m18s
1.0000 (0.00) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 33s, train: 1m11s, total: 1m43s
1.0000 (0.00) no data no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.5714 (0.00) 0.2500 (0.00) 6s 7s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 7s 7s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 57s 25s
Rules + Memo 0.9796 (0.00) 0.9167 (0.00) 8s 10s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 50s 23s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 51s 22s

Dataset: insurance-demo, Dataset repository branch: main (external repository), commit: 03ec1d9b9f2a41a877381ad24bfd91a3994692b7
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 58s, train: 2m35s, total: 3m33s
1.0000 (0.00) 1.0000 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m12s, train: 1m57s, total: 3m9s
1.0000 (0.00) 0.0000 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 54s, train: 2m17s, total: 3m12s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m30s, train: 2m22s, total: 3m51s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 13s, train: 34s, total: 47s
1.0000 (0.00) 1.0000 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 41s, train: 1m9s, total: 1m49s
1.0000 (0.00) 1.0000 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.5909 (0.00) 0.0000 (0.00) 11s 14s
Rules + AugMemo 1.0000 (0.00) 1.0000 (0.00) 9s 9s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 1m28s 24s
Rules + Memo 0.7600 (0.00) 0.5000 (0.00) 9s 8s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 1m54s 29s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 1m27s 23s

Dataset: retail-demo, Dataset repository branch: main (external repository), commit: 4c6a477775dbd538782278ca4c2fb651a63d3c29
Configuration repository branch: main

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
BERT + DIET(bow) + ResponseSelector(bow)
test: 50s, train: 1m37s, total: 2m27s
0.8387 (0.00) 0.2857 (0.00) no data
BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m34s, train: 2m36s, total: 4m9s
0.8750 (0.00) 0.2857 (0.00) no data
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 50s, train: 1m48s, total: 2m37s
0.9375 (0.00) 0.2857 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m27s, train: 2m19s, total: 3m45s
0.8125 (0.00) 0.2857 (0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 18s, train: 48s, total: 1m6s
1.0000 (0.00) 0.2857 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 42s, train: 1m5s, total: 1m47s
0.9375 (0.00) 0.2857 (0.00) no data
Dialog Policy Configuration Action Level Micro Avg. F1 Conversation Level Accuracy Run Time Train Run Time Test
Rules 0.9531 (0.00) 0.7778 (0.00) 7s 11s
Rules + AugMemo 0.9692 (0.00) 0.8889 (0.00) 6s 10s
Rules + AugMemo + TED 1.0000 (0.00) 1.0000 (0.00) 46s 27s
Rules + Memo 0.9692 (0.00) 0.8889 (0.00) 6s 10s
Rules + Memo + TED 1.0000 (0.00) 1.0000 (0.00) 48s 30s
Rules + TED 1.0000 (0.00) 1.0000 (0.00) 46s 28s

@stale
Copy link

stale bot commented Apr 16, 2022

This PR has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@m-vdb m-vdb closed this Jun 2, 2022
@m-vdb m-vdb deleted the tf-2.5 branch June 2, 2022 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants