Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental training regression tests #7544

Closed
wants to merge 76 commits into from

Conversation

dakshvar22
Copy link
Contributor

Proposed changes:

  • ...

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

dakshvar22 and others added 17 commits December 11, 2020 10:24
…7504)

* Use fingerprinting for finetuning and add more tests

* Use all training labels for fingerprinting

* rename to action_names
* doc strings and changes needed to cvf

* added tests, small refactoring in cvf

* refactor regex featurizers and fix tests

* added tests for regex featurizer, comments and doc strings

* rename 'finetune_mode' parameter inside load

* address review comments, make ML components inside NLU loadable in finetune mode.

* try resetting default additional slots in cvf to 0, see if results go back to normal

* revert default in regex also, to see if model regression tests pass

* rectify how regex featurizer is loaded

* revert back defaults for additional vocab params in cvf and regex

* add default minimum for cvf as well

* Load core model in fine-tuning mode

* Core finetune loading test

* Test and PR comments

* Fallback to default epochs

* Test policy and ensemble fine-tuning exception cases

* Remove epoch_override from Policy.load

* Apply suggestions from code review

Co-authored-by: Tobias Wochinger <[email protected]>

* review comments and add tests for loaded diet and rs

* fix regex tests

* use kwargs

* fix

* fix train tests

* More test fixes

* Apply suggestions from code review

Co-authored-by: Daksh Varshneya <[email protected]>

* remove unneeded sklearn epochs

* Apply suggestions from code review

Co-authored-by: Tobias Wochinger <[email protected]>

* PR comments for warning strings

* Add typing

* add back invalid model tests

* handle empty sections in config

* review comments

* make core models finetunable

* add tests finetuning core policies

* add print for loaded model

* add vocabulary stats logging for cvf

* code quality

* review comments

* reduce number of finetuning epochs in tests

* Use fingerprinting for finetuning and add more tests

* review comments

* review comments

* fix tests

* Use all training labels for fingerprinting

* rename to action_names

Co-authored-by: Joseph Juzl <[email protected]>
Co-authored-by: Tobias Wochinger <[email protected]>
* Add migration guide for policies

* spelling fix

* changelog
@github-actions
Copy link
Contributor

Hey @dakshvar22! 👋 To run model regression tests, comment with the /modeltest command and a configuration.

Tips 💡: The model regression test will be run on push events. You can re-run the tests by re-add status:model-regression-tests label or use a Re-run jobs button in Github Actions workflow.

Tips 💡: Every time when you want to change a configuration you should edit the comment with the previous configuration.

You can copy this in your comment and customize:

/modeltest

```yml
##########
## Available datasets
##########
# - "Carbon Bot"
# - "Hermit"
# - "Private 1"
# - "Private 2"
# - "Private 3"
# - "Sara"

##########
## Available configurations
##########
# - "BERT + DIET(bow) + ResponseSelector(bow)"
# - "BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Spacy + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + BERT + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + DIET(seq) + ResponseSelector(t2t)"
# - "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)"
# - "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"

## Example configuration
#################### syntax #################
## include:
##   - dataset: ["<dataset_name>"]
##     config: ["<configuration_name>"]
#
## Example:
## include:
##  - dataset: ["Carbon Bot"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Shortcut:
## You can use the "all" shortcut to include all available configurations or datasets
#
## Example: Use the "Sparse + EmbeddingIntent + ResponseSelector(bow)" configuration
## for all available datasets
## include:
##  - dataset: ["all"]
##    config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]
#
## Example: Use all available configurations for the "Carbon Bot" and "Sara" datasets
## and for the "Hermit" dataset use the "Sparse + DIET + ResponseSelector(T2T)" and
## "BERT + DIET + ResponseSelector(T2T)" configurations:
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]
##  - dataset: ["Hermit"]
##    config: ["Sparse + DIET(seq) + ResponseSelector(t2t)", "BERT + DIET(seq) + ResponseSelector(t2t)"]
#
## Example: Define a branch name to check-out for a dataset repository. Default branch is 'master'
## dataset_branch: "test-branch"
## include:
##  - dataset: ["Carbon Bot", "Sara"]
##    config: ["all"]


include:
 - dataset: ["Carbon Bot"]
   config: ["Sparse + DIET(bow) + ResponseSelector(bow)"]

```

@github-actions
Copy link
Contributor

/modeltest

include:
 - dataset: ["all"]
   config: ["Sparse + BERT + DIET(bow) + ResponseSelector(bow)", "Sparse + BERT + DIET(seq) + ResponseSelector(t2t)", "Sparse + DIET(bow) + ResponseSelector(bow)", "Sparse + DIET(seq) + ResponseSelector(t2t)", "Sparse + Spacy + DIET(bow) + ResponseSelector(bow)", "Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)"]

@github-actions github-actions bot deleted a comment from dakshvar22 Dec 14, 2020
@github-actions
Copy link
Contributor

The model regression tests have started. It might take a while, please be patient.
As soon as results are ready you'll see a new comment with the results.

Used configuration can be found in the comment.

@github-actions
Copy link
Contributor

Commit: 212eff2, The full report is available as an artifact.

Dataset: Carbon Bot, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 1m28s, train: 4m10s, total: 5m37s
0.8097 (0.02) 0.7529 (0.00) 0.5581 (no data)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 1m36s, train: 4m53s, total: 6m28s
0.8039 (0.01) 0.7925 (-0.00) 0.5533 (0.04)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 27s, train: 2m56s, total: 3m23s
0.7359 (0.02) 0.7529 (0.00) 0.5232 (no data)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 40s, train: 4m6s, total: 4m45s
0.7437 (0.01) 0.7079 (0.01) 0.5099 (0.01)

Dataset: Hermit, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 3m18s, train: 20m58s, total: 24m15s
0.8690 (0.00) 0.7504 (0.00) no data
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m42s, train: 13m26s, total: 16m7s
0.8643 (0.00) 0.7919 (-0.00) no data
Sparse + DIET(bow) + ResponseSelector(bow)
test: 57s, train: 20m24s, total: 21m21s
0.8318 (-0.01) 0.7504 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 1m9s, train: 12m35s, total: 13m44s
0.8346 (-0.00) 0.7503 (-0.01) no data

Dataset: Private 1, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + DIET(bow) + ResponseSelector(bow)
test: 20s, train: 3m52s, total: 4m11s
0.9033 (-0.00) 0.9612 (0.00) no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 37s, train: 3m22s, total: 3m58s
0.8992 (-0.01) 0.9745 (-0.00) no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m21s, train: 5m11s, total: 6m32s
0.8929 (0.00) 0.9574 (0.00) no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m29s, train: 4m31s, total: 5m59s
0.9064 (0.00) 0.9698 (-0.00) no data

Dataset: Private 2, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + DIET(bow) + ResponseSelector(bow)
test: 28s, train: 4m1s, total: 4m28s
0.8519 (0.01) no data no data
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 35s, train: 5m15s, total: 5m49s
0.8552 (0.01) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m18s, train: 5m3s, total: 6m21s
0.8412 (-0.02) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m22s, train: 7m12s, total: 8m33s
0.8594 (0.00) no data no data

Dataset: Private 3, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 28s, train: 45s, total: 1m12s
0.8189 (-0.01) no data no data
Sparse + Spacy + DIET(bow) + ResponseSelector(bow)
test: 1m16s, train: 1m55s, total: 3m10s
0.8107 (-0.02) no data no data
Sparse + Spacy + DIET(seq) + ResponseSelector(t2t)
test: 1m16s, train: 1m32s, total: 2m48s
0.8642 (0.01) no data no data

Dataset: Sara, Dataset repository branch: master

Configuration Intent Classification Micro F1 Entity Recognition Micro F1 Response Selection Micro F1
Sparse + BERT + DIET(bow) + ResponseSelector(bow)
test: 2m22s, train: 8m5s, total: 10m27s
0.8697 (-0.00) 0.8683 (0.00) 0.8957 (0.01)
Sparse + BERT + DIET(seq) + ResponseSelector(t2t)
test: 2m28s, train: 4m48s, total: 7m16s
0.8756 (0.00) 0.8944 (-0.00) 0.9000 (0.01)
Sparse + DIET(bow) + ResponseSelector(bow)
test: 37s, train: 5m50s, total: 6m27s
0.8374 (0.01) 0.8683 (0.00) 0.8630 (0.01)
Sparse + DIET(seq) + ResponseSelector(t2t)
test: 52s, train: 4m5s, total: 4m57s
0.8433 (-0.01) 0.8523 (0.02) 0.8761 (-0.00)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants