-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to TF 2.6 #7619
Comments
Rasa does not work on the new M1 chip before upgrading this dependency. Just thought I would leave that as a note here. |
|
pushed the necessary code updates to |
blocked by: tensorflow/tensorflow#46511 |
Once this is done should check this: #7762 |
most probably we have to wait till 2.5 |
Related: #7793 |
same issue, wait for tf above 2.4 version, thanks |
Hi guys, same issue, we will need tf 2.4 version, ty for all. |
Once this is done we should revisit this and see if it has been fixed: |
we cannot upgrade to |
Yeah, sorry I was being unclear, at this point when I say "done" I mean whatever tensorflow update we eventually manage |
2.5 is now out! https://developer.apple.com/metal/tensorflow-plugin/ https://github.com/tensorflow/tensorflow/releases/tag/v2.5.0 Seems like this can move forward? |
@TyDunn is this one still blocked? given that we'll need to buy m1s as well, would be great to sort this out 😅 |
@tmbo I'm trying to verify this atm, will update when I know more! |
I am available to help migrate this over and test. Let me know if there is any ongoing work and how I can help speed it up. I have a M1 on hand and the desire to get rasa working asap on it :) |
@mpomarole thank you for the kind offer! We'll let you know if we need any support :) |
@samsucik Do we have a small tensorflow code snippet with a very basic model and dummy data that could demonstrate this increase in training time? I think it would be worthwile to ask the tensorflow folks if we are doing something wrong or the increase is indeed expected. Also, is the increase in time observable on both CPU and GPU? |
@dakshvar22 I'm coming back to this after almost a week and my previous comments basically summarise where I had left off. What you're describing is the state I'd like to reach asap 😉
Yes. I saw the regression tests taking |
Runtimes & failing model regression tests To complete my previous update on model regression tests, I've re-run the tests on the Hermit dataset and tests involving the transformer version of DIET (i.e. All in all, with TF 2.5, our regression test runs take ~5x longer and sometimes even get killed in the process (TF 2.5 run vs scheduled run). |
Overall status update & handover notes My work is in the Having observed the regression tests taking so long, I profiled the training of our small All of my code used for comparing TF 2.3 vs 2.5 runs of various Python examples is in this project dir. The README should help you with everything you need. |
Btw as for the "investigate failed memory leak tests.. these are flakey" task in the issue description, I didn't get to look into it. I did not do memory profiling, only runtime profiling. Regarding the failing windows tests (e.g. here), in some cases these are seemingly due to memory errors, but sometimes all that we observe is that a worker crashes, so it's not all so clear. Additionally, there are some Ubuntu tests failing too (though I think that's a bit flakey and may not be related to TF 2.5). In any case, I think that memory profiling would help a lot here. |
Related issue: #9129. This upgrade blocks our ability to fix since |
Dug a little deeper into the model regression test outputs. Looks like this also affects |
I'm not sure what we can generalise about the effect on CPU performance. After yesterday's results re: |
@samsucik Sentry has flagged this error recently caused by this decorator - digging a bit through the forum, it seems to affect users who are upgrading to TF versions higher than the one currently specified in rasa |
@ancalita (Sam is currently out and I'm working on the issue with Daksh). We'll remove it as part of the upgrade, though this might be a while (about to write an update as my next comment). I think we can restrict users to 2.3.3 and wait until the upgrade, unless this error is urgent, in which case I think we can remove it, and possibly create an issue for someone to look into how to enable distributed training. Latter is out of scope for this project, and may be better handled by Engine. |
Issue affecting our custom CRF layer has been communicated to tensorflow here. As far as we know there is no workaround. Waiting to hear otherwise or progress on the issue. |
@koernerfelicia the error is not urgent, it can wait until the upgrade. |
Tests for which we had to increase timeouts or memory threshold. Once TF addresses the issue above, we should aim to bring these timeouts back down. test_train_persist_load_with_different_settings_non_windows |
See here for more workarounds: https://www.notion.so/rasa/tf-2-6-workarounds-timeouts-memory-f3706b6322214574b0e97b5689903524 |
Rasa X 0.42.4 solves CVE-2021-42556 Bumping Rasa OSS to 2.8.12 solves the solves issues in TensorFlow 2.3 RasaHQ/rasa#7619 This breaks backward compatibility of previously trained models. It is not possible to load models trained with previous versions of Rasa Open Source. Please re-train your assistant before trying to use this version.
Rasa X 0.42.4 solves CVE-2021-42556 Bumping Rasa OSS to 2.8.12 solves the solves issues in TensorFlow 2.3 RasaHQ/rasa#7619 This breaks backward compatibility of previously trained models. It is not possible to load models trained with previous versions of Rasa Open Source. Please re-train your assistant before trying to use this version.
Rasa X 0.42.4 solves CVE-2021-42556 Bumping Rasa OSS to 2.8.12 solves the issues in TensorFlow 2.3 RasaHQ/rasa#7619 This breaks backward compatibility of previously trained models. It is not possible to load models trained with previous versions of Rasa Open Source. Please re-train your assistant before trying to use this version.
* other: Bump Rasa X version to 0.42.4 * Relese minor version addressing security patches Rasa X 0.42.4 solves CVE-2021-42556 Bumping Rasa OSS to 2.8.12 solves the issues in TensorFlow 2.3 RasaHQ/rasa#7619 This breaks backward compatibility of previously trained models. It is not possible to load models trained with previous versions of Rasa Open Source. Please re-train your assistant before trying to use this version. * Add links to release notes * Reduce the amount of links to OSS releases notes Co-authored-by: github-actions <[email protected]> Co-authored-by: Alejandro Lazaro <[email protected]> Co-authored-by: Alejandro Lazaro <[email protected]>
Closing as this has been done as part of #9649 |
TF
2.4
is out and hence we should update the dependency inside Rasa OS to use it.Introduces some breaking changes as mentioned in the changelog
Update: we are now targeting 2.6 see here
Note that there's a draft PR here that already contains some of the necessary changes. Known remaining tasks in order to actually update code for TF
2.5
are (this list contains only the known remaining tasks, there may be others hence it is not an exhaustive of all remaining tasks):Temporarily de-prioritised followups:
cvf_diet_responset2t
#9515cvf_diet_responset2t
#9516LMFeaturizer.parse
with TF 2.6 #9666test-nlu-predictors
on CI with TF 2.6 #9734Definition of done
No memory tests failingNo increased timeouts for testsModel regression tests passing with approx. same time and accuracytest-nlu-predictors
on CI with TF 2.6 #9734Investigate and address OOM due to TF 2.6 for CI model regression tests with Hermit and BERT + DIET(seq) + ResponseSelector(t2t) or Sparse + BERT + DIET(seq) + ResponseSelector(t2t) #9798
The text was updated successfully, but these errors were encountered: