Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing duckling url shouldn't require a model retrain #3389

Closed
c12k opened this issue May 6, 2019 · 22 comments
Closed

Changing duckling url shouldn't require a model retrain #3389

c12k opened this issue May 6, 2019 · 22 comments
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@c12k
Copy link

c12k commented May 6, 2019

Rasa version:
Rasa core 0.14.0
Rasa nlu 0.14.4
Python version:
3.6.8
Operating system (windows, osx, ...):
osx
Issue:
rasa nlu model creation takes the duckling URL from the config.yml file and puts it into the metadata.json file of the trained model.
we use docker-compose for local testing and k8s for cloud test/prod.
docker and k8s use different way to network between containers; docker uses named containers eg duckling and k8s uses localhost. So we need different duckling url in local vs cloud testing.
we've separated the URL's in environment files but the Rasa training puts the URL into the metadata.json file of the model. This means that the model has to be retrained between local (docker-compose) and cloud (k8s-docker) testing. It makes more sense to have the URL outside of the model in a config file that can be controlled with environment and build processes so that the trained model can be copied rather than retrained (for no reason other than URL change due to environment).
eg.
for docker-compose "url": "http://duckling:8000",
for k8s "url": "http://localhost:8000",

Content of configuration file (config.yml):

for docker-compose:

pipeline:
# other stuff
  - name: ner_duckling_http
    url: http://duckling:8000

for cloud k8s:
pipeline:
# other stuff
  - name: ner_duckling_http
    url: http://localhost:8000

Content of domain file (domain.yml) (if used & relevant):

not relevant
@akelad
Copy link
Contributor

akelad commented May 8, 2019

Thanks for raising this issue, @MetcalfeTom will get back to you about it soon.

@erohmensing
Copy link
Contributor

Hey @cmcc13, does this help?

In addition to setting the default ``url`` of your duckling server in the
configuration, you can also change the url of your duckling server (without
needing to re-train your model) by setting the ``RASA_DUCKLING_HTTP_URL``
environment variable.

See relevant issue here

@akelad akelad added the status:more-details-needed Waiting for the user to provide more details / stacktraces / answer a question label May 10, 2019
@c12k
Copy link
Author

c12k commented May 12, 2019

This might be a work around. But I think the URL should not be put into the model in the first place. The URL should be read from a YML file (config, .env or endpoints). We'll give the environment variable a go. Thanks.

@no-response no-response bot removed the status:more-details-needed Waiting for the user to provide more details / stacktraces / answer a question label May 12, 2019
@stale
Copy link

stale bot commented Aug 11, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the status:stale label Aug 11, 2019
@stale
Copy link

stale bot commented Aug 18, 2019

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

@stale stale bot closed this as completed Aug 18, 2019
@sankaran45
Copy link

I found the same problem - if i change duckling http url in the cnfig file it requires a complete retrain. Please consider fixing this as its very un-intuitive - spent a lot of time trying to figure out why the url change is not getting picked up before stumbling on this.

@erohmensing erohmensing changed the title nlu duckling url in model metadata.json creates problem with docker vs k8s Changing duckling url shouldn't require a model retrain Nov 14, 2019
@erohmensing erohmensing added type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR and removed status:stale labels Nov 14, 2019
@erohmensing
Copy link
Contributor

I agree. Not sure how best to handle this, either the URL should be part of the endpoints.yml (but would still need to be able to define in config for NLU only models? 🤔 ) or its value shouldn't influence the fingerprinting.

@wochinge wochinge reopened this Nov 18, 2019
@wochinge wochinge added the area:rasa-oss 🎡 Anything related to the open source Rasa framework label Nov 18, 2019
@erohmensing
Copy link
Contributor

@wochinge in progress but no assignee?

@wochinge
Copy link
Contributor

Thanks, fixed it :-)

@wochinge wochinge added the resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) label Jan 24, 2020
@akelad
Copy link
Contributor

akelad commented Jan 27, 2020

@wochinge why are we not fixing that?

@wochinge
Copy link
Contributor

Because

  • it's messy in the code
  • it's only a tiny tiny advancement if we don't retrain in case the duckling url is changed (how often are you changing your duckling url?)

So basically the relation between benefit and effort is very bad.

@c12k
Copy link
Author

c12k commented Jan 27, 2020

We have to change the duckling url regularly because dev and prod environments are different. So frequency of needing to change this is daily.
for docker-compose "url": "http://duckling:8000",
for k8s "url": "http://localhost:8000"

@Yoomtah
Copy link

Yoomtah commented Jan 28, 2020

Actually @cmcc13 the prod url is now "duckling.default.svc.cluster.local.:8000" and that could change later if we do more advanced GKE service stuff. But in dev we just want to docker-compose up and let docker sort out all the networking. So @wochinge it's a bit of a headache for our team to not have this all in a config file.

@wochinge
Copy link
Contributor

@Yoomtah

As far as I understand it, you have two setups, correct? One docker-compose and one K8s? And are they completely separate or are you sharing the trained models between them? Because if you are not sharing the models between these two deployments, then you have to retrain either way.

@Yoomtah
Copy link

Yoomtah commented Jan 29, 2020

For each chatbot that we have (I believe its 5), we have two model files: model-dev and model-prod. These models are identical except that they were trained with different duckling URLs. Depending on our environment we then build a Rasa docker container with one of these files.

The duckling URL is the only thing necessitating two training runs and managing two model files for each bot. We have a different action URL for dev and prod as well but this is easily changed in the endpoints.yml file.

@wochinge
Copy link
Contributor

wochinge commented Jan 30, 2020

Ah, I think I'm getting it now.

  1. You train a model in the dev environment and decide it's worth to promote it to the prod environment
  2. You can't promote it to the prod environment, because the duckling url is diferent and then you have to retrain it, right?

Would an easy workaround to add an alias for duckling to your hosts file? (https://en.wikipedia.org/wiki/Hosts_(file))

@nmelche
Copy link

nmelche commented Nov 14, 2020

This problem still exists and is very uninituitv. Every endpoint can be configured in the endpoints.yml except the duckling part. Makes the automate deployment e.g. via helm very messy.

@s-montes-majorel
Copy link

@wochinge, is there any chance this will be changed at some point? In order to avoid messing up our /etc/hosts file, we ended up budgeting for a separate, global Duckling server, used both for dev and prod. We are not completely satisfied but it was the simplest way to avoid multiple traininings of the same model.

@wochinge
Copy link
Contributor

@s-montes-majorel This is currently not super high on our priority list 😬 How about using environment variables for this and set the env variable depending on the context?

@s-montes-majorel
Copy link

s-montes-majorel commented Jul 27, 2021

We use an env variable for the Duckling URL. However, changing the value while leaving the config.yml unchanged still is detected as a change that requires retraining. Is the Duckling component used during training? If it is not the case, maybe it would make sense to only replace the env variable at inference time.

I do understand that this may be low in the priority list, though :) For us, it meant budgeting the Duckling component in a different way. It could also be explained in the documentation for people deploying Rasa using separate microservices instead of one big Kubernetes.

@wochinge
Copy link
Contributor

wochinge commented Aug 9, 2021

Thanks for the explanation!

Is the Duckling component used during training?

It actually isn't 👍🏻 We are currently working on some changes for 3.0 where we could consider this 🤔

It could also be explained in the documentation for people deploying Rasa using separate microservices instead of one big Kubernetes.

What do you mean by "one big Kubernetes"?

@s-montes-majorel
Copy link

Thanks for the reply!

What do you mean by "one big Kubernetes"?

I meant that the documentation (for Rasa X, in particular) assumes one big deploy (using Helm, Docker Compose, etc) that takes care of all the microservices. Our current implementation is more modular. Most of the components (the tracker DB, the Duckling API, a custom event broker using Pub/Sub) are handled in different machines. We also have the different environments (dev, pre, pro). We managed to make everything work using environment variables, except for the Duckling URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework resolution:wontfix issue is acknowledged but we will not work on this (nor will we accept contributions) type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

8 participants