Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse #8004

Closed
dhirajgite opened this issue Feb 22, 2021 · 9 comments

Comments

@dhirajgite
Copy link

dhirajgite commented Feb 22, 2021

Rasa version:
2.2.10
Rasa SDK version (if used & relevant):

Rasa X version (if used & relevant):

Python version:
3.8.5
Operating system (windows, osx, ...):
Windows-10-10.0.19041-SP0
Issue:
While training on GPU following error occurred. for CPU there is no such error.
for the GPU application with TensorFlow without using rasa GPU working without any error.
this error occurs only on rasa versions greater than 2.0
below 2.0 versions, It works fine

Error (including full traceback):

2021-02-22 18:12:10.512490: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-02-22 18:12:13.619199: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-02-22 18:12:14.035719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 13.41GiB/s
2021-02-22 18:12:14.035908: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-02-22 18:12:14.494922: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-02-22 18:12:14.801656: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-02-22 18:12:14.998529: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-02-22 18:12:15.335537: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-02-22 18:12:15.514724: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-02-22 18:12:16.037287: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-02-22 18:12:16.238357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-22 18:12:17 INFO     rasa.model  - Data (messages) for NLU model section changed.
Training NLU model...
2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 258 (8 distinct intents)

2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'help', 'deny', 'stop', 'greet', 'affirm', 'query_knowledge_base_other', 'bye', 'query_knowledge_base_self'
2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 53 (10 distinct entities)
2021-02-22 18:12:27 INFO     rasa.shared.nlu.training_data.training_data  -   Found entity types: 'asset_type', 'entity_type', 'date', 'ticket_id', 'attribute', 'asset_status', 'ticket_status', 'relation', 'sla', 'ticket_type'
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component WhitespaceTokenizer
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Finished training component.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Finished training component.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Finished training component.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2021-02-22 18:12:27 INFO     rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - 250 vocabulary slots consumed out of 1250 slots configured for text attribute.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Finished training component.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2021-02-22 18:12:27 INFO     rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - 1853 vocabulary slots consumed out of 2853 slots configured for text attribute.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Finished training component.
2021-02-22 18:12:27 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
2021-02-22 18:12:28.154212: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-22 18:12:28.163854: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1fb19755e00 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-02-22 18:12:28.164021: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-02-22 18:12:28.461270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce 940MX computeCapability: 5.0
coreClock: 1.2415GHz coreCount: 3 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 13.41GiB/s
2021-02-22 18:12:28.461483: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-02-22 18:12:28.463405: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-02-22 18:12:28.464457: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-02-22 18:12:28.468068: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-02-22 18:12:28.468893: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-02-22 18:12:28.469502: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-02-22 18:12:28.470132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-02-22 18:12:28.470888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-02-22 18:12:32.813818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-02-22 18:12:32.814000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0
2021-02-22 18:12:32.817728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N
2021-02-22 18:12:32.851106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3129 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
2021-02-22 18:12:32.949937: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1fb19755500 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-02-22 18:12:32.950195: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce 940MX, Compute Capability 5.0
Traceback (most recent call last):
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Dhiraj_Gite\anaconda3\envs\rasa2210\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\cli\train.py", line 58, in <lambda>
    train_parser.set_defaults(func=lambda args: train(args, can_exit=True))
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\cli\train.py", line 90, in train
    training_result = rasa.train(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\train.py", line 94, in train
    return rasa.utils.common.run_in_loop(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\utils\common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\train.py", line 163, in train_async
    return await _train_async_internal(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\train.py", line 342, in _train_async_internal
    await _do_training(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\train.py", line 388, in _do_training
    model_path = await _train_nlu_with_validated_data(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\train.py", line 811, in _train_nlu_with_validated_data
    await rasa.nlu.train(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\nlu\train.py", line 116, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\nlu\model.py", line 209, in train
    updates = component.train(working_data, self.config, **context)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 814, in train
    self.model = self._instantiate_model_class(model_data)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1132, in _instantiate_model_class
    return self.model_class()(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1150, in __init__
    super().__init__("DIET", config, data_signature, label_data)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\utils\tensorflow\models.py", line 700, in __init__
    super().__init__(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\rasa\utils\tensorflow\models.py", line 91, in __init__
    super().__init__(**kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\training\tracking\base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\keras\engine\training.py", line 308, in __init__
    self._init_batch_counters()
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\training\tracking\base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\keras\engine\training.py", line 317, in _init_batch_counters
    self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\variables.py", line 262, in __call__
    return cls._variable_v2_call(*args, **kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\variables.py", line 244, in _variable_v2_call
    return previous_getter(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\variables.py", line 237, in <lambda>
    previous_getter = lambda **kws: default_variable_creator_v2(None, **kws)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2633, in default_variable_creator_v2
    return resource_variable_ops.ResourceVariable(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\variables.py", line 264, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1507, in __init__
    self._init_from_args(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1661, in _init_from_args
    handle = eager_safe_variable_handle(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 242, in eager_safe_variable_handle
    return _variable_handle_from_shape_and_dtype(
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 174, in _variable_handle_from_shape_and_dtype
    gen_logging_ops._assert(  # pylint: disable=protected-access
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 49, in _assert
    _ops.raise_from_not_ok_status(e, name)
  File "c:\users\dhiraj_gite\anaconda3\envs\rasa2210\lib\site-packages\tensorflow\python\framework\ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

Command or request that led to error:
rasa train in command prompt

Content of configuration file (config.yml) (if relevant):

language: en

pipeline:
  - name: "WhitespaceTokenizer"
  - name: "RegexFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "DIETClassifier"
    epochs: 100

policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 200
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @erohmensing will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@erohmensing
Copy link
Contributor

Hi @dhirajgite, please fill out the entire issue template for a bug report. We'll need the full traceback in order to be able to look into this issue.

@erohmensing erohmensing added the status:more-details-needed Waiting for the user to provide more details / stacktraces / answer a question label Feb 22, 2021
@dhirajgite
Copy link
Author

@erohmensing updated info

@no-response no-response bot removed the status:more-details-needed Waiting for the user to provide more details / stacktraces / answer a question label Feb 22, 2021
@koernerfelicia
Copy link
Contributor

@dhirajgite can you make sure that you don't have multiple processes running using tensorflow? You can check the processes by running nvidia-smi

@dhirajgite
Copy link
Author

dhirajgite commented Feb 23, 2021

@koernerfelicia No processes running on GPU using TensorFlow

@koernerfelicia
Copy link
Contributor

@dhirajgite what tensorflow version are you using?

@dhirajgite
Copy link
Author

tensorflow version = 2.3.2

@dhirajgite
Copy link
Author

@koernerfelicia is there any update?

@koernerfelicia
Copy link
Contributor

@dhirajgite, I'm sorry, I haven't been able to replicate this. I think it may be a result of your specific combination of tensorflow version (2.3) and operating system (Windows), and not directly related to rasa.

I've found others with this problem here.
They solved this either by:

  1. killing concurrent processes (we've ruled this out for you)
  2. downgrading tensorflow to 2.2. You would need to downgrade rasa as well. Unfortunately, we cannot update rasa's dependency to tensorflow 2.4, because of a bug on tensorflow's side. They expect to fix this bug in 2.5, at which point rasa will re-evaluate updating our dependency.

Other alternatives: we do not see this problem on Linux with a GPU. If you have access to Google Cloud or similar you should be able to set up rasa with GPU support there.

Closing this issue as I have no further information, and this problem does not seem to be in our hands. But please feel free to update or ask questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants