Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug when training #2

Open
yqf2000119 opened this issue May 25, 2024 · 2 comments
Open

bug when training #2

yqf2000119 opened this issue May 25, 2024 · 2 comments

Comments

@yqf2000119
Copy link

Hi! I meet some bugs when i start the training, the whole message is:
[WARNING]your gpu arch (8, 9) isn't compiled in prebuilt, may cause invalid device function. available: {(6, 1), (7, 0), (8, 0), (8, 6), (6, 0), (7, 5), (5, 2)}

Error executing job with overrides: ['+experiment=urban']
Traceback (most recent call last):
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
results = self._run_stage()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
return self._run_train()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1353, in _run_train
self.fit_loop.run()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
batch_output = self.batch_loop.run(batch, batch_idx)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
outputs = self.optimizer_loop.run(split_batch, optimizers, batch_idx)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
self.advance(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
result = self._run_optimization(
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
self._optimizer_step(optimizer, opt_idx, batch_idx, closure)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
self.trainer._call_lightning_module_hook(
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1595, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1646, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
step_output = self._strategy.optimizer_step(self._optimizer, self._optimizer_idx, closure, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
return self.precision_plugin.optimizer_step(model, optimizer, opt_idx, closure, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
return optimizer.step(closure=closure, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/optim/adamw.py", line 100, in step
loss = closure()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
closure_result = closure()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in call
self._result = self.closure(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 134, in closure
step_output = self._step_fn()
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 427, in _training_step
training_step_output = self.trainer._call_strategy_hook("training_step", *step_kwargs.values())
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
output = fn(*args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 333, in training_step
return self.model.training_step(*args, **kwargs)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 371, in training_step
return self.do_step(batch, batch_idx, 'train')
File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 153, in do_step
out = self.forward(batch, tag, batch_size=batch_size, batch_idx=batch_idx)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/base.py", line 199, in forward
self.compute_reconstruction_loss(tag, batch, batch_size, out, protos, proto_slab, None, batch_idx=batch_idx)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/model/ours.py", line 132, in compute_reconstruction_loss
out["l_XP"] = compute_l_XP(out["kappa_presoftmax"], out["choice_L"], cham_x, x_lengths_LK, self.hparams.S, self.hparams.K)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern "C" global
void fused_sub_exp(float* tv_, float* tv__, float* aten_exp) {
{
float v = _ldg(tv + (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 6ll + 448ll * (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 2304ll)) + 7ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 36ll) % 64ll));
float v_1 = ldg(tv + ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 6ll) % 6ll + 448ll * (((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 2304ll)) + 7ll * ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 36ll) % 64ll));
aten_exp[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = expf(v - v_1);
}
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/glose/LearnableEarthParser-main/main.py", line 42, in main
getattr(trainer, cfg.mode)(model, datamodule=datamodule)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 737, in _call_and_handle_interrupt
self._call_callback_hooks("on_exception", exception)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1636, in _call_callback_hooks
fn(self, self.lightning_module, *args, **kwargs)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 730, in on_exception
self.do_out_html(trainer, pl_module, "on_exception", "on_exception")
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 717, in do_out_html
html += "\n" + self.get_title(trainer, pl_module) + self.get_body(trainer, pl_module, title) + "\n"
File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 696, in get_body
body = "


" + self.add_text("h2", title) + self.get_metrics(trainer, pl_module) + self.get_inferences(trainer, pl_module)
File "/home/glose/mambaforge/envs/learnableearthparser/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/glose/LearnableEarthParser-main/learnableearthparser/callbacks/outhtml.py", line 300, in get_inferences
assert out_recs.shape[0] == color.shape[0]
AssertionErro.
my GPU is GTX-4060ti with 16GB VRAM,My system is ubuntun 22.04 . how can i solve this problem?

@romainloiseau
Copy link
Owner

I never had this error before .. Can you ensure that CUDA and cuDNN are correctly installed and configured on your system? CUDA 11.8 or newer should be appropriate.

@yqf2000119
Copy link
Author

yqf2000119 commented May 28, 2024

I never had this error before .. Can you ensure that CUDA and cuDNN are correctly installed and configured on your system? CUDA 11.8 or newer should be appropriate.

HI,thanks to your reply. I am sure I had installed the CUDA and cuDNN and it's ok with other projects, that makes me more confused.... and my CUDA is 11.8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants