Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in backpropagation for federated_align #2

Open
ameliajimenez opened this issue Mar 29, 2021 · 5 comments
Open

Error in backpropagation for federated_align #2

ameliajimenez opened this issue Mar 29, 2021 · 5 comments

Comments

@ameliajimenez
Copy link

ameliajimenez commented Mar 29, 2021

Hi,

Thank you so much for sharing the code of this work!

I've encountered a problem when running the file "federated_align". I think the problem is related to the backpropagation with retain_graph=True of the adversarial loss in lines 312-316.

Traceback (most recent call last): in
lossG.backward(retain_graph=True)
File "/home/amelia/anaconda3/envs/py36pytorch1/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/amelia/anaconda3/envs/py36pytorch1/lib/python3.6/site-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 1]], which is output 0 of TBackward, is at version 2; expected version 1 instead.

Any ideas about why this is happening and how could I fix it? Thanks again!

@xiaoeyuztj
Copy link

I met the same question today.
There has two ways to solve the problem:

  1. remove the bn in the MLP model.
  2. change the pytorch version to 1.1.0

@zyz0000
Copy link

zyz0000 commented Mar 12, 2023

I met the same question today. There has two ways to solve the problem:

  1. remove the bn in the MLP model.
  2. change the pytorch version to 1.1.0

Hello! Have you successfully run the code? I notice that in Line 39-48 of file federated_MoE.py, h5 files such as ./idx/NYU_sub_overlap.h5, ./idx/NYU_sub.h5, etc. , may be missing, and I cannot get how to generate them in the code. Could you please provide these files? Thank you so much and hope to your reply!

@yueluoshenheng
Copy link

I met the same question today. There has two ways to solve the problem:

  1. remove the bn in the MLP model.
  2. change the pytorch version to 1.1.0

Hello! Have you successfully run the code? I notice that in Line 39-48 of file federated_MoE.py, h5 files such as ./idx/NYU_sub_overlap.h5, ./idx/NYU_sub.h5, etc. , may be missing, and I cannot get how to generate them in the code. Could you please provide these files? Thank you so much and hope to your reply!

I don't have those files here either. May I ask if you have it?Thank you so much and hope to your reply!

@ameliajimenez
Copy link
Author

I created a modified working version in this repository: ameliajimenez/curriculum-federated-learning, hope it's useful for you! :)

@zyz0000
Copy link

zyz0000 commented Apr 28, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants