Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

Make DDP master port configurable #158

Merged
merged 1 commit into from
Apr 14, 2021

Conversation

danpovey
Copy link
Contributor

No description provided.

Copy link
Collaborator

@pzelasko pzelasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danpovey danpovey merged commit 2420c3b into k2-fsa:master Apr 14, 2021
@pzelasko
Copy link
Collaborator

After the recent DDP changes to mmi_att_transformer_train.py, I no longer see any logs in the command line -- is that expected? I ran it like python ./mmi_att_transformer_train.py --max-duration 120 --num-epochs 6 --on-the-fly-feats true --full-libri true --world-size 2

@csukuangfj
Copy link
Collaborator

I no longer see any logs in the command line -- is that expected?

No, there should be logs printed to the console.

@YiwenShaoStephen
Copy link

I have the same issue after using DDP.
There are some discussion in PyTorch Forum (facebookresearch/hydra#1126). But I'm not very famaliar with this topic so hopefully it will give you some information.

@danpovey
Copy link
Contributor Author

danpovey commented Apr 15, 2021 via email

@pzelasko
Copy link
Collaborator

Probably it's getting divided over test-clean and test-other... We could either pre-shuffle the validation cut sets; add shuffle=True to the validation set sampler; or aggregate their losses by syncing the GPUs.

I haven't looked further into the printing issue yet, if I find something I will let you guys know.

@csukuangfj
Copy link
Collaborator

I do notice that the valid objf in the two halves of the validation set are
very different- perhaps division is very nonrandom?

That should be fixed via #162.

@pzelasko
Copy link
Collaborator

I have no idea why the logs are not working and my only workaround is a desperate logging.info = print in the global namespace. FYI neither the files are created nor the console is getting any outputs.

@csukuangfj
Copy link
Collaborator

After the recent DDP changes to mmi_att_transformer_train.py, I no longer see any logs in the command line

I have the same issue after using DDP.

@pzelasko @YiwenShaoStephen

Were both of you using torch==1.8.0?

I am encountering the same issue when using torch 1.8.0. But if I switch to torch==1.7.1, the issue is gone without
any code changes.

@pzelasko
Copy link
Collaborator

My torch version was 1.8.1. Interesting...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants