Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use glob in datasets instead of walk_files #1051

Closed
7 tasks done
vincentqb opened this issue Nov 19, 2020 · 3 comments
Closed
7 tasks done

Use glob in datasets instead of walk_files #1051

vincentqb opened this issue Nov 19, 2020 · 3 comments

Comments

@vincentqb
Copy link
Contributor

vincentqb commented Nov 19, 2020

Replace walk_files+filter in datasets by globing,

  • librispeech
  • libritts
  • speechcommands
  • tedlium
  • vctk
  • yesno

Example (noted in discussion):

walker = sorted(str(p) for p in Path(self._path).glob('*/*_no_hash_*.wav'))
  • Once this is done, we can deprecate torchaudio.datasets.utils.walk_files.

cc #910

@faroit
Copy link
Contributor

faroit commented Nov 19, 2020

Shouldn't the globs be sorted for reproducibility reasons?

@vincentqb
Copy link
Contributor Author

vincentqb commented Nov 20, 2020

Yes, tests should catch if the order is not consistent, see also comment. (Added sorted() in description above.)

@vincentqb
Copy link
Contributor Author

Closed by #1109

mthrok pushed a commit to mthrok/audio that referenced this issue Dec 13, 2022
mpc001 pushed a commit to mpc001/audio that referenced this issue Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants