Add LibriSpeech Base class for metadata #2646

carolineechen · 2022-08-23T20:58:08Z

Add LIBRISPEECHBase class that returns the same information as LIBRISPEECH, but returns a file path instead of the loaded waveform. See #2539 for some context.

This design choice results in 2x API for the dataset -- one for metadata mode, and one for wav mode. We may want to think of a better way to arrange the docs on the website since most variables are repeated, so while this under discussion, I have simply left it out of the docs build for now.

TODO: testing

cc: @leo19941227

torchaudio/datasets/librispeech.py

mthrok · 2022-08-23T23:05:35Z

torchaudio/datasets/librispeech.py

@@ -43,16 +43,16 @@ def download_librispeech(root, url):
    extract_archive(archive)


-def load_librispeech_item(
+def get_librispeech_metadata(


Please move this to class method so that it's overridable from client code.

Same goes to load_librispeech_item.

is this preferable to having users directly override the __getitem__ method of the dataset (as opposed to overriding a class method that __getitem__ calls which involves going a layer deeper)? also since load_librispeech_item is currently being used by another dataset/accessible to users

load_librispeech_item should not be public. Can we follow up and make it private?

From the regular OOP perspective, the right approach is to let users overwrite __getitem__. However, this forces the users to re-write/copy-paste almost the whole logic of path-parsing and text processing. There are situations where only decoding should be overwritten but since these functions being module-level plain function, it is not plausible. These functions being class method allow to override the logic in the subclass. That's the pattern we should have been followed in the dataset class implementations, but existing datasets not doing that led to the situation where new dataset did not follow either. See #910 (comment) for the basic idea.

mthrok · 2022-08-23T23:10:42Z

torchaudio/datasets/librispeech.py

+    )
+
+
+class LIBRISPEECHBase(Dataset):


Let's not keep dragging the bad naming scheme.

Suggested change

class LIBRISPEECHBase(Dataset):

class LibriSpeechBase(Dataset):

Also, since we are introducing a new class, which gives rare opportunity to redefine the interface without BC, let's remove folder_in_archive, and make root optional.

Similarly, we should split download from the constructor, downloading should not be the responsibility of this class (or at least constructor).

sure, we can redefine/simplify the API for the base class. curious what's the reasoning for removing downloading from this class?

Downloading is a complex procedure. It involves network access, local file access and the permission.
There are so many ways to fail for downloading feature, for example, network unreachable, slow network, directory creation permission, file write permission, and file storage limit. The fact that the existing dataset interface ended up having folder_in_archive argument is one testament.

Also the data integrity is not resolved with the current approach. It is easy to verify the integrity of downloaded archive by using SHA-256 hash, but what about the individual data? Say, I delete one of the WAV file from the directory, should the download function refill it somehow? How to detect it? All these specifications and responsibility of flawless executions should be placed in a separate location. Dataset class should not be responsible.

From the perspective of library aiming to be building block (or single responsibility principle), it's not something that should be performed inside of dataset constructor. It should be performed separately.
By introducing the Base dataset class, we can re-interpret the existing dataset classes as higher-level implementations, composed of multiple primitives.

what should the default for root be if making it optional? I remember some conversation about this but don't think we reached a conclusion, and think this is something that could easily be addressed later on as making it optional would not be BC-breaking

nateanl · 2022-08-24T18:20:52Z

torchaudio/datasets/librispeech.py

+
+
+class LibriSpeechBase(Dataset):
+    """Create a Dataset for *LibriSpeech* [:footcite:`7178964`].


I feel the description can be different from LIBRISPEECH, for example, explicitly mention the class is only for fetching metadata, without loading the actual waveforms.

shall we also add it to the documentation page?

I did not add it to docs yet since I wasn't sure the best way to do so with 2x API. Maybe something formatted like this?

... LibriMix + LibriSpeech - LibriSpeechBase - LIBRISPEECH LibriLightLimited ...

carolineechen · 2022-08-26T10:56:56Z

switching to a different approach for enabling easy access to metadata, similar to #910 (comment)

add Librispeech base class

bb94647

facebook-github-bot added the CLA Signed label Aug 23, 2022

carolineechen changed the title ~~Add Librispeech Base class for metadata~~ Add LibriSpeech Base class for metadata Aug 23, 2022

carolineechen commented Aug 23, 2022

View reviewed changes

torchaudio/datasets/librispeech.py Show resolved Hide resolved

mthrok reviewed Aug 23, 2022

View reviewed changes

modifications

01489af

carolineechen force-pushed the librispeech-metadata branch from b3a1a4a to 01489af Compare August 24, 2022 15:22

carolineechen requested a review from nateanl August 24, 2022 15:47

nateanl reviewed Aug 24, 2022

View reviewed changes

carolineechen closed this Aug 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LibriSpeech Base class for metadata #2646

Add LibriSpeech Base class for metadata #2646

carolineechen commented Aug 23, 2022 •

edited

Loading

mthrok Aug 23, 2022

mthrok Aug 23, 2022

carolineechen Aug 24, 2022

mthrok Aug 24, 2022

mthrok Aug 23, 2022

mthrok Aug 23, 2022

mthrok Aug 23, 2022

carolineechen Aug 24, 2022

mthrok Aug 24, 2022

carolineechen Aug 24, 2022

nateanl Aug 24, 2022

nateanl Aug 24, 2022

carolineechen Aug 24, 2022 •

edited

Loading

carolineechen commented Aug 26, 2022

	class LIBRISPEECHBase(Dataset):
	class LibriSpeechBase(Dataset):



		class LibriSpeechBase(Dataset):
		"""Create a Dataset for LibriSpeech [:footcite:`7178964`].

Add LibriSpeech Base class for metadata #2646

Add LibriSpeech Base class for metadata #2646

Conversation

carolineechen commented Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carolineechen Aug 24, 2022 • edited Loading

Choose a reason for hiding this comment

carolineechen commented Aug 26, 2022

carolineechen commented Aug 23, 2022 •

edited

Loading

carolineechen Aug 24, 2022 •

edited

Loading