Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tedlium dataset (all 3 releases) #882

Merged
merged 16 commits into from
Sep 15, 2020
Merged

Conversation

jiwidi
Copy link
Contributor

@jiwidi jiwidi commented Aug 15, 2020

Hi!

So this is my pull request for adding tedlium support (all 3 releases of tedlium) reference to #765.

The code is not yet finished (still need to add test, read the phonetic dictionaries of each release, proper formatting) but I wanted to create the PR to get some feedback on the way I read/iterate tedlium files and if the torchaudio manteiners are okay with it or have an idea on how to improve it. This way I make sure the core of the datasetloader is good to merge and I can start working on the test or change it before coding the tests. I will explain how I iterate through the dataset now.

Tedlium dataset (all releases) is composed of .sph and .stm files. These files are divided in the root folder in two different folders:

sph/
stm/

Each .stm file contains the transcripts for a FULL tedtalk and while the brother audio .sph file contains the audio for the FULL tedtalk as well.

Since this dataset is used in speech recognition, people don't train in full tedtalk audios but rather split into sentences. Each sentence corresponds to a line in the .stm file. This means that a .stm file contains as many training samples as lines inside the file.

The previous datasets that I found in torchaudio didn't encounter this case and treated every transcript file as a sample, so I had to make a small change to the self._walker used in the datasets. I created a new self._extended_walker list that contains as many occurrences of a .stm file with its line number as lines this file has, so for example if we had two stm files A (5 lines/5 samples) and B (3 lines/3 samples) self._extended_walker will look like this:

self._extended_walker = [[A,0],[A,1],[A,2],[A,3],[A,4],[B,0,][B,1]]

And every time we load an item i of the dataset we will index this _extended_walker where we retrieve the file containing the sample and the corresponding line, the line contains the transcript and start time,end time of the audio file corresponding to the audio file. With all of this information we can load the line transcript and corresponding subaudio from the full ted talk.

This will allow the dataset to work as any other torchaudio datasets(being able to run sequentially or shuffling it). Looks great right? Then why I am checking with the maintainers if this is a good idea?

Well, in order to create this _extended_walker list we have to read how many lines each of our .stm file has while initializing the dataset, causing a "not optimal I/O operation". The operation happens in 119-123:

for file in self._walker:
    stm_path = os.path.join(self._path, self._folder_txt, file + self._ext_txt)
    with open(stm_path) as f:
        l = len(f.readlines())
        self._extended_walker += [(file, line) for line in range(l)]

Is fast enough to not represent any downside for the user
image
(release 3 contains the higher number of files)

I'm open for edits and suggestions on the implementation of the TEDLIUM dataset and please notice is still a work in progress.

Things that still need work:

  • _getdict function to return the phonems dictionary of the dataset
  • Several test for the dataset and different releases.
  • Upload my notebook where I play with the dataset object to show the functionality for reviewers
  • Better styling and compliance with the rest of torchaudio code

Thanks

@jiwidi
Copy link
Contributor Author

jiwidi commented Aug 15, 2020

Pinging @vincentqb

@jiwidi jiwidi changed the title Added tedlium support for 3 releases Add tedlium dataset (all 3 releases) Aug 15, 2020
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jiwidi

Thanks for the contribution, and thanks for opening the PR in the early stage. This will make our review process easier and more fruitful.

I see you mostly follow the existing Dataset implementation, and that's a good start, but I found many spaces for improvement in these existing implementations, so I added comments regarding them. Let me know what you think.

root: str,
release: str = RELEASE,
subset: str = None,
folder_in_archive: str = _RELEASE_CONFIGS[RELEASE]["folder_in_archive"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get remove folder_in_archive from the signature?
There is no necessity or benefit of making this into variable.
Also this variable is never used as it's overwritten at the beginning of the constructor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, there is no reason why it would stay 👍

download_url(url, root, hash_value=checksum)
extract_archive(archive)

walker = walk_files(self._path, suffix=self._ext_txt, prefix=False, remove_suffix=True)
Copy link
Collaborator

@mthrok mthrok Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am advising people not to use walk_files, because this function forces Dataset implementations to not be able to designate which directories it should be looking at and might traverse directories totally unnecessary and irrelevant, which is irresponsible behavior, and should be avoided.

Also since you need a different/new logic for going over the STM files, I see that the fact that walk_files needed to be extended as this function is not suitable for the purpose of this Dataset. So I suggest add the logic that does the right amount of work, which is simply going over the subdirectories and check the files. Please see another example here. Following this advice, @Abhi011999 has found some data missing about VCTK 0.92, which is another reason why walk_files function should not be used. It hinders what Dataset implementation should be looking at.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am downloading all TEDLIUM datasets now so that we can discuss the detail of how to traverse these directories. I will give my thoughts once I see the dataset strucutres.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be perfect, lets have a discussion for it when you have the data.


def load_tedlium_item(
fileid: str, line: int, path: str, ext_audio: str, ext_txt: str
) -> Tuple[Tensor, int, str, int, int, int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this as a method of TEDLIUM class? That way user code can customize the loading behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense

)

wave_path = os.path.join(path, "sph/", fileid)
waveform, sample_rate = torchaudio.load(wave_path + ext_audio)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a method that wrap torchaudio.load to TEDLIUM class?

def load_auiod(self, path):
    return torchaudio.load(path)

The problem with using bare torchaudio.load is that user cannot customize the loading behavior. (such as disabling normalization or swapping channels dimension.) We could also pass down such parameters from the constructor of TEDLIUM class but that will make the constructor signature ugly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok -- What's the advantage of doing this over having torchaudio.load inside the load item function? Is your suggestion to also eventually have a load_text method? One property of having a single load item function was that it factored the item loading as a pure function that was taking an identifier to return a data point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok -- What's the advantage of doing this over having torchaudio.load inside the load item function? Is your suggestion to also eventually have a load_text method? One property of having a single load item function was that it factored the item loading as a pure function that was taking an identifier to return a data point.

As far as I understood it he wants to give the user the option to overwrite the load function of the dataset to use optional parameters if he feels like.

So user could do:

def custom_load(path):
    return torchaudio.load(path,normalize=False,channels_first=False)
dt = torchaudio.datasets.TEDLIUM()
dt.load_audio = custom_load

As normalize and channels_first are True by default and the load_item function doesnt change them the user has no ability to change them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example is Windows users who cannot use torchaudio.load because torchaudio on Windows does not support sph format so they need to provide something else in the custom load method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As normalize and channels_first are True by default and the load_item function doesnt change them the user has no ability to change them.

If this is indeed something desired, we could also consider exposing those parameters to the constructor instead. I wouldn't say this is enough alone to change the implementation through the addition of a load method.

In torchaudio, we went with the convention that waveforms are between -1 and 1 with batch first, see readme. This is in particular to avoid having to carry such parameters everywhere.

Another example is Windows users who cannot use torchaudio.load because torchaudio on Windows does not support sph format so they need to provide something else in the custom load method.

This is an interesting point, and there's precedence for this for datasets using mp3. I see two paths for this:

  • Offering a backend that supports sph in windows
  • Decoupling torchaudio.load from torchaudio.datasets.

In the former, a user who decides to change the datasets would copy-paste the code and modify to their heart's content. In the latter, we need to decide how we would standardize this. Is any of torch{text,vision} doing something like this already? Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say offering support for sph in windows would be better, less hassling and moving around for the user. At the end, the users who use this custom dataset loaders are the ones looking for a working solution and easy to implement, if they want to customize loading or whatever they will probably rewrite a dataset on their own.

torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
waveform, sample_rate, utterance, speaker_id, chapter_id, utterance_id
"""

_ext_txt = ".stm"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a necessity or benefit of making this extension variable. STM is an established format, and since it's text, any platform (like Windows) should be able to open it, unlike the case of ".sph" which torchaudio on Windows cannot load.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is used for the file walker as we look for files with this extension and I keept it as a variable for consistency with the Librispeech dataset

_ext_txt = ".trans.txt"
_ext_audio = ".flac"

It will probably unused depending on what logic we decide on to iterate though the files. We can take a look at this when we are done with the logic ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is used for the file walker as we look for files with this extension

Even that's the case, I think it can be hardcoded at where the walker function is called, however

It will probably unused depending on what logic we decide on to iterate though the files. We can take a look at this when we are done with the logic ?

Yeah, we can differ the change until then.

Copy link
Contributor Author

@jiwidi jiwidi Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved _ext_audio initialization to the constructor on my coming new commit so I removed this one as well. We can always go back.

torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
torchaudio/datasets/__init__.py Show resolved Hide resolved

walker = walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True)
self._walker = list(walker)
self._extended_walker = []
Copy link
Contributor

@vincentqb vincentqb Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need an extended walker in place of updating walker? It seems like this simply extracts information from a file. Could this be done in load item function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does simple extract information for the files contained in the original walker. I create it because we need this information before calling the load item function so the dataset contains samples equals to total lines in all files (1 line is one sentence) instead of containing samples equals to files.

This dataset is a bit tricky because each file contains a full ted talk and no one in speech recognition (where this dataset is usually used) treats a full talk as a training sample. We split the file and audio file into multiple samples (sentences).

This means that once we instantiate the dataset we need to know how many samples we have(lines in all files) so the user can index any sample N at any time or shuffle it. If we were to implement this on the load function then this functionality will be way harder. See the original post where I explain with detail how the files are iterated.

I'm happy to discuss which method is the best method to achieve this functionality but I'm checking first if you understand why this extender walker is here first, maybe I didn't explain it correctly on the original post

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding details :) What I meant is more about the particular use of _walker and _extended_walker. I would have suggested something like:

        self._walker = []
        for file in walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True):
            stm_path = os.path.join(self._path, "stm", file + ".stm")
            with open(stm_path) as f:
                l = len(f.readlines())
                self._walker.extend((file, line) for line in range(l))

or, to stick with generators,

    def _generate_walker(...):
        for file in walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True):
            stm_path = os.path.join(self._path, "stm", file + ".stm")
            with open(stm_path) as f:
                for line in range(len(f.readlines())):
                    yield (file, line)

    self._walker = list(_generate_walker(...))

Also, why just indexing and not storing the line directly since the file is already being read? How big is the total text?

    def _generate_walker(...):
        for file in walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True):
            stm_path = os.path.join(self._path, "stm", file + ".stm")
            with open(stm_path) as f:
                for line in f.readlines():
                    yield (file, line)

    self._walker = list(_generate_walker(...))

Copy link
Contributor Author

@jiwidi jiwidi Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean now, yes it will be cleaner to stick to the _walker name for the resultant walker.

I went ahead and checked the size of the walker list for release3 (the biggest)

import numpy as np
np.array(test_dataset._extended_walker).nbytes

At its 316733528 bytes,~316 megabytes storing the full line and 20449080 ~20 megabytes with just the line identifier. I would say is a considerable size but it does remove a I/O operation each time we load the item. Does any other dataset take this much space in memory?

Regarding sticking with generators or not, we convert it to a list so its just adding more logic that we dont use, I would go with the:

        self._walker = []
        for file in walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True):
            stm_path = os.path.join(self._path, "stm", file + ".stm")
            with open(stm_path) as f:
                l = len(f.readlines())
                self._walker.extend((file, line) for line in range(l))

Just because is easier to read

extract_archive(archive)

walker = walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True)
self._walker = list(walker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our current implementation, this would simply go as the last command manipulating walker (or extended walker here)

@mthrok
Copy link
Collaborator

mthrok commented Aug 17, 2020

@jiwidi

  • _getdict function to return the phonems dictionary of the dataset

Do you mean the lexicon file such as TEDLIUM.150K.dic and TEDLIUM.152K.dic?
dict is very overloaded word here, so I just want a clarification on what you mean.

stm_path = os.path.join(self._path, "stm", file + ".stm")
with open(stm_path) as f:
l = len(f.readlines())
self._extended_walker += [(file, line) for line in range(l)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the contents of STM files. For the case of release-3

$ wc -l TEDLIUM_release1/*/stm/*.stm | tail -1
   58863 total
$ wc -l TEDLIUM_release2/*/stm/*.stm | tail -1
   95033 total
$ wc -l TEDLIUM_release-3/data/stm/* | tail -1
  268263 total

So if loading the whole release-3 dataset, self._extended_walker could take a couple of megabytes on memory.

Also this will open the number of files;

$ find  TEDLIUM_release1 -type f -name '*.stm' | wc -l
793
$ find  TEDLIUM_release2 -type f -name '*.stm' | wc -l
1514
$ find  TEDLIUM_release-3 -type f -name '*.stm' | wc -l
2370

For release-3 dataset, this initialization process will go over 2370 files sequentially. How long does this take?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh is this what your report as 24 micros second??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I report the time of initiating the dataset and per se going over all files counting lines. I think that is quite fast and we shouldn't worry about it.

About the size it occupies in memory, I did a quick test for a comment from @vincentqb just on top of this one. I checked how much would the walker occupy if we were to store the full line and not the line identifier as we already loop it, saving some I/O when loading each item.

I went ahead and checked the size of the walker list for release3 (the biggest)

import numpy as np
np.array(test_dataset._extended_walker).nbytes

At its 316733528 bytes,~316 megabytes storing the full line and 20449080 ~20 megabytes with just the line identifier. I would say is a considerable size but it does remove a I/O operation each time we load the item. Does any other dataset take this much space in memory?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I report the time of initiating the dataset and per se going over all files counting lines. I think that is quite fast and we shouldn't worry about it.

I did a quick benchmark on my end and

script
import time

import torchaudio

torchaudio.set_audio_backend('sox_io')

n_rep = 100
t0 = time.monotonic()
for _ in range(n_rep):
    ds = torchaudio.datasets.TEDLIUM("../dataset/tedlium", release="release3", download=False)
t1 = time.monotonic()
print(len(ds._walker))
print((t1 - t0) / n_rep)
$python foo.py
268263
0.21009648161940275

I did not get as fast as micro seconds, but it was only 0.2 seconds so I agree that speed is not an issue here.

Copy link
Collaborator

@mthrok mthrok Aug 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the memory consumption, I talked with @cpuhrsch. 20 megabytes is not a big deal. However be aware that when using multiple data loader worker process, each process will consume the same amount of memory.

Also DataLoader has memory leakage issue pytorch/pytorch#13246 . (current workaround pytorch/pytorch#13246 (comment))

@jiwidi
Copy link
Contributor Author

jiwidi commented Aug 17, 2020

@jiwidi

  • _getdict function to return the phonems dictionary of the dataset

Do you mean the lexicon file such as TEDLIUM.150K.dic and TEDLIUM.152K.dic?
dict is very overloaded word here, so I just want a clarification on what you mean.

Yes, that .dic file contains the phoneme dictionary of the dataset and is used sometimes. I want to add a function to the class to load and return it

torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
def __init__(
self, root: str, release: str = "release1", subset: str = None, download: bool = False, audio_ext=".sph"
) -> None:
"""Constructor for TEDLIUM dataset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this docstring to class-level docstring and merge them?
Also you need to add this class in docs/source/datasets.rst to show up in doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked with the other datasets, they have a similar docstring under the class description:
We have

    """
    Create a Dataset for Tedlium. Each item is a tuple of the form:
    [waveform, sample_rate, transcript, talk_id, speaker_id, identifier]
    """

Librispeech:

    """
    Create a Dataset for LibriSpeech. Each item is a tuple of the form:
    waveform, sample_rate, utterance, speaker_id, chapter_id, utterance_id
    """

Are you sure you want to move the big constructor docstring to class-level?

Copy link
Collaborator

@mthrok mthrok Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you want to move the big constructor docstring to class-level?

Yes. The fact that the other dataset lacks the documentation for the constructor is not user-friendly.
It is better if each Dataset explains what options a user can change.
I will update the other dataset documentation.

Args:
root (str): Path containing dataset or target path where its downloaded if needed
release (str, optional): TEDLIUM identifier (release1,release2,release3). Defaults to RELEASE.
subset (str, optional): Subset of data(train,test,dev) supported for release 1,2. Defaults to Train/None.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be more strict about subset value. for release 1 and 2 the allowed values are train/dev/test, but for release 3 it has to be None.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add value check and raise a ValueError if it's not the right one for the given release?

self._walker = []

# Create walker for all samples
for file in walk_files(self._path, suffix=".stm", prefix=False, remove_suffix=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not to use walk_files and just list stm files? I checked the dataset and we only need to check one directory par configuration.

TEDLIUM_release1/dev/stm/*.stm
TEDLIUM_release1/test/stm/*.stm
TEDLIUM_release1/train/stm/*.stm

TEDLIUM_release2/dev/stm/*.stm
TEDLIUM_release2/test/stm/*.stm
TEDLIUM_release2/train/stm/*.stm

TEDLIUM_release-3/data/stm/*.stm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, I used the walker since I saw it was used in other datasets but listing all stm files will do the job as all of them are supposed to be inside the same folder.

torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved
torchaudio/datasets/tedlium.py Outdated Show resolved Hide resolved

.. autoclass:: TEDLIUM
:members: __getitem__
:special-members: get_phoneme_dict
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthrok What does special-members mean here? I interpretate it as extra functions to include in the docs? Thats why I included get_phoneme_dict

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are Sphinx's directive. Check out their documentation

  • :members: is where you list the members you want document
  • :special-members: is where you list the special methods like __init__, __len__, __getitem__ etc...

I think the other documentations are wrong, (__getitem__ should be under :special-member: but it will not show up either way because they don't have a docstring.)

I think you can just do .. autoclass:: TEDLIUM and the rest (get_phoneme_dict) will be handled.

You can build the documentation and check how the resulting documentation looks like.

cd docs
pip install -r requirements.txt
make html
# open ./build/html/index.html

from torchaudio.datasets.utils import (
download_url,
extract_archive,
walk_files,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove walk_files from import too.

# Create walker for all samples
self._walker = []
stm_path = os.path.join(self._path, "stm")
for file in os.listdir(stm_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you sort the file? Different OS returns files in different order.

Suggested change
for file in os.listdir(stm_path):
for file in sorted(os.listdir(stm_path)):

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks much better now.

@jiwidi What else is left for this PR?

content = line.strip().split(maxsplit=1)
self.phoneme_dict[content[0]] = content[1:] # content[1:] can be empty list

def load_tedlium_item(self, fileid: str, line: int, path: str) -> Tedlium_item:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also mark this function as private?

Suggested change
def load_tedlium_item(self, fileid: str, line: int, path: str) -> Tedlium_item:
def _load_tedlium_item(self, fileid: str, line: int, path: str) -> Tedlium_item:

@jiwidi
Copy link
Contributor Author

jiwidi commented Aug 19, 2020

The code looks much better now.

@jiwidi What else is left for this PR?

Test! If we are okay with the logic for iterating files I'll go ahead and create some test based on the last updates I saw you guys do on the other dataset test.

wave_path = os.path.join(path, "sph", fileid)
waveform, sample_rate = self._load_audio(wave_path + self._ext_audio, start_time=start_time, end_time=end_time)

return Tedlium_item(waveform, sample_rate, transcript, talk_id, speaker_id, identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find the discussion about NamedTuple, so I'm leaving a new comment here :)

There has been prior discussion about changing the type of data points (and in particular using NamedTuple), see internal document. Though there may be value in redefining what a data point is, I feel the discussion would delay this pull request, so I would avoid it for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't access the document but regardless I have no problem with changing it back to the previous version.

We can delay this NamedTuple to a PR that will change multiple Datasets/files and not do 1 by 1 and get inconsistent code. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can delay this NamedTuple to a PR that will change multiple Datasets/files and not do 1 by 1 and get inconsistent code. What do you think?

Yes, I agree with your suggestion. It would be preferable to change them all at the same time once we settle on a specific format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I oppose to using tuple here. 6 heterogeneous items as tuple give very bad user experience and make code harder to read/maintain. Specifically named tuple is compatible with tuple, there is virtually no disadvantage of using named tuple here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main disadvantage of using namedtuple here is (1) inconsistent behavior with other datasets in audio and elsewhere, (2) would make the user leverage an API that could change in the near future, (3) will slow down this particular pull request.

The reason we did not standardize around NamedTuple in the past was that it didn't allow for easy introspection of keys, and each dataset would have its own DataPoint NamedTuple. Please see internal document for more discussions.

We are also using tuples with many items with other torchaudio datasets. That being said, we can revisit these design choices. Thoughts @zhangguanheng66 @fmassa @cpuhrsch ?

Copy link
Collaborator

@mthrok mthrok Aug 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) inconsistent behavior with other datasets in audio and elsewhere

Can you elaborate this? Each dataset returns different number of items, so it does not look like they can be swapped easily from the beginning. And using NamedTuple with consistent key name will make it possible to give consistent user experience.

(2) would make the user leverage an API that could change in the near future

Not sure what this means. Can you elaborate?

(3) will slow down this particular pull request.

It's already in NamedTuple so, it seems that changing it tuple seems to be slowing down the process.

The reason we did not standardize around NamedTuple in the past was that it didn't allow for easy introspection of keys

When is it necessary or useful to perform key introspection?

each dataset would have its own DataPoint NamedTuple.

That sounds the right approach. Because each dataset returns the different number of items of different types. They should be typed differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) inconsistent behavior with other datasets in audio and elsewhere

Can you elaborate this? Each dataset returns different number of items, so it does not look like they can be swapped easily from the beginning. And using NamedTuple with consistent key name will make it possible to give consistent user experience.

I was really just referring to using NamedTuple in this particular pull request. Moving toward consistently using NamedTuple in torchaudio/text/vision is fine by me.

(2) would make the user leverage an API that could change in the near future

Not sure what this means. Can you elaborate?

Since we have not, as a team (torchaudio/text/vision), revisited the decision of using Tuple, using NamedTuple here exposes us to changing the API again would we settle on another option.

(3) will slow down this particular pull request.

It's already in NamedTuple so, it seems that changing it tuple seems to be slowing down the process.

But we should avoid using NamedTuple until we, as a team, have made a decision on the API for data point. The current decision had been Tuple, but I'm personally fine with revisiting this past decision.

The reason we did not standardize around NamedTuple in the past was that it didn't allow for easy introspection of keys

When is it necessary or useful to perform key introspection?

each dataset would have its own DataPoint NamedTuple.

That sounds the right approach. Because each dataset returns the different number of items of different types. They should be typed differently.

I don't feel too strongly about either points. I am really just listing reasons that we left in the internal document. :)

@mthrok
Copy link
Collaborator

mthrok commented Sep 6, 2020

Mm I have been copying the https://github.com/pytorch/audio/blob/master/test/torchaudio_unittest/datasets/librispeech_test.py but for TEDLIUM, I have a doubt here, how will you save the whitenoise into '.sph' files? As far as I know is not supported yet(?)

Friendly ping @mthrok :)

Hi @jiwidi

Thanks for singing me again and sorry for missing your reply in the first place.
There are two things to consider here.

  1. Did you try saving audio data in sph format? With Add SPHERE support to "sox_io" backend #871, I believe you can save audio with sph format. (If not, let me know.) however,
  2. In the case of librispeech_test.py and the others, we emulate the dataset with wav format. This is because wav is easy to handle and does not require any third-party library to load (although we use scipy for simplicity, but we do so because we believe scipy does the correct job to load wav file we can reference.). In unit-test, it is ideal to only use the module being tested. In the case of dataset implementation, we want to only use dataset module as much as possible, and we do not want to use I/O module in the test. Otherwise when test fails, we do not know immediately if it is due to I/O module or dataset module. (of course, the dataset implementation itself uses torchaudio.load, so the perfect separation at test time is not achieved without monkey-patching torchaudio.load function, but I think that's too much)

So because of 2., it is preferable if we can test the dataset implementation with WAV format, even though the real dataset is in SPH. (and I believe this is also a valid use case for end users to convert the entire dataset into WAV.). For that purpose we make dataset implementation to be able to change the target extension. So can you write your test with "audio_ext='.wav'" when instantiating Dataset?
If you strongly thing this dataset also should be tested with .sph format, then you can put the whole test logic in a helper method and make the test change format too.


Tedlium_item = namedtuple(
"Tedlium_item", ["waveform", "sample_rate", "transcript", "talk_id", "speaker_id", "identifier"]
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiwidi

Regarding the namedtuple, I talked with @cpuhrsch and I will conduct performance check. This one is tricky and could take some time, So I'm sorry to ask but can you revert it back to a regular tuple for the sake of merging this PR first? I feel so sorry the process is taking so long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem!

"""
start_time = int(float(start_time) * 16000)
end_time = int(float(end_time) * 16000)
return torchaudio.load(path, frame_offset=start_time, num_frames=end_time - start_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this will only work with sox_io backend. Could you add fall-back for the other backend?

if torchaudio.get_audio_backen() == 'sox_io':
    return torchaudio.load(path, frame_offset=start_time, num_frames=end_time - start_time)
return torchaudio.load(path)[:, start_time:end_time] 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current sox backend also support reading file segment. Not soundfile? If the datasets needs to be aware of the backend, then we should see how we could evolve the backends also so that this does not need to happen. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torchaudio.load(path)[:, start_time:end_time] will still work for both backends as far as I understand. But @mthrok pointed out is more efficient to use the torchaudio.load(path, frame_offset=start_time, num_frames=end_time - start_time) if sox_io is enabled.

We either:

  • Add support for other backends with same parameter name so the optimized loading can be used with all (this could be better for the long term, unifying

  • Use the slicing operator and support all backends right away.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current sox backend also support reading file segment. Not soundfile? If the datasets needs to be aware of the backend, then we should see how we could evolve the backends also so that this does not need to happen. Thoughts?

We have already planned to resolve this in 0.9.0 release. See #903

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 7, 2020

Mm I have been copying the https://github.com/pytorch/audio/blob/master/test/torchaudio_unittest/datasets/librispeech_test.py but for TEDLIUM, I have a doubt here, how will you save the whitenoise into '.sph' files? As far as I know is not supported yet(?)

Friendly ping @mthrok :)

Hi @jiwidi

Thanks for singing me again and sorry for missing your reply in the first place.
There are two things to consider here.

  1. Did you try saving audio data in sph format? With Add SPHERE support to "sox_io" backend #871, I believe you can save audio with sph format. (If not, let me know.) however,
  2. In the case of librispeech_test.py and the others, we emulate the dataset with wav format. This is because wav is easy to handle and does not require any third-party library to load (although we use scipy for simplicity, but we do so because we believe scipy does the correct job to load wav file we can reference.). In unit-test, it is ideal to only use the module being tested. In the case of dataset implementation, we want to only use dataset module as much as possible, and we do not want to use I/O module in the test. Otherwise when test fails, we do not know immediately if it is due to I/O module or dataset module. (of course, the dataset implementation itself uses torchaudio.load, so the perfect separation at test time is not achieved without monkey-patching torchaudio.load function, but I think that's too much)

So because of 2., it is preferable if we can test the dataset implementation with WAV format, even though the real dataset is in SPH. (and I believe this is also a valid use case for end users to convert the entire dataset into WAV.). For that purpose we make dataset implementation to be able to change the target extension. So can you write your test with "audio_ext='.wav'" when instantiating Dataset?
If you strongly thing this dataset also should be tested with .sph format, then you can put the whole test logic in a helper method and make the test change format too.

Dont worry, you've been very helpful. I will test it late today and get back if there are any problems, otherwise a commit will follow :)

I think I can make it work with wav files as well but since this is the only interaction of the library with sph files I see value in keeping the test with sph format as it will serve as a check for this functionality. If we check for sph support in other test then I have no preference with wav or sph.

edit: saving sph files work perfectly with torchaudio.save() sorry I didnt even checked it and thought it was only for wavs

@mthrok
Copy link
Collaborator

mthrok commented Sep 7, 2020

I think I can make it work with wav files as well but since this is the only interaction of the library with sph files I see value in keeping the test with sph format as it will serve as a check for this functionality. If we check for sph support in other test then I have no preference with wav or sph.

load/save/info functions on .sph format are tested separately. (check out #871)

If you decide to test sph, then following is (only the gist of) my recommendation.

def _generate_dataset(root_dir, audio_ext, dataset_version, <other parameters, such as dataset versions>) -> List[tuple]:
    # Generates dataset with the given extension
    for foo in whatever:
        ...
        # Switch saving function
        if audio_ext == "wav":
            save_wav(file_path, data, sample_rate)  # use scipy-based test utility for wav
        else:
            torchaudio.save(file_path, data, sample_rate)  # use sox-io backend
                                                            # assuming that the backend is properly set.
                                                            # See the bellow backend="sox_io" for this.
    # Returns expected data


# Note: Probably better to define different test class for different TEDLIUM versions.
class TestTedliumWav(TempDirMixin, TorchaudioTestCase):
    backend = "default"  # <- this test should work on all supported platforms

    @classmethod
    def setUp(cls):
        cls.root_dir = cls.get_base_temp_dir()
        _generate_dataset(cls.root_dir, "wav", ...)

    # define the test
    def testFoo(self):
        ...


class TestTedliumSph(TempDirMixin, TorchaudioTestCase):
    backend = "sox_io"  # <- this test will run only on the platforms where "sox_io" backend is available.

    @classmethod
    def setUp(cls):
        cls.root_dir = cls.get_base_temp_dir()
        _generate_dataset(cls.root_dir, "sph", ...)

    # define the test
    def testFoo(self):
        ...

edit: saving sph files work perfectly with torchaudio.save() sorry I didnt even checked it and thought it was only for wavs

glad it works :)

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 9, 2020

I think I can make it work with wav files as well but since this is the only interaction of the library with sph files I see value in keeping the test with sph format as it will serve as a check for this functionality. If we check for sph support in other test then I have no preference with wav or sph.

load/save/info functions on .sph format are tested separately. (check out #871)

If you decide to test sph, then following is (only the gist of) my recommendation.

def _generate_dataset(root_dir, audio_ext, dataset_version, <other parameters, such as dataset versions>) -> List[tuple]:
    # Generates dataset with the given extension
    for foo in whatever:
        ...
        # Switch saving function
        if audio_ext == "wav":
            save_wav(file_path, data, sample_rate)  # use scipy-based test utility for wav
        else:
            torchaudio.save(file_path, data, sample_rate)  # use sox-io backend
                                                            # assuming that the backend is properly set.
                                                            # See the bellow backend="sox_io" for this.
    # Returns expected data


# Note: Probably better to define different test class for different TEDLIUM versions.
class TestTedliumWav(TempDirMixin, TorchaudioTestCase):
    backend = "default"  # <- this test should work on all supported platforms

    @classmethod
    def setUp(cls):
        cls.root_dir = cls.get_base_temp_dir()
        _generate_dataset(cls.root_dir, "wav", ...)

    # define the test
    def testFoo(self):
        ...


class TestTedliumSph(TempDirMixin, TorchaudioTestCase):
    backend = "sox_io"  # <- this test will run only on the platforms where "sox_io" backend is available.

    @classmethod
    def setUp(cls):
        cls.root_dir = cls.get_base_temp_dir()
        _generate_dataset(cls.root_dir, "sph", ...)

    # define the test
    def testFoo(self):
        ...

edit: saving sph files work perfectly with torchaudio.save() sorry I didnt even checked it and thought it was only for wavs

glad it works :)

I just pushed the test for tedlium. I saw the sph load is already tested elsewhere as you mentioned so I kept the test with .wav files for simplicity and standard among other test.

For this new commit I also changed the phoneme dictionary to be only read if the user actually calls the get_phoneme_dict and this also makes it easier when testing.

The testing is very similar to the librispeech one, I try to recreate the 3 releases folder structure in a temporary folder and put some white noise there. Instantiate tedlium datasets and check all the data is well retrieved when iterating them.

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 9, 2020

@mthrok is it normal this many unittest fail due to failure to download/connectivity issues?

Test run on my side, but we have to assure they also run in the CI pipeline.

"""
# Read phoneme dictionary
if not hasattr(self, "phoneme_dict"):
self.phoneme_dict = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design pattern might leave users questions like "Which one should I use, phoneme_dict or get_phoneme_dict?", "What are the differences?"

The following techniques are often used to improve this;

  1. Use internal attribute name like _phoneme_dict.
    This does not prevent users to access the underlying object directly, but accessing an attribute prefixed with underscore clearly sends a message that user code is not doing the right thing. so it's okay.

  2. Use property decorator.
    Lazy-initialization often fits well in property attribute.
    Something like this;

    @property
    def phoneme_dict(self):
        if self._phoneme_dict is None:
            self._phoneme_dict = dict()
            # Fill the dictionary
        return self._phoneme_dict.copy()  # see the bellow for `copy`

Also instead of checking if it's initialized with hasattr, it's more straight forward to initialize the actual attribute object in constructor as None such as self._phone_dict = None and perform the check as if self._phone_dict is None:

Another concern is returning dictionary reference. The user code might accidentally modify the dictionary which can lead to a very subtle bug. Maybe it's better to return a copy of the dictionary. as return self._phoneme_dict.copy() (however this is still shallow copy so the value points to the original lists and user code can still modify the lists. maybe changing it to tuple might prevent such unintended modification, like self.phoneme_dict[content[0]] = tuple(content[1:]))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good points, a commit with all the suggestions will follow 👍

@classmethod
def tearDownClass(cls):
# In case of test failure
tedlium.TEDLIUM._ext_audio = ".flac"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack is required in the case of librispeech_test, because the dataset does not receive any audio extension in the constructor. TEDLIUM simply accepts "audio_ext", so you do not need to do this (neither tedlium.TEDLIUM._ext_audio = ".wav") you can just do dataset = tedium.TEDLIUM(self.root_dir, audio_ext=".wav")


def test_tedlium(self):
tedlium.TEDLIUM._ext_audio = ".wav"
dataset = tedlium.TEDLIUM(self.root_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the mocked datasets are generated for all the releases, how about testing all the releases, (as separate test case (method))?

for i, utterance in enumerate(UTTERANCES):
talk_id, _, speaker_id, start_time, end_time, identifier, transcript = utterance.split(" ", 6)
start_time = int(float(start_time)) * 16000
end_time = int(float(end_time)) * 16000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a variable sample_rate = 16000, defined above, why not use it here?
If something has to be changed in the future, it will be easier that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

os.makedirs(dataset_dir, exist_ok=True)
sample_rate = 16000 # 16kHz
seed = 0
data = get_whitenoise(sample_rate=sample_rate, duration=10.00, n_channels=1, dtype="float32", seed=seed)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mocked audio data has to be generated for each files being mocked, with different seed value, otherwise all the files generated are exactly same (for the sake or test reproducibility, test helper functions are implemented deterministic by default, that is with fixed seed value) and we do not know if the dataset implementation is traversing the files in expected order.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I get it now. it's one file with different segments.
Can we have multiple audio files per release?
Testing for cases with multiple objects often reveal a bug, so it's a good practice.
Here, we would like to make sure that os.listdir function (which returns items in OS-specific way) does not change the result across OSs. (though our test is broken for non-linux environment, which I have to fix soon.)

Copy link
Contributor Author

@jiwidi jiwidi Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed a new version with the previous feedback and including a separate test for every release. More test never hurt and as you say we could have miss some failures only testing for the 1st release. We also have different data to test between releases

@mthrok
Copy link
Collaborator

mthrok commented Sep 9, 2020

@mthrok is it normal this many unittest fail due to failure to download/connectivity issues?

Test run on my side, but we have to assure they also run in the CI pipeline.

Hi @jiwidi

Yes, it's broken for Windows and macOS and I have to fix it at some point. These failures have nothing to do with your work.

@vincentqb vincentqb mentioned this pull request Sep 10, 2020

# Create a samples list to compare with
cls.samples[release] = []
for i, utterance in enumerate(UTTERANCES):
Copy link

@NikeNano NikeNano Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, utterance in enumerate(UTTERANCES):
for utterance in UTTERANCES:

I don't think the enumerate is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

return len(self._filelist)

@property
def get_phoneme_dict(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@property is accessed like an attribute and it is invoked without parenthesis, so get_phoneme_dict is strange. Just phoneme_dict makes more sense.

with open(self.dict_path, "r", encoding="utf-8") as f:
for line in f.readlines():
content = line.strip().split(maxsplit=1)
self._phoneme_dict[content[0]] = content[1:] # content[1:] can be empty list
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of changing the turning this to tuple to prevent the modification on user code?
Does it affect user experience?

Suggested change
self._phoneme_dict[content[0]] = content[1:] # content[1:] can be empty list
self._phoneme_dict[content[0]] = tuple(content[1:]) # content[1:] can be empty list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 11, 2020

@mthrok new commit with the PR comments suggestion, what do you think of the state of this PR? Doesnt seem much left to do (?)

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jiwidi

I had one more question about the form of phoneme_dict. Please check the comment.
It's also a good idea to add test for it but since this PR has got super long, so it's okay. Other than that, I think we are ready to merge. Thanks for sticking with the such a rigorous review. I enjoyed our interactions. :)

self._phoneme_dict = {}
with open(self.dict_path, "r", encoding="utf-8") as f:
for line in f.readlines():
content = line.strip().split(maxsplit=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was maxsplit=1 always here?

I thought this is splitting into the list of phonemes, but with maxsplit=1 it is a list of one string which has all the phonemes.

with maxsplit=1, it's

content = ['dani', 'D AA N IY']
self._phenome_dict = {'dani': ('D AA N IY', ), ...}

whereas without maxsplit, it's

content = ['dani', 'D', 'AA', 'N', 'IY']`,
self._phenome_dict = {'dani': ('D', 'AA', 'N', 'IY'), ...}

Which one do you intend? I thought it's the later.

Copy link
Contributor Author

@jiwidi jiwidi Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was always there and the intent is to have the first option you mention:

content = ['dani', 'D AA N IY']
self._phenome_dict = {'dani': ('D AA N IY', ), ...}

I'm used to seeing phoneme as a one string only instead of a list as second option, but this doesn't mean maybe other people expect the second option. But at least this is an easy change from the user perspective.

edit: I chatted with some colleagues and seems like second option would be more "standard" so I'll go with that one

[Tensor, int]: Audio tensor representation and sample rate
"""
start_time = int(float(start_time) * 16000)
end_time = int(float(end_time) * 16000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use sample_rate variable here instead of hard coded 16000?

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 11, 2020

@mthrok I just pushed a new commit where I include test for the dictionary for every release as well. More test never hurt :)

It also changes the value of the dictionary with the second option you described above and fixes wrong naming in a private variable.

I also enjoyed the interaction here very much, very good coding practices reviews you left me here 💯

Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiwidi

Thanks for your work.

fixes wrong naming in a private variable.

Grad you caught this, I did not notice this :)

I also enjoyed the interaction here very much, very good coding practices reviews you left me here 💯

Thanks for the nice words. Let me know in future if there is something I can help you. ;)

@mthrok
Copy link
Collaborator

mthrok commented Sep 14, 2020

@jiwidi

Sorry, I just noticed that tedlium dataset test fails on style test. Do you have time to fix it? I tried to modify the PR but I cannot push to your branch. If you are done with it, let me know, I will merge it first and create a follow-up PR>

@jiwidi
Copy link
Contributor Author

jiwidi commented Sep 14, 2020

@jiwidi

Sorry, I just noticed that tedlium dataset test fails on style test. Do you have time to fix it? I tried to modify the PR but I cannot push to your branch. If you are done with it, let me know, I will merge it first and create a follow-up PR>

Just pushed a new commit that should fix it

@mthrok mthrok merged commit 914a846 into pytorch:master Sep 15, 2020
@mthrok
Copy link
Collaborator

mthrok commented Sep 15, 2020

@jiwidi

Thank you so much!

@vincentqb vincentqb mentioned this pull request Oct 2, 2020
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
mpc001 pushed a commit to mpc001/audio that referenced this pull request Aug 4, 2023
[FX] Move examples from pytorch/pytorch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants