(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,)) #153

a897456 · 2023-10-11T10:40:07Z

(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,))
(0): ConvTranspose1d(512, 256, kernel_size=(16,), stride=(8,), padding=(4,))
(1): ConvTranspose1d(256, 128, kernel_size=(16,), stride=(8,), padding=(4,))
(2): ConvTranspose1d(128, 64, kernel_size=(4,), stride=(2,), padding=(1,))
(3): ConvTranspose1d(64, 32, kernel_size=(4,), stride=(2,), padding=(1,))

My understanding is that: Convolution is the process by which the out_channel becomes smaller, and deconvolution is the process by which the out_channel becomes larger.
Why is it the opposite in the code?

infected4098 · 2024-08-24T01:38:42Z

Deconvolution is the process of upsampling the input data into a certain shape. Since the function is transposed convolution 1D, the sequence length would be multipled by a factor of stride.

The Mel is a representation that is much more condensed compared to the original wav. However number of channels is bigger (e.g., 80, 128). Thus You need to make the channels concluded into 1 (Which is, Amplitude in the waveform) and then upsample it to a longer degree.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,)) #153

(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,)) #153

a897456 commented Oct 11, 2023

infected4098 commented Aug 24, 2024

(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,)) #153

(conv_pre): Conv1d(80, 512, kernel_size=(7,), stride=(1,), padding=(3,)) #153

Comments

a897456 commented Oct 11, 2023

infected4098 commented Aug 24, 2024