We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio2Mel does the following to extract the mel spectrogram:
data, sampling_rate = load(full_path, sr=self.sampling_rate) data = 0.95 * normalize(data) if self.augment: amplitude = np.random.uniform(low=0.3, high=1.0) data = data * amplitude return torch.from_numpy(data).float(), sampling_rate
which is forwarded as:
def forward(self, audio): p = (self.n_fft - self.hop_length) // 2 audio = F.pad(audio, (p, p), "reflect").squeeze(1) fft = torch.stft( audio, n_fft=self.n_fft, hop_length=self.hop_length, win_length=self.win_length, window=self.window, center=False, ) real_part, imag_part = fft.unbind(-1) magnitude = torch.sqrt(real_part ** 2 + imag_part ** 2) mel_output = torch.matmul(self.mel_basis, magnitude) log_mel_spec = torch.log10(torch.clamp(mel_output, min=1e-5)) return log_mel_spec
Is there a benefit of doing this over Torchaudio's mel spectrogram function, e.g.:
data, sampling_rate = torchaudio.load(full_path) melspec_ops = torchaudio.transforms.MelSpectrogram(sample_rate=sampling_rate, n_fft=self.n_fft, win_length=self.win_length, hop_length=self.hop_length, f_min=0, f_max=None, n_mels=self.n_mel_channels) mel_spec = melspec_ops(data) log_mel_spec = torch.log10(mel_spec + 0.000000001) return log_mel_spec
I'm just curious about this design choice — it wasn't really touched in the paper.
Side quesiton: Why do you multiply the normalized waveform by 0.95 in the original method?
The text was updated successfully, but these errors were encountered:
Can you share the full Audio2Mel code
Sorry, something went wrong.
No branches or pull requests
Audio2Mel does the following to extract the mel spectrogram:
which is forwarded as:
Is there a benefit of doing this over Torchaudio's mel spectrogram function, e.g.:
I'm just curious about this design choice — it wasn't really touched in the paper.
Side quesiton: Why do you multiply the normalized waveform by 0.95 in the original method?
The text was updated successfully, but these errors were encountered: