To GTA finetune HiFi-GAN models, you should download Pretrained models and transfer from those weight.
You can use pretrained UNIVERSAL_V1
models authors of HiFi-GAN provide.
Download pretrained models
Details of each folder are as in follows:
Folder Name | Generator | Dataset | Fine-Tuned |
---|---|---|---|
LJ_V1 | V1 | LJSpeech | No |
LJ_V2 | V2 | LJSpeech | No |
LJ_V3 | V3 | LJSpeech | No |
LJ_FT_T2_V1 | V1 | LJSpeech | Yes (Tacotron2) |
LJ_FT_T2_V2 | V2 | LJSpeech | Yes (Tacotron2) |
LJ_FT_T2_V3 | V3 | LJSpeech | Yes (Tacotron2) |
VCTK_V1 | V1 | VCTK | No |
VCTK_V2 | V2 | VCTK | No |
VCTK_V3 | V3 | VCTK | No |
UNIVERSAL_V1 | V1 | Universal | No |
- make cp_hifigan directory.
mkdir cp_hifigan
- Download
g_02500000
anddo_02500000
from following link - place them in
cp_hifigan/
directory.
-
Generate GTA mel-spectrograms in
torch.Tensor
format using Assem-VC.
The file name of the generated mel-spectrogram should match the audio file and the extension should be.gta
.
Example:Audio File : p233_392.wav Mel-Spectrogram File : p233_392.wav.gta
-
Run the following command.
python train.py --config config_v1.json \ --input_wavs_dir <root_path_of_input_audios> \ --input_mels_dir <root_path_of_GTA_mels> \ --input_training_file <absolute_path_of_train_metadata_of_gta_mels> \ --input_validation_file <absolute_path_of_val_metadata_of_gta_mels> \ --fine_tuning True
To train V2 or V3 Generator, replace
config_v1.json
withconfig_v2.json
orconfig_v3.json
.
Checkpoints and copy of the configuration file are saved incp_hifigan
directory by default.
You can change the path by adding--checkpoint_path
option.Here are some example commands that might help you understand the arguments:
python train.py --config config_v1.json \ --input_wavs_dir ../datasets/ \ --input_mels_dir ../datasets/ \ --input_training_file ../datasets/gta_metadata/gta_vctk_22k_train_10s_g2p.txt \ --input_validation_file ../datasets/gta_metadata/gta_vctk_22k_val_g2p.txt \ --fine_tuning True
tensorboard --log_dir cp_hifigan/logs --bind_all
We referred to HiFi-GAN, WaveGlow, MelGAN and Tacotron2 to implement this.