Implementation of "MelNet: A Generative Model for Audio in the Frequency Domain"
Implementation of MelNet: A Generative Model for Audio in the Frequency Domain
pip install -r requirements.txt
config/. For other datasets, fill out your own YAML file according to the other provided ones.
data.extensionwithin the YAML file.
python trainer.py -c [config YAML file path] -n [name of run] -t [tier number] -b [batch size] -s [TTS]
-sflag is a boolean for determining whether to train a TTS tier. Since a TTS tier only differs at tier 1, this flag is ignored when
[tier number] != 0. Warning: this flag is toggled
Trueno matter what follows the flag. Ignore it if you're not planning to use it.
inference.yamlmust be provided under
inference.yamlmust specify the number of tiers, the names of the checkpoints, and whether or not it is a conditional generation.
python inference.py -c [config YAML file path] -p [inference YAML file path] -t [timestep of generated mel spectrogram] -n [name of sample] -i [input sentence for conditional generation]
[sample rate] : [hop length of FFT].
-iflag is optional, only needed for conditional generation. Surround the sentence with
""and end with