Need help with Multi-Tacotron-Voice-Cloning?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

vlomme
262 Stars 72 Forks Other 55 Commits 15 Opened issues

Description

Phoneme multilingual(Russian-English) voice cloning based on

Services available

!
?

Need anything else?

Contributors list

No Data

Multi-Tacotron Voice Cloning

This repository is a phonemic multilingual (Russian-English) implementation based on Real-Time-Voice-Cloning. it is a four-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model. If you only need the English version, please use the original implementation.

Этот репозиторий является многоязычной(русско-английской) фонемной реализацией, основанной на Real-Time-Voice-Cloning. Она состоит из четырёх нейронных сетей, которые позволяют создавать числовое представление голоса из нескольких секунд звука и использовать его для создания модели преобразования текста в речь

Example

Quick start

Use the colab online demo

Requirements

You will need the following whether you plan to use the toolbox only or to retrain the models.

≥Python 3.6.

PyTorch (>=1.0.1).

Run

pip install -r requirements.txt
to install the necessary packages.

A GPU is mandatory, but you don't necessarily need a high tier GPU if you only want to use the toolbox.

Pretrained models

Download the latest here.

Datasets

| Name | Language | Link | Comments | My link | Comments | | --- | -- | ------ | ----- | ----- | ----- | | Phoneme dictionary | En, Ru | En,Ru | Phoneme dictionary | link | Совместил русский и английский фонемный словарь | | LibriSpeech | En | link | 300 speakers, 360h clean speech | | | | VoxCeleb | En | link | 7000 speakers, many hours bad speech | | | | M-AILABS | Ru | link | 3 speakers, 46h clean speech| | | | opentts, openstt | Ru | open_tts, open_stt | many speakers, many hours bad speech | link | Почистил 4 часа речи одного спикера. Поправил анотацию, разбил на отрезки до 7 секунд | | Voxforge+audiobook | Ru | link | Many speaker, 25h various quality | link | Выбрал хорошие файлы. Разбил на отрезки. Добавил аудиокниг из интернета. Получилось 200 спикеров по паре минут на каждого | | RUSLAN | Ru | link | One speaker, 40h good speech | link | Перекодировал в 16кГц | | Mozilla | Ru | link | 50 speaker, 30h good speech | link | Перекодировал в 16кГц, Раскидал разных пользователей по папкам | | Russian Single | Ru | link | One speaker, 9h good speech | link | Перекодировал в 16кГц |

Toolbox

You can then try the toolbox:

python demo_toolbox.py -d 

or
python demo_toolbox.py

Wiki

Pretrained models

Тренировка (и для других языков)

Training (and for other languages)

Contribution

for any questions, please email me

Papers implemented

| URL | Designation | Title | Implementation source | | --- | ----------- | ----- | --------------------- | |1806.04558 | SV2TTS | Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | CorentinJ | |1802.08435 | WaveRNN (vocoder) | Efficient Neural Audio Synthesis | fatchord/WaveRNN | |1712.05884 | Tacotron 2 (synthesizer) | Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions | Rayhane-mamah/Tacotron-2 |1710.10467 | GE2E (encoder)| Generalized End-To-End Loss for Speaker Verification | CorentinJ |

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.