Need help with reconstructing_faces_from_voices?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

129 Stars 18 Forks GNU General Public License v3.0 29 Commits 4 Opened issues


An example of the paper "reconstructing faces from voices"

Services available


Need anything else?

Contributors list

Reconstructing faces from voices

Implementation of Reconstructing faces from voices paper

Yandong Wen, Rita Singh, and Bhiksha Raj

Machine Learning for Signal Processing Group

Carnegie Mellon University


This implementation is based on Python 3.7 and Pytorch 1.1.

We recommend you use conda to install the dependencies. All the requirements are found in

. Run the following command to create a new conda environment using all the dependencies.
$ ./

After you run the above script, you need to activate the environment where all the packages had been installed. The environment is called

and can be run by:
$ source activate voice2face

NOTE: If you get an error complaining about "webrtcvad" not being found, then you need to make sure the pip in your PATH is the one found inside your environment. This could happen if you have multiple installations of pip (inside/outside environment).

Processed data

The following are the processed training data we used for this paper. Please feel free to download them.

Voice data (log mel-spectrograms): google drive

Face data (aligned face images): google drive

Once downloaded, update variables

with the corresponding paths.


on how to change configurations.


We provide pretrained models including a voice embedding network and a trained generator in

. Or you can train your own generator by running the training script
$ python
The trained model is


We provide some examples of generated faces (in

) using the model in
. If you want to generate faces for your own voice recordings using the trained model, specify the test_data (as the folder containing voice recordings) and model_path (as the path of the generator) variables in
and run:
$ python

Results will be in test_data folder. For each voice recording named

, we generate a face image named

Note: Now we only support the voice recording with one channel at 16K sample rate. The file names of the voices and faces starting with A-E are validation or testing set, while those starting with F-Z are training set.


  title={Reconstructing faces from voices},
  author={Yandong Wen, Rita Singh, Bhiksha Raj},
  journal={arXiv preprint arXiv:1905.10604},


We welcome contributions from everyone and always working to make it better. Please give us a pull request or raise an issue and we will be happy to help.


This repository is licensed under GNU GPL-3.0. Please refer to

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.