Need help with deep-clustering?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

zhr1201
133 Stars 74 Forks 6 Commits 13 Opened issues

Description

A tensorflow implementation for Deep clustering: Discriminative embeddings for segmentation and separation

Services available

!
?

Need anything else?

Contributors list

# 204,927
Python
Deep le...
Tensorf...
6 commits

A tensorflow implementation of deep clustering for speech seperation

This is a tensorflow implementation of the deep clustering paper: https://arxiv.org/abs/1508.04306 A few exmaples from the test set can be viewed in visualizationsamples/ and speechsamples/

Requirements

Python 2 and its packages: * tensorflow r0.11 * numpy * scikit-learn * matplotlib * librosa

File documentation

  • GlobalConstant.py: Gloabl constants.
  • datagenerator.py: Transform seperate speech files in a dir into .pkl format data set.
  • datagenerator2.py: A class to read the .pkl data set and generate batches of data for training the net.
  • model.py: A class defining the net structure.
  • train_net.py: Train the DC model.
  • mix_samples.py: Mix up two pieces of speech signals for test.
  • AudioSampleReader.py: Transform a .wav file into chunks of frames to be fed to the models during test.
  • visualizationofsamples.py: Visualize the active embedding points using PCA.
  • audio_test.py: Take in two speaker mix sample and seperate them.

Training procedure

  1. Orgnize your speech data files as the following format: rootdir/speakerid/speech_files.wav
  2. Make some changes dir of the datagenerator.py and run it, you may want to rename the .pkl file properly.  3. Make dirs for write summaries and checkpoints, update your dirs in the train_net.py. The changes of the .pkl file list for     training and validation are also need to be made.
  3. Train the model.
  4. Generate some mixtures using mixsamples.py, and modify the checkpoints in audiotest.py.
  5. Enjoy yourself!

Some other things

The optimizer is not the same as that in the original paper, and also no 3 speaker mixture generator is provided, and we are moving on to the next stage of work and will not bother to do that. If you are interested and implemente that, we are glad to merge your branch.

References

https://arxiv.org/abs/1508.04306

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.