Need help with awesome-speech-enhancement?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

Wenzhe-Liu
303 Stars 93 Forks GNU General Public License v2.0 86 Commits 0 Opened issues

Description

speech enhancement\speech seperation\sound source localization

Services available

!
?

Need anything else?

Contributors list

Awesome Speech Enhancement

This repository summarizes the papers, codes and tools for single-/multi-channel speech enhancement/speech seperation task, which aims to create a list of open source projects rather than pursuing the completeness of the papers. You are kindly invited to pull requests. <!--TODO ... datasets... Tutorials... https://github.com/topics/beamforming -->

Contents

Speech_Enhancement

### Magnitude spectrogram #### spectral masking * 2014, On Training Targets for Supervised Speech Separation, Wang. [Paper]
* 2018, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement, Valin. [Paper] [RNNoise] [RNNoise16k] * 2020, A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech, Valin. Paper [PercepNet] * 2021, RNNoise-Ex: Hybrid Speech Enhancement System based on RNN and Spectral Features. [Paper] [RNNoise-Ex] * Other IRM-based SE repositories: [IRM-SE-LSTM] [nn-irm] [rnn-se] [DL4SE]

#### spectral mapping * 2014, An Experimental Study on Speech Enhancement Based on Deep Neural Networks, Xu. [Paper] * 2014, A Regression Approach to Speech Enhancement Based on Deep Neural Networks, Xu. [Paper] [sednn] [DNN-SE-Xu] [DNN-SE-Li] * Other DNN magnitude spectrum mapping-based SE repositories: [SE toolkit] [TensorFlow-SE] [UNetSE] * 2015, Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR, Weninger. [Paper] * 2016, A Fully Convolutional Neural Network for Speech Enhancement, Park. [Paper] [CNN4SE] * 2017, Long short-term memory for speaker generalizationin supervised speech separation, Chen. [Paper] * 2018, A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement, Tan. [Paper] [CRN-Tan] * 2018, Convolutional-Recurrent Neural Networks for Speech Enhancement, Zhao. [Paper] [CRN-Hao] * 2020, Online Monaural Speech Enhancement using Delayed Subband LSTM, Li. [Paper] * 2020, FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement, Hao. [Paper] [FullSubNet]

### Complex domain * 2017, Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, Fu. [Paper] * 2017, Time-Frequency Masking in the Complex Domain for Speech Dereverberation and Denoising, Williamson. [Paper] * 2019, PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network, Yin. [Paper] [PHASEN] * 2019, Phase-aware Speech Enhancement with Deep Complex U-Net, Choi. [Paper] [DC-UNet] * 2020, Learning Complex Spectral Mapping With GatedConvolutional Recurrent Networks forMonaural Speech Enhancement, Tan. [Paper] [GCRN] * 2020, DCCRN: Deep Complex Convolution Recurrent Network for Phase-AwareSpeech Enhancement, Hu. [Paper] [DCCRN] * 2020, T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement, Kim. [Paper] * 2020, Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net, Choi. [Paper]

### Time domain * 2018, Improved Speech Enhancement with the Wave-U-Net, Macartney. [Paper] [WaveUNet] * 2019, A New Framework for CNN-Based Speech Enhancement in the Time Domain, Pandey. [Paper] * 2019, TCNN: Temporal Convolutional Neural Network for Real-time Speech Enhancement in the Time Domain, Pandey. [Paper] * 2020, Real Time Speech Enhancement in the Waveform Domain, Defossez. [Paper] [facebookDenoiser] * 2020, Monaural speech enhancement through deep wave-U-net, Guimarães. [Paper] [SEWUNet] * 2020, Speech Enhancement Using Dilated Wave-U-Net: an Experimental Analysis, Ali. [Paper] * 2020, Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in the Time Domain, Pandey. [Paper] [DDAEC] * 2021, Dense CNN With Self-Attention for Time-Domain Speech Enhancement, Pandey. [Paper] * 2021, Dual-path Self-Attention RNN for Real-Time Speech Enhancement, Pandey. [Paper]

### GAN * 2017, SEGAN: Speech Enhancement Generative Adversarial Network, Pascual. [Paper] [SEGAN] * 2019, SERGAN: Speech enhancement using relativistic generative adversarial networks with gradient penalty, Deepak Baby. [Paper] [SERGAN] * 2019, MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement, Fu. [Paper] [MetricGAN] * 2019, MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement, Fu. [Paper] [MetricGAN+] * 2020, HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks, Su. [Paper] [HifiGAN]

### Hybrid SE * 2019, Deep Xi as a Front-End for Robust Automatic Speech Recognition, Nicolson. [Paper] [DeepXi] * 2019, Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep-Learning-Based Speech Enhancement, Li. [Paper] [SE-MLC] * 2020, Deep Residual-Dense Lattice Network for Speech Enhancement, Nikzad. [Paper] [RDL-SE] * 2020, DeepMMSE: A Deep Learning Approach to MMSE-based Noise Power Spectral Density Estimation, Zhang. [Paper] * 2020, Speech Enhancement Using a DNN-Augmented Colored-Noise Kalman Filter, Yu. [Paper] [DNN-Kalman]

<!--### NMF * SpeechEnhancementDNNNMF [[Code]](https://github.com/eesungkim/SpeechEnhancementDNNNMF) * gcc-nmf:Real-time GCC-NMF Blind Speech Separation and Enhancement [Code] * https://github.com/Jerry-jwz/Audio-Enhancement-via-ONMF-->

### Multi-stage * 2020, A Recursive Network with Dynamic Attention for Monaural Speech Enhancement, Li. [Paper] [DARCN] * 2020, Masking and Inpainting: A Two-Stage Speech Enhancement Approach for Low SNR and Non-Stationary Noise, Hao. [Paper] * 2020, A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement, Du. [Paper] * 2020, Dual-Signal Transformation LSTM Network for Real-Time Noise Suppression, Westhausen. [Paper] [DTLN] * 2020, Listening to Sounds of Silence for Speech Denoising, Xu. [Paper] [LSS] * 2021, ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network, Li. [Paper]

### Data collection * Kashyap([Noise2Noise]) ### Loss * [Quality-Net] ### Challenge * DNS Challenge [DNS Interspeech2020] [DNS ICASSP2021] [DNS Interspeech2021]

### Other repositories * Collection of papers, datasets and tools on the topic of Speech Dereverberation and Speech Enhancement [Link] * nanahou's awesome speech enhancement [Link]

Dereverberation

Traditional method

Speech Separation (single channel)

  • Tutorial speech separation, like awesome series [Link] ### NN-based separation
  • 2015, Deep-Clustering:Discriminative embeddings for segmentation and separation, Hershey and Chen.[Paper] [Code] [Code] [Code]
  • 2016, DANet:Deep Attractor Network (DANet) for single-channel speech separation, Chen.[Paper] [Code]
  • 2017, Multitalker speech separation with utterance-level permutation invariant training of deep recurrent, Yu.[Paper] [Code]
  • 2018, LSTMPITSpeechSeparation [[Code]](https://github.com/pchao6/LSTMPITSpeechSeparation)
  • 2018, Tasnet: time-domain audio separation network for real-time, single-channel speech separation, Luo.[Paper] [Code]
  • 2019, Conv-TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation, Luo.(Paper) [Code]
  • 2019, Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation, Luo.[Paper] [Code1] [Code2]
  • 2019, TAC end-to-end microphone permutation and number invariant multi-channel speech separation, Luo.[Paper] [Code]
  • sound separation(Google) [Code]
  • sound separation: Deep learning based speech source separation using Pytorch [Code]
  • music-source-separation [Code]
  • Singing-Voice-Separation [Code]
  • Comparison-of-Blind-Source-Separation-techniques[Code] ### BSS/ICA method
  • FastICA[Code]
  • A localisation- and precedence-based binaural separation algorithm[Download]
  • Convolutive Transfer Function Invariant SDR [Code]
  • ## Array Signal Processing
  • MASP:Microphone Array Speech Processing [Code]
  • BeamformingSpeechEnhancer [Code]
  • TSENet [Code]
  • steernet [Code]
  • DNNLocalizationAndSeparation [[Code]](https://github.com/shaharhoch/DNNLocalizationAndSeparation)
  • nn-gev:Neural network supported GEV beamformer CHiME3 [Code]
  • chime4-nn-mask:Implementation of NN based mask estimator in pytorch(reuse some programming from nn-gev)[Code]
  • beamformitmatlab:A MATLAB implementation of CHiME4 baseline Beamformit [[Code]](https://github.com/gogyzzz/beamformitmatlab)
  • pbchime5:Speech enhancement system for the CHiME-5 dinner party scenario [[Code]](https://github.com/fgnt/pbchime5)
  • beamformit:麦克风阵列算法 [Code]
  • Beamforming-for-speech-enhancement [Code]
  • deepBeam [Code]
  • NNMASK [[Code]](https://github.com/ZitengWang/nnmask)

* Cone-of-Silence [Code]

  • binauralLocalization [Code]
  • robotauditionexamples:Some Robot Audition simplified examples (sound source localization and separation), coded in Octave/Matlab [[Code]](https://github.com/balkce/robotauditionexamples)
  • WSCM-MUSIC [Code]
  • doa-tools [Code]
  • Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks [Code] [PDF]
  • messl:Model-based EM Source Separation and Localization [Code]
  • messlJsalt15:MESSL wrappers etc for JSALT 2015, including CHiME3 [Code]
  • fastsoundsourcelocalizationusingTLSSC:Fast Sound Source Localization Using Two-Level Search Space Clustering [[Code]](https://github.com/LeeTaewoo/fastsoundsourcelocalizationusingTLSSC)
  • Binaural-Auditory-Localization-System [Code]
  • BinauralLocalization:ITD-based localization of sound sources in complex acoustic environments [[Code]](https://github.com/Hardcorehobel/BinauralLocalization)
  • DualChannelBeamformerandPostfilter [Code]
  • 麦克风声源定位 [Code]
  • RTF-based-LCMV-GSC [Code]
  • DOA [Code]

Sound Event Detection

  • sedeval - Evaluation toolbox for Sound Event Detection [[Code]](https://github.com/TUT-ARG/sedeval)
  • Benchmark for sound event localization task of DCASE 2019 challenge [Code]
  • sed-crnn DCASE 2017 real-life sound event detection winning method. [Code]
  • seld-net [Code]

Tools

  • APS:A workspace for single/multi-channel speech recognition & enhancement & separation. [Code]
  • AKtools:the open software toolbox for signal acquisition, processing, and inspection in acoustics [SVN Code](username: aktools; password: ak)
  • espnet [Code]
  • asteroid:The PyTorch-based audio source separation toolkit for researchers[PDF][Code]
  • pytorchcomplex [[Code]](https://github.com/kamo-naoyuki/pytorchcomplex)
  • ONSSEN: An Open-source Speech Separation and Enhancement Library [Code]
  • separationdatapreparation[Code]
  • MatlabToolbox [Code]
  • athena-signal [[Code]](https://github.com/athena-team/athena-signal)
  • pythonspeechfeatures [Code]
  • speechFeatures:语音处理,声源定位中的一些基本特征 [Code]
  • sap-voicebox [Code]
  • Calculate-SNR-SDR [Code]
  • RIR-Generator [Code]
  • Python library for Room Impulse Response (RIR) simulation with GPU acceleration [Code]
  • ROOMSIM:binaural image source simulation [Code]
  • binaural-image-source-model [Code]
  • PESQ [Code]
  • SETK: Speech Enhancement Tools integrated with Kaldi [Code]
  • pbchime5:Speech enhancement system for the CHiME-5 dinner party scenario [[Code]](https://github.com/fgnt/pbchime5)

Resources

  • Speech Signal Processing Course(ZH) [Link]
  • Speech Algorithms(ZH) [Link]
  • CCF语音对话与听觉专业组语音对话与听觉前沿研讨会(ZH) [Link]

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.