by supasorn

230 Stars 90 Forks Last release: Not found 10 Commits 0 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

This is research-code for Synthesizing Obama: Learning Lip Sync from Audio.
Supasorn Suwajanakorn, Steven M. Seitz, Ira Kemelmacher-Shlizerman

Code tested using tensorflow 0.11.0 Please see Supasorn's website for the overview.

To generate MFCC, first normalize the input audio using Then use Sphinx III's snippet by David Huggins-Daines with a modified routine that saves log energy and timestamps:

def sig2s2mfc_energy(self, sig, dn):
  nfr = int(len(sig) / self.fshift + 1)

mfcc = numpy.zeros((nfr, self.ncep + 2), 'd') fr = 0 while fr < nfr: start = int(round(fr * self.fshift)) end = min(len(sig), start + self.wlen) frame = sig[start:end] if len(frame) < self.wlen: frame = numpy.resize(frame,self.wlen) frame[self.wlen:] = 0 mfcc[fr,:-2] = self.frame2s2mfc(frame) mfcc[fr, -2] = math.log(1 + np.mean(np.power(frame.astype(float), 2))) mid = 0.5 * (start + end - 1) mfcc[fr, -1] = mid / self.samprate

fr = fr + 1

return mfcc

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.