:speech_balloon: Reverse Engineering Google's Speech To Text API (v2)
Google has since launched it's official Google Cloud Speech API. I strongly recommend looking over there.
https://www.google.com/speech-api/v2/recognize
output: json, xml not supported.
lang: any valid locale (en-us, nl-be, fr-fr, etc.)
key: Please get one from the Google Developers Console
Key is not optional.
app: optional
You can specify an optional query string called
app, which returns some extra transcripts for some reason.
client: optional, seems to do nothing in particular
Flac file; 44100Hz 32bit float, exported with Audacity. Check the audio folder in this repository for some hilarious examples.
Channels : 2 Sample Rate : 44100 Precision : 32-bit Sample Encoding: 32-bit Float
The following audio options are confirmed working for 16-bit PCM sample encoding:
Channels : 1 Sample Rate : 16000 Precision : 16-bit Sample Encoding: 16-bit Signed Integer PCM
One-line sox recording command:
rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
Content-Type:
Content-Type: audio/x-flac; rate=44100;
Set the rate to be equal to the rate of the FLAC file (generally 44100Hz) but it supports different rates.
Content-Type: audio/l16; rate=16000;is also supported with a rate of 44100Hz or 16000Hz for files encoded with LPCM 16-bit signed-integer.
NOTE: Make sure the rate in your header matches the sample rate you used for your audio capture.
User-Agent:
not required, but for spoofing purposes use one of Chrome’s userAgent strings.
When Google is 100% confident in it's translation, it will return the following object:
{ "result":[ { "alternative":[ { "transcript":"good morning Google how are you feeling today" } ], "final":true } ], "result_index":0 }
When it's doubtful, it adds a confidence parameter for you. It also seems to add multiple transcripts for some reason.
{ "result":[ { "alternative":[ { "transcript":"this is a test", "confidence":0.97321892 }, { "transcript":"this is a test for" } ], "final":true } ], "result_index":0 }
On OS X with Homebrew installed:
brew install sox
rec --encoding signed-integer --bits 16 --channels 1 --rate 16000 test.wav
curl -X POST \ --data-binary @'audio/hello (16bit PCM).wav' \ --header 'Content-Type: audio/l16; rate=16000;' \ 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'
Or for FLAC encoded audio:
curl -X POST \ --data-binary @audio/good-morning-google.flac \ --header 'Content-Type: audio/x-flac; rate=44100;' \ 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=yourkey'
Here are a few caveats you have to know about, should you decide to use this API in a production environment. (I don't recommend it)