1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77
| def recognize_google(self, audio_data, key=None, language="en-US", show_all=False): """ Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Speech Recognition API.
The Google Speech Recognition API key is specified by ``key``. If not specified, it uses a generic key that works out of the box. This should generally be used for personal or testing purposes only, as it **may be revoked by Google at any time**.
To obtain your own API key, simply following the steps on the `API Keys <http://www.chromium.org/developers/how-tos/api-keys>`__ page at the Chromium Developers site. In the Google Developers Console, Google Speech Recognition is listed as "Speech API".
The recognition language is determined by ``language``, an RFC5646 language tag like ``"en-US"`` (US English) or ``"fr-FR"`` (International French), defaulting to US English. A list of supported language tags can be found in this `StackOverflow answer <http://stackoverflow.com/a/14302134>`__.
Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the raw API response as a JSON dictionary.
Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection. """ assert isinstance(audio_data, AudioData), "``audio_data`` must be audio data" assert key is None or isinstance(key, str), "``key`` must be ``None`` or a string" assert isinstance(language, str), "``language`` must be a string"
flac_data = audio_data.get_flac_data( convert_rate=None if audio_data.sample_rate >= 8000 else 8000, convert_width=2, ) if key is None: key = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw" url = "http://www.google.cn/speech-api/v2/recognize?{}".format( urlencode({"client": "chromium", "lang": language, "key": key,}) ) request = Request( url, data=flac_data, headers={ "Content-Type": "audio/x-flac; rate={}".format(audio_data.sample_rate) }, )
try: response = urlopen(request, timeout=self.operation_timeout) except HTTPError as e: raise RequestError("recognition request failed: {}".format(e.reason)) except URLError as e: raise RequestError("recognition connection failed: {}".format(e.reason)) response_text = response.read().decode("utf-8")
actual_result = [] for line in response_text.split("\n"): if not line: continue result = json.loads(line)["result"] if len(result) != 0: actual_result = result[0] break
if show_all: return actual_result if ( not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0 ): raise UnknownValueError()
if "confidence" in actual_result["alternative"]: best_hypothesis = max( actual_result["alternative"], key=lambda alternative: alternative["confidence"], ) else: best_hypothesis = actual_result["alternative"][0] if "transcript" not in best_hypothesis: raise UnknownValueError() return best_hypothesis["transcript"]
|