Back in July, Apple launched a new journal to detail its efforts in machine learning and other areas, letting researchers talk about their work out in the open.
Today, there is a new entry, and this one talks about the “Hey, Siri” feature that’s available on a variety of Apple devices to beckon the digital personal assistant. Apple’s latest entry was published by the Siri team, and details how software, hardware, and a variety of services come together to make the feature work.
There is a speech recognizer that is built into the motion co-processor. It is running all the time, in an effort to make sure that the hardware is always listening for the “Hey, Siri” command. The moment that the speech recognizer identifies those two words, anything that comes after that is immediately recognized as a query.
“The microphone in an iPhone or Apple Watch turns your voice into a stream of instantaneous waveform samples, at a rate of 16000 per second. A spectrum analysis stage converts the waveform sample stream to a sequence of frames, each describing the sound spectrum of approximately 0.01 sec. About twenty of these frames at a time (0.2 sec of audio) are fed to the acoustic model, a Deep Neural Network (DNN) which converts each of these acoustic patterns into a probability distribution over a set of speech sound classes: those used in the “Hey Siri” phrase, plus silence and other speech, for a total of about 20 sound classes.”
A Deep Neural Network is utilized to identify the acoustic pattern of a user’s voice, which then leads to a “temporal integration process” which goes to work to compute a “confidence score.” That score is then used to determine if the phrase “Hey, Siri” was used. After that determination is made, and the query is spoken aloud, Siri will go to work behind-the-scenes to answer the question or provide requested information.
The full paper is available through the source link below. If you’re curious about how Siri works, especially in relation to how the voice-activation works, it’s an interesting read and worth a look.
How often do you use the “Hey, Siri” feature?