The Siri Team Talks About Personalized ‘Hey Siri’ in New Machine Learning Journal Entry


Way back in October, the Siri Team took some time to publish a new entry in Apple’s machine learning journal, putting some focus on the “Hey Siri” feature.

Now, the team is back with a new entry in the journal, and it’s going into a bit more detail when it comes to the personalization of the “Hey Siri” feature. As was touched on in the previous entry, Apple notes that it went with the “Hey Siri” voice function because it was so natural that many iOS users were already using it before Apple even introduced the feature. When they would launch Siri with the Home button in older versions of the feature, they would start their request with “Hey Siri”.

“The phrase “Hey Siri” was originally chosen to be as natural as possible; in fact, it was so natural that even before this feature was introduced, users would invoke Siri using the home button and inadvertently prepend their requests with the words, “Hey Siri.” Its brevity and ease of articulation, however, bring to bear additional challenges.”

The full entry is certainly worth looking at if you like the technical side of things. As is par for the course with these machine learning journal entries, they are certainly meant to dig into the key details, as opposed to just looking at a feature in a general scope.

That being said, the entry also looks ahead at what features or challenges the team is planning on taking on next. That includes the ability to use Siri in a crowded room, or even in a large room, and still have the same efficacy that Siri users have come to expect over the years.

“Although the average speaker recognition performance has improved significantly, anecdotal evidence suggests that the performance in reverberant (large room) and noisy (car, wind) environments still remain more challenging. One of our current research efforts is focused on understanding and quantifying the degradation in these difficult conditions in which the environment of an incoming test utterance is a severe mismatch from the existing utterances in a user’s speaker profile. In our subsequent work [2], we demonstrate success in the form of multi-style training, in which a subset of the training data is augmented with different types of noise and reverberation.”

You can check out the full journal entry below.

[via Apple]

Like this post? Share it!