Personalized Keyword Spotting

Name

Yechen Wang

Major

Data Science

Class

2022

About

I'm Yechen Wang, a senior data science student at DKU.

Signature Work Project Overview

Keyword spotting (KWS), in terms of speech level, is a task that detects whether a predefined word or phrase has appeared in continuous speech. It is commonly used as the primary technique for low resource trigger systems and speech-based document analysis. More recently, KWS has been widely applied to our daily life, such as the wake-up word detection module for speech assistants on mobile phones, vehicles, and smart speakers. Those speech assistants are triggered by predefined keywords, like “Hey, Cortana,” “Alexa,” and “Hey, Siri,” spoken by the owner. Such applications raise the need for a customized KWS system that could detect the keyword and identify the target speaker’s voice simultaneously. To this end, more attention has been paid to develop a KWS system that responds to a particular speaker in recent research. Therefore, we propose a two-stage personalized keyword spotting system. Our implementation consists of a two-stage keyword spotting system based on query-by-example spoken term detection and speaker verification. We employ two different detection algorithms in our proposed keyword spotting system. The first stage adopts subsequence dynamic time warping for template matching based on frame-level language-independent bottleneck feature and phoneme posterior probability. We use a sliding window template matching algorithm based on acoustic word embeddings to further verify the detection from the first stage from sequence level. As a result, our KWS system achieves an average score of 0.61 on the feedback dataset when using a weighted sum of miss rate and false alarm rate, which outperforms the baseline system by 0.25.

Signature Work Presentation Video