Zero-Shot Automatic Pronunciation Assessment

Hongfu Liu; Mingqian Shi; Ye Wang

doi:10.48550/arxiv.2305.19563

Back

Preprint

Zero-Shot Automatic Pronunciation Assessment

Hongfu Liu, Mingqian Shi and Ye Wang

05/31/2023

DOI: https://doi.org/10.48550/arxiv.2305.19563

Abstract

Automatic Pronunciation Assessment (APA) is vital for computer-assisted language learning. Prior methods rely on annotated speech-text data to train Automatic Speech Recognition (ASR) models or speech-score data to train regression models. In this work, we propose a novel zero-shot APA method based on the pre-trained acoustic model, HuBERT. Our method involves encoding speech input and corrupting them via a masking module. We then employ the Transformer encoder and apply k-means clustering to obtain token sequences. Finally, a scoring module is designed to measure the number of wrongly recovered tokens. Experimental results on speechocean762 demonstrate that the proposed method achieves comparable performance to supervised regression baselines and outperforms non-regression baselines in terms of Pearson Correlation Coefficient (PCC). Additionally, we analyze how masking strategies affect the performance of APA.

Metrics

22 Record Views

Details

Title: Zero-Shot Automatic Pronunciation Assessment
Creators: Hongfu Liu
Mingqian Shi
Ye Wang
Identifiers: 9924262776001921
Academic Unit: Michtom School of Computer Science; Benjamin and Mae Volen National Center for Complex Systems
Language: English
Resource Type: Preprint

Zero-Shot Automatic Pronunciation Assessment

Abstract

Metrics

Details

Brandeis University Social media