IMPROVED CYCLEGAN BASED SPEECH‐TO‐SPEECH NEURO‐STYLE TRANSFER FOR VOICE CONVERSION

Name

Tianlai Li

Major

Data Science

Class

2023

About

Tianlai Li is a student still exploring in data science. His research interests lie in deep learning, speech processing and math modeling.

Signature Work Project Overview

Voice conversion (VC) is a type of method that can pick up on other people’s voices and speech patterns and produce synthesized audios that closely resemble those voices. Many voice changers would employ such techniques to help people modify their voice characteristics. Many scholars are currently mastering such VC task by utilizing a wide variety of models. In their work, Siwei and Ehab A. introduce a multiple VC model that combines a WaveNet‐based vocoder with Variational Auto‐Encoder (VAE) and Generative Adversarial Network (GAN) as its core components. The VAE‐GAN model introduced by them is excellent at producing numerous target voices as results, but it performs substantially worse than those one‐to‐one VC models, especially on the authenticity. In this SW, to solve this problem, a 2‐1‐2D CNN layer and PatchGAN discriminator inspired by the CycleGAN‐VC2 model are adopted and multiple encoders are introduced to obtain better results in the many‐to‐many VC task. In addition, we compare the results of the new model with the original results through objective (MOSNet) and subjective evaluation (MCD/MSD) and analyze the areas where the model needs to be improved.

Signature Work Presentation Video

SIGNATURE WORKCONFERENCE & EXHIBITION 2023