| Voice conversion is a critical area of research with numerous applications in speech recognition, natural language processing, and entertainment. In this project, we aimed to improve the performance of the VAE-GAN model for voice conversion by incorporating a revised architecture inspired by the CycleGAN-VC2 and other popular models. Our goal was to achieve better results in inter-gender voice conversion while maintaining the naturalness and similarity of the converted speech. Overall, our findings suggest that the improved VAE-GAN model with a 2-1-2 CNN layer and other architectures in decoder and discriminator could be a promising approach for improving inter-gender voice conversion. However, further research is needed to address the naturalness and similarity of the converted speech and to identify strategies for improving performance across different accents. By continuing to explore new architectures and training techniques, we hope to continue advancing the field of voice conversion and contributing to the development of more natural and human-like speech technologies. |