Differential Privacy in Natural Language Processing: Secure Transformation of Word Embeddings

Name

Minda Zhao

Major

Data Science

Class

2024

About

Minda Zhao is a data science student.

Signature Work Project Overview

In this study, we address the growing concern of privacy in Natural Language Processing (NLP) by proposing a novel approach that inte grates Differential Privacy (DP) with the Continuous Bag-of-Words (CBOW) model for word embeddings. As NLP applications increasingly rely on detailed text data, the risk of exposing sensitive information becomes a significant challenge. Our research aims to develop a technique that not only preserves the privacy of individual data but also maintains high model performance. We employ Differentially Private Stochastic Gradient Descent (DPSGD) to balance privacy and accuracy effectively. Our experimental results demonstrate that our approach can achieve competitive accuracy levels while providing robust privacy guarantees. The findings suggest that the integration of DP into word embedding models is not only feasible but also advantageous in maintaining data privacy without compromising model utility. This work contributes to the field of privacy-preserving machine learning and opens up new possibilities for secure and trustworthy NLP applications.

Signature Work Presentation Video

SIGNATURE WORKCONFERENCE & EXHIBITION 2024