OSW

SIGNATURE WORK
CONFERENCE & EXHIBITION 2022

Feature Engineering in Biomedical and Health Data Analysis

Name

Yiyang Sun

Major

Data Science

Class

2022

About

I'm Yiyang Sun, an undergraduate student majoring in Data Science. My research interest is mostly focused on biomedical informatics and explainable AI.

Signature Work Project Overview

Feature engineering of biological datasets is always a tough undertaking due to the real-time context and complexity inside and between people. We’ll focus on two aspects of feature engineering in this paper: the missing value problem in feature transformation and feature selection techniques. While biostatisticians and machine learning researchers have developed a range of ways to impute missing values, there is a lack of thorough benchmarks comparing classical and current imputation algorithms under fair and realistic settings. From both theoretical and experimental perspectives, we attempted to analyze the effectiveness of both innovative deep learning approaches and traditional ML imputation methods in various sorts of missing value patterns in both train and test data. To better deal with the feature selection methods, we mainly focus on the feature evaluation methods like Permutation feature importance and Shapley Addictive exPlanation(SHAP) through the 4 general feature elimination methods: filter, wrapper, embedded, and hybrid. Finally, those feature engineering methods are applied to two real-world cases: stroke risk evaluation in Shanxi Province and distinguishing between Diabetic Kidney Disease(DKD) Patients and non-DKD patients and the MIX(both DKD and NDKD) patients. Valuable conclusions are discovered in those biomedical datasets.

Signature Work Presentation Video