OSW

SIGNATURE WORK
CONFERENCE & EXHIBITION 2022

Deep Learning-Based Gaze Estimation

Name

Xinmeng Chen

Major

Data Science

Class

2022

About

Class of 2022, Major in Data Science

Signature Work Project Overview

Gaze estimation aims to determine the direction or position a person is intently and steadily looking at based on visual input (e.g., eye images, video). It helps to reveal an important cue for human intention, and it has a wide range of applications in the areas like healthcare, human-computer interaction, and commerce. The conventional ways to estimate gaze commonly rely on geometric features of the eyes and require dedicated equipment and accurate landmark detection. The development of deep learning inspirited the rise of deep learning-based gaze estimation methods, which extract high-level features and use highly non-linear mapping functions. CNN is the most commonly used architecture in existing studies. With the popularity of transformers in the computer vision area, a recent study showed that integrating ViT in gaze estimation can achieve better results. Early this year, a new study found that modifying a CNN model toward a transformer design would surpass transformers’ performance in the image classification task. In this report, we explored whether a CNN model with the design idea of transformers would have a better performance in the gaze estimation task. We tested the CNN, transformer, and modified CNN architectures on various gaze estimation datasets to compare their performance.

Signature Work Presentation Video