ANSWERING OPEN-DOMAIN QUESTIONS IN LONG-FORM: A PRETRAINING TWO-STAGE SEQ2SEQ FRAMEWORK FOR TEXT GENERATION

Name

Yuzhe Gu

Major

Data Science

Class

2023

About

This E-portfolio contains a narrative, EL reflection and a product thesis paper on Yuzhe Gu's undergraduate signature work study.

Signature Work Project Overview

In this work, I replicated the BERT-based two-stage transformer sequence-to- sequence model for text generation. Although the original work focuses on text summarization, I successfully adapted this framework into the domain of long-form question answering. This framework demonstrated satisfying performance on the ELI5 dataset compared to some state-of-the-art question-answering systems. The superiority of the model’s two-stage setting is verified to be highly effective. Besides, the low efficiency of the refinement process in the second stage throws a potential drawback of this framework. Inspired by relevant successes in question answering, some future works might be utilizing multi-task training, as well as trying other pre-trained models similar to BERT.

Signature Work Presentation Video

SIGNATURE WORKCONFERENCE & EXHIBITION 2023