Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization

Albert, Corbin (2017) Exploring teacher forcing techniques for sequence-to-sequence abstractive headline summarization. [Dissertation (University of Nottingham only)]

[img] PDF - Registered users only - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Download (3MB)


Every internet user today is exposed to countless article headlines. These can range from informative, to sensationalist, to downright misleading. These snippets of information can have tremendous impacts on those exposed and can shape ones views on a subject before even reading the associated article. For these reasons and more, it is important that the Natural Language Processing community turn its attention towards this critical part of everyday life by improving current abstractive text summarization techniques. To aid in that endeavor, this project explores various methods of teacher forcing, a technique used during model training for sequence-to-sequence recurrent reural network architectures.

A relatively new deep learning library called PyTorch has made experimentation with teacher forcing accessible for the first time and is utilized for this purpose in the project. Additionally, to the author’s best knowledge this is the first implementation of abstrac¬tive headline summarization in PyTorch. Seven different teacher forcing techniques were designed and experimented with: (1) Constant levels of 0%, 25%, 50%, 75%, and 100% teacher forcing probability through the entire training cycle; and (2) two different gradu¬ated techniques: one that decreased linearly from 100% to 0% through the entire training cycle to convergence, and another that graduated from 100% to 0% every 12.5% of the training cycle, often corresponding with learning rate annealing. Dozens of generative sequence-to-sequence models were trained with these various techniques to observe their differences.

These seven different teacher forcing techniques were compared to one another via two metrics: (1) ROUGE F-scores, the most common metric used in this field; and (2) average loss over time. Counter to what was expected, this project shows with statistical significance that consistent 100% and 75% teacher forcing produced better ROUGE scores than any other metric.

These results confirm the use of 100% teacher forcing, the most widely used technique today. However, this throws into question an important assumption by many leading machine learning researchers that dynamic, graduated teacher forcing techniques should results in greater model performance. Questions of ROUGE metric validity, response to more complicated model parameters, and domain specificity are encouraged for further analysis.

Item Type: Dissertation (University of Nottingham only)
Depositing User: Gonzalez-Orbegoso, Mrs Carolina
Date Deposited: 05 Jan 2018 12:18
Last Modified: 09 Jan 2018 14:16
URI: http://eprints.nottingham.ac.uk/id/eprint/48564

Actions (Archive Staff Only)

Edit View Edit View