Mathematical Aspects of Word Embeddings

Carrington, Rachel (2021) Mathematical Aspects of Word Embeddings. PhD thesis, University of Nottingham.

[thumbnail of Rachel_Carrington_thesis.pdf]
Preview
PDF (Thesis - as examined) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Available under Licence Creative Commons Attribution.
Download (7MB) | Preview

Abstract

Word embeddings are a popular way of modelling relationships between words. Words are represented as low-dimensional vectors, such that the distances between the vectors reflect relationships between the words: words which are more similar to each other should be closer together in the embedding space.

This thesis explores several different aspects of word embeddings. First, we look at the problem of non-identifiability: word embeddings are generated by optimizing an objective function, but the optimal embedding set is not unique. This has consequences for how embeddings are evaluated, and for making comparisons between different word embedding methods. We explain why this is the case and propose some solutions for dealing with it.

We then explore the potential for generating semi-supervised word embeddings, with the aim being to more accurately capture the relationships between words, compared to using standard unsupervised embedding methods. We introduce three semi-supervised objective functions, derive algorithms for optimizing them, and implement them on simulated and real data.

Finally, we look at the generation of time-dependent word embeddings, in particular the development of statistical tests for assessing whether certain words have changed in meaning or usage over a given time period. We introduce a time-dependent word embedding model and use it to test for change over time. However, we find that we are unable to distinguish between the presence of time dependence and a misspecified embedding dimension.

Item Type: Thesis (University of Nottingham only) (PhD)
Supervisors: Bharath, Karthik
Preston, Simon
Keywords: word embedding, language analysis, natural language processing, data
Subjects: Q Science > QA Mathematics > QA299 Analysis
Faculties/Schools: UK Campuses > Faculty of Science > School of Mathematical Sciences
Item ID: 65089
Depositing User: Carrington, Rachel
Date Deposited: 18 Jan 2024 15:17
Last Modified: 18 Jan 2024 15:17
URI: https://eprints.nottingham.ac.uk/id/eprint/65089

Actions (Archive Staff Only)

Edit View Edit View