Understanding word embeddings


Carl Allen


Word embeddings derived by (often implicit) factorisation methods are used ubiquitously in machine learning involving reasoning with natural language data. Related factorised vector embeddings also arise in the representation of Knowledge Bases, network graphs and recommender systems. However, the properties learned by such embeddings are not well understood, e.g. in the case of word embeddings how it is that vectors seemingly capture word meaning or why vectors can be added to give intuitive results and solve analogy tasks. We investigate Word2vec, a well-known algorithm by which word embeddings are derived to try to answer these questions, to better understand how these embeddings work and how they might be improved. 


Supervisors: Tim Hospedales & Iain Murray