The area to be explored in this PhD is paraphrasing with neural machine translation. Paraphrasing can be broadly described as the task of using an alternative surface form to express the same semantic content (Madnani and Dorr, 2010a). Much of the appeal of paraphrasing stems from its application potential to a wider range of NLP problems. Examples include query and pattern expansion (Riezler et al., 2007), summarization (Barzilay, 2003), question answering (Lin and Pantel, 2001), semantic parsing (Berant and Liang, 2014), semantic role labeling (Woodsend and Lapata, 2014), and machine translation (Callison-Burch et al., 2006).
A well-established technique for paraphrasing leverages bilingual corpora to find meaning-equivalent phrases in a single language by “pivoting” over a shared translation in another language. Pivoting is often used in machine translation to overcome the shortage of parallel data, i,e., when there is not a translation path from the source language to the target. Instead, pivoting takes advantage of paths through an intermediate language. The idea dates back at least to Kay (1997), who observed that ambiguities in translating from one language onto another may be resolved if a translation into some third language is available, and has met with success in traditional phrase-based SMT (Wu and Wang, 2007; Utiyama and Isahara, 2007) and more recently in neural MT systems (Firat et al., 2016; Zoph and Knight, 2016).
I will explore the application of machine translation and pivoting techniques to both paraphrase classification and the generation of paraphrases. There are many pre-existing paraphrase classification datasets and problems, which require determining if pairs of sentences are paraphrases. These datasets are well studied, which will make the performance of my model easy to judge. Due to the difficulty of evaluating generating paraphrases both a manual and automatic approach will be undertaken.