Katarzyna Prus

Modelling Aspect and NLP for Slavic Languages

Katarzyna Prus

Most modern language technology, driven by the volume of resources available, focuses on building solutions for English. Consequently, research on other languages often aims to extend what has already been established for English. Arguably, by directing the focus this way, we miss out on tackling research questions that can be addressed by exploiting phenomena which simply do not exist in English.

An example of such a phenomenon is verb aspect – a property of a verb which internally describes the state, unfolding or duration of the action. Although a similar concept is expressed in English with the use of different grammatical tenses (e.g. ‘I did’ vs ‘I was doing’), in some languages it is a feature of the verb itself, encoded in its morphological structure (e.g. [Polish] ‘zrobi´c’ vs ‘robi´c’). This phenomenon, while extensively described in the Linguistics literature, remains somewhat understudied in the computational domain.

Following recent interest in research on whether neural networks capture morphology (Vania and Lopez, 2017), this proposal suggests to start by studying aspect in Polish as a morphological phenomenon. This is a starting point to ask further questions on how the difference between how aspect is realised in English and how it is realised in Slavic languages influences various NLP applications.

Supervisors: Adam Lopez &