Tom Sherborne

Crosslingual Semantic Parsing with Minimal Supervision

Tom Sherborne


Semantic parsing is the task of realising the intention of a natural language utterance as a precise description in some formal, machine readable meaning representation language. A semantic parser can function as a question-answering interface for a structured knowledge base (KB). The parser answers questions by translating natural language into a logical representation and then executing such representation inside the KB to retrieve an answer, or denotation.

Semantic parsing for generating executable queries has gained interest in recent years as a challenging natural language processing (NLP) task, with the most interest from industry, owing to the potential of using a parser inside a machine-learning driven digital assistant such as Amazon Alexa, Apple Siri or Google Assistant. There has been a contemporary proliferation of new datasets and benchmarks for the task (Pasupat and Liang, 2015; Suhr et al., 2018; Wang et al., 2015), as well as for generating SQL queries from natural language questions (Iyyer et al., 2017; Yu et al., 2018; Zhong et al., 2017). However, this recent research output has generally used English as a synonym for all natural language and generating logical forms from any other languages has been largely ignored. This has established a bias towards English with significantly greater task-specific resources for this language than any other. Previous efforts for multilingual semantic parsing widely assume parallel data in all languages and use small legacy datasets such as GeoQuery(Duong et al., 2015; Susanto and Lu, 2017).

English is neither linguistically typical (Dryer and Haspelmath, 2013) nor the most widely spoken first language worldwide (Eberhard et al., 2019), identifying a chasm between the lingua franca of semantic parsing and the populations which may benefit from such systems as users of assistant technologies. Rather than attempting to match this surfeit of English resources for another language, we propose a PhD project into investigating how to best utilise existing resources to improve semantic parsing of resource-poor1 languages. We formulate this problem as scenario for an average developer who desires to build a semantic parser for a language with only a small set of annotated utterances from native speakers for evaluation. The typical approach to developing a semantic parser is fully supervised learning sequence transduction using logical forms as the target sequence.

This project will investigate how a developer in this scenario might train a semantic parser without exorbitant annotation costs or access to near-unlimited machine-learning hardware. To extend this, the project will also investigate semantic parsing without supervision of logical forms, using denotations as the target, and also with only a knowledge base and no training data of any kind. These efforts will uncover the limits of decreasing supervision, or annotation, balanced with competitive accuracy parsing a resource-poor language.

Supervisors: Mirella Lapata & Mark Steedman