Translating Natural Language into Source Code & vice versa


Rafael-Michael Karampatsis

This PhD project aspires to study methodologies that can translate natural language into source code. This task can be viewed as an instance of semantic parsing where the meaning representation is the source code in the language of choice. Specifically, the main objective is to create a system that can translate a natural language description into executable source code and also generate an appropriate natural language description when source code is given as input. Drawing inspiration from machine translation (MT) we plan to investigate tree-to-tree transducers and neural machine translation. The main advantages of the tree-to-tree transducer approach is that it can do bidirectional translation without any modification in the methodology and also the generated output has correct syntax. On the other hand Neural machine translation is a recently proposed approach to MT that has achieved very promising results in various applications. Unlike the traditional statistical MT, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. We would like to study how syntax could be incorporated into Neural MT as these approaches operate over sequences and not trees. Lastly, the different methods will be evaluated on four datasets (e.g., database queries described in natural language) using evaluation metrics proposed in relevant work.


Supervisors: Charles Sutton & Mirella Lapata