Modelling student engagement using discussion forum data


Elaine Farrow


Key objective: My aim is to develop, validate, and refine models of student engagement using data from class discussion forums, employing methods from natural language processing and machine learning.

Research question: Can we identify features in online discussion forum content which will allow us to model student engagement while a course is still in progress?

Technology use is now a fundamental part of the educational experience for many students, and its importance is widely recognised by researchers. MOOCs and other online courses generate large amounts of 'trace' data as students interact with the course material, with the tutors, and with each other. Previous quantitative work has mainly used this data for prediction tasks, for example to predict which students will drop out of the course (Joksimovic, 2017). There is growing interest in going beyond predicted grades and course completion rates to create richer models of student engagement with a course. If these models can be used while a course is still in progress, they could be used to identify students who are struggling, or lessons which cause confusion, allowing instructors to intervene.

The significance of online discussion forums, where students can interact with one another and with their tutors, is of particular note. In addition to their primary role in supporting education, discussion forums can also be used to inform research. The messages exchanged in the forum can be exported as a time-stamped record of the discussion. Forum transcripts of this sort encompass social exchanges as well as task-focused talk and form a rich source of material for researchers interested in studying how participants work together online, and the ways in which effective learning takes place through discussion.

This is the area in which my PhD research will be situated, using methods from natural language processing and machine learning to model student engagement using discussion forum data. I hope to generate new insights which will be beneficial to students and educators, as well as researchers.

I will begin by using data that has already been collected for research purposes from blended learning courses and MOOCs. Some of this data has been annotated for 'cognitive presence', a measure of engagement that forms part of the Community of Inquiry framework (Garrison et al., 2000). As a first step, I will investigate how well a predictive classifier for cognitive presence can generalise across languages and domains, by training a classifier on English-language data and evaluating it on Portuguese data, using the same set of features in each data set. Next, I plan to use block HMM to gain new insights into the English-language data, conducting a correlation analysis between the identified speech acts and the levels of cognitive presence.

I will also carry out a literature review of previous work on methods and approaches for text analysis in learning analytics. I hope to find unsupervised methods that can be used to generate insights from large, unannotated, data sets, such as MOOC data. I am also particularly interested in work looking at the learning outcomes of students who read forum messages but don't contribute to the discussion themselves.


Supervisors: Johanna Moore & Dragan Gasevic