NeurIPS 2020 Education Challenge

The NeurIPS 2020 Education Challenge was an international machine learning competition in which participants aimed to accurately predict students' answers to assessments, determine question quality, and identify a personalized sequence of questions for each student that best predicts the student’s answers.

The Need

Digital technologies are becoming increasingly prevalent in education, enabling personalized, high quality education resources to be accessible by students across the world. Importantly, among these resources are diagnostic questions: the answers that the students give to these questions reveal key information about the specific nature of misconceptions that the students may hold.

Analyzing the massive quantities of data stemming from students’ interactions with these diagnostic questions can help us more accurately understand the students’ learning status and thus allow us to automate learning curriculum recommendations.

The Challenge

In this competition, participants focused on the students’ answer records to these multiple-choice diagnostic questions, with the aim of addressing three problems.

Tasks 1 & 2: Prediction

Accurately predicting which answers the students provide. In task 1 the goal is to predict correctness and in task 2 the goal is to predict the answer chosen (A, B, C or D).

Task 3: Quality

Accurately predict which questions have high quality.

Task 4: Recommend

Determine a personalized sequence of questions for each student that best predicts the student’s answers.

These tasks closely mimic the goals of a real-world educational platform and are highly representative of the educational challenges faced today. More information can be found on the competition website and in the official competition guide.

Competition Website

Dataset

We provided data from two school years (September 2018 to May 2020) of students’ answers to mathematics questions from Eedi, a leading educational platform which millions of students interact with daily around the globe. Eedi offers diagnostic questions to students from primary to high school (roughly between 7 and 18 years old). Each diagnostic question is a multiple-choice question with 4 possible answer choices, exactly one of which is correct. Currently, the platform mainly focuses on mathematics questions.

We created 2 sets of data, one set for task 1and 2, and the other set for task 3 and 4. The summary statistics for these 2 sets of data are as follows:

Tasks 1 & 2

27,613 Questions
118,971 Students
15,867,850 Answers

Tasks 3 & 4

948 Questions
4,918 Students
1,382,727 Answers

The total number of answer records for these training sets exceeds 17 million, making it one of the largest educational datasets to date. We also provide extensive metadata on questions, students and answers. For more details on the dataset and the competition tasks, please refer to Section 3 and 4, respectively, of the competition white paper.

We provide access to the data here for the benefit of the community.

This data is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License:

You may not use the material for commercial purposes.
If you remix, transform, or build upon the material, you may not distribute the modified material.
The questions images are shared solely for the purpose of model training and must not be used for any other purpose. The question images must not printed or shared with anyone.‍
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. Please cite this project through:

@article{wang2020diagnostic,
    title={Diagnostic questions: The neurips 2020 education challenge},
    author={Wang, Zichao and Lamb, Angus and Saveliev, Evgeny and Cameron, Pashmina and Zaykov, Yordan and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel and Turner, Richard E and Baraniuk, Richard G and Barton, Craig and Jones, Simon Peyton and Woodhead, Simon and Zhang, Cheng},
    journal={arXiv preprint arXiv:2007.12061},
    year={2020}
}

Download Eedi Dataset

Evaluation

We also open-source the evaluation script as well as a simple baseline for each competition task. For task 1, 2 and 3, an example evaluation can be performed by running the model file (sample_xxx.py) followed by running the evaluation script (evaluation.py). For task 4, an example evaluation can be performed by running the evaluation script (evaluation.py). Results will be saved to a results/ directory which will be created while running the evaluation scripts. More details about the evaluation scripts are available in Section 5 of the competition white paper.

Download the Eedi Starter Kit

Awards

Task 1
Daichi Takehara and Yuto Shinahara; Aidemy Inc.

Task 2
Shuanghong Shen, Qi Liu, Enhong Chen, Shiwei Tong, Zhengya Huang, Wei Tong, Yu Su, and Shijin Wang; University of Science and Technology of China

Task 3
In this task there was a four-way tie.
Daichi Takehara and Yuto Shinahara; Aidemy Inc.
Guowei Xu, Jiaohao Chen, Hang Li, Yu Kang, Tianqiao Liu, Yang Hao, Wenbiao Ding, Zitao Liu; TAL Education Group.
TabChen
The Quokka Appreciation Team

Task 4
Aritra Ghosh, University of Massachusetts Amherst

Combined
Daichi Takehara and Yuto Shinahara; Aidemy Inc.
‍

Accepted Papers

Tasks 1 & 2‍

Which to Choose? An Order-aware Cognitive Diagnosis Model for Predicting the Multiple-choice Answer of Students ) Shuanghong Shen, Qi Liu, Enhong Chen, Shiwei Tong, Zhengya Huang, Wei Tong, Yu Su, and Shijin Wang
How to Predict Students’ Interactions with Diagnostic Questions: from A Perspective of Recommender System Hongbo Zhang, Xiaolei Qin, Wuhe Zou, Yue Zhu, Ying Liu, Nan Liang and Weidong Zhang
Explainable Knowledge Tracing Models for Big Data: Is Ensembling an Answer? Tirth Shah, Lukas Olson, Aditya Sharma, and Nirmal Patel
Option Tracing: Beyond Binary Knowledge Tracing Aritra Ghosh and Andrew S. Lan (code)
Practical Strategies for Improving the Performance of Student Response Prediction Daichi Takehara and Yuto Shinahara (code)
Diagnostic Questions - The NeurIPS 2020 Education Challenge Abhilash

Task 3

Which Questions Have High Quality: Question Quality Evaluation Metric Based on Answer Records Shiwei Tong, Ye Huang, Rui Lv, Wei Tong, Qi Liu, Zhenya Huang, Yu Su, and Enhong Chen
Quality Assessment of Diagnostic Questions Based on Multiple Features ) Yuto Shinahara and Daichi Takehara (code)
Solution For NeurIPS Education Challenge 2020 from TAL ML Team TAL Machine Learning Group (code 1, code 2)
Assessing the Quality of Mathematics Questions Using Student Confidence Scores Jessica McBroom and Benjamin Paassen (code)

Task 4

Extended Abstract of Task 4 For NeurIPS Education Challenge 2020 Yuren Zhang, Yufei Wu, Jie Huang and Songtao Fang
Extended Abstract of Task 4 For NeurIPS Education Challenge 2020 Haoyang Bi, Yan Zhuang, Shiwei Tong, Qi Liu, Enhong Chen, Zhenya Huang, Yu Su and Wei Tong
A Meta-learning Framework for Personalized Question Selection Aritra Ghosh and Andrew S. Lan (code)
Massive Computerized Adaptive Testing Mehdi Douch, Yassine Esmili, Sein Minn, and Jill-Jênn Vie (code)

Organisers

Jack Wang
Angus Lamb
Evgeny Saveliev
Pashmina Cameron
Yordan Zaykov
José Miguel Hernández-Lobato
Richard Turner
Richard G. Baraniuk
Craig Barton
Simon Peyton-Jones
Simon Woodhead
Cheng Zhang‍