NeurIPS 2020 Education Challenge

The NeurIPS 2020 Education Challenge was an international machine learning competition in which participants aimed to accurately predict students' answers to assessments, determine question quality, and identify a personalized sequence of questions for each student that best predicts the student’s answers.

The Need

Digital technologies are becoming increasingly prevalent in education, enabling personalized, high quality education resources to be accessible by students across the world. Importantly, among these resources are diagnostic questions: the answers that the students give to these questions reveal key information about the specific nature of misconceptions that the students may hold.

Analyzing the massive quantities of data stemming from students’ interactions with these diagnostic questions can help us more accurately understand the students’ learning status and thus allow us to automate learning curriculum recommendations.

The Challenge

In this competition, participants focused on the students’ answer records to these multiple-choice diagnostic questions, with the aim of addressing three problems.

Accurately predicting which answers the students provide. In task 1 the goal is to predict correctness and in task 2 the goal is to predict the answer chosen (A, B, C or D).

Accurately predict which questions have high quality.

Determine a personalized sequence of questions for each student that best predicts the student’s answers.

These tasks closely mimic the goals of a real-world educational platform and are highly representative of the educational challenges faced today. More information can be found on the competition website and in the official competition guide.

Dataset

We provided data from two school years (September 2018 to May 2020) of students’ answers to mathematics questions from Eedi, a leading educational platform which millions of students interact with daily around the globe. Eedi offers diagnostic questions to students from primary to high school (roughly between 7 and 18 years old). Each diagnostic question is a multiple-choice question with 4 possible answer choices, exactly one of which is correct. Currently, the platform mainly focuses on mathematics questions.

We created 2 sets of data, one set for task 1and 2, and the other set for task 3 and 4. The summary statistics for these 2 sets of data are as follows:

• 27,613 Questions
• 118,971 Students

• 948 Questions
• 4,918 Students

The total number of answer records for these training sets exceeds 17 million, making it one of the largest educational datasets to date. We also provide extensive metadata on questions, students and answers. For more details on the dataset and the competition tasks, please refer to Section 3 and 4, respectively, of the competition white paper.

We provide access to the data here for the benefit of the community.

• You may not use the material for commercial purposes.
• If you remix, transform, or build upon the material, you may not distribute the modified material.
• The questions images are shared solely for the purpose of model training and must not be used for any other purpose. The question images must not printed or shared with anyone.

@article{wang2020diagnostic,
title={Diagnostic questions: The neurips 2020 education challenge},
author={Wang, Zichao and Lamb, Angus and Saveliev, Evgeny and Cameron, Pashmina and Zaykov, Yordan and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel and Turner, Richard E and Baraniuk, Richard G and Barton, Craig and Jones, Simon Peyton and Woodhead, Simon and Zhang, Cheng},
journal={arXiv preprint arXiv:2007.12061},
year={2020}
}

Evaluation

We also open-source the evaluation script as well as a simple baseline for each competition task. For task 1, 2 and 3, an example evaluation can be performed by running the model file (sample_xxx.py) followed by running the evaluation script (evaluation.py). For task 4, an example evaluation can be performed by running the evaluation script (evaluation.py). Results will be saved to a results/ directory which will be created while running the evaluation scripts. More details about the evaluation scripts are available in Section 5 of the competition white paper.

Awards

Daichi Takehara and Yuto Shinahara; Aidemy Inc.

Shuanghong Shen, Qi Liu, Enhong Chen, Shiwei Tong, Zhengya Huang, Wei Tong, Yu Su, and Shijin Wang; University of Science and Technology of China

In this task there was a four-way tie.
Daichi Takehara and Yuto Shinahara; Aidemy Inc.
Guowei Xu, Jiaohao Chen, Hang Li, Yu Kang, Tianqiao Liu, Yang Hao, Wenbiao Ding, Zitao Liu; TAL Education Group.
TabChen
The Quokka Appreciation Team

Aritra Ghosh, University of Massachusetts Amherst

Combined
Daichi Takehara and Yuto Shinahara; Aidemy Inc.

Organisers

Jack Wang
Angus Lamb
Evgeny Saveliev
Pashmina Cameron
Yordan Zaykov
José Miguel Hernández-Lobato
Richard Turner
Richard G. Baraniuk
Craig Barton
Simon Peyton-Jones