Eedi Blog

All blog posts

From Wrong Answers to Real Insights: How We Used a Kaggle Challenge to Map Student Misconceptions

Panagiota Konstantinou

At Eedi, we’re obsessed with a deceptively simple question: why do students get things wrong?

We’re not talking about slips like picking B instead of A by accident. We’re talking about the deeper kind of wrong, like when a student always works left to right, ignoring the order of operations. These kind of mistakes reveal something something more profound: a misconception.

If we can spot them, we can intervene and prevent a misconception cascade, where unresolved misconceptions lead to new ones.

📊 Data: Not Just Any Multiple-Choice Questions

Over the years, we’ve collected a lot of student responses to our diagnostic multiple-choice maths questions (MCQs). But unlike standard MCQs, each incorrect answer (called a distractor) is carefully crafted to reveal a specific misconception.

But here’s the thing: while we had all this rich data, we didn’t have labels linking distractors to the misconceptions they revealed.

And manually tagging them? Painful. Slow. Inconsistent. Not scalable.

So we asked ourselves:

Could a machine learning model help us do this better? Could it learn to tag distractors with the right misconceptions, or at least give teachers a solid head start?

We had no idea. We hadn’t built a model for this. But we knew how to find out.

🚀 Enter: The Kaggle Competition

Rather than cooking something up in secret, we opened the challenge to the world. We launched a competition on Kaggle, the go-to platform for data scientists to flex their skills. We called it Eedi - Mining Misconceptions in Mathematics.

So we kept the task clear:

🧠 Given a distractor and a list of misconception descriptions, predict which ones match.

Simple to say, tough to solve — especially when the many of the misconceptions in the test set were unseen — they had not been encountered during training.

Now, this wasn’t your typical NLP challenge. These distractors live in the weird and wonderful world of maths education — full of numbers, logic traps, and deeply specific student reasoning. It’s not the kind of task that generic language models handle well out of the box.

🎯 What Made This Special

This was completely new territory for us. We hadn’t tried solving this problem before and didn’t have a go-to model. We just had a hunch it was doable and that the global data science community might come up with solutions we’d never think of on our own.

And wow, did they deliver. We saw a fantastic mix of submissions, some wildly creative, others deeply technical, all impressive in their own ways.

It was our first time exploring this specific problem, but not our first time in the competition space. Our NeurIPS 2020 dataset won best dataset at EDM 2021, and our NeurIPS 2022 dataset was voted best at CLeaR 2023.

🏆 Meet the Winners

We offered two types of prizes:

  • Main prizes, awarded to the top-performing solutions based on accuracy.
  • Efficiency prizes, for models that balanced performance with speed, size, and deployment-readiness.

🏆 Main Prizes

🥇 1st Place – Team MTH 101 (Raja Biswas)

This winning solution used a multi-stage retrieve-and-rerank pipeline built on Qwen LLMs:

  • Stage 1: Two Qwen models retrieved top candidate misconceptions for each question-distractor pair, trained with LoRA and contrastive learning on a mix of 10k synthetic and 1.8k real examples.
  • Stage 2: Claude was used to generate short student rationales, which, along with the question, correct answer, and distractor, helped rerank the top misconceptions.

🥈 2nd Place – Kazuhito Yonekawa, Qihang Wang, Yohei Okuyama, Lihang Hong

This team used chain-of-thought prompting with Qwen2.5 to guide the model in reasoning through each distractor.

  • Synthetic data was used for misconception augmentation.
  • Listwise reranking was applied to fine-tune the top 25 predictions.
  • Final model was post-training quantized for faster and lighter deployment.

🥉 3rd Place – Team Waseda Pochi

Focused on robustness and generalisation, especially to unseen misconceptions:

  • Two-stage retrieval: Qwen2.5-14B embeddings for initial search, Qwen2.5-32B for reranking.
  • Used QLoRA with FlagEmbedding for efficient fine-tuning.
  • Introduced a scaling factor to adjust scores for unseen misconceptions, aligning predictions with the test set distribution.

⚡ Efficiency Prizes

🥇 1st Place – Dipam Chakraborty

Built a fast, compact model using Qwen2.5-0.5B as the base:

  • Added prefix caching for inference speed.
  • Generated missing data using Claude 3.5 and GPT-4o.
  • Final model was a SLERP-merged ensemble of 6 models via mergekit, trained in ~2.5 hrs per model on RTX 4080.

🥈 2nd Place – Ryuji Sakata

Took a minimalist approach with all-MiniLM-L6-v2 (22.7M parameters):

  • No fine-tuning—focused on crafting effective input queries.
  • Found that adding misconception explanations (via Qwen2.5-32B) improved retrieval performance significantly.

💡 What We Learned

The competition gave us more than just leaderboard results. It offered something even more valuable: insight into what’s possible.

It’s already inspired early prototypes inside Eedi, and helped us reimagine how we might support teachers in the process of tagging misconceptions, making it faster, more consistent, and more scalable across subjects and topics.

We want to say a huge thank you to every participant, and to our partners at Vanderbilt University, The Learning Agency Lab, and Kaggle for making this all possible.

And of course, we’re deeply grateful to our supporters; the Bill & Melinda Gates Foundation, Schmidt Futures, and the Chan Zuckerberg Initiative, for backing this work.

🎉 Why We’ll Keep Doing This

Competitions like this make us better. They bring new minds to tough problems. They challenge assumptions. And they often lead to tools that help real teachers and real students.

We know it’s not realistic to be experts in every domain. But we believe in asking good questions, sharing good data, and creating space for others to build alongside us.

If you’re a researcher, engineer, or just someone who geeks out over learning and data, we’d love to have you on the next one.

🔗 Check out the competition on Kaggle

Written by
Panagiota Konstantinou
ML Research Engineer

More from the Eedi Blog

Eedi’s Magic Number 🪄

Why 120 questions per year can transform maths progress in your class... At Eedi, teachers often ask: *“How many quizzes do my students really need to complete to meaningfully improve their maths?”* Thanks to a rigorous, externally conducted study, we now have a clear answer: students significantly benefit when they complete at least **120 Eedi check-in questions** over the course of a school year. Here’s what this means in practice — and why it matters for you and your students.

How Eedi drives meaningful maths gains—in just 10 minutes a week

Independent research confirms: Eedi delivers maths improvements equivalent to approximately 3–4 months of additional learning—with just 10 minutes of weekly engagement.

From Data to Discovery: Lessons Learned from Hosting AI Competitions in EdTech

At Eedi, we’re constantly pushing the boundaries of how data and AI can improve learning outcomes for students—and this time, we’re taking it to the next level. Simon Woodhead, Eedi’s Chief Data Scientist, shares an exciting behind-the-scenes look at how we’re using AI and competitions to tackle one of education’s toughest challenges: identifying and addressing student misconceptions in mathematics.