From Data to Discovery: Lessons Learned from Hosting AI Competitions in EdTech

Simon Woodhead

November 25, 2024

As a young edtech startup, we at Eedi found ourselves collecting a lot of learning data, specifically around misconceptions in mathematics, and we wanted to see whether that data could be used to help us make predictions about students’ learning, ultimately enhancing our solution. I had a rusty background in Bayesian statistics and tried applying a few methods, but I suspected that Machine Learning methods might be more appropriate, and I was not an expert.

We were very fortunate to be working on a project with Simon Peyton-Jones, the co-creator of Haskell, who was working at Microsoft Research at the time. Simon recognized that a Machine Learning healthcare model that a colleague was working on could be applied to our setting. We hired a student intern to work with Microsoft Research and adapt the model for the education context. The model worked well – so were we done?

I believe that we have a responsibility, as edtech providers, to be the ultimate pragmatists, not specialists. To really know if your model is state of the art, you need more perspectives.

Maybe you’re an expert in a particular field and have built a great model for your edtech tool, or maybe like us, a fortuitous connection has put you in touch with a brilliant researcher. But how do you know your model is the best approach? I believe that we have a responsibility, as edtech providers, to be the ultimate pragmatists, not specialists. To really know if your model is state of the art, you need more perspectives. Inviting others to solve the problem is crucial because the best solution may not come from your particular area of expertise.

This does not mean you need to throw your model away if another outperforms it. Maybe it has particular properties that are crucial for your use case. However, it is good to know that your model is “in the right ballpark.” One way to get an outside perspective is to host or participate in a data science competition which could introduce you to diverse expertise and new approaches that could offer valuable insights into your approach.

Academic researchers, for example, often have deeper knowledge of specific domains but limited access to data. They can also be fiercely dedicated to their fields, often disagreeing passionately with other approaches. Experts in causality will want to test their causal models; reinforcement learning researchers will want to try their methods. This diversity of thought is exactly why at Eedi we share data and host competitions: to harvest the best ideas from brilliant minds across disciplines.

Experts in causality will want to test their causal models; reinforcement learning researchers will want to try their methods. This diversity of thought is exactly why we share data and host competitions: to harvest the best ideas from brilliant minds across disciplines.

A consequence of running competitions is that it opens up conversations with the top researchers. I regularly get requests from researchers to use our datasets or citations in their published research, or when they have used our dataset to evaluate their new research. We have regularly collaborated with participants of past competitions, most recently on a successful Tools Competition 2024 proposal.

We are currently hosting a Kaggle competition in which we’re challenging participants to tag incorrect answers with misconceptions. We have tens of thousands of multiple-choice questions, in which the incorrect answers often indicate a specific misconception. So when a student answers one of these questions incorrectly we learn something about why they got it wrong. Unfortunately, this link between the incorrect answer and misconception is not recorded so we saw a huge opportunity to add these labels. This is a challenging task for humans and it is very time-consuming. Could a model be created to do this task? I guess we’ll find out!

Eedi – Mining Misconceptions in Mathematics | Kaggle
Deadline: 5th December 2024
Prize: $55,000

These competitions are not just about the best model; they’re about leveraging collective intelligence to push boundaries. Furthermore, it’s about uncovering the most effective strategies for driving innovation through collective problem-solving. Over the years, Eedi has learned valuable lessons on how to run competitions that generate real impact. From structuring the competition to attract diverse expertise, to designing datasets that challenge participants, and fostering ongoing collaboration with top researchers, each element plays a key role in ensuring success. Let’s dive into some of the best practices for hosting these competitions to get many top-quality solutions.

These competitions are not just about the best model; they’re about leveraging collective intelligence to push boundaries.

Curate a high-quality, diverse dataset: A well-curated dataset is the backbone of a successful competition. It should be novel, robust, and diverse, reflecting a wide range of student responses and demographics. For example, when detecting math misconceptions, including varied misconceptions from topics like geometry, algebra, and fractions ensures broader, more accurate solutions. Thoughtful curation also reduces bias and keeps the competition fair.

Align competition design with project goals: The competition structure should match your objectives. Open competitions on popular platforms work well for large-scale datasets, while niche datasets – like those involving sensitive audio or disability data – may benefit from specialized, invite-only competitions. Tailoring the design ensures you attract the right expertise for meaningful results.

Promote collaboration and clear communication: Transparency and communication are key to a successful competition. Clear rules, timelines, and metrics help participants stay focused. Providing spaces like discussion forums encourages collaboration, which can help participants push through challenges and keep engagement high, even when things get tough.

Plan for competition costs: Budgeting is crucial. Larger datasets and more complex tasks require more resources, so understanding these factors from the start helps prevent overspending. Proper planning ensures the competition runs smoothly without straining your budget.

Partner up to fill any gaps: If you need help, partner up with someone or an organization that has done it before. In our first two competitions, we worked with Microsoft Research, and in the latest, we had invaluable support from The Learning Agency.

Simon Woodhead is Chief Data Scientist at Eedi where he leads innovative data-driven projects, bridging machine learning research with live product features.

Written by

Simon Woodhead

Chief Data Scientist & Co-Founder, Eedi

Eedi Blog

From Data to Discovery: Lessons Learned from Hosting AI Competitions in EdTech

More from the Eedi Blog

Improved Human-AI Alignment by Asking Smarter Clarifying Questions

From Wrong Answers to Real Insights: How We Used a Kaggle Challenge to Map Student Misconceptions

Eedi’s Magic Number 🪄