From Research to Reality: Using AI to Generate High-Quality Question Items

Key insights for awarding organisation leaders exploring AI item authoring

Published: 3/27/2026
From Research to Reality: Using AI to Generate High-Quality Question Items

If you lead an awarding organisation and you've been wondering whether AI-generated question items are ready for prime time — or whether they're just a shiny distraction — this one's for you. Tim Burnett recently hosted a webinar walking through the findings of a research sprint on exactly this topic, and the takeaways are practical, honest, and a little bit eye-opening.

The research didn't start with a slide deck. It started with a conversation — Tim brought together four experienced professionals from across the assessment industry and they talked frankly about what AI actually does when you ask it to write test questions. What came out of those conversations shaped both the research and a new guide that Tim has since produced for assessment managers. The headline finding? AI absolutely has a role to play in item authoring. But the way most organisations are currently using it is leaving a lot of quality on the table.

"You never now have enough SMEs. You're always trying to find SMEs. It's always a challenge. And that's one of the reasons why AI generation is so compelling — it solves this SME bottleneck."
— Tim Burnett, Founder, Test Community Network

The SME bottleneck is real, and most senior leaders in awarding organisations know it. Whether you're looking to scale your item bank, move to adaptive testing, or simply speed up time-to-market, AI offers genuine potential. But Tim is clear that the benefits only materialise if you approach it properly — and there are some common mistakes that even well-resourced organisations are making right now.

The first is relying on free-tier tools. Around 95% of people using ChatGPT are on the free plan, which doesn't include reasoning capability. When you're generating items that need to discriminate between strong and weak candidates, that matters. The quality difference between a reasoning model and a standard chatbot can be significant — especially for higher-order thinking questions and plausible distractor generation.

"AI makes items easier. Unless you steer it with good, proper prompts, it tends to come up with easier questions. You need to give it misconceptions to work with. You need to give it more information."
— Tim Burnett, Founder, Test Community Network

The second common mistake is what Tim calls one-shot generation — asking the AI to produce the stem, the correct answer, and all the distractors in a single go. It sounds efficient. It rarely is. If your SME needs to edit the stem (and they will), those distractors become stale and potentially misleading. Tim's recommendation is a two-stage approach: generate the stem and correct answer first, review it, and only then ask the AI to produce the distractors. If the question doesn't hold up at stage one, bin it and generate another. Don't spend time polishing something that was never going to work.

The third issue — and arguably the most dangerous — is what Tim describes as being "confidently wrong." AI-generated content that looks polished and professional is more likely to slip through review unchallenged. The more perfect the first ten questions look, the more likely a reviewer is to nod through question eleven without really reading it. This anchoring effect is a known quality risk, and it's one that good workflow design needs to actively counteract.

"I call it confidently wrong. The more polished it looks, the easier it is that you're going to let that go."
— Tim Burnett, Founder, Test Community Network

During the webinar Tim also shared a live demo of a prototype authoring tool he built to test out his thinking on what good AI item generation should look like in practice. It's not a commercial product — it was built to prove a theory — but it illustrates some important principles. It uses a two-stage generation workflow. It builds candidate personas into the prompt so the AI has a concrete profile to write for. It automatically checks character counts across answer options to flag where the correct answer is suspiciously longer than the distractors (a telltale sign of poor distractor quality). And critically, it produces a full audit trail — recording the model used, the prompt settings, and the degree of human editing — in a format that meets the kind of transparency requirements bodies like Ofqual and NCCA are increasingly expecting.

That last point is worth dwelling on for anyone at a senior level. The regulatory conversation around AI-generated items is moving. Transparency about how items were produced, which model was used, and what human oversight was applied is not just good practice — it's likely to become a requirement. Building your audit trail from day one is significantly easier than retrofitting it later.

"I strongly recommend you tag all items if they're AI-generated — even the model used, even the prompt. You can then test that prompt back at a later date to make sure everything's okay."
— Tim Burnett, Founder, Test Community Network

Tim's overarching philosophy is one he sums up as "human first, AI forward, agent ready." AI isn't here to replace the expertise of your SMEs or the integrity of your assessment principles — it's here to augment them. The organisations that will get the most out of it are the ones that invest in getting the foundations right: choosing the right model for their domain, building proper prompting into their workflows, and maintaining meaningful human oversight throughout.

If your organisation is exploring how to bring AI into item production — or you're already doing it but not getting the quality results you'd hoped for — Tim is available to help. He works directly with awarding organisations on short, focused projects: exploring your specific challenge, reviewing your current approach, and helping you identify what good looks like for your context. It's practical, experienced support from someone who has worked across the industry and spent considerable time testing what actually works.

To find out more or to schedule a conversation, get in touch with Tim via the Test Community Network: https://calendly.com/educationtech/15min-teams-chat