Using AI to Create Assessment Content

This collection brings together a diverse range of research papers, studies, and reports exploring the rapidly evolving landscape of artificial intelligence in assessment authoring and item generation. As educational institutions and awarding organisations worldwide grapple with the opportunities and challenges presented by AI technologies, these resources provide valuable insights into both the practical applications and critical considerations of using AI tools to create examination questions, multiple-choice items, and other assessment materials. The collection encompasses empirical studies comparing AI-generated content with human-authored assessments across various educational contexts—from New Zealand's national moderation system to medical education and higher education settings. It examines the performance of different AI models, including various iterations of ChatGPT and other large language models, whilst also addressing crucial concerns around quality assurance, item security, pre-exposure risks, and the essential role of human oversight. Research on human-in-the-loop frameworks, ethical considerations, and regulatory responses provides a balanced perspective on how AI can augment, rather than replace, human expertise in assessment design. Additionally, the collection features content on emerging policy developments and practical guidance for organisations considering AI implementation, covering deployment options, regulatory compliance, and the transformation of quality assurance processes. These resources reflect both the potential efficiency gains and the significant challenges that must be navigated as the sector adapts to these technological advances. Disclaimer: The Test Community Network does not endorse the views, findings, or recommendations expressed in the linked content. These resources are provided for informational purposes to support professional knowledge and informed discussion within the assessment community.

Collection Items (32)

AI-Generated Assessments: First Comprehensive Examination for New Zealand's National Moderation System

This research represents the first comprehensive examination of whether artificial intelligence can design assessments capable of passing New Zealand's national moderation system. Conducted in partner...

📰 Content Research

AI Adoption Playbook for UK Awarding Organisations

The "AI Adoption Playbook for UK Awarding Organisations" offers invaluable guidance for responsibly integrating AI into educational assessments. Featuring 37 practical strategies and expert advice, it...

📰 Content Content

Evaluating ChatGPT's Efficacy in Higher Education MCQ Exams: A Comparative Study of GPT-3.5 and GPT-4

The research article explores ChatGPT's performance on multiple-choice question (MCQ) examinations in higher education. It finds that GPT-3.5 versions perform better than random guessing but often fai...

📰 Content Report

Evaluating AI-Generated vs. Clinician-Designed MCQs in Emergency Medicine: Potential and Pitfalls

BMC Medical Education's study compared AI-generated and clinician-designed multiple-choice questions (MCQs) for emergency medicine exams. AI-generated questions, while generally easier and associated ...

📰 Content Report

Assessing ChatGPT-4o's Role in Crafting Medical Exam MCQs: Efficiency Meets Essential Human Oversight

The study evaluates ChatGPT-4o's effectiveness in generating medical examination multiple-choice questions (MCQs) compared to humans. AI-generated questions were quicker to produce and had comparable ...

📰 Content Report

NERA 2025 Presentation: Item Pre-exposure Risk in AI-Generated Test Questions

Sergio Araneda from Caveon will present research on "item pre-exposure" - a new test security risk from using Generative AI for item construction. His experiments show 40% overlap in AI-generated ques...

📰 Content Content

Question Generation with AI: Essence, Advantages, Use Cases

Silk Data explores AI-powered question generation for businesses, covering automated question creation from text, audio, and video inputs. The article discusses applications in e-learning, HR recruitm...

📰 Content Content

Automated Question Paper Generator Using LLM

Research paper presenting an AI-powered system that automates question paper generation for educational institutions. The system uses Gemini-1.5-Pro and rule-based algorithms to create balanced, sylla...

📰 Content Research

AI-Assisted Exam Variant Generation: A Human-in-the-Loop Framework for Automatic Item Creation

This research presents a human-in-the-loop framework for automatic item generation using AI to create multiple exam variants whilst maintaining psychometric rigor. The study demonstrates how educators...

📰 Content Research

Advancing Education: Evolving Assessments with AI

UNESCO MGIEP explores how artificial intelligence can transform educational assessment practices, addressing challenges in measuring complex cognitive skills needed for solving wicked problems like cl...

📰 Content Content

AI and the Future of UK Awarding: Transforming Assessment, Quality, and Skills Development

Comprehensive analysis of how AI is transforming UK awarding organisations, assessment practices, and skills development. Examines regulatory frameworks, quality assurance innovations, learner engagem...

📰 Content Content

The Ethics of Using Generative AI in Awarding and Assessment

Taylor Educational Consulting explores ethical considerations for using generative AI in qualification design and assessment development. The article covers legal compliance, copyright issues, GDPR re...

📰 Content Content

Over 60% colleges and universities in India now permit use of AI tools: Report

A new EY-Parthenon–FICCI report reveals that over 60% of Indian higher education institutions now allow students to use AI tools, with 53% using generative AI for learning materials. The study of 30 l...

📰 Content News

California lawmakers impose bar exam controls after February meltdown

California enacted three new pieces of legislation to strengthen oversight of the state's bar exam following technical problems with the February 2025 attorney licensing test. The laws require advance...

📰 Content News

AI questions to be trialled in SATs moderator tests

The Standards and Testing Agency is piloting AI-generated questions in SATs moderator standardisation tests to reduce costs and school workload. The trial explores whether large language models can cr...

📰 Content News

AI Model Choices: From Free to On-Prem and Everything in Between

Tim Burnett discusses AI adoption strategies for UK awarding organisations, covering deployment options from consumer chat UIs to on-premise solutions. The article explores API routes, enterprise plat...

📰 Content Content

Psychometric Methods to Evaluate Measurement and Algorithmic Bias in Automated Scoring

Academic research examining bias and fairness in automated scoring systems used in educational testing. The paper surveys predictive methods that can lead to biased results, provides definitions of fa...

📰 Content Research

Jump-Starting Item Parameters for Adaptive Language Tests

Research paper presenting a multi-task generalized linear model with BERT features to estimate test item difficulties for adaptive language assessments. The method rapidly improves difficulty estimate...

📰 Content Research

The Integration of Generative AI into Assessment Item Generation: A Paradigm Shift in Psychometrics

Generative AI is revolutionising assessment item generation, offering 10x to 100x production gains while maintaining psychometric quality comparable to human-authored items. However, success requires ...

📰 Content Research

AI-Generated Assessment Items: From Novelty to Practical Reality in 2025

Comprehensive research review examining AI's role in generating assessment items across academic, professional, and psychometric domains. Shows AI can produce high-quality content comparable to human-...

📰 Content Research

The Risks of AI-Generated Psychometric Assessments: Scientific and Ethical Concerns

TTS Talent examines the scientific and ethical risks of using AI-generated psychometric assessments, comparing them to rigorous test development standards. The article highlights concerns about lack o...

📰 Content Content

An Assessment of the Effectiveness of Artificial Intelligence in Certification Exam Test Development

ACRP and The Academy of Clinical Research Professionals are conducting a study to evaluate AI-assisted item writing for certification exams. The research compares AI-generated questions with human-wri...

📰 Content Research

Assessing the psychometric properties of AI-generated multiple-choice exams in a psychology subject

This study evaluated 80 multiple-choice questions created by ChatGPT-4 for undergraduate psychology education. Results showed AI-generated items had reasonable content validity but limitations in asse...

📰 Content Research

Using generative AI in item development for the driving theory test: What does the future hold?

Pearson VUE explores how generative AI can assist in developing test items for driving theory tests, presenting research findings on AI-generated content quality compared to human-written items, while...

📰 Content Content

Assessing the Quality of AI-Generated Exams: A Large-Scale Field Study

A comprehensive study evaluating AI-generated exam questions across 91 college classes with nearly 1,700 students. Researchers developed an iterative refinement strategy using large language models to...

📰 Content Research

ChatGPT's quality: Reliability and validity of concept inventory items

Research study examining ChatGPT's ability to create physics concept inventory items. After careful prompt engineering and expert evaluation, ChatGPT-generated items showed medium difficulty and discr...

📰 Content Research

Automated Item Generation with Recurrent Neural Networks

Research paper exploring deep learning approaches for automated test item generation using recurrent neural networks, presenting an alternative to traditional human-written assessment items by impleme...

📰 Content Research

A review of automatic item generation techniques leveraging large language models

This comprehensive review examines 60 studies on using large language models (LLMs) like T5, BERT, and GPT for automatic item generation in educational assessment. The research reveals that whilst LLM...

📰 Content Research

A review of automatic item generation techniques leveraging large language models

📰 Content Research

STAIR-AIG: Optimizing the Automated Item Generation Process through Human-AI Collaboration for Critical Thinking Assessment

Research paper introducing STAIR-AIG, a human-in-the-loop framework that integrates expert judgment to optimize AI-generated assessment items for critical thinking. The study compares evaluations by h...

📰 Content Research

Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models

Research paper exploring automated generation of distractors for math multiple-choice questions using large language models. The study compares various LLM approaches including in-context learning, fi...

📰 Content Research

Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models

Research exploring GPT-4's potential to automate generation of personality situational judgment tests in Chinese. Two studies found that optimised prompts with temperature 1.0 produced creative, accur...

📰 Content Research

Using AI to Create Assessment Content

Collection Items (32)

Explore More Assessment Resources