AI Marking and Feedback: A Teacher's Guide
A practical guide to using AI for marking and feedback in UK schools. Covers what AI can and cannot mark, subject-specific approaches, tool comparison.


A practical guide to using AI for marking and feedback in UK schools. Covers what AI can and cannot mark, subject-specific approaches, tool comparison.
AI marking tools can now grade multiple-choice quizzes, highlight grammatical errors and generate written feedback on pupil work. They cannot, however, judge the quality of a historical argument, recognise a pupil's growing confidence, or notice that a quiet Year 9 student finally attempted the extension task. Understanding this boundary is what separates effective AI-assisted marking from a dangerous shortcut.
The Department for Education's 2025 guidance on AI in schools specifically addresses marking and feedback, noting that "AI can support teachers in providing timely feedback but should not replace professional judgement on pupil attainment" (DfE, 2025). This article sets out what that looks like in practice, subject by subject, from primary to Key Stage 4.
The distinction is straightforward. AI marking tools work well on tasks with clear right-or-wrong answers and struggle with anything requiring interpretation. A Year 3 spelling test and a GCSE English Language Paper 2 response require fundamentally different kinds of assessment.
| Task Type | AI Reliability | Teacher Role | Example |
|---|---|---|---|
| Multiple-choice quizzes | High | Review misconception patterns | KS2 science end-of-topic quiz |
| Short-answer recall | High | Check for partial credit edge cases | Year 8 history key dates |
| Grammar and spelling | High | None needed for surface errors | Year 5 writing SPaG check |
| Maths calculations | High | Review method marks vs answer marks | Year 10 algebra homework |
| Extended writing (argument) | Low | Full assessment required | GCSE English Language Paper 2 |
| Creative writing | Very low | Full assessment required | Year 7 narrative writing |
| Practical/performance | Not applicable | Teacher observation only | PE, drama, science practicals |
The key principle: AI should mark the work that takes you the most time but requires the least professional judgement. A set of 30 vocabulary tests takes 45 minutes of a teacher's evening. AI handles them in seconds, with equal accuracy. That 45 minutes is better spent writing targeted feedback on three pupils' essays.

AI feedback is instant, consistent and impersonal. Teacher feedback is slower, variable and deeply contextual. Both have value, and the research suggests they work best in combination (Hattie and Timperley, 2007).
When a Year 10 pupil submits a geography essay, an AI tool can identify structural weaknesses: paragraphs without topic sentences, missing evidence, or conclusions that introduce new information. What the AI cannot do is recognise that this particular pupil struggled with paragraph structure all term and has finally produced a coherent opening. That contextual knowledge changes what feedback you give.
The most effective model is what researchers call "AI-first, teacher-last" feedback (Kasneci et al., 2023). The AI generates an initial response. The teacher reviews it, removes anything inaccurate, adds personal context, and decides what the pupil sees. This takes less time than writing feedback from scratch but produces something better than either teacher or AI could manage alone.
Not all feedback is equal. Hattie and Timperley's (2007) model identifies four levels, and AI performs differently at each.
| Feedback Level | What It Addresses | AI Capability | Teacher Action |
|---|---|---|---|
| Task (correctness) | Is the answer right or wrong? | Strong | Trust AI output for factual tasks |
| Process (strategy) | How did the pupil approach the task? | Moderate | Review and refine AI suggestions |
| Self-regulation (metacognition) | Can the pupil monitor their own learning? | Weak | Write metacognitive prompts yourself |
| Self (personal) | How does the pupil feel about the work? | None | Personal, relational feedback only |
The practical implication: use AI for task-level and some process-level feedback. Reserve your time for self-regulation and personal feedback, where your knowledge of the pupil is irreplaceable. This aligns with what Dylan Wiliam (2011) calls "responsive teaching": using assessment information to adapt instruction in real time.
The market for AI marking tools is growing rapidly, but quality varies. Some tools are designed specifically for UK education; others are adapted from American systems with different curricula and assessment frameworks. Here is what is currently available, with limitations clearly stated.
| Tool | Best For | Limitations | UK Curriculum Alignment |
|---|---|---|---|
| Marking.ai | KS3/KS4 extended writing feedback | Requires rubric setup; inconsistent on creative writing | Strong (UK-built) |
| Grammarly for Education | Grammar, spelling, tone | Surface-level only; no content assessment | Moderate (US-default, UK mode available) |
| ChatGPT / Claude | Generating draft feedback comments | No pupil data; generic without strong prompts | Neutral (depends on prompt) |
| Educake | Science quizzes with auto-marking | Science-only; limited feedback depth | Strong (UK exam board aligned) |
| Carousel Learning | Retrieval practice with spaced repetition | Quiz-based only; no extended writing | Strong (UK teacher-built) |
| Seneca Learning | KS3-KS5 revision with adaptive feedback | Pre-set content; limited teacher customisation | Strong (UK spec aligned) |
No single tool replaces a teacher's marking. The most effective approach combines two or three tools for different task types: one for quiz auto-marking, one for writing feedback generation, and your own professional judgement for everything else.
The value of AI marking varies significantly across subjects. What works in mathematics homework does not transfer directly to English literature essays. Here is a subject-by-subject breakdown of where AI adds genuine value and where it falls short.
AI can check SPaG (spelling, punctuation and grammar) with high accuracy. It can identify missing paragraphs, flag overuse of simple sentences, and detect where a pupil has not addressed the question. What it cannot do is assess the quality of a metaphor, the effectiveness of a structural choice, or whether a Year 11 student has developed a convincing personal voice. Use AI to handle the surface features of writing so you can focus your marking time on content and craft.
Practical example: After a Year 9 persuasive writing task, run all 30 pieces through Grammarly for Education to flag SPaG errors. Then spend your marking time on argument structure, evidence use and rhetorical technique. You have saved 40 minutes on surface marking and redirected that time to the feedback that actually shifts grades.
Maths is where AI marking works best. Correct answers are unambiguous, and many platforms can now trace method marks by recognising working-out steps. Tools like MyMaths and Hegarty Maths auto-mark homework and generate reports showing which topics need reteaching. The limitation is non-standard methods: a pupil who solves a problem using an unconventional but valid approach may be marked incorrect by an algorithm expecting a specific method.
Practical example: Set a Year 7 fractions homework on Hegarty Maths. The platform marks it overnight and produces a class summary showing that 18 of 28 pupils struggled with converting mixed numbers. You now have diagnostic data before the next lesson, without marking a single paper.
AI handles factual recall questions well. Educake is widely used in UK science departments for end-of-topic quizzes, and its auto-marking is reliable for closed questions. The challenge comes with "explain" and "evaluate" questions, where pupils must construct scientific arguments. These require a teacher's understanding of whether the pupil has demonstrated genuine conceptual understanding or simply recalled key phrases.
Practical example: Use Educake for a Year 10 biology quiz on cell division. The platform auto-marks 20 recall questions and flags three pupils who consistently confuse mitosis and meiosis. For the two 6-mark "explain" questions, you mark those yourself, using the quiz data to target your written feedback.
History, geography and RE involve extended analytical writing where AI marking is least reliable. A history essay on the causes of World War One requires the assessor to evaluate source interpretation, argument strength and historical reasoning, none of which current AI tools handle well. Where AI adds value is in the preparatory stages: checking that pupils have included required source references, flagging essays that are significantly under the word count, and identifying structural weaknesses like missing conclusions.
Practical example: Before marking a set of Year 8 history essays, paste the question and mark scheme into ChatGPT and ask it to generate a checklist of what a strong answer includes. Then use that checklist as a marking aid, rather than relying on AI to assess the essays directly.
In primary settings, AI marking works well for phonics checks, spelling tests and times tables quizzes. Many schools already use Times Tables Rock Stars, which auto-marks and tracks progress without any teacher input. For writing assessment in Key Stage 1 and 2, teacher moderation remains essential because the writing frameworks (working towards, expected, greater depth) require professional judgement about consistency across a piece of work.
Practical example: A Year 4 teacher uses Times Tables Rock Stars for daily recall practice and an auto-marked reading comprehension quiz on Purple Mash for homework. This frees two hours per week for detailed feedback on extended writing, where teacher assessment against the writing frameworks is required.
If you are using a general-purpose AI tool like ChatGPT or Claude to generate feedback, the quality of the output depends entirely on the quality of your prompt. A vague instruction produces vague feedback. A specific, structured prompt produces feedback you can use.
| Weak Prompt | Strong Prompt |
|---|---|
| "Mark this essay" | "This is a Year 10 GCSE English Language Paper 2 response on animal testing. Using AQA's mark scheme for Question 5 (content and organisation: 24 marks; SPaG: 16 marks), identify two strengths and two areas for improvement. Write feedback in second person, addressed to the pupil." |
| "Give feedback on this work" | "This Year 8 pupil wrote a paragraph explaining why Henry VIII broke from Rome. The learning objective was to use evidence to support a historical claim. Provide one piece of praise (what they did well) and one 'next step' (specific improvement). Keep the language at a reading age of 12." |
The five elements of an effective AI marking prompt are: year group, subject and exam board, the specific task or question, the assessment criteria, and the format you want the feedback in. Missing any one of these produces generic output.
Here is a complete prompt you could paste into any AI tool:
"You are a KS3 science teacher in a UK state school. A Year 9 pupil has answered the following 6-mark question: 'Explain how vaccination prevents disease.' Their response is below. Using the AQA trilogy science mark scheme for 6-mark questions, provide: (1) a mark out of 6 with brief justification, (2) one specific strength with a quote from their work, (3) one specific improvement with an example of what they should have written. Write all feedback addressed to the pupil using 'you' language."
This level of specificity consistently produces feedback that teachers find usable. Without the exam board, year group and format instructions, the same AI tool produces generic comments that add no value.
AI marking tools are trained on existing data, which means they reproduce existing biases. Research has shown that automated essay scoring systems can penalise non-standard English dialects, favour longer responses regardless of quality, and score formulaic writing higher than original thinking (Bridgeman et al., 2012).
For UK teachers, the specific risks include:
Dialect bias: Pupils who write in regional or culturally influenced English may receive lower scores from AI tools trained primarily on standard academic English. A Year 9 pupil in Birmingham writing "I was proper shocked" in a creative piece is making a deliberate stylistic choice, not a grammatical error.
Length bias: Most AI grading systems correlate length with quality. A concise, well-argued paragraph may score lower than a rambling, repetitive one simply because it is shorter. This particularly affects pupils with SEND who may write less but with greater precision.
Formulaic preference: AI tools trained on high-scoring exam responses learn to reward structural conventions (PEEL paragraphs, topic sentences, discourse markers) even when a pupil achieves the same quality through a less conventional structure. This can disadvantage creative or divergent thinkers.
The practical response is straightforward: never use AI as the sole assessor for any work that contributes to pupil grades. Use it as a first-pass filter, then apply your own professional judgement. Where you notice patterns of bias, adjust the tool's rubric or switch to manual assessment for that task type.
Uploading pupil work to AI tools creates data protection obligations. Under UK GDPR, pupil work containing personal information (names, schools, identifiable details) is personal data. Before using any AI marking tool, check three things.
Where is the data processed? Tools using US-based servers may not meet UK adequacy requirements. Check whether the tool offers a UK or EU data centre option. Marking.ai, for example, processes data within the UK; ChatGPT's free tier processes data globally.
Is pupil work used for training? Some AI tools use submitted text to improve their models. This means a pupil's essay could influence future outputs. Check the tool's terms of service for data retention and training clauses. Where possible, use tools that explicitly exclude educational data from model training.
Do you have a DPIA? A Data Protection Impact Assessment is required when processing children's data at scale. Your school's Data Protection Officer should review any AI marking tool before it is deployed across a department. The DfE (2025) recommends that schools maintain a register of all AI tools used with pupil data.
A simple safeguard: before uploading pupil work, remove names and replace them with candidate numbers or initials. This reduces the data protection risk to near zero while preserving the AI tool's ability to provide useful feedback.
The most effective approach is not to replace your existing marking with AI but to restructure it so that AI handles the routine tasks and you focus on the high-value assessment work. Here is a weekly workflow that several UK schools have adopted successfully.
| Day | AI Task | Teacher Task | Time Saved |
|---|---|---|---|
| Monday | Auto-mark weekend homework quizzes | Review misconception reports, plan reteaching | 30 min |
| Tuesday | Generate draft feedback for extended writing | Review, personalise and approve feedback | 45 min |
| Wednesday | SPaG check on collected classwork | Focus on content quality, not surface errors | 20 min |
| Thursday | Auto-mark mid-week retrieval practice | Identify pupils needing intervention | 20 min |
| Friday | Generate weekly progress summaries | Review summaries, update records, plan next week | 30 min |
This workflow saves approximately 2.5 hours per week. Over a 39-week school year, that is nearly 100 hours redirected from routine marking to higher-value teaching tasks: planning better lessons, providing targeted intervention, and building relationships with pupils.
Schools adopting AI marking tools make predictable errors. Recognising these in advance saves significant wasted effort.
1. Trusting AI grades for reporting. AI-generated marks should never go directly into a markbook without teacher verification. A tool that gives a Year 10 essay 18/30 may be broadly right, but the difference between 16 and 20 can determine a predicted grade. Only use AI marks for formative purposes.
2. Giving pupils raw AI feedback. Unreviewed AI feedback can be confusing, contradictory or inappropriate. A pupil who receives "Your analysis lacks depth" without further explanation is no better off than receiving no feedback at all. Always review before sharing.
3. Using the wrong tool for the task. Running a creative writing portfolio through a grammar checker does not constitute marking. Match the tool to the assessment objective. Grammar tools check grammar. They do not assess imagination, voice or narrative craft.
4. Ignoring the workload shift. AI marking does not eliminate workload; it shifts it. You spend less time on routine marking but more time reviewing AI outputs, managing data, and handling the inevitable edge cases where the AI gets it wrong. Budget for this transition period.
5. Skipping the training. DfE data from 2024 shows that 76% of teachers have received no formal training on AI tools (DfE, 2024). Without understanding what the tool can and cannot do, teachers either over-rely on it or abandon it after a bad experience. Invest 30 minutes in learning the tool's strengths and limits before deploying it with live pupil work.
AI tools can support peer and self-assessment by providing a reference point against which pupils compare their own work. Rather than asking "Is my essay good?", the pupil can ask the AI to identify structural features, then compare the AI's analysis with their own self-assessment. This builds metacognitive skills: the ability to evaluate one's own learning (Flavell, 1979).
A practical classroom approach: after pupils complete a piece of writing, ask them to self-assess against three success criteria. Then run the same piece through an AI tool that provides feedback against the same criteria. Pupils compare their self-assessment with the AI's analysis and write a short reflection on the differences. This teaches pupils to calibrate their own judgement, which is the foundation of independent learning.
The risk is that pupils treat AI feedback as the "correct" answer, undermining the purpose of self-assessment. Frame the AI output as "one perspective" rather than the definitive assessment. Emphasise that the teacher's judgement, informed by knowledge of the pupil, remains the standard against which work is measured.
Start s
AI marking involves using artificial intelligence software to grade pupil work and generate feedback. In schools, these tools automatically assess multiple-choice quizzes, check grammar and evaluate short factual answers. They work by comparing pupil responses against programmed rubrics and identifying patterns, saving teachers significant administrative time.
Teachers usually begin by using AI tools to mark routine assessments like vocabulary tests or homework quizzes. The most effective approach is a method where the AI creates the first draft of the comments. The teacher then reviews and edits this text to add personal context before returning the work to the pupil.
The primary benefit is a massive reduction in teacher workload for routine marking tasks. Research by the Education Endowment Foundation indicates that automating factual marking can free up three to five hours per week. Teachers can then spend this recovered time planning better lessons or writing highly targeted feedback for complex assignments.
The Department for Education states that AI can support teachers by providing timely feedback on routine tasks. However, official guidance explicitly warns that algorithms must not replace professional teacher judgement regarding pupil attainment. Educational research supports an approach where the AI generates an initial response and the teacher refines it before the pupil sees it.
A major mistake is trusting AI to accurately assess extended essays or creative writing. Algorithms currently struggle to evaluate the quality of a historical argument or recognise a pupil's unique creative voice. Another common error is giving automated feedback directly to pupils without a teacher reviewing it first for tone and personal context.
No current AI tool can reliably mark GCSE extended writing with the accuracy of an experienced teacher. While software can highlight spelling and grammatical errors, it cannot judge nuanced arguments or subject-specific reasoning. Teachers must still conduct full assessments for long-form answers to ensure marking aligns with specific exam board specifications.
Week 1: One class, one tool, one task type. Choose your most straightforward marking task (a homework quiz or vocabulary test) and one AI tool. Set the quiz, let the tool mark it, and spend 15 minutes reviewing the results. Note what the tool got right, what it missed, and how long the process took compared to manual marking.
Week 2: Add feedback generation. Take a set of extended writing from the same class. Use an AI tool to generate draft feedback, then review and personalise each piece before returning it to pupils. Track the time difference: how long did AI-assisted feedback take compared to writing it from scratch?
After two weeks, you have enough data to decide whether to expand AI marking to other classes and task types. Most teachers find that the initial investment in learning the tool pays back within the first month. The key is starting with tasks where AI is genuinely reliable, building confidence, and expanding gradually.
For a broader perspective on integrating AI into your teaching practice, see our guide to AI for teachers, which covers lesson planning, differentiation, and building AI literacy alongside assessment.
The research base on AI in educational assessment is growing rapidly. These papers provide the evidence behind the recommendations in this article.
The Power of Feedback View study ↗
Hattie and Timperley (2007)
The foundational framework for understanding feedback in education. Identifies four feedback levels (task, process, self-regulation, self) and demonstrates that feedback targeting the process and self-regulation levels has the greatest impact on learning. Essential context for understanding where AI feedback fits.
ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education View study ↗
0 citations
Kasneci et al. (2023)
Comprehensive analysis of how large language models can support teaching and learning. Particularly relevant for its discussion of AI-generated feedback quality and the "human-in-the-loop" model where teachers review AI outputs before pupils see them.
Embedded Formative Assessment View study ↗
4,100+ citations
Wiliam (2011)
The definitive guide to formative assessment in UK classrooms. Wiliam's five key strategies provide the framework within which AI marking tools should operate. His emphasis on "responsive teaching" aligns with using AI for rapid diagnostic data while reserving professional judgement for interpretive assessment.
AI and the Future of Assessment in Education View study ↗
DfE Official Guidance
Department for Education (2025)
The UK government's position on AI use in schools, including specific guidance on marking and assessment. Establishes that AI should support rather than replace teacher judgement, and sets expectations for data protection when using AI tools with pupil work.
Automated Essay Scoring and Its Impact on Writing Assessment View study ↗
340+ citations
Bridgeman et al. (2012)
Critical research on the limitations of automated essay scoring, including evidence of bias against non-standard dialects and correlation between essay length and AI-assigned scores. Important reading for any school considering AI for writing assessment.
AI marking tools can now grade multiple-choice quizzes, highlight grammatical errors and generate written feedback on pupil work. They cannot, however, judge the quality of a historical argument, recognise a pupil's growing confidence, or notice that a quiet Year 9 student finally attempted the extension task. Understanding this boundary is what separates effective AI-assisted marking from a dangerous shortcut.
The Department for Education's 2025 guidance on AI in schools specifically addresses marking and feedback, noting that "AI can support teachers in providing timely feedback but should not replace professional judgement on pupil attainment" (DfE, 2025). This article sets out what that looks like in practice, subject by subject, from primary to Key Stage 4.
The distinction is straightforward. AI marking tools work well on tasks with clear right-or-wrong answers and struggle with anything requiring interpretation. A Year 3 spelling test and a GCSE English Language Paper 2 response require fundamentally different kinds of assessment.
| Task Type | AI Reliability | Teacher Role | Example |
|---|---|---|---|
| Multiple-choice quizzes | High | Review misconception patterns | KS2 science end-of-topic quiz |
| Short-answer recall | High | Check for partial credit edge cases | Year 8 history key dates |
| Grammar and spelling | High | None needed for surface errors | Year 5 writing SPaG check |
| Maths calculations | High | Review method marks vs answer marks | Year 10 algebra homework |
| Extended writing (argument) | Low | Full assessment required | GCSE English Language Paper 2 |
| Creative writing | Very low | Full assessment required | Year 7 narrative writing |
| Practical/performance | Not applicable | Teacher observation only | PE, drama, science practicals |
The key principle: AI should mark the work that takes you the most time but requires the least professional judgement. A set of 30 vocabulary tests takes 45 minutes of a teacher's evening. AI handles them in seconds, with equal accuracy. That 45 minutes is better spent writing targeted feedback on three pupils' essays.

AI feedback is instant, consistent and impersonal. Teacher feedback is slower, variable and deeply contextual. Both have value, and the research suggests they work best in combination (Hattie and Timperley, 2007).
When a Year 10 pupil submits a geography essay, an AI tool can identify structural weaknesses: paragraphs without topic sentences, missing evidence, or conclusions that introduce new information. What the AI cannot do is recognise that this particular pupil struggled with paragraph structure all term and has finally produced a coherent opening. That contextual knowledge changes what feedback you give.
The most effective model is what researchers call "AI-first, teacher-last" feedback (Kasneci et al., 2023). The AI generates an initial response. The teacher reviews it, removes anything inaccurate, adds personal context, and decides what the pupil sees. This takes less time than writing feedback from scratch but produces something better than either teacher or AI could manage alone.
Not all feedback is equal. Hattie and Timperley's (2007) model identifies four levels, and AI performs differently at each.
| Feedback Level | What It Addresses | AI Capability | Teacher Action |
|---|---|---|---|
| Task (correctness) | Is the answer right or wrong? | Strong | Trust AI output for factual tasks |
| Process (strategy) | How did the pupil approach the task? | Moderate | Review and refine AI suggestions |
| Self-regulation (metacognition) | Can the pupil monitor their own learning? | Weak | Write metacognitive prompts yourself |
| Self (personal) | How does the pupil feel about the work? | None | Personal, relational feedback only |
The practical implication: use AI for task-level and some process-level feedback. Reserve your time for self-regulation and personal feedback, where your knowledge of the pupil is irreplaceable. This aligns with what Dylan Wiliam (2011) calls "responsive teaching": using assessment information to adapt instruction in real time.
The market for AI marking tools is growing rapidly, but quality varies. Some tools are designed specifically for UK education; others are adapted from American systems with different curricula and assessment frameworks. Here is what is currently available, with limitations clearly stated.
| Tool | Best For | Limitations | UK Curriculum Alignment |
|---|---|---|---|
| Marking.ai | KS3/KS4 extended writing feedback | Requires rubric setup; inconsistent on creative writing | Strong (UK-built) |
| Grammarly for Education | Grammar, spelling, tone | Surface-level only; no content assessment | Moderate (US-default, UK mode available) |
| ChatGPT / Claude | Generating draft feedback comments | No pupil data; generic without strong prompts | Neutral (depends on prompt) |
| Educake | Science quizzes with auto-marking | Science-only; limited feedback depth | Strong (UK exam board aligned) |
| Carousel Learning | Retrieval practice with spaced repetition | Quiz-based only; no extended writing | Strong (UK teacher-built) |
| Seneca Learning | KS3-KS5 revision with adaptive feedback | Pre-set content; limited teacher customisation | Strong (UK spec aligned) |
No single tool replaces a teacher's marking. The most effective approach combines two or three tools for different task types: one for quiz auto-marking, one for writing feedback generation, and your own professional judgement for everything else.
The value of AI marking varies significantly across subjects. What works in mathematics homework does not transfer directly to English literature essays. Here is a subject-by-subject breakdown of where AI adds genuine value and where it falls short.
AI can check SPaG (spelling, punctuation and grammar) with high accuracy. It can identify missing paragraphs, flag overuse of simple sentences, and detect where a pupil has not addressed the question. What it cannot do is assess the quality of a metaphor, the effectiveness of a structural choice, or whether a Year 11 student has developed a convincing personal voice. Use AI to handle the surface features of writing so you can focus your marking time on content and craft.
Practical example: After a Year 9 persuasive writing task, run all 30 pieces through Grammarly for Education to flag SPaG errors. Then spend your marking time on argument structure, evidence use and rhetorical technique. You have saved 40 minutes on surface marking and redirected that time to the feedback that actually shifts grades.
Maths is where AI marking works best. Correct answers are unambiguous, and many platforms can now trace method marks by recognising working-out steps. Tools like MyMaths and Hegarty Maths auto-mark homework and generate reports showing which topics need reteaching. The limitation is non-standard methods: a pupil who solves a problem using an unconventional but valid approach may be marked incorrect by an algorithm expecting a specific method.
Practical example: Set a Year 7 fractions homework on Hegarty Maths. The platform marks it overnight and produces a class summary showing that 18 of 28 pupils struggled with converting mixed numbers. You now have diagnostic data before the next lesson, without marking a single paper.
AI handles factual recall questions well. Educake is widely used in UK science departments for end-of-topic quizzes, and its auto-marking is reliable for closed questions. The challenge comes with "explain" and "evaluate" questions, where pupils must construct scientific arguments. These require a teacher's understanding of whether the pupil has demonstrated genuine conceptual understanding or simply recalled key phrases.
Practical example: Use Educake for a Year 10 biology quiz on cell division. The platform auto-marks 20 recall questions and flags three pupils who consistently confuse mitosis and meiosis. For the two 6-mark "explain" questions, you mark those yourself, using the quiz data to target your written feedback.
History, geography and RE involve extended analytical writing where AI marking is least reliable. A history essay on the causes of World War One requires the assessor to evaluate source interpretation, argument strength and historical reasoning, none of which current AI tools handle well. Where AI adds value is in the preparatory stages: checking that pupils have included required source references, flagging essays that are significantly under the word count, and identifying structural weaknesses like missing conclusions.
Practical example: Before marking a set of Year 8 history essays, paste the question and mark scheme into ChatGPT and ask it to generate a checklist of what a strong answer includes. Then use that checklist as a marking aid, rather than relying on AI to assess the essays directly.
In primary settings, AI marking works well for phonics checks, spelling tests and times tables quizzes. Many schools already use Times Tables Rock Stars, which auto-marks and tracks progress without any teacher input. For writing assessment in Key Stage 1 and 2, teacher moderation remains essential because the writing frameworks (working towards, expected, greater depth) require professional judgement about consistency across a piece of work.
Practical example: A Year 4 teacher uses Times Tables Rock Stars for daily recall practice and an auto-marked reading comprehension quiz on Purple Mash for homework. This frees two hours per week for detailed feedback on extended writing, where teacher assessment against the writing frameworks is required.
If you are using a general-purpose AI tool like ChatGPT or Claude to generate feedback, the quality of the output depends entirely on the quality of your prompt. A vague instruction produces vague feedback. A specific, structured prompt produces feedback you can use.
| Weak Prompt | Strong Prompt |
|---|---|
| "Mark this essay" | "This is a Year 10 GCSE English Language Paper 2 response on animal testing. Using AQA's mark scheme for Question 5 (content and organisation: 24 marks; SPaG: 16 marks), identify two strengths and two areas for improvement. Write feedback in second person, addressed to the pupil." |
| "Give feedback on this work" | "This Year 8 pupil wrote a paragraph explaining why Henry VIII broke from Rome. The learning objective was to use evidence to support a historical claim. Provide one piece of praise (what they did well) and one 'next step' (specific improvement). Keep the language at a reading age of 12." |
The five elements of an effective AI marking prompt are: year group, subject and exam board, the specific task or question, the assessment criteria, and the format you want the feedback in. Missing any one of these produces generic output.
Here is a complete prompt you could paste into any AI tool:
"You are a KS3 science teacher in a UK state school. A Year 9 pupil has answered the following 6-mark question: 'Explain how vaccination prevents disease.' Their response is below. Using the AQA trilogy science mark scheme for 6-mark questions, provide: (1) a mark out of 6 with brief justification, (2) one specific strength with a quote from their work, (3) one specific improvement with an example of what they should have written. Write all feedback addressed to the pupil using 'you' language."
This level of specificity consistently produces feedback that teachers find usable. Without the exam board, year group and format instructions, the same AI tool produces generic comments that add no value.
AI marking tools are trained on existing data, which means they reproduce existing biases. Research has shown that automated essay scoring systems can penalise non-standard English dialects, favour longer responses regardless of quality, and score formulaic writing higher than original thinking (Bridgeman et al., 2012).
For UK teachers, the specific risks include:
Dialect bias: Pupils who write in regional or culturally influenced English may receive lower scores from AI tools trained primarily on standard academic English. A Year 9 pupil in Birmingham writing "I was proper shocked" in a creative piece is making a deliberate stylistic choice, not a grammatical error.
Length bias: Most AI grading systems correlate length with quality. A concise, well-argued paragraph may score lower than a rambling, repetitive one simply because it is shorter. This particularly affects pupils with SEND who may write less but with greater precision.
Formulaic preference: AI tools trained on high-scoring exam responses learn to reward structural conventions (PEEL paragraphs, topic sentences, discourse markers) even when a pupil achieves the same quality through a less conventional structure. This can disadvantage creative or divergent thinkers.
The practical response is straightforward: never use AI as the sole assessor for any work that contributes to pupil grades. Use it as a first-pass filter, then apply your own professional judgement. Where you notice patterns of bias, adjust the tool's rubric or switch to manual assessment for that task type.
Uploading pupil work to AI tools creates data protection obligations. Under UK GDPR, pupil work containing personal information (names, schools, identifiable details) is personal data. Before using any AI marking tool, check three things.
Where is the data processed? Tools using US-based servers may not meet UK adequacy requirements. Check whether the tool offers a UK or EU data centre option. Marking.ai, for example, processes data within the UK; ChatGPT's free tier processes data globally.
Is pupil work used for training? Some AI tools use submitted text to improve their models. This means a pupil's essay could influence future outputs. Check the tool's terms of service for data retention and training clauses. Where possible, use tools that explicitly exclude educational data from model training.
Do you have a DPIA? A Data Protection Impact Assessment is required when processing children's data at scale. Your school's Data Protection Officer should review any AI marking tool before it is deployed across a department. The DfE (2025) recommends that schools maintain a register of all AI tools used with pupil data.
A simple safeguard: before uploading pupil work, remove names and replace them with candidate numbers or initials. This reduces the data protection risk to near zero while preserving the AI tool's ability to provide useful feedback.
The most effective approach is not to replace your existing marking with AI but to restructure it so that AI handles the routine tasks and you focus on the high-value assessment work. Here is a weekly workflow that several UK schools have adopted successfully.
| Day | AI Task | Teacher Task | Time Saved |
|---|---|---|---|
| Monday | Auto-mark weekend homework quizzes | Review misconception reports, plan reteaching | 30 min |
| Tuesday | Generate draft feedback for extended writing | Review, personalise and approve feedback | 45 min |
| Wednesday | SPaG check on collected classwork | Focus on content quality, not surface errors | 20 min |
| Thursday | Auto-mark mid-week retrieval practice | Identify pupils needing intervention | 20 min |
| Friday | Generate weekly progress summaries | Review summaries, update records, plan next week | 30 min |
This workflow saves approximately 2.5 hours per week. Over a 39-week school year, that is nearly 100 hours redirected from routine marking to higher-value teaching tasks: planning better lessons, providing targeted intervention, and building relationships with pupils.
Schools adopting AI marking tools make predictable errors. Recognising these in advance saves significant wasted effort.
1. Trusting AI grades for reporting. AI-generated marks should never go directly into a markbook without teacher verification. A tool that gives a Year 10 essay 18/30 may be broadly right, but the difference between 16 and 20 can determine a predicted grade. Only use AI marks for formative purposes.
2. Giving pupils raw AI feedback. Unreviewed AI feedback can be confusing, contradictory or inappropriate. A pupil who receives "Your analysis lacks depth" without further explanation is no better off than receiving no feedback at all. Always review before sharing.
3. Using the wrong tool for the task. Running a creative writing portfolio through a grammar checker does not constitute marking. Match the tool to the assessment objective. Grammar tools check grammar. They do not assess imagination, voice or narrative craft.
4. Ignoring the workload shift. AI marking does not eliminate workload; it shifts it. You spend less time on routine marking but more time reviewing AI outputs, managing data, and handling the inevitable edge cases where the AI gets it wrong. Budget for this transition period.
5. Skipping the training. DfE data from 2024 shows that 76% of teachers have received no formal training on AI tools (DfE, 2024). Without understanding what the tool can and cannot do, teachers either over-rely on it or abandon it after a bad experience. Invest 30 minutes in learning the tool's strengths and limits before deploying it with live pupil work.
AI tools can support peer and self-assessment by providing a reference point against which pupils compare their own work. Rather than asking "Is my essay good?", the pupil can ask the AI to identify structural features, then compare the AI's analysis with their own self-assessment. This builds metacognitive skills: the ability to evaluate one's own learning (Flavell, 1979).
A practical classroom approach: after pupils complete a piece of writing, ask them to self-assess against three success criteria. Then run the same piece through an AI tool that provides feedback against the same criteria. Pupils compare their self-assessment with the AI's analysis and write a short reflection on the differences. This teaches pupils to calibrate their own judgement, which is the foundation of independent learning.
The risk is that pupils treat AI feedback as the "correct" answer, undermining the purpose of self-assessment. Frame the AI output as "one perspective" rather than the definitive assessment. Emphasise that the teacher's judgement, informed by knowledge of the pupil, remains the standard against which work is measured.
Start s
AI marking involves using artificial intelligence software to grade pupil work and generate feedback. In schools, these tools automatically assess multiple-choice quizzes, check grammar and evaluate short factual answers. They work by comparing pupil responses against programmed rubrics and identifying patterns, saving teachers significant administrative time.
Teachers usually begin by using AI tools to mark routine assessments like vocabulary tests or homework quizzes. The most effective approach is a method where the AI creates the first draft of the comments. The teacher then reviews and edits this text to add personal context before returning the work to the pupil.
The primary benefit is a massive reduction in teacher workload for routine marking tasks. Research by the Education Endowment Foundation indicates that automating factual marking can free up three to five hours per week. Teachers can then spend this recovered time planning better lessons or writing highly targeted feedback for complex assignments.
The Department for Education states that AI can support teachers by providing timely feedback on routine tasks. However, official guidance explicitly warns that algorithms must not replace professional teacher judgement regarding pupil attainment. Educational research supports an approach where the AI generates an initial response and the teacher refines it before the pupil sees it.
A major mistake is trusting AI to accurately assess extended essays or creative writing. Algorithms currently struggle to evaluate the quality of a historical argument or recognise a pupil's unique creative voice. Another common error is giving automated feedback directly to pupils without a teacher reviewing it first for tone and personal context.
No current AI tool can reliably mark GCSE extended writing with the accuracy of an experienced teacher. While software can highlight spelling and grammatical errors, it cannot judge nuanced arguments or subject-specific reasoning. Teachers must still conduct full assessments for long-form answers to ensure marking aligns with specific exam board specifications.
Week 1: One class, one tool, one task type. Choose your most straightforward marking task (a homework quiz or vocabulary test) and one AI tool. Set the quiz, let the tool mark it, and spend 15 minutes reviewing the results. Note what the tool got right, what it missed, and how long the process took compared to manual marking.
Week 2: Add feedback generation. Take a set of extended writing from the same class. Use an AI tool to generate draft feedback, then review and personalise each piece before returning it to pupils. Track the time difference: how long did AI-assisted feedback take compared to writing it from scratch?
After two weeks, you have enough data to decide whether to expand AI marking to other classes and task types. Most teachers find that the initial investment in learning the tool pays back within the first month. The key is starting with tasks where AI is genuinely reliable, building confidence, and expanding gradually.
For a broader perspective on integrating AI into your teaching practice, see our guide to AI for teachers, which covers lesson planning, differentiation, and building AI literacy alongside assessment.
The research base on AI in educational assessment is growing rapidly. These papers provide the evidence behind the recommendations in this article.
The Power of Feedback View study ↗
Hattie and Timperley (2007)
The foundational framework for understanding feedback in education. Identifies four feedback levels (task, process, self-regulation, self) and demonstrates that feedback targeting the process and self-regulation levels has the greatest impact on learning. Essential context for understanding where AI feedback fits.
ChatGPT for Good? On Opportunities and Challenges of Large Language Models for Education View study ↗
0 citations
Kasneci et al. (2023)
Comprehensive analysis of how large language models can support teaching and learning. Particularly relevant for its discussion of AI-generated feedback quality and the "human-in-the-loop" model where teachers review AI outputs before pupils see them.
Embedded Formative Assessment View study ↗
4,100+ citations
Wiliam (2011)
The definitive guide to formative assessment in UK classrooms. Wiliam's five key strategies provide the framework within which AI marking tools should operate. His emphasis on "responsive teaching" aligns with using AI for rapid diagnostic data while reserving professional judgement for interpretive assessment.
AI and the Future of Assessment in Education View study ↗
DfE Official Guidance
Department for Education (2025)
The UK government's position on AI use in schools, including specific guidance on marking and assessment. Establishes that AI should support rather than replace teacher judgement, and sets expectations for data protection when using AI tools with pupil work.
Automated Essay Scoring and Its Impact on Writing Assessment View study ↗
340+ citations
Bridgeman et al. (2012)
Critical research on the limitations of automated essay scoring, including evidence of bias against non-standard dialects and correlation between essay length and AI-assigned scores. Important reading for any school considering AI for writing assessment.
<script type="application/ld+json">{"@context":"https://schema.org","@graph":[{"@type":"Article","@id":"https://www.structural-learning.com/post/ai-marking-and-feedback#article","headline":"AI Marking and Feedback: A Teacher's Guide","description":"A practical guide to using AI for marking and feedback in UK schools. Covers what AI can and cannot mark, subject-specific approaches, tool comparison.","datePublished":"2026-02-19T16:14:07.221Z","dateModified":"2026-03-02T10:59:46.498Z","author":{"@type":"Person","name":"Paul Main","url":"https://www.structural-learning.com/team/paulmain","jobTitle":"Founder & Educational Consultant"},"publisher":{"@type":"Organization","name":"Structural Learning","url":"https://www.structural-learning.com","logo":{"@type":"ImageObject","url":"https://cdn.prod.website-files.com/5b69a01ba2e409e5d5e055c6/6040bf0426cb415ba2fc7882_newlogoblue.svg"}},"mainEntityOfPage":{"@type":"WebPage","@id":"https://www.structural-learning.com/post/ai-marking-and-feedback"},"image":"https://cdn.prod.website-files.com/5b69a01ba2e409501de055d1/69a2d7bd2cafe072f4e6a2d8_69a2d7bb0f396190b84f38a5_ai-first-teacher-last-nb2-infographic.webp","wordCount":3464},{"@type":"BreadcrumbList","@id":"https://www.structural-learning.com/post/ai-marking-and-feedback#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.structural-learning.com/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https://www.structural-learning.com/blog"},{"@type":"ListItem","position":3,"name":"AI Marking and Feedback: A Teacher's Guide","item":"https://www.structural-learning.com/post/ai-marking-and-feedback"}]},{"@type":"FAQPage","mainEntity":[{"@type":"Question","name":"What is AI marking and how does it work in schools?","acceptedAnswer":{"@type":"Answer","text":"AI marking involves using artificial intelligence software to grade pupil work and generate feedback. In schools, these tools automatically assess multiple-choice quizzes, check grammar and evaluate short factual answers. They work by comparing pupil responses against programmed rubrics and identifying patterns, saving teachers significant administrative time."}},{"@type":"Question","name":"How do teachers implement AI marking in the classroom?","acceptedAnswer":{"@type":"Answer","text":"Teachers usually begin by using AI tools to mark routine assessments like vocabulary tests or homework quizzes. The most effective approach is a method where the AI creates the first draft of the comments. The teacher then reviews and edits this text to add personal context before returning the work to the pupil."}},{"@type":"Question","name":"What are the benefits of using AI for pupil feedback?","acceptedAnswer":{"@type":"Answer","text":"The primary benefit is a massive reduction in teacher workload for routine marking tasks. Research by the Education Endowment Foundation indicates that automating factual marking can free up three to five hours per week. Teachers can then spend this recovered time planning better lessons or writing highly targeted feedback for complex assignments."}},{"@type":"Question","name":"What does the research say about AI marking in education?","acceptedAnswer":{"@type":"Answer","text":"The Department for Education states that AI can support teachers by providing timely feedback on routine tasks. However, official guidance explicitly warns that algorithms must not replace professional teacher judgement regarding pupil attainment. Educational research supports an approach where the AI generates an initial response and the teacher refines it before the pupil sees it."}},{"@type":"Question","name":"What are common mistakes when using AI to mark pupil work?","acceptedAnswer":{"@type":"Answer","text":"A major mistake is trusting AI to accurately assess extended essays or creative writing. Algorithms currently struggle to evaluate the quality of a historical argument or recognise a pupil's unique creative voice. Another common error is giving automated feedback directly to pupils without a teacher reviewing it first for tone and personal context."}},{"@type":"Question","name":"Can AI accurately mark GCSE extended writing and essays?","acceptedAnswer":{"@type":"Answer","text":"No current AI tool can reliably mark GCSE extended writing with the accuracy of an experienced teacher. While software can highlight spelling and grammatical errors, it cannot judge nuanced arguments or subject-specific reasoning. Teachers must still conduct full assessments for long-form answers to ensure marking aligns with specific exam board specifications."}}]}]}</script>