A practical guide to using AI for marking and feedback in UK schools. Covers what AI can and cannot mark, subject-specific approaches, tool comparison, bias risks, data privacy, and a step-by-step workflow for getting started.
AI marking tools can now grade multiple-choice quizzes, highlight grammatical errors and generate written feedback on pupil work. They cannot, however, judge the quality of a historical argument, recognise a pupil's growing confidence, or notice that a quiet Year 9 student finally attempted the extension task. Understanding this boundary is what separates effective AI-assisted marking from a dangerous shortcut.
Key Takeaways
AI handles routine marking well: Multiple-choice, short-answer and grammar checks are reliably automated, freeing 3-5 hours per week for most teachers (Education Endowment Foundation, 2024).
Extended writing still needs a teacher: No current AI tool can reliably assess argument quality, creative voice or subject-specific reasoning against exam board criteria.
Feedback quality matters more than speed: AI-generated feedback works best when teachers review it before pupils see it, adding context the algorithm cannot know.
Start with low-stakes work: Homework quizzes and vocabulary tests are the safest entry point. Build from there as you learn what the tool can and cannot do.
The Department for Education's 2025 guidance on AI in schools specifically addresses marking and feedback, noting that "AI can support teachers in providing timely feedback but should not replace professional judgement on pupil attainment" (DfE, 2025). This article sets out what that looks like in practice, subject by subject, from primary to Key Stage 4.
What AI Can and Cannot Mark
The distinction is straightforward. AI marking tools work well on tasks with clear right-or-wrong answers and struggle with anything requiring interpretation. A Year 3 spelling test and a GCSE English Language Paper 2 response require fundamentally different kinds of assessment.
Task Type
AI Reliability
Teacher Role
Example
Multiple-choice quizzes
High
Review misconception patterns
KS2 science end-of-topic quiz
Short-answer recall
High
Check for partial credit edge cases
Year 8 history key dates
Grammar and spelling
High
None needed for surface errors
Year 5 writing SPaG check
Maths calculations
High
Review method marks vs answer marks
Year 10 algebra homework
Extended writing (argument)
Low
Full assessment required
GCSE English Language Paper 2
Creative writing
Very low
Full assessment required
Year 7 narrative writing
Practical/performance
Not applicable
Teacher observation only
PE, drama, science practicals
The key principle: AI should mark the work that takes you the most time but requires the least professional judgement. A set of 30 vocabulary tests takes 45 minutes of a teacher's evening. AI handles them in seconds, with equal accuracy. That 45 minutes is better spent writing targeted feedback on three pupils' essays.
How AI Feedback Differs from Teacher Feedback
AI feedback is instant, consistent and impersonal. Teacher feedback is slower, variable and deeply contextual. Both have value, and the research suggests they work best in combination (Hattie and Timperley, 2007).
When a Year 10 pupil submits a geography essay, an AI tool can identify structural weaknesses: paragraphs without topic sentences, missing evidence, or conclusions that introduce new information. What the AI cannot do is recognise that this particular pupil struggled with paragraph structure all term and has finally produced a coherent opening. That contextual knowledge changes what feedback you give.
The most effective model is what researchers call "AI-first, teacher-last" feedback (Kasneci et al., 2023). The AI generates an initial response. The teacher reviews it, removes anything inaccurate, adds personal context, and decides what the pupil sees. This takes less time than writing feedback from scratch but produces something better than either teacher or AI could manage alone.
The Feedback Quality Framework
Not all feedback is equal. Hattie and Timperley's (2007) model identifies four levels, and AI performs differently at each.
The practical implication: use AI for task-level and some process-level feedback. Reserve your time for self-regulation and personal feedback, where your knowledge of the pupil is irreplaceable. This aligns with what Dylan Wiliam (2011) calls "responsive teaching": using assessment information to adapt instruction in real time.
AI Marking Tools: An Honest Comparison
The market for AI marking tools is growing rapidly, but quality varies. Some tools are designed specifically for UK education; others are adapted from American systems with different curricula and assessment frameworks. Here is what is currently available, with limitations clearly stated.
Tool
Best For
Limitations
UK Curriculum Alignment
Marking.ai
KS3/KS4 extended writing feedback
Requires rubric setup; inconsistent on creative writing
No single tool replaces a teacher's marking. The most effective approach combines two or three tools for different task types: one for quiz auto-marking, one for writing feedback generation, and your own professional judgement for everything else.
AI Marking by Subject
The value of AI marking varies significantly across subjects. What works in mathematics homework does not transfer directly to English literature essays. Here is a subject-by-subject breakdown of where AI adds genuine value and where it falls short.
English
AI can check SPaG (spelling, punctuation and grammar) with high accuracy. It can identify missing paragraphs, flag overuse of simple sentences, and detect where a pupil has not addressed the question. What it cannot do is assess the quality of a metaphor, the effectiveness of a structural choice, or whether a Year 11 student has developed a convincing personal voice. Use AI to handle the surface features of writing so you can focus your marking time on content and craft.
Practical example: After a Year 9 persuasive writing task, run all 30 pieces through Grammarly for Education to flag SPaG errors. Then spend your marking time on argument structure, evidence use and rhetorical technique. You have saved 40 minutes on surface marking and redirected that time to the feedback that actually shifts grades.
Mathematics
Maths is where AI marking works best. Correct answers are unambiguous, and many platforms can now trace method marks by recognising working-out steps. Tools like MyMaths and Hegarty Maths auto-mark homework and generate reports showing which topics need reteaching. The limitation is non-standard methods: a pupil who solves a problem using an unconventional but valid approach may be marked incorrect by an algorithm expecting a specific method.
Practical example: Set a Year 7 fractions homework on Hegarty Maths. The platform marks it overnight and produces a class summary showing that 18 of 28 pupils struggled with converting mixed numbers. You now have diagnostic data before the next lesson, without marking a single paper.
Science
AI handles factual recall questions well. Educake is widely used in UK science departments for end-of-topic quizzes, and its auto-marking is reliable for closed questions. The challenge comes with "explain" and "evaluate" questions, where pupils must construct scientific arguments. These require a teacher's understanding of whether the pupil has demonstrated genuine conceptual understanding or simply recalled key phrases.
Practical example: Use Educake for a Year 10 biology quiz on cell division. The platform auto-marks 20 recall questions and flags three pupils who consistently confuse mitosis and meiosis. For the two 6-mark "explain" questions, you mark those yourself, using the quiz data to target your written feedback.
Humanities
History, geography and RE involve extended analytical writing where AI marking is least reliable. A history essay on the causes of World War One requires the assessor to evaluate source interpretation, argument strength and historical reasoning, none of which current AI tools handle well. Where AI adds value is in the preparatory stages: checking that pupils have included required source references, flagging essays that are significantly under the word count, and identifying structural weaknesses like missing conclusions.
Practical example: Before marking a set of Year 8 history essays, paste the question and mark scheme into ChatGPT and ask it to generate a checklist of what a strong answer includes. Then use that checklist as a marking aid, rather than relying on AI to assess the essays directly.
Primary
In primary settings, AI marking works well for phonics checks, spelling tests and times tables quizzes. Many schools already use Times Tables Rock Stars, which auto-marks and tracks progress without any teacher input. For writing assessment in Key Stage 1 and 2, teacher moderation remains essential because the writing frameworks (working towards, expected, greater depth) require professional judgement about consistency across a piece of work.
Practical example: A Year 4 teacher uses Times Tables Rock Stars for daily recall practice and an auto-marked reading comprehension quiz on Purple Mash for homework. This frees two hours per week for detailed feedback on extended writing, where teacher assessment against the writing frameworks is required.
Writing Effective AI Marking Prompts
If you are using a general-purpose AI tool like ChatGPT or Claude to generate feedback, the quality of the output depends entirely on the quality of your prompt. A vague instruction produces vague feedback. A specific, structured prompt produces feedback you can use.
Weak Prompt
Strong Prompt
"Mark this essay"
"This is a Year 10 GCSE English Language Paper 2 response on animal testing. Using AQA's mark scheme for Question 5 (content and organisation: 24 marks; SPaG: 16 marks), identify two strengths and two areas for improvement. Write feedback in second person, addressed to the pupil."
"Give feedback on this work"
"This Year 8 pupil wrote a paragraph explaining why Henry VIII broke from Rome. The learning objective was to use evidence to support a historical claim. Provide one piece of praise (what they did well) and one 'next step' (specific improvement). Keep the language at a reading age of 12."
The five elements of an effective AI marking prompt are: year group, subject and exam board, the specific task or question, the assessment criteria, and the format you want the feedback in. Missing any one of these produces generic output.
A Worked Prompt for Science Feedback
Here is a complete prompt you could paste into any AI tool:
"You are a KS3 science teacher in a UK state school. A Year 9 pupil has answered the following 6-mark question: 'Explain how vaccination prevents disease.' Their response is below. Using the AQA trilogy science mark scheme for 6-mark questions, provide: (1) a mark out of 6 with brief justification, (2) one specific strength with a quote from their work, (3) one specific improvement with an example of what they should have written. Write all feedback addressed to the pupil using 'you' language."
This level of specificity consistently produces feedback that teachers find usable. Without the exam board, year group and format instructions, the same AI tool produces generic comments that add no value.
Bias and Fairness in AI Marking
AI marking tools are trained on existing data, which means they reproduce existing biases. Research has shown that automated essay scoring systems can penalise non-standard English dialects, favour longer responses regardless of quality, and score formulaic writing higher than original thinking (Bridgeman et al., 2012).
For UK teachers, the specific risks include:
Dialect bias: Pupils who write in regional or culturally influenced English may receive lower scores from AI tools trained primarily on standard academic English. A Year 9 pupil in Birmingham writing "I was proper shocked" in a creative piece is making a deliberate stylistic choice, not a grammatical error.
Length bias: Most AI grading systems correlate length with quality. A concise, well-argued paragraph may score lower than a rambling, repetitive one simply because it is shorter. This particularly affects pupils with SEND who may write less but with greater precision.
Formulaic preference: AI tools trained on high-scoring exam responses learn to reward structural conventions (PEEL paragraphs, topic sentences, discourse markers) even when a pupil achieves the same quality through a less conventional structure. This can disadvantage creative or divergent thinkers.
The practical response is straightforward: never use AI as the sole assessor for any work that contributes to pupil grades. Use it as a first-pass filter, then apply your own professional judgement. Where you notice patterns of bias, adjust the tool's rubric or switch to manual assessment for that task type.
Data Privacy and GDPR
Uploading pupil work to AI tools creates data protection obligations. Under UK GDPR, pupil work containing personal information (names, schools, identifiable details) is personal data. Before using any AI marking tool, check three things.
Where is the data processed? Tools using US-based servers may not meet UK adequacy requirements. Check whether the tool offers a UK or EU data centre option. Marking.ai, for example, processes data within the UK; ChatGPT's free tier processes data globally.
Is pupil work used for training? Some AI tools use submitted text to improve their models. This means a pupil's essay could influence future outputs. Check the tool's terms of service for data retention and training clauses. Where possible, use tools that explicitly exclude educational data from model training.
Do you have a DPIA? A Data Protection Impact Assessment is required when processing children's data at scale. Your school's Data Protection Officer should review any AI marking tool before it is deployed across a department. The DfE (2025) recommends that schools maintain a register of all AI tools used with pupil data.
A simple safeguard: before uploading pupil work, remove names and replace them with candidate numbers or initials. This reduces the data protection risk to near zero while preserving the AI tool's ability to provide useful feedback.
Building an AI Marking Workflow
The most effective approach is not to replace your existing marking with AI but to restructure it so that AI handles the routine tasks and you focus on the high-value assessment work. Here is a weekly workflow that several UK schools have adopted successfully.
This workflow saves approximately 2.5 hours per week. Over a 39-week school year, that is nearly 100 hours redirected from routine marking to higher-value teaching tasks: planning better lessons, providing targeted intervention, and building relationships with pupils.
Common Mistakes with AI Marking
Schools adopting AI marking tools make predictable errors. Recognising these in advance saves significant wasted effort.
1. Trusting AI grades for reporting. AI-generated marks should never go directly into a markbook without teacher verification. A tool that gives a Year 10 essay 18/30 may be broadly right, but the difference between 16 and 20 can determine a predicted grade. Only use AI marks for formative purposes.
2. Giving pupils raw AI feedback. Unreviewed AI feedback can be confusing, contradictory or inappropriate. A pupil who receives "Your analysis lacks depth" without further explanation is no better off than receiving no feedback at all. Always review before sharing.
3. Using the wrong tool for the task. Running a creative writing portfolio through a grammar checker does not constitute marking. Match the tool to the assessment objective. Grammar tools check grammar. They do not assess imagination, voice or narrative craft.
4. Ignoring the workload shift. AI marking does not eliminate workload; it shifts it. You spend less time on routine marking but more time reviewing AI outputs, managing data, and handling the inevitable edge cases where the AI gets it wrong. Budget for this transition period.
5. Skipping the training. DfE data from 2024 shows that 76% of teachers have received no formal training on AI tools (DfE, 2024). Without understanding what the tool can and cannot do, teachers either over-rely on it or abandon it after a bad experience. Invest 30 minutes in learning the tool's strengths and limits before deploying it with live pupil work.
Peer and Self-Assessment with AI
AI tools can support peer and self-assessment by providing a reference point against which pupils compare their own work. Rather than asking "Is my essay good?", the pupil can ask the AI to identify structural features, then compare the AI's analysis with their own self-assessment. This builds metacognitive skills: the ability to evaluate one's own learning (Flavell, 1979).
A practical classroom approach: after pupils complete a piece of writing, ask them to self-assess against three success criteria. Then run the same piece through an AI tool that provides feedback against the same criteria. Pupils compare their self-assessment with the AI's analysis and write a short reflection on the differences. This teaches pupils to calibrate their own judgement, which is the foundation of independent learning.
The risk is that pupils treat AI feedback as the "correct" answer, undermining the purpose of self-assessment. Frame the AI output as "one perspective" rather than the definitive assessment. Emphasise that the teacher's judgement, informed by knowledge of the pupil, remains the standard against which work is measured.
Getting Started: Your First Two Weeks
Start small. The biggest risk with AI marking is trying to transform everything at once, getting overwhelmed, and reverting to old habits. This two-week plan introduces AI marking gradually, with built-in checkpoints.
Week 1: One class, one tool, one task type. Choose your most straightforward marking task (a homework quiz or vocabulary test) and one AI tool. Set the quiz, let the tool mark it, and spend 15 minutes reviewing the results. Note what the tool got right, what it missed, and how long the process took compared to manual marking.
Week 2: Add feedback generation. Take a set of extended writing from the same class. Use an AI tool to generate draft feedback, then review and personalise each piece before returning it to pupils. Track the time difference: how long did AI-assisted feedback take compared to writing it from scratch?
After two weeks, you have enough data to decide whether to expand AI marking to other classes and task types. Most teachers find that the initial investment in learning the tool pays back within the first month. The key is starting with tasks where AI is genuinely reliable, building confidence, and expanding gradually.
The research base on AI in educational assessment is growing rapidly. These papers provide the evidence behind the recommendations in this article.
The Power of FeedbackView study ↗ 9,400+ citations
Hattie and Timperley (2007)
The foundational framework for understanding feedback in education. Identifies four feedback levels (task, process, self-regulation, self) and demonstrates that feedback targeting the process and self-regulation levels has the greatest impact on learning. Essential context for understanding where AI feedback fits.
ChatGPT for Good? On Opportunities and Challenges of Large Language Models for EducationView study ↗ 2,800+ citations
Kasneci et al. (2023)
Comprehensive analysis of how large language models can support teaching and learning. Particularly relevant for its discussion of AI-generated feedback quality and the "human-in-the-loop" model where teachers review AI outputs before pupils see them.
Embedded Formative AssessmentView study ↗ 4,100+ citations
Wiliam (2011)
The definitive guide to formative assessment in UK classrooms. Wiliam's five key strategies provide the framework within which AI marking tools should operate. His emphasis on "responsive teaching" aligns with using AI for rapid diagnostic data while reserving professional judgement for interpretive assessment.
AI and the Future of Assessment in EducationView study ↗ DfE Official Guidance
Department for Education (2025)
The UK government's position on AI use in schools, including specific guidance on marking and assessment. Establishes that AI should support rather than replace teacher judgement, and sets expectations for data protection when using AI tools with pupil work.
Automated Essay Scoring and Its Impact on Writing AssessmentView study ↗ 340+ citations
Bridgeman et al. (2012)
Critical research on the limitations of automated essay scoring, including evidence of bias against non-standard dialects and correlation between essay length and AI-assigned scores. Important reading for any school considering AI for writing assessment.
AI marking tools can now grade multiple-choice quizzes, highlight grammatical errors and generate written feedback on pupil work. They cannot, however, judge the quality of a historical argument, recognise a pupil's growing confidence, or notice that a quiet Year 9 student finally attempted the extension task. Understanding this boundary is what separates effective AI-assisted marking from a dangerous shortcut.
Key Takeaways
AI handles routine marking well: Multiple-choice, short-answer and grammar checks are reliably automated, freeing 3-5 hours per week for most teachers (Education Endowment Foundation, 2024).
Extended writing still needs a teacher: No current AI tool can reliably assess argument quality, creative voice or subject-specific reasoning against exam board criteria.
Feedback quality matters more than speed: AI-generated feedback works best when teachers review it before pupils see it, adding context the algorithm cannot know.
Start with low-stakes work: Homework quizzes and vocabulary tests are the safest entry point. Build from there as you learn what the tool can and cannot do.
The Department for Education's 2025 guidance on AI in schools specifically addresses marking and feedback, noting that "AI can support teachers in providing timely feedback but should not replace professional judgement on pupil attainment" (DfE, 2025). This article sets out what that looks like in practice, subject by subject, from primary to Key Stage 4.
What AI Can and Cannot Mark
The distinction is straightforward. AI marking tools work well on tasks with clear right-or-wrong answers and struggle with anything requiring interpretation. A Year 3 spelling test and a GCSE English Language Paper 2 response require fundamentally different kinds of assessment.
Task Type
AI Reliability
Teacher Role
Example
Multiple-choice quizzes
High
Review misconception patterns
KS2 science end-of-topic quiz
Short-answer recall
High
Check for partial credit edge cases
Year 8 history key dates
Grammar and spelling
High
None needed for surface errors
Year 5 writing SPaG check
Maths calculations
High
Review method marks vs answer marks
Year 10 algebra homework
Extended writing (argument)
Low
Full assessment required
GCSE English Language Paper 2
Creative writing
Very low
Full assessment required
Year 7 narrative writing
Practical/performance
Not applicable
Teacher observation only
PE, drama, science practicals
The key principle: AI should mark the work that takes you the most time but requires the least professional judgement. A set of 30 vocabulary tests takes 45 minutes of a teacher's evening. AI handles them in seconds, with equal accuracy. That 45 minutes is better spent writing targeted feedback on three pupils' essays.
How AI Feedback Differs from Teacher Feedback
AI feedback is instant, consistent and impersonal. Teacher feedback is slower, variable and deeply contextual. Both have value, and the research suggests they work best in combination (Hattie and Timperley, 2007).
When a Year 10 pupil submits a geography essay, an AI tool can identify structural weaknesses: paragraphs without topic sentences, missing evidence, or conclusions that introduce new information. What the AI cannot do is recognise that this particular pupil struggled with paragraph structure all term and has finally produced a coherent opening. That contextual knowledge changes what feedback you give.
The most effective model is what researchers call "AI-first, teacher-last" feedback (Kasneci et al., 2023). The AI generates an initial response. The teacher reviews it, removes anything inaccurate, adds personal context, and decides what the pupil sees. This takes less time than writing feedback from scratch but produces something better than either teacher or AI could manage alone.
The Feedback Quality Framework
Not all feedback is equal. Hattie and Timperley's (2007) model identifies four levels, and AI performs differently at each.
The practical implication: use AI for task-level and some process-level feedback. Reserve your time for self-regulation and personal feedback, where your knowledge of the pupil is irreplaceable. This aligns with what Dylan Wiliam (2011) calls "responsive teaching": using assessment information to adapt instruction in real time.
AI Marking Tools: An Honest Comparison
The market for AI marking tools is growing rapidly, but quality varies. Some tools are designed specifically for UK education; others are adapted from American systems with different curricula and assessment frameworks. Here is what is currently available, with limitations clearly stated.
Tool
Best For
Limitations
UK Curriculum Alignment
Marking.ai
KS3/KS4 extended writing feedback
Requires rubric setup; inconsistent on creative writing
No single tool replaces a teacher's marking. The most effective approach combines two or three tools for different task types: one for quiz auto-marking, one for writing feedback generation, and your own professional judgement for everything else.
AI Marking by Subject
The value of AI marking varies significantly across subjects. What works in mathematics homework does not transfer directly to English literature essays. Here is a subject-by-subject breakdown of where AI adds genuine value and where it falls short.
English
AI can check SPaG (spelling, punctuation and grammar) with high accuracy. It can identify missing paragraphs, flag overuse of simple sentences, and detect where a pupil has not addressed the question. What it cannot do is assess the quality of a metaphor, the effectiveness of a structural choice, or whether a Year 11 student has developed a convincing personal voice. Use AI to handle the surface features of writing so you can focus your marking time on content and craft.
Practical example: After a Year 9 persuasive writing task, run all 30 pieces through Grammarly for Education to flag SPaG errors. Then spend your marking time on argument structure, evidence use and rhetorical technique. You have saved 40 minutes on surface marking and redirected that time to the feedback that actually shifts grades.
Mathematics
Maths is where AI marking works best. Correct answers are unambiguous, and many platforms can now trace method marks by recognising working-out steps. Tools like MyMaths and Hegarty Maths auto-mark homework and generate reports showing which topics need reteaching. The limitation is non-standard methods: a pupil who solves a problem using an unconventional but valid approach may be marked incorrect by an algorithm expecting a specific method.
Practical example: Set a Year 7 fractions homework on Hegarty Maths. The platform marks it overnight and produces a class summary showing that 18 of 28 pupils struggled with converting mixed numbers. You now have diagnostic data before the next lesson, without marking a single paper.
Science
AI handles factual recall questions well. Educake is widely used in UK science departments for end-of-topic quizzes, and its auto-marking is reliable for closed questions. The challenge comes with "explain" and "evaluate" questions, where pupils must construct scientific arguments. These require a teacher's understanding of whether the pupil has demonstrated genuine conceptual understanding or simply recalled key phrases.
Practical example: Use Educake for a Year 10 biology quiz on cell division. The platform auto-marks 20 recall questions and flags three pupils who consistently confuse mitosis and meiosis. For the two 6-mark "explain" questions, you mark those yourself, using the quiz data to target your written feedback.
Humanities
History, geography and RE involve extended analytical writing where AI marking is least reliable. A history essay on the causes of World War One requires the assessor to evaluate source interpretation, argument strength and historical reasoning, none of which current AI tools handle well. Where AI adds value is in the preparatory stages: checking that pupils have included required source references, flagging essays that are significantly under the word count, and identifying structural weaknesses like missing conclusions.
Practical example: Before marking a set of Year 8 history essays, paste the question and mark scheme into ChatGPT and ask it to generate a checklist of what a strong answer includes. Then use that checklist as a marking aid, rather than relying on AI to assess the essays directly.
Primary
In primary settings, AI marking works well for phonics checks, spelling tests and times tables quizzes. Many schools already use Times Tables Rock Stars, which auto-marks and tracks progress without any teacher input. For writing assessment in Key Stage 1 and 2, teacher moderation remains essential because the writing frameworks (working towards, expected, greater depth) require professional judgement about consistency across a piece of work.
Practical example: A Year 4 teacher uses Times Tables Rock Stars for daily recall practice and an auto-marked reading comprehension quiz on Purple Mash for homework. This frees two hours per week for detailed feedback on extended writing, where teacher assessment against the writing frameworks is required.
Writing Effective AI Marking Prompts
If you are using a general-purpose AI tool like ChatGPT or Claude to generate feedback, the quality of the output depends entirely on the quality of your prompt. A vague instruction produces vague feedback. A specific, structured prompt produces feedback you can use.
Weak Prompt
Strong Prompt
"Mark this essay"
"This is a Year 10 GCSE English Language Paper 2 response on animal testing. Using AQA's mark scheme for Question 5 (content and organisation: 24 marks; SPaG: 16 marks), identify two strengths and two areas for improvement. Write feedback in second person, addressed to the pupil."
"Give feedback on this work"
"This Year 8 pupil wrote a paragraph explaining why Henry VIII broke from Rome. The learning objective was to use evidence to support a historical claim. Provide one piece of praise (what they did well) and one 'next step' (specific improvement). Keep the language at a reading age of 12."
The five elements of an effective AI marking prompt are: year group, subject and exam board, the specific task or question, the assessment criteria, and the format you want the feedback in. Missing any one of these produces generic output.
A Worked Prompt for Science Feedback
Here is a complete prompt you could paste into any AI tool:
"You are a KS3 science teacher in a UK state school. A Year 9 pupil has answered the following 6-mark question: 'Explain how vaccination prevents disease.' Their response is below. Using the AQA trilogy science mark scheme for 6-mark questions, provide: (1) a mark out of 6 with brief justification, (2) one specific strength with a quote from their work, (3) one specific improvement with an example of what they should have written. Write all feedback addressed to the pupil using 'you' language."
This level of specificity consistently produces feedback that teachers find usable. Without the exam board, year group and format instructions, the same AI tool produces generic comments that add no value.
Bias and Fairness in AI Marking
AI marking tools are trained on existing data, which means they reproduce existing biases. Research has shown that automated essay scoring systems can penalise non-standard English dialects, favour longer responses regardless of quality, and score formulaic writing higher than original thinking (Bridgeman et al., 2012).
For UK teachers, the specific risks include:
Dialect bias: Pupils who write in regional or culturally influenced English may receive lower scores from AI tools trained primarily on standard academic English. A Year 9 pupil in Birmingham writing "I was proper shocked" in a creative piece is making a deliberate stylistic choice, not a grammatical error.
Length bias: Most AI grading systems correlate length with quality. A concise, well-argued paragraph may score lower than a rambling, repetitive one simply because it is shorter. This particularly affects pupils with SEND who may write less but with greater precision.
Formulaic preference: AI tools trained on high-scoring exam responses learn to reward structural conventions (PEEL paragraphs, topic sentences, discourse markers) even when a pupil achieves the same quality through a less conventional structure. This can disadvantage creative or divergent thinkers.
The practical response is straightforward: never use AI as the sole assessor for any work that contributes to pupil grades. Use it as a first-pass filter, then apply your own professional judgement. Where you notice patterns of bias, adjust the tool's rubric or switch to manual assessment for that task type.
Data Privacy and GDPR
Uploading pupil work to AI tools creates data protection obligations. Under UK GDPR, pupil work containing personal information (names, schools, identifiable details) is personal data. Before using any AI marking tool, check three things.
Where is the data processed? Tools using US-based servers may not meet UK adequacy requirements. Check whether the tool offers a UK or EU data centre option. Marking.ai, for example, processes data within the UK; ChatGPT's free tier processes data globally.
Is pupil work used for training? Some AI tools use submitted text to improve their models. This means a pupil's essay could influence future outputs. Check the tool's terms of service for data retention and training clauses. Where possible, use tools that explicitly exclude educational data from model training.
Do you have a DPIA? A Data Protection Impact Assessment is required when processing children's data at scale. Your school's Data Protection Officer should review any AI marking tool before it is deployed across a department. The DfE (2025) recommends that schools maintain a register of all AI tools used with pupil data.
A simple safeguard: before uploading pupil work, remove names and replace them with candidate numbers or initials. This reduces the data protection risk to near zero while preserving the AI tool's ability to provide useful feedback.
Building an AI Marking Workflow
The most effective approach is not to replace your existing marking with AI but to restructure it so that AI handles the routine tasks and you focus on the high-value assessment work. Here is a weekly workflow that several UK schools have adopted successfully.
This workflow saves approximately 2.5 hours per week. Over a 39-week school year, that is nearly 100 hours redirected from routine marking to higher-value teaching tasks: planning better lessons, providing targeted intervention, and building relationships with pupils.
Common Mistakes with AI Marking
Schools adopting AI marking tools make predictable errors. Recognising these in advance saves significant wasted effort.
1. Trusting AI grades for reporting. AI-generated marks should never go directly into a markbook without teacher verification. A tool that gives a Year 10 essay 18/30 may be broadly right, but the difference between 16 and 20 can determine a predicted grade. Only use AI marks for formative purposes.
2. Giving pupils raw AI feedback. Unreviewed AI feedback can be confusing, contradictory or inappropriate. A pupil who receives "Your analysis lacks depth" without further explanation is no better off than receiving no feedback at all. Always review before sharing.
3. Using the wrong tool for the task. Running a creative writing portfolio through a grammar checker does not constitute marking. Match the tool to the assessment objective. Grammar tools check grammar. They do not assess imagination, voice or narrative craft.
4. Ignoring the workload shift. AI marking does not eliminate workload; it shifts it. You spend less time on routine marking but more time reviewing AI outputs, managing data, and handling the inevitable edge cases where the AI gets it wrong. Budget for this transition period.
5. Skipping the training. DfE data from 2024 shows that 76% of teachers have received no formal training on AI tools (DfE, 2024). Without understanding what the tool can and cannot do, teachers either over-rely on it or abandon it after a bad experience. Invest 30 minutes in learning the tool's strengths and limits before deploying it with live pupil work.
Peer and Self-Assessment with AI
AI tools can support peer and self-assessment by providing a reference point against which pupils compare their own work. Rather than asking "Is my essay good?", the pupil can ask the AI to identify structural features, then compare the AI's analysis with their own self-assessment. This builds metacognitive skills: the ability to evaluate one's own learning (Flavell, 1979).
A practical classroom approach: after pupils complete a piece of writing, ask them to self-assess against three success criteria. Then run the same piece through an AI tool that provides feedback against the same criteria. Pupils compare their self-assessment with the AI's analysis and write a short reflection on the differences. This teaches pupils to calibrate their own judgement, which is the foundation of independent learning.
The risk is that pupils treat AI feedback as the "correct" answer, undermining the purpose of self-assessment. Frame the AI output as "one perspective" rather than the definitive assessment. Emphasise that the teacher's judgement, informed by knowledge of the pupil, remains the standard against which work is measured.
Getting Started: Your First Two Weeks
Start small. The biggest risk with AI marking is trying to transform everything at once, getting overwhelmed, and reverting to old habits. This two-week plan introduces AI marking gradually, with built-in checkpoints.
Week 1: One class, one tool, one task type. Choose your most straightforward marking task (a homework quiz or vocabulary test) and one AI tool. Set the quiz, let the tool mark it, and spend 15 minutes reviewing the results. Note what the tool got right, what it missed, and how long the process took compared to manual marking.
Week 2: Add feedback generation. Take a set of extended writing from the same class. Use an AI tool to generate draft feedback, then review and personalise each piece before returning it to pupils. Track the time difference: how long did AI-assisted feedback take compared to writing it from scratch?
After two weeks, you have enough data to decide whether to expand AI marking to other classes and task types. Most teachers find that the initial investment in learning the tool pays back within the first month. The key is starting with tasks where AI is genuinely reliable, building confidence, and expanding gradually.
The research base on AI in educational assessment is growing rapidly. These papers provide the evidence behind the recommendations in this article.
The Power of FeedbackView study ↗ 9,400+ citations
Hattie and Timperley (2007)
The foundational framework for understanding feedback in education. Identifies four feedback levels (task, process, self-regulation, self) and demonstrates that feedback targeting the process and self-regulation levels has the greatest impact on learning. Essential context for understanding where AI feedback fits.
ChatGPT for Good? On Opportunities and Challenges of Large Language Models for EducationView study ↗ 2,800+ citations
Kasneci et al. (2023)
Comprehensive analysis of how large language models can support teaching and learning. Particularly relevant for its discussion of AI-generated feedback quality and the "human-in-the-loop" model where teachers review AI outputs before pupils see them.
Embedded Formative AssessmentView study ↗ 4,100+ citations
Wiliam (2011)
The definitive guide to formative assessment in UK classrooms. Wiliam's five key strategies provide the framework within which AI marking tools should operate. His emphasis on "responsive teaching" aligns with using AI for rapid diagnostic data while reserving professional judgement for interpretive assessment.
AI and the Future of Assessment in EducationView study ↗ DfE Official Guidance
Department for Education (2025)
The UK government's position on AI use in schools, including specific guidance on marking and assessment. Establishes that AI should support rather than replace teacher judgement, and sets expectations for data protection when using AI tools with pupil work.
Automated Essay Scoring and Its Impact on Writing AssessmentView study ↗ 340+ citations
Bridgeman et al. (2012)
Critical research on the limitations of automated essay scoring, including evidence of bias against non-standard dialects and correlation between essay length and AI-assigned scores. Important reading for any school considering AI for writing assessment.
{"@context":"https://schema.org","@graph":[{"@type":"Article","headline":"AI Marking and Feedback: A Teacher's Guide","description":"A practical guide to using AI for marking and feedback in UK schools. Covers what AI can and cannot mark, subject-specific approaches, tool comparison, bias risks, data privacy, and a step-by-step workflow.","datePublished":"2026-02-19T16:14:20.962Z","dateModified":"2026-02-19T16:14:20.963Z","author":{"@type":"Person","name":"Paul Main","jobTitle":"Founder & Educational Consultant"},"publisher":{"@type":"Organization","name":"Structural Learning","logo":{"@type":"ImageObject","url":"https://assets-global.website-files.com/63316a087abe0105c6171382/63316a087abe01f540171755_Structural-Learning-Logo.png"}},"wordCount":3462},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.structural-learning.com/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https://www.structural-learning.com/blog"},{"@type":"ListItem","position":3,"name":"AI Marking and Feedback: A Teacher's Guide","item":"https://www.structural-learning.com/post/ai-marking-and-feedback"}]}]}