The Illusion of Knowing: A Calibration Toolkit for Teachers

Updated on  

March 31, 2026

The Illusion of Knowing: A Calibration Toolkit for Teachers

|

March 31, 2026

A practical toolkit for teaching learners to accurately judge what they know. Covers the Dunning-Kruger effect, judgements of learning, desirable difficulties, and a 20-minute calibration intervention for any subject.

A Year 10 learner finishes revising for a biology exam and tells her teacher she is confident she will score above 80%. She scores 47%. This is not laziness or dishonesty. It is a well-documented cognitive phenomenon called the calibration gap, and it affects every classroom in every school. Kruger and Dunning (1999) demonstrated that the lowest-performing individuals overestimate their performance by the largest margins, sometimes predicting scores 30-40 percentage points above their actual results. Hiller, Ihme and Pfeiffer (2020), in a study of 209 undergraduates, found that structured metacognitive training, combining psychoeducation about overconfidence with item-level judgement practice and feedback, significantly improved calibration accuracy across subsequent assessments. Critically, the learners who started with the worst calibration showed the largest improvements. The illusion of knowing is not a fixed trait. It is a skill deficit that responds to direct instruction. Yet no widely available classroom resource gives teachers a structured, time-efficient method for teaching calibration. This article provides that method.

Key Takeaways

  1. The calibration gap (the difference between what learners think they know and what they actually know) is the single largest barrier to effective self-regulated learning (Kruger and Dunning, 1999).
  2. Structured calibration training improves both monitoring accuracy and academic performance, with the weakest learners showing the greatest gains (Hiller et al., 2020).
  3. Teachers can implement a 20-minute calibration intervention using prediction, testing, comparison, and reflection in any subject and any key stage.
  4. Judgements of learning made at the item level (question by question) are more accurate predictors of performance than global judgements about overall readiness (Thiede, Anderson and Therriault, 2005).

What Is Calibration and Why Does It Matter?

Calibration is the degree of alignment between a learner's confidence in their knowledge and their actual knowledge. A perfectly calibrated learner who predicts they will score 70% on a test and then scores 70% has zero calibration error. In practice, perfect calibration is rare. What matters is the direction and size of the error.

Four-step process infographic showing how teachers can implement calibration training in 20 minutes
The 20-Minute Calibration Intervention

Koriat (1997) established the cue-utilisation framework for understanding how people make judgements of learning. His research demonstrated that when people assess how well they know something, they do not directly access the strength of their memory trace. Instead, they rely on cues: how familiar the material feels, how easily it comes to mind, and how fluently they can process it. These cues are often misleading. Material that has been recently read feels familiar and fluent, leading to high confidence, even though recognition familiarity is a poor predictor of recall ability.

Compares traditional study methods that lead to overconfidence with effective calibration strategies using item-level prediction and feedback to improve accuracy.
Misleading Cues vs. Calibration

This matters for teachers because the standard revision approach used by most secondary learners, re-reading notes and highlighting, produces exactly the misleading cues that inflate confidence. The learner re-reads her biology notes, the material feels familiar, she concludes she knows it, and she stops studying. On the exam, she cannot recall the information because recognition and recall are fundamentally different cognitive processes.

Infographic showing the four steps of the Calibration Skill Cycle: Predict Score, Take Quiz, Compare Answers, and Reflect Learning, to improve self-assessment accuracy.
Calibration Skill Cycle

Thiede, Anderson and Therriault (2005) showed that the timing and grain size of metacognitive judgements dramatically affect their accuracy. Global judgements ("How well do I know this topic?") were consistently less accurate than local, item-level judgements ("Can I answer this specific question?"). This finding has direct implications for classroom practice: asking learners to rate their overall confidence in a topic is less useful than asking them to predict their performance on specific questions.

Classroom Example: The Prediction Quiz

Before a retrieval practice quiz, give each learner a prediction sheet. Next to each question number, they write their confidence (1-5) that they can answer correctly. They then attempt the quiz. Afterwards, they compare their predictions with their actual scores question by question. The visual gap between the prediction column and the score column is the calibration error made visible. This takes 3 minutes of lesson time and produces immediate metacognitive insight.

The Dunning-Kruger Effect in the Classroom

The Dunning-Kruger effect is frequently misunderstood in popular culture. It is not simply that unintelligent people are overconfident. The original finding (Kruger and Dunning, 1999) was more specific and more useful for teachers: people who perform poorly on a task lack the same skills needed to accurately evaluate their performance on that task. The incompetence that leads to poor performance is the same incompetence that prevents accurate self-assessment.

Jansen, Rafferty and Griffiths (2021), writing in Nature Human Behaviour, developed a rational model of the Dunning-Kruger effect. Their study of approximately 4,000 participants across two experiments showed that the effect is driven primarily by the level of task performance itself, though metacognitive sensitivity (the ability to distinguish between correct and incorrect responses) also plays a contributing role. Critically, when task difficulty was adjusted so that all participants performed at equivalent success levels, the calibration gap disappeared. This suggests that the illusion of knowing is partly a function of task difficulty relative to skill level.

For teachers, this means that low-performing learners are not deliberately overconfident. They genuinely cannot tell the difference between knowing and not knowing because they lack the domain knowledge needed to make that distinction. The solution is not to tell them they are wrong about their confidence. It is to give them structured practice in making and checking predictions, so they develop the metacognitive skill alongside the domain knowledge.

Schleinschok, Eitel and Scheiter (2024), in a study published in Frontiers in Education examining physics learners across three courses, provided direct evidence for the "unskilled and unaware" hypothesis. Low-performing students were not only more overconfident before exams but were also less able to adjust their predictions after taking exams. This failure to recalibrate after feedback is the mechanism that perpetuates poor performance across a school year.

Classroom Example: The Confidence Calibration Graph

After three quizzes with prediction sheets, help each learner plot their data on a simple graph. The x-axis shows their average prediction (1-5) and the y-axis shows their average score (as a percentage). A perfectly calibrated learner sits on the diagonal line. Most learners will see they sit above the line (overconfident) or below it (underconfident). Showing learners their own calibration pattern, without judgement, is the single most powerful intervention available. The graph does the teaching.

A 20-Minute Calibration Intervention

This four-step protocol can be used in any subject, any key stage, and requires no special resources. It takes 20 minutes the first time and 10 minutes on subsequent uses.

Step 1: Predict (3 minutes). Before a retrieval task, learners make item-level predictions. For a 10-question quiz, they write a confidence rating (1-5) next to each question. For an extended task, they predict their score on a rubric. The key is specificity: "I think I will get question 4 correct because I can remember the formula" is more useful than "I think I know this topic."

Step 2: Test (5-8 minutes). Learners complete the retrieval task under normal conditions. This can be a quiz, a diagram from memory, a set of practice questions, or a written recall task. The format does not matter. What matters is that the task produces a clear, scorable outcome that can be compared to the prediction.

Step 3: Compare (3 minutes). Learners score their work (peer-marking works well here) and then compare their predictions with their actual scores. They highlight any question where their prediction was wrong by 2 or more points. These are their calibration blind spots, the specific areas where their metacognitive monitoring is least accurate.

Step 4: Reflect and Adjust (4 minutes). Learners answer two questions: "Which topics did I think I knew but actually did not?" and "What will I do differently to check my understanding next time?" The first question builds metacognitive knowledge. The second builds metacognitive regulation. Over repeated cycles, learners begin to internalise the prediction-test-compare loop and apply it independently during revision.

Hiller, Ihme and Pfeiffer (2020) demonstrated that this type of structured judgement training, when delivered alongside feedback and psychoeducation about overconfidence, produced improvements in calibration accuracy that exceeded the benefits of repeated testing alone. The metacognitive training group showed a nonlinear decrease in overconfidence that stabilised after approximately four cycles, suggesting that a half-term of weekly practice is sufficient for measurable improvement.

Classroom Example: Science Department Calibration Tracking

A science department runs the prediction quiz protocol every fortnight across Years 7-11. Each learner's calibration accuracy (the average gap between prediction and performance) is tracked on a simple spreadsheet. The data is reviewed at each half-term assessment point. Teachers can identify learners whose calibration is not improving, which signals that they need additional metacognitive support, not just more content teaching.

Desirable Difficulties and the Calibration Paradox

Bjork (1994) introduced the concept of desirable difficulties: learning conditions that make initial performance harder but improve long-term retention. Spacing, interleaving, and retrieval practice all qualify. The calibration paradox is that desirable difficulties, precisely because they make learning feel harder, reduce confidence. Learners who space their revision and use retrieval practice often feel less confident than learners who mass their practice and re-read, even though the first group will perform better on delayed tests.

This creates a dangerous feedback loop. A learner tries retrieval practice, feels that it is difficult, concludes she does not know the material, and switches back to re-reading, which feels fluent and produces high (but false) confidence. The illusion of knowing actually punishes effective study strategies and rewards ineffective ones.

Koriat (1997) explained this through his cue-utilisation model. Judgements of learning are based on processing fluency: how easily information comes to mind. Re-reading produces high fluency (the material feels familiar) and therefore high confidence. Retrieval practice produces low fluency (the material is difficult to recall) and therefore low confidence, even when it is producing stronger learning.

Teachers can break this loop by teaching learners about the calibration paradox explicitly. When learners understand that difficulty during practice is a sign of learning, not a sign of failure, they can override the misleading fluency cue and maintain effective strategies despite low confidence.

Classroom Example: The Difficulty Diary

Ask learners to keep a one-column diary during revision sessions. After each study session, they write one sentence: "This felt [easy/medium/hard]." After the test, they compare their difficulty ratings with their scores for each topic. Over time, a pattern emerges: the topics that felt hardest during revision often produce the highest scores. This experiential evidence is more persuasive than any teacher explanation of the testing effect.

Calibration Across Key Stages

Calibration instruction needs to be adapted to the developmental stage of the learner, but the core principle remains constant: make predictions, check them against reality, and use the gap to inform future learning.

Key Stage 1 (Ages 5-7). Use physical self-assessment with immediate checking. After a phonics activity, learners hold up a smiley face, straight face, or sad face to show their confidence. The teacher then asks them to read the target words. Learners compare their face with their reading performance. The focus is on building the habit of self-assessment, not on accuracy.

Key Stage 2 (Ages 7-11). Introduce the prediction quiz format with simple numerical scales. Learners predict their scores on multiplication table tests, spelling tests, or reading comprehension questions. Track calibration accuracy over a half-term. Introduce the concept of overconfidence explicitly: "Sometimes our brain tricks us into thinking we know something when we actually just recognise it."

Key Stage 3 (Ages 11-14). Expand to rubric-based prediction. Before submitting an extended piece of work, learners predict their level on each rubric criterion. Compare predictions with teacher assessments. Discuss the specific criteria where calibration errors are largest. This develops criterion-referenced self-assessment.

Key Stage 4 and 5 (Ages 14-18). Introduce the calibration graph. Learners plot their prediction accuracy across multiple assessments and identify their personal calibration bias (consistently overconfident, underconfident, or variable). Discuss the relationship between calibration and revision strategy. Link to the desirable difficulties research explicitly.

Boud and Falchikov (2024), in a meta-analysis examining interventions for monitoring accuracy in problem solving, found that interventions targeting metacognitive knowledge and external standards improved monitoring accuracy with a small but significant effect size (g = 0.25). Interventions involving the whole task (predicting and comparing across a complete assessment) were more effective than those targeting only timing or individual components.

Classroom Example: Year 9 English Calibration Wall

An English department creates a "Calibration Wall" in each Year 9 classroom. After each assessment, learners place a post-it note showing their predicted grade and actual grade. The wall makes calibration patterns visible across the class. A learner who consistently predicts grade 7 but achieves grade 5 can see that pattern physically. The teacher uses the wall to identify learners for targeted calibration support.

The Role of Feedback in Calibration Improvement

Calibration does not improve through practice alone. It requires specific, timely feedback about the accuracy of predictions, not just about task performance.

The study by Garcia Conejero, Pinilla Lebrero and Garcia Gallego (2025) randomised university students into groups receiving monetary incentives, metacognitive feedback, both, or neither. The critical finding was that metacognitive feedback (showing learners how accurate their predictions were) had no significant effect on calibration accuracy when delivered in isolation. However, when combined with explicit instruction about miscalibration and opportunities for repeated practice, feedback became effective.

This aligns with Hiller, Ihme and Pfeiffer's (2020) finding that the combination of psychoeducation, item-specific judgement practice, and feedback produced greater calibration improvements than any single component. The practical implication is that teachers need to do more than simply show learners their calibration data. They need to teach learners why calibration matters, give them structured practice in making and checking predictions, and provide feedback on the accuracy of those predictions.

Pintrich (2005) found that global monitoring judgements showed stability across an entire semester, even with repeated practice. Local, item-level judgements were more responsive to feedback and more strongly correlated with academic performance. This reinforces the recommendation to focus calibration practice on specific, question-level predictions rather than global confidence ratings.

Classroom Example: The Feedback Sandwich

When returning marked assessments, give learners three minutes before they see their marks. In this time, they write their predicted score. Then they receive the mark. Then they spend three minutes comparing prediction with reality and identifying the specific questions where they were most miscalibrated. This simple "predict-receive-compare" sandwich turns every assessment into a calibration training opportunity at zero additional cost.

Calibration and Formative Assessment

Calibration is not separate from formative assessment. It is formative assessment turned inward. Where formative assessment asks "What does this learner know?", calibration asks "Does this learner know what they know?" Both questions are essential for effective learning, and neither is sufficient alone.

A learner with good domain knowledge but poor calibration will revise the wrong topics, under-prepare for difficult content, and over-prepare for content already mastered. A learner with poor domain knowledge but good calibration will at least direct their limited study time to their actual weaknesses.

The research from Kalender, Marshman and Singh (2024) on metacognitive monitoring in STEM found that low-performing physics students were consistently more overconfident and less able to recalibrate after exams. The failure was not in their physics knowledge alone but in their inability to accurately monitor that knowledge. The researchers argued that metacognitive monitoring ability and domain knowledge develop together, and that interventions targeting only content knowledge miss half the problem.

For assessment leads and heads of department, this creates a practical mandate: build calibration checks into the assessment cycle. Every summative assessment is a calibration data point. Every formative assessment is a calibration training opportunity. The data already exists in schools; it simply needs to be used for metacognitive purposes alongside academic ones.

Your Next Lesson

Before your next quiz or test, hand out a prediction sheet. Ask each learner to rate their confidence (1-5) for each question before they attempt it. After the test, give them two minutes to compare predictions with scores. Ask: "Where were you most wrong about your own knowledge?" Do this once a fortnight for a half-term and track the average calibration gap. You will see it narrow. The learners who improve their calibration will also improve their performance, because accurate monitoring drives effective regulation. The illusion of knowing is breakable. It just takes practice.

Free Resource Pack

The Illusion of Knowing: A Calibration Toolkit

4 ready-to-use resources to help teachers and students accurately assess understanding and improve metacognition.

The Illusion of Knowing: A Calibration Toolkit — 4 resources
Metacognition Self-Assessment Cognitive Bias Teacher CPD Student Self-Regulation Classroom Strategies Formative Assessment

Download your free bundle

Fill in your details below and we'll send the resource pack straight to your inbox.

Quick survey (helps us create better resources)

How confident are you in your ability to identify and address the 'illusion of knowing' in your students?

Not at all confident
Slightly confident
Moderately confident
Quite confident
Very confident

To what extent do your colleagues and school culture support metacognitive strategies for accurate self-assessment?

Not at all
Slightly
Moderately
Significantly
Extensively

How often do you explicitly teach and provide tools for students to calibrate their understanding?

Never
Rarely
Sometimes
Often
Always

Your resource pack is ready

We've also sent a copy to your email. Check your inbox.

References

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe and A. P. Shimamura (Eds.), Metacognition: Knowing About Knowing (pp. 185-205). MIT Press.

Boud, D. and Falchikov, N. (2024). Meta-analysis of interventions for monitoring accuracy in problem solving. Educational Psychology Review, 36, 45.

Garcia Conejero, J., Pinilla Lebrero, J. J. and Garcia Gallego, A. (2025). The role of monetary incentives and feedback on how well students calibrate their academic performance. European Journal of Education, 60(1), e12834.

Hiller, S., Ihme, T. A. and Pfeiffer, H. C. (2020). Enhanced monitoring accuracy and test performance: Incremental effects of judgment training over and above repeated testing. Learning and Instruction, 65, 101245.

Jansen, R. A., Rafferty, A. N. and Griffiths, T. L. (2021). A rational model of the Dunning-Kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5, 756-763.

Kalender, Z. Y., Marshman, E. and Singh, C. (2024). Unskilled and unaware? Differences in metacognitive awareness between high and low-ability students in STEM. Frontiers in Education, 9, 1375638.

Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126(4), 349-370.

Kruger, J. and Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121-1134.

Pintrich, P. A. (2005). Metacognitive monitoring accuracy and student performance in the postsecondary classroom. Journal of Experimental Education, 73(4), 269-286.

Schleinschok, K., Eitel, A. and Scheiter, K. (2024). Unskilled and unaware? Differences in metacognitive awareness between high and low-ability students in STEM. Frontiers in Education, 9, 1375638.

Thiede, K. W., Anderson, M. C. M. and Therriault, D. (2005). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95(1), 66-73.

Loading audit...

A Year 10 learner finishes revising for a biology exam and tells her teacher she is confident she will score above 80%. She scores 47%. This is not laziness or dishonesty. It is a well-documented cognitive phenomenon called the calibration gap, and it affects every classroom in every school. Kruger and Dunning (1999) demonstrated that the lowest-performing individuals overestimate their performance by the largest margins, sometimes predicting scores 30-40 percentage points above their actual results. Hiller, Ihme and Pfeiffer (2020), in a study of 209 undergraduates, found that structured metacognitive training, combining psychoeducation about overconfidence with item-level judgement practice and feedback, significantly improved calibration accuracy across subsequent assessments. Critically, the learners who started with the worst calibration showed the largest improvements. The illusion of knowing is not a fixed trait. It is a skill deficit that responds to direct instruction. Yet no widely available classroom resource gives teachers a structured, time-efficient method for teaching calibration. This article provides that method.

Key Takeaways

  1. The calibration gap (the difference between what learners think they know and what they actually know) is the single largest barrier to effective self-regulated learning (Kruger and Dunning, 1999).
  2. Structured calibration training improves both monitoring accuracy and academic performance, with the weakest learners showing the greatest gains (Hiller et al., 2020).
  3. Teachers can implement a 20-minute calibration intervention using prediction, testing, comparison, and reflection in any subject and any key stage.
  4. Judgements of learning made at the item level (question by question) are more accurate predictors of performance than global judgements about overall readiness (Thiede, Anderson and Therriault, 2005).

What Is Calibration and Why Does It Matter?

Calibration is the degree of alignment between a learner's confidence in their knowledge and their actual knowledge. A perfectly calibrated learner who predicts they will score 70% on a test and then scores 70% has zero calibration error. In practice, perfect calibration is rare. What matters is the direction and size of the error.

Four-step process infographic showing how teachers can implement calibration training in 20 minutes
The 20-Minute Calibration Intervention

Koriat (1997) established the cue-utilisation framework for understanding how people make judgements of learning. His research demonstrated that when people assess how well they know something, they do not directly access the strength of their memory trace. Instead, they rely on cues: how familiar the material feels, how easily it comes to mind, and how fluently they can process it. These cues are often misleading. Material that has been recently read feels familiar and fluent, leading to high confidence, even though recognition familiarity is a poor predictor of recall ability.

Compares traditional study methods that lead to overconfidence with effective calibration strategies using item-level prediction and feedback to improve accuracy.
Misleading Cues vs. Calibration

This matters for teachers because the standard revision approach used by most secondary learners, re-reading notes and highlighting, produces exactly the misleading cues that inflate confidence. The learner re-reads her biology notes, the material feels familiar, she concludes she knows it, and she stops studying. On the exam, she cannot recall the information because recognition and recall are fundamentally different cognitive processes.

Infographic showing the four steps of the Calibration Skill Cycle: Predict Score, Take Quiz, Compare Answers, and Reflect Learning, to improve self-assessment accuracy.
Calibration Skill Cycle

Thiede, Anderson and Therriault (2005) showed that the timing and grain size of metacognitive judgements dramatically affect their accuracy. Global judgements ("How well do I know this topic?") were consistently less accurate than local, item-level judgements ("Can I answer this specific question?"). This finding has direct implications for classroom practice: asking learners to rate their overall confidence in a topic is less useful than asking them to predict their performance on specific questions.

Classroom Example: The Prediction Quiz

Before a retrieval practice quiz, give each learner a prediction sheet. Next to each question number, they write their confidence (1-5) that they can answer correctly. They then attempt the quiz. Afterwards, they compare their predictions with their actual scores question by question. The visual gap between the prediction column and the score column is the calibration error made visible. This takes 3 minutes of lesson time and produces immediate metacognitive insight.

The Dunning-Kruger Effect in the Classroom

The Dunning-Kruger effect is frequently misunderstood in popular culture. It is not simply that unintelligent people are overconfident. The original finding (Kruger and Dunning, 1999) was more specific and more useful for teachers: people who perform poorly on a task lack the same skills needed to accurately evaluate their performance on that task. The incompetence that leads to poor performance is the same incompetence that prevents accurate self-assessment.

Jansen, Rafferty and Griffiths (2021), writing in Nature Human Behaviour, developed a rational model of the Dunning-Kruger effect. Their study of approximately 4,000 participants across two experiments showed that the effect is driven primarily by the level of task performance itself, though metacognitive sensitivity (the ability to distinguish between correct and incorrect responses) also plays a contributing role. Critically, when task difficulty was adjusted so that all participants performed at equivalent success levels, the calibration gap disappeared. This suggests that the illusion of knowing is partly a function of task difficulty relative to skill level.

For teachers, this means that low-performing learners are not deliberately overconfident. They genuinely cannot tell the difference between knowing and not knowing because they lack the domain knowledge needed to make that distinction. The solution is not to tell them they are wrong about their confidence. It is to give them structured practice in making and checking predictions, so they develop the metacognitive skill alongside the domain knowledge.

Schleinschok, Eitel and Scheiter (2024), in a study published in Frontiers in Education examining physics learners across three courses, provided direct evidence for the "unskilled and unaware" hypothesis. Low-performing students were not only more overconfident before exams but were also less able to adjust their predictions after taking exams. This failure to recalibrate after feedback is the mechanism that perpetuates poor performance across a school year.

Classroom Example: The Confidence Calibration Graph

After three quizzes with prediction sheets, help each learner plot their data on a simple graph. The x-axis shows their average prediction (1-5) and the y-axis shows their average score (as a percentage). A perfectly calibrated learner sits on the diagonal line. Most learners will see they sit above the line (overconfident) or below it (underconfident). Showing learners their own calibration pattern, without judgement, is the single most powerful intervention available. The graph does the teaching.

A 20-Minute Calibration Intervention

This four-step protocol can be used in any subject, any key stage, and requires no special resources. It takes 20 minutes the first time and 10 minutes on subsequent uses.

Step 1: Predict (3 minutes). Before a retrieval task, learners make item-level predictions. For a 10-question quiz, they write a confidence rating (1-5) next to each question. For an extended task, they predict their score on a rubric. The key is specificity: "I think I will get question 4 correct because I can remember the formula" is more useful than "I think I know this topic."

Step 2: Test (5-8 minutes). Learners complete the retrieval task under normal conditions. This can be a quiz, a diagram from memory, a set of practice questions, or a written recall task. The format does not matter. What matters is that the task produces a clear, scorable outcome that can be compared to the prediction.

Step 3: Compare (3 minutes). Learners score their work (peer-marking works well here) and then compare their predictions with their actual scores. They highlight any question where their prediction was wrong by 2 or more points. These are their calibration blind spots, the specific areas where their metacognitive monitoring is least accurate.

Step 4: Reflect and Adjust (4 minutes). Learners answer two questions: "Which topics did I think I knew but actually did not?" and "What will I do differently to check my understanding next time?" The first question builds metacognitive knowledge. The second builds metacognitive regulation. Over repeated cycles, learners begin to internalise the prediction-test-compare loop and apply it independently during revision.

Hiller, Ihme and Pfeiffer (2020) demonstrated that this type of structured judgement training, when delivered alongside feedback and psychoeducation about overconfidence, produced improvements in calibration accuracy that exceeded the benefits of repeated testing alone. The metacognitive training group showed a nonlinear decrease in overconfidence that stabilised after approximately four cycles, suggesting that a half-term of weekly practice is sufficient for measurable improvement.

Classroom Example: Science Department Calibration Tracking

A science department runs the prediction quiz protocol every fortnight across Years 7-11. Each learner's calibration accuracy (the average gap between prediction and performance) is tracked on a simple spreadsheet. The data is reviewed at each half-term assessment point. Teachers can identify learners whose calibration is not improving, which signals that they need additional metacognitive support, not just more content teaching.

Desirable Difficulties and the Calibration Paradox

Bjork (1994) introduced the concept of desirable difficulties: learning conditions that make initial performance harder but improve long-term retention. Spacing, interleaving, and retrieval practice all qualify. The calibration paradox is that desirable difficulties, precisely because they make learning feel harder, reduce confidence. Learners who space their revision and use retrieval practice often feel less confident than learners who mass their practice and re-read, even though the first group will perform better on delayed tests.

This creates a dangerous feedback loop. A learner tries retrieval practice, feels that it is difficult, concludes she does not know the material, and switches back to re-reading, which feels fluent and produces high (but false) confidence. The illusion of knowing actually punishes effective study strategies and rewards ineffective ones.

Koriat (1997) explained this through his cue-utilisation model. Judgements of learning are based on processing fluency: how easily information comes to mind. Re-reading produces high fluency (the material feels familiar) and therefore high confidence. Retrieval practice produces low fluency (the material is difficult to recall) and therefore low confidence, even when it is producing stronger learning.

Teachers can break this loop by teaching learners about the calibration paradox explicitly. When learners understand that difficulty during practice is a sign of learning, not a sign of failure, they can override the misleading fluency cue and maintain effective strategies despite low confidence.

Classroom Example: The Difficulty Diary

Ask learners to keep a one-column diary during revision sessions. After each study session, they write one sentence: "This felt [easy/medium/hard]." After the test, they compare their difficulty ratings with their scores for each topic. Over time, a pattern emerges: the topics that felt hardest during revision often produce the highest scores. This experiential evidence is more persuasive than any teacher explanation of the testing effect.

Calibration Across Key Stages

Calibration instruction needs to be adapted to the developmental stage of the learner, but the core principle remains constant: make predictions, check them against reality, and use the gap to inform future learning.

Key Stage 1 (Ages 5-7). Use physical self-assessment with immediate checking. After a phonics activity, learners hold up a smiley face, straight face, or sad face to show their confidence. The teacher then asks them to read the target words. Learners compare their face with their reading performance. The focus is on building the habit of self-assessment, not on accuracy.

Key Stage 2 (Ages 7-11). Introduce the prediction quiz format with simple numerical scales. Learners predict their scores on multiplication table tests, spelling tests, or reading comprehension questions. Track calibration accuracy over a half-term. Introduce the concept of overconfidence explicitly: "Sometimes our brain tricks us into thinking we know something when we actually just recognise it."

Key Stage 3 (Ages 11-14). Expand to rubric-based prediction. Before submitting an extended piece of work, learners predict their level on each rubric criterion. Compare predictions with teacher assessments. Discuss the specific criteria where calibration errors are largest. This develops criterion-referenced self-assessment.

Key Stage 4 and 5 (Ages 14-18). Introduce the calibration graph. Learners plot their prediction accuracy across multiple assessments and identify their personal calibration bias (consistently overconfident, underconfident, or variable). Discuss the relationship between calibration and revision strategy. Link to the desirable difficulties research explicitly.

Boud and Falchikov (2024), in a meta-analysis examining interventions for monitoring accuracy in problem solving, found that interventions targeting metacognitive knowledge and external standards improved monitoring accuracy with a small but significant effect size (g = 0.25). Interventions involving the whole task (predicting and comparing across a complete assessment) were more effective than those targeting only timing or individual components.

Classroom Example: Year 9 English Calibration Wall

An English department creates a "Calibration Wall" in each Year 9 classroom. After each assessment, learners place a post-it note showing their predicted grade and actual grade. The wall makes calibration patterns visible across the class. A learner who consistently predicts grade 7 but achieves grade 5 can see that pattern physically. The teacher uses the wall to identify learners for targeted calibration support.

The Role of Feedback in Calibration Improvement

Calibration does not improve through practice alone. It requires specific, timely feedback about the accuracy of predictions, not just about task performance.

The study by Garcia Conejero, Pinilla Lebrero and Garcia Gallego (2025) randomised university students into groups receiving monetary incentives, metacognitive feedback, both, or neither. The critical finding was that metacognitive feedback (showing learners how accurate their predictions were) had no significant effect on calibration accuracy when delivered in isolation. However, when combined with explicit instruction about miscalibration and opportunities for repeated practice, feedback became effective.

This aligns with Hiller, Ihme and Pfeiffer's (2020) finding that the combination of psychoeducation, item-specific judgement practice, and feedback produced greater calibration improvements than any single component. The practical implication is that teachers need to do more than simply show learners their calibration data. They need to teach learners why calibration matters, give them structured practice in making and checking predictions, and provide feedback on the accuracy of those predictions.

Pintrich (2005) found that global monitoring judgements showed stability across an entire semester, even with repeated practice. Local, item-level judgements were more responsive to feedback and more strongly correlated with academic performance. This reinforces the recommendation to focus calibration practice on specific, question-level predictions rather than global confidence ratings.

Classroom Example: The Feedback Sandwich

When returning marked assessments, give learners three minutes before they see their marks. In this time, they write their predicted score. Then they receive the mark. Then they spend three minutes comparing prediction with reality and identifying the specific questions where they were most miscalibrated. This simple "predict-receive-compare" sandwich turns every assessment into a calibration training opportunity at zero additional cost.

Calibration and Formative Assessment

Calibration is not separate from formative assessment. It is formative assessment turned inward. Where formative assessment asks "What does this learner know?", calibration asks "Does this learner know what they know?" Both questions are essential for effective learning, and neither is sufficient alone.

A learner with good domain knowledge but poor calibration will revise the wrong topics, under-prepare for difficult content, and over-prepare for content already mastered. A learner with poor domain knowledge but good calibration will at least direct their limited study time to their actual weaknesses.

The research from Kalender, Marshman and Singh (2024) on metacognitive monitoring in STEM found that low-performing physics students were consistently more overconfident and less able to recalibrate after exams. The failure was not in their physics knowledge alone but in their inability to accurately monitor that knowledge. The researchers argued that metacognitive monitoring ability and domain knowledge develop together, and that interventions targeting only content knowledge miss half the problem.

For assessment leads and heads of department, this creates a practical mandate: build calibration checks into the assessment cycle. Every summative assessment is a calibration data point. Every formative assessment is a calibration training opportunity. The data already exists in schools; it simply needs to be used for metacognitive purposes alongside academic ones.

Your Next Lesson

Before your next quiz or test, hand out a prediction sheet. Ask each learner to rate their confidence (1-5) for each question before they attempt it. After the test, give them two minutes to compare predictions with scores. Ask: "Where were you most wrong about your own knowledge?" Do this once a fortnight for a half-term and track the average calibration gap. You will see it narrow. The learners who improve their calibration will also improve their performance, because accurate monitoring drives effective regulation. The illusion of knowing is breakable. It just takes practice.

Free Resource Pack

The Illusion of Knowing: A Calibration Toolkit

4 ready-to-use resources to help teachers and students accurately assess understanding and improve metacognition.

The Illusion of Knowing: A Calibration Toolkit — 4 resources
Metacognition Self-Assessment Cognitive Bias Teacher CPD Student Self-Regulation Classroom Strategies Formative Assessment

Download your free bundle

Fill in your details below and we'll send the resource pack straight to your inbox.

Quick survey (helps us create better resources)

How confident are you in your ability to identify and address the 'illusion of knowing' in your students?

Not at all confident
Slightly confident
Moderately confident
Quite confident
Very confident

To what extent do your colleagues and school culture support metacognitive strategies for accurate self-assessment?

Not at all
Slightly
Moderately
Significantly
Extensively

How often do you explicitly teach and provide tools for students to calibrate their understanding?

Never
Rarely
Sometimes
Often
Always

Your resource pack is ready

We've also sent a copy to your email. Check your inbox.

References

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe and A. P. Shimamura (Eds.), Metacognition: Knowing About Knowing (pp. 185-205). MIT Press.

Boud, D. and Falchikov, N. (2024). Meta-analysis of interventions for monitoring accuracy in problem solving. Educational Psychology Review, 36, 45.

Garcia Conejero, J., Pinilla Lebrero, J. J. and Garcia Gallego, A. (2025). The role of monetary incentives and feedback on how well students calibrate their academic performance. European Journal of Education, 60(1), e12834.

Hiller, S., Ihme, T. A. and Pfeiffer, H. C. (2020). Enhanced monitoring accuracy and test performance: Incremental effects of judgment training over and above repeated testing. Learning and Instruction, 65, 101245.

Jansen, R. A., Rafferty, A. N. and Griffiths, T. L. (2021). A rational model of the Dunning-Kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5, 756-763.

Kalender, Z. Y., Marshman, E. and Singh, C. (2024). Unskilled and unaware? Differences in metacognitive awareness between high and low-ability students in STEM. Frontiers in Education, 9, 1375638.

Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126(4), 349-370.

Kruger, J. and Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77(6), 1121-1134.

Pintrich, P. A. (2005). Metacognitive monitoring accuracy and student performance in the postsecondary classroom. Journal of Experimental Education, 73(4), 269-286.

Schleinschok, K., Eitel, A. and Scheiter, K. (2024). Unskilled and unaware? Differences in metacognitive awareness between high and low-ability students in STEM. Frontiers in Education, 9, 1375638.

Thiede, K. W., Anderson, M. C. M. and Therriault, D. (2005). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95(1), 66-73.

No Posts found.
Back to Blog