Evidence-based education, the EEF Toolkit, Hattie's Visible Learning, and the cognitive science that should be shaping every classroom. Updated for 2026.
Evidence-based education is the discipline of making teaching decisions on the basis of the best available research evidence, rather than custom, intuition, or vendor marketing. It does not mean ignoring professional judgement: it means informing that judgement with what rigorous research actually shows. Hattie (2009) synthesised over 800 meta-analyses involving more than 80 million students to produce Visible Learning, arguably the most comprehensive quantitative picture of what works in education ever assembled. His central finding is that almost everything works to some degree; the question is whether an intervention works well enough to justify the opportunity cost.
The Education Endowment Foundation (EEF) Toolkit translates that research into a practitioner-facing resource, ranking interventions by their effect size and cost. The clear winners are metacognitive and self-regulation strategies (effect size +0.7, low cost), feedback (+0.7), and reading comprehension (+0.6). The cognitive science tradition adds an explanatory layer: why does retrieval practice work? Because it strengthens memory consolidation more than re-reading. Why does worked-example instruction work? Because it respects the limits of working memory (Sweller, 1988). Why does feedback work? Because it closes the gap between current and desired performance (Hattie and Timperley, 2007). This hub brings together the key ideas, the landmark evidence, and their classroom translation.
Start with Evidence-Based Teaching for the foundations, then follow the pathway below.
| Framework | What It Does | Top Finding | Key Limitation |
|---|---|---|---|
| Hattie's Visible Learning | Meta-analysis of 800+ meta-analyses. Ranks interventions by effect size against the 0.40 hinge point. | Teacher credibility (0.90), collective teacher efficacy (1.57), and feedback (0.70) are among the most powerful influences. | Aggregating across contexts loses implementation nuance. Effect sizes vary enormously within each intervention category. |
| EEF Teaching and Learning Toolkit | Translates research into cost-effectiveness rankings for schools in England. Evidence strength is rated alongside effect size. | Metacognition and self-regulation: +7 months, low cost. Feedback: +8 months, very low cost. | Built primarily on studies from higher-income contexts. May not transfer directly to all school or learner populations. |
| Cognitive Science of Learning | Laboratory and classroom research on memory, attention, and learning. Explains the mechanisms behind effective teaching. | Retrieval practice, spaced practice, and interleaving are among the most robust and transferable findings. | Lab findings do not always survive intact when translated to real classroom conditions at scale. |
| What Works Clearinghouse (WWC) | US-based registry of rigorous intervention studies. Focuses on evidence standards: only RCTs and quasi-experiments with strong design qualify. | Many widely used programmes have weak or no evidence of effectiveness when subjected to rigorous review. | US-centric. Evidence standards exclude naturalistic and qualitative research that may have genuine classroom value. |
What evidence-based practice actually means, how to evaluate research quality, and why effect sizes need context to be useful.
The most practical research translations for daily teaching. Rosenshine gives you 10 principles; formative assessment shows you how to use information to adjust instruction.
The two most powerful levers from cognitive science. Manage load to enable learning; teach metacognition to sustain it.
Evidence-based teaching means using the best available research to inform instructional decisions, rather than defaulting to habit, received wisdom, or unverified claims. It does not mean rejecting professional judgement: it means grounding that judgement in what rigorous research actually shows. The key word is "informed": evidence rarely tells you exactly what to do in your specific classroom with your specific learners, but it does significantly narrow the field of approaches worth trying. Coe et al. (2014) describe six common misconceptions about teaching quality, many of which persist because they have surface plausibility but weak evidence bases: for example, praise, discovery learning, and learning styles.
Hattie (2009) identified 0.40 as the average effect size across all educational interventions and called it the hinge point: the level above which an intervention is doing more than the typical teacher does anyway, and therefore worth prioritising. The figure has been influential but also criticised. The main concern is that averaging across wildly different contexts, populations, and study designs produces a number that can be misleading: an intervention with effect size 0.60 in one narrow context may produce 0.10 elsewhere. The hinge point is useful as a rough benchmark for prioritisation, not as a hard threshold for decision-making. Read the evidence on any individual intervention before drawing conclusions from its effect size alone.
Rosenshine (2012) synthesised research from three sources: cognitive science, classroom observation studies of effective teachers, and studies of cognitive supports. His 10 principles include: begin each lesson with a short review of previous learning; present new material in small steps with practice after each step; ask a large number of questions and check all students' responses; provide models; guide practice; check for student understanding; obtain a high success rate; provide scaffolds for difficult tasks; require and monitor independent practice; engage students in weekly and monthly review. The principles are not a rigid script; they are a distillation of what consistently effective teachers do, translated into specific behaviours that any teacher can adopt.
The EEF Toolkit ranks teaching approaches on two dimensions: the estimated additional months of learning progress they produce for an average learner, and the cost per learner. A third dimension is evidence strength, rated from padlocks one to five. The highest-rated approaches combine strong evidence, significant impact, and low cost: metacognition and self-regulation (+7 months, very low cost), feedback (+8 months, very low cost), and reading comprehension strategies (+6 months, low cost). The toolkit is primarily designed for decisions about disadvantaged learners, because it was built from studies with that focus. However, the approaches identified as effective for disadvantaged learners tend to be effective for all learners.
Cognitive load theory, developed by Sweller (1988), is built on one key fact: working memory is severely limited. We can hold approximately four items of information in working memory at any one time, and we can only process information we are currently attending to. When a task places too many simultaneous demands on working memory, learning breaks down. The theory distinguishes three types of load: intrinsic load (the inherent complexity of the material), extraneous load (demands created by poor instructional design), and germane load (the mental effort invested in constructing schemas). Good teaching maximises germane load and minimises extraneous load. Worked examples, reducing split-attention effects, and using both visual and auditory channels are the main practical implications.
The learning styles hypothesis holds that individuals have preferred sensory modalities for receiving information (visual, auditory, kinaesthetic) and that matching instruction to these preferences improves learning. Pashler et al. (2008) reviewed the evidence and found that the hypothesis had not been adequately tested, and where it had been, the results did not support it. The critical test of the hypothesis requires showing that matching instruction to style produces better outcomes than mismatching it; no reliable studies have found this. The persistence of learning styles in education is a useful case study in how an appealing idea can survive in practice long after the evidence has failed to support it. All learners benefit from varied, well-designed instruction that uses multiple representations.
Hattie and Timperley (2007) identified three levels of effective feedback: task level (this answer is wrong because...), process level (the strategy you used here does not work because...), and self-regulation level (next time, check your answer against the original question). Feedback at the self level (you are a great learner) is largely ineffective. The most important finding is that feedback is only effective when learners act on it. This means feedback must be specific enough to enable a clear next action, and lessons must be structured to include time for learners to respond. Written comments that are never returned to, or that are given after the learning episode has ended, have minimal impact. The oral feedback that happens during a lesson in response to a learner's answer is often more effective than extended written marking.
The Structural Learning platform has CPD courses, interactive lesson planning tools, and a growing library of resources built on the research above. Open a free account to browse.
No credit card required.
About this hub. Articles are written by practising educators and reviewed against peer-reviewed research. Citations follow author-date format. New content added regularly. Get in touch if you cannot find what you need.
Start with the most-comprehensive guide in the list below. Look for titles that say A Teachers Guide those are flagship deep-dives. They link out to all the related concepts.
Every article cites peer-reviewed research and translates findings into classroom practice. Where research is contested, we say so. Where the evidence is strong, we explain why and what to do.
Each guide ends with practical next-lesson actions. You can also use our AI lesson planning tools which generate full lesson plans grounded in these methods.