Robert Bjork: A Teacher's Guide to Desirable DifficultiesRobert Bjork: A Teacher's Guide to Desirable Difficulties - educational concept illustration

Updated on  

February 19, 2026

Robert Bjork: A Teacher's Guide to Desirable Difficulties

|

February 19, 2026

Robert Bjork's research reveals that the conditions making learning feel easiest produce the weakest retention. This guide explains desirable difficulties, storage vs retrieval strength, and how to apply Bjork's framework in UK classrooms.

Robert Bjork is the single most important cognitive psychologist that most teachers have never heard of. His research at UCLA over five decades has produced findings that directly contradict how the majority of lessons are planned and assessed. Bjork's central argument is this: the conditions that make learning feel easiest in the short term are precisely the conditions that produce the weakest long-term retention. Conversely, the conditions that feel difficult, slow, and even frustrating are the ones that produce durable, transferable learning. Understanding this distinction is one of the most practically useful things a teacher can do.

Key Takeaways

    • Storage vs Retrieval Strength: Bjork distinguishes between how well something is stored in long-term memory and how easily it can be retrieved right now. These are independent, and teaching often conflates them.
    • Desirable Difficulties: Spacing, interleaving, retrieval practice, generation, and variation all slow down apparent progress but produce significantly stronger long-term memory and transfer.
    • Forgetting Is Functional: Bjork's New Theory of Disuse reframes forgetting as a feature of an adaptive memory system, not a failure. Some forgetting before re-study is desirable because it strengthens storage when retrieval eventually occurs.
    • Performance vs Learning: Current performance during a lesson is a poor indicator of durable learning. Students and teachers are both routinely fooled by this distinction.
    • Metacognitive Illusions: Students consistently misjudge which study strategies work best, preferring massed practice and re-reading because they feel productive, when the evidence shows they produce shallow retention.

Performance vs. Learning: The Great Illusion infographic for teachers
Performance vs. Learning: The Great Illusion

Who Is Robert Bjork? Biography and Academic Career

Robert A. Bjork is Distinguished Professor of Psychology at the University of California, Los Angeles (UCLA), where he has worked since 1974. He earned his undergraduate degree from the University of Minnesota and his PhD from Stanford University, where he was supervised by William Estes, one of the founders of mathematical learning theory. This mathematical training shaped Bjork's career: he has always approached learning empirically, designing experiments rather than building untestable theories.

Bjork served as Chair of the UCLA Department of Psychology and as editor of the journal Psychological Review, the field's flagship theoretical journal. He is a former president of the Association for Psychological Science and a recipient of the Distinguished Scientific Contribution Award from the American Psychological Association. These affiliations signal the breadth of his influence: Bjork is not an education researcher who occasionally makes claims about teaching; he is a foundational figure in cognitive psychology whose conclusions happen to have profound educational implications.

What distinguishes Bjork from many learning researchers is his focus on practical application. From the 1970s onwards, he was interested in why military and pilot training programmes produced poor transfer to real-world performance, and he spent decades working with training organisations in the United States government to apply cognitive science to instruction. His work has been taken up by the US Air Force, NASA, and professional sports organisations, as well as schools and universities.

Elizabeth Bjork: Equal Partner in the Research

Robert Bjork's work cannot be understood without acknowledging his research partner, Elizabeth Ligon Bjork, also a professor of cognitive psychology at UCLA. Elizabeth Bjork has led and co-authored many of the key papers in the desirable difficulties literature, including the foundational 2011 paper 'Making Things Hard on Yourself, But in a Good Way' and the 2020 revisit, 'Desirable Difficulties in Theory and Practice'. In much of the research literature, papers are credited to both Bjorks, and the ideas are genuinely joint.

Elizabeth Bjork's independent contributions include work on directed forgetting (the conditions under which people can intentionally inhibit memories), the role of cue competition in memory retrieval, and the mechanisms underlying the testing effect. Teachers reading any paper by "Bjork & Bjork" should understand that this is a collaborative research programme of comparable standing to Rosenshine's work on instruction, not a secondary contributor citing a more famous colleague.

The Difference Between Storage Strength and Retrieval Strength

The foundation of Bjork's work is a distinction that sounds simple but has sweeping consequences for classroom practice. He proposes that every memory has two independent properties: storage strength and retrieval strength (Bjork & Bjork, 1992).

Storage strength is a measure of how thoroughly a piece of knowledge is encoded in long-term memory. It accumulates over time and, crucially, once high, it never decreases. You do not lose access to a well-encoded memory; storage strength is permanent. However, storage strength alone does not determine whether you can recall something. That is determined by retrieval strength.

Retrieval strength is a measure of how accessible a memory is right now, at this moment. Retrieval strength fluctuates dramatically. It is high immediately after study and drops sharply without use. It is higher in familiar contexts and lower in novel ones. Retrieval strength is the thing most people mean when they say they "remember" something, but it is the less important of the two for long-term outcomes.

The relationship between these two properties is the crux of the theory. Bjork's key finding is this: when retrieval strength is high, practice has little effect on storage strength. When retrieval strength is low but not zero (that is, when you have partially forgotten something and have to work to retrieve it), successful retrieval produces a large increase in both storage strength and retrieval strength. The implication for teaching is counterintuitive. If you allow students to forget a little before returning to material, re-learning that material produces a much more durable memory than revising it when it is still fresh (Bjork, 1994).

This is why massed practice (covering everything in one concentrated block) produces poor long-term retention despite appearing highly effective in the short term. The knowledge is accessible during massed study because retrieval strength is high, but retrieval strength decays quickly, and because storage strength has not been substantially increased, the knowledge is gone within days or weeks.

What Are Desirable Difficulties?

Bjork coined the term 'desirable difficulties' to describe conditions that make learning harder in the short term but produce stronger retention and transfer in the long term (Bjork, 1994). The word 'desirable' is important: not all difficulty is beneficial. Difficulties are desirable only when they trigger deeper cognitive processing that strengthens storage strength and improves the learner's ability to apply the knowledge flexibly in new contexts.

The contrast with undesirable difficulty is crucial for teachers. An undesirable difficulty is one that imposes extra effort without triggering useful cognitive processing: unclear instructions, unhelpful distractors, tasks beyond the student's prerequisite knowledge, or inaccessible text. These slow learning down without producing any compensatory benefit. A desirable difficulty, by contrast, slows apparent progress but engages the learner in retrieval, elaboration, and discrimination that actually strengthens what they know.

Bjork identifies five types of desirable difficulty, each with a distinct mechanism and classroom application.

Spacing: Distributing Practice Over Time

Spacing is the practice of distributing study or practice over time rather than concentrating it in a single block. The spacing effect, first documented by Ebbinghaus in the 1880s, is one of the most replicated findings in all of cognitive psychology. Spaced practice consistently produces better long-term retention than massed practice, often by a factor of two or more, even when total study time is held constant (Bjork & Bjork, 2011).

The mechanism connects directly to storage and retrieval strength. When you return to material after a delay, retrieval strength has dropped. The effort required to reconstruct the memory triggers a process Bjork calls 'retrieval-induced forgetting of competitors' and produces a large increase in storage strength. The longer the gap (within reason), the greater the benefit of re-study. Teachers applying this principle space reviews of content across days, weeks, and months, rather than reviewing everything at the end of a unit. For a detailed practical guide to structuring spaced practice in your timetable, see the article on spaced practice.

Classroom example (Year 8 Science): A teacher covers chemical reactions in Week 1. Rather than leaving it until end-of-year revision, they include a 10-minute retrieval starter on chemical reactions in Week 5, Week 12, and Week 20. Each review is deliberately shorter than the initial lesson. Students who find the Week 5 starter difficult are not failing; they are experiencing the desirable difficulty of reduced retrieval strength, and the effort they invest in retrieving the information substantially increases their storage strength.

Interleaving: Mixing Topics and Problem Types

Interleaving means practising different topics or problem types in a mixed rather than blocked sequence. Instead of completing twenty long-division problems followed by twenty fraction problems, students complete a mixed set: long-division, fraction, long-division, fraction, or even more varied sequences (Kornell & Bjork, 2008).

Interleaving is one of the least intuitive desirable difficulties because it demonstrably makes practice feel harder and produces worse immediate performance than blocked practice. Students who interleave problems will make more errors during practice. Teachers observing an interleaved practice session can easily mistake this for confusion or poor teaching. However, in tests conducted 24 hours or a week later, interleaved practice consistently produces better performance, sometimes substantially so (Kornell & Bjork, 2008).

The mechanism is discrimination learning. When problems of the same type are grouped together, students apply a single strategy repeatedly without needing to identify which strategy is appropriate. In interleaved practice, each problem requires students to identify the problem type first, then select and apply the correct method. This identification step is the additional cognitive work that strengthens both the knowledge of procedures and the ability to discriminate between problem types, which is precisely what is required in an exam or in applying knowledge to real situations.

Full guidance on building interleaved practice into schemes of work is available in the article on interleaving.

Classroom example (GCSE Mathematics, Year 10): A teacher is covering quadratics, completing the square, and the quadratic formula. Rather than blocking each method over separate lessons and then having students practise each in isolation, she creates weekly mixed problem sets that combine all three methods, along with problems from previous units (factorising, linear equations, and simultaneous equations). Students initially find these sets frustrating. After six weeks, they significantly outperform a comparable class that practised each method in isolation, not just on the three methods but on their ability to select the right approach unprompted.

Retrieval Practice: Testing as a Learning Tool

Retrieval practice refers to the act of recalling information from memory rather than re-reading or reviewing it. The testing effect, which is the finding that retrieval practice produces stronger retention than restudying, is one of the most robust phenomena in memory science (Roediger & Karpicke, 2006). Bjork's contribution to this literature includes the understanding that retrieval is not merely a measurement tool; it is itself a learning event that substantially increases storage strength.

The mechanism is that retrieval requires reconstructing a memory trace from partial cues, and this reconstruction process strengthens and elaborates the stored representation. By contrast, re-reading or reviewing material that is visible to the student provides no retrieval challenge; retrieval strength is artificially elevated by the presence of the information, and storage strength gains nothing (Bjork & Bjork, 2011).

Low-stakes quizzing, flashcards, retrieval starters, brain dumps, and practice questions without notes are all forms of retrieval practice. The method matters less than the principle: students must retrieve information from memory, not just recognise it when prompted. For a full classroom implementation guide, see the article on retrieval practice.

Classroom example (Year 6 History): A teacher finishes a unit on the Second World War. Rather than asking students to read over their notes and create a mind map, she begins the next three lessons with a 5-minute blank-page recall activity: "Write down everything you can remember about causes of the Second World War. No notes." Students write independently, share with a partner to fill gaps, and the teacher reveals anything significant that was missed. This is more uncomfortable than reviewing notes, but it triples retention compared to passive review (Roediger & Karpicke, 2006).

Generation: Creating Answers Before Being Told Them

Generation refers to the practice of requiring learners to produce an answer or solution before being shown the correct response, even when they are likely to produce an error or incomplete answer. The generation effect, documented by Slamecka and Graf (1978) and extended by Bjork and colleagues, shows that generated information is better remembered than read information (Bjork & Bjork, 2020).

The principle applies across contexts: students who attempt a problem before seeing the solution, who try to recall a vocabulary word before seeing it, or who generate an explanation before the teacher gives one will remember the target information better than students who simply read the correct answer. The difficulty of failing to generate the answer is itself the mechanism. The cognitive effort of searching memory and attempting to construct a response increases storage strength for the correct answer when it is subsequently provided.

This runs counter to the common teaching instinct to protect students from struggling. Allowing students to attempt a problem, produce an incomplete answer, and then encounter the correct solution is not careless instruction; it is applied learning science. The teacher's role shifts from gatekeeper of correct information to architect of productive failure.

Classroom example (Year 9 French): A teacher is introducing new vocabulary for jobs and occupations. Rather than displaying the French word and asking students to repeat it, she displays only the English word and asks students to write their best guess at the French equivalent, including any words they half-remember or cognates they can identify. After 90 seconds of individual effort, she displays the French words. Students who guessed wrong and then saw the correct answer remember the vocabulary better a week later than students who were shown the French word from the start (Kornell & Bjork, 2008).

Variation: Changing the Conditions of Practice

Variation means practising the same skill or applying the same knowledge across varied contexts, surface forms, or problem formats. Rather than practising one type of essay question repeatedly, students practise under different question formats, different stimulus materials, or different time constraints. Rather than teaching a mathematical procedure with one class of problem, teachers present the same underlying concept across problems that look quite different on the surface (Bjork & Bjork, 2020).

The benefit of variation is transfer. When students encounter only one form of a problem, they often develop knowledge that is brittle: it works for that specific form but fails when the surface features change. Variation forces the learner to distinguish the deep structure of a problem from its surface features, which is what enables transfer to novel situations. This connects directly to cognitive load theory, which identifies schema abstraction (extracting general principles from specific instances) as the goal of effective instruction.

Classroom example (Year 7 English): A teacher wants students to understand how writers use sentence structure for effect. She could teach this using only newspaper headlines, practise it with newspaper headlines, and assess it using newspaper headlines. Instead, she uses examples from novels, speeches, advertising, poetry, and non-fiction reports. Each context presents the same underlying concept (sentence structure for effect) in a different surface form. Students develop a more transferable, flexible understanding because they have had to identify the concept across varied contexts, rather than recognising a familiar surface pattern.

The New Theory of Disuse: Why Forgetting Is Not the Enemy

One of Bjork's most provocative claims is that forgetting is not a bug in the memory system but a feature of an adaptive one. The 'New Theory of Disuse' (Bjork & Bjork, 1992) proposes that the human memory system has evolved to manage an extraordinarily large amount of stored information by making unused information less accessible over time. This is not storage decay, which would mean the information is lost. It is retrieval strength reduction, which means the information becomes harder to access without being erased.

The adaptive value of this system is that currently relevant information stays accessible while older, unused information becomes temporarily suppressed. If every memory maintained the same retrieval strength regardless of use, the most recently acquired knowledge would be constantly competing with older memories, and retrieval would be chaotic. The reduction of retrieval strength for low-use information is a useful filtering mechanism.

What makes this theory practically significant is the relationship between forgetting and re-learning. When retrieval strength for a piece of knowledge drops but storage strength is still present, re-learning that knowledge produces a disproportionately large increase in storage strength. This is the 'savings in relearning' effect, first documented by Ebbinghaus and thoroughly explained by Bjork's theoretical framework. Allowing students to partially forget material before reviewing it is not pedagogical negligence; it is the optimal condition for maximising long-term storage strength.

Teachers who interpret student forgetting between lessons as evidence that they 'didn't really learn it' are applying the wrong model of memory. A student who cannot recall something after two weeks has not necessarily failed to learn it; they have simply experienced the natural reduction in retrieval strength that the memory system applies to low-activation knowledge. Re-engaging that knowledge will restore and strengthen it, especially if the retrieval is effortful.

Performance vs Learning: The Most Important Distinction in Teaching

Bjork and Soderstrom (2015) make a distinction that is perhaps the most important single idea in the desirable difficulties literature: performance and learning are not the same thing. Performance is what a student can do during or immediately after instruction. Learning is the relatively permanent change in knowledge or skill that endures over time and transfers to new contexts. The problem is that performance is visible and learning is not.

This creates a systematic illusion. A lesson where students perform well (give correct answers, complete tasks, demonstrate fluency) feels effective. A lesson where students struggle, produce errors, and take longer to complete tasks feels less effective. But if the struggling lesson was employing desirable difficulties (spaced retrieval, interleaving, generation), it is producing substantially more learning than the fluent lesson, which may be producing only the performance illusion.

This illusion affects both students and teachers. Students, given a free choice of study method, consistently prefer massed practice, re-reading, and highlighting because these strategies produce high fluency during study, which feels like learning. Teachers, when observing lessons, tend to rate lessons with high student engagement and correct responses more positively than lessons with productive struggle and errors. Both judgements are, from a learning science perspective, likely to be wrong (Soderstrom & Bjork, 2015).

The educational implication is significant. Bjork and colleagues argue that standard assessment practice, particularly formative assessment that occurs immediately or shortly after teaching, is measuring performance rather than learning. A student who scores well on an end-of-lesson exit ticket may have learned very little; a student who scores poorly on a test two weeks later, but then retrieves the material successfully after a hint, may have strong storage strength that simply could not be accessed at that moment. The EEF Toolkit consistently rates metacognition and self-regulated learning as producing approximately seven months of additional progress (+7 months) precisely because teaching students to manage their own retrieval, spacing, and self-testing makes them better at learning rather than just at performing.

Comparing Bjork's Approach with Traditional Teaching

The contrast between what Bjork's research recommends and what typical classroom practice looks like is stark. The table below summarises the key differences.

Traditional Practice Bjork-Informed Practice Mechanism
Massed practice: teach a topic and practise it immediately Spaced practice: return to topics after delays of days or weeks Low retrieval strength at time of practice increases storage strength gains
Blocked practice: complete all problems of one type before moving on Interleaved practice: mix problem types in practice sessions Discrimination learning; forces identification of problem type, not just procedure application
Re-reading and review: students re-read notes or textbooks Retrieval practice: students recall from memory without reference materials Retrieval strengthens storage; re-reading with material present produces little encoding gain
Explain first, then practise: teachers give complete explanation before students attempt tasks Generation before instruction: students attempt problems or recall before explanation Generation effect; effort of attempted retrieval primes encoding of the correct answer
Consistent conditions: same format, context, and problem type in practice and assessment Variable practice: vary formats, contexts, and problem features during learning Schema abstraction; variation forces extraction of deep structure, enabling transfer
Minimise errors: scaffold to ensure high success rates during practice Desirable errors welcome: allow productive failure before correction Error generation activates retrieval competition, strengthening the correct answer when given

This does not mean that traditional practices are entirely wrong. Direct instruction is well supported by evidence, and explicit teaching before retrieval practice is appropriate, particularly for novice learners who lack the prerequisite knowledge to generate useful responses. Bjork's framework is not an argument against instruction; it is an argument about the conditions under which practice produces the most durable results.

The Desirable Difficulties Framework infographic for teachers
The Desirable Difficulties Framework

How Bjork's Research Connects to EEF and UK School Evidence

Bjork's work is primarily laboratory-based, conducted with university students in controlled experiments. Teachers in UK schools rightly want to know whether these findings hold in classrooms, with mixed-ability groups, across subjects, and under curriculum constraints.

The evidence from classroom-based research is encouraging. The EEF (Education Endowment Foundation) has funded multiple studies of retrieval practice in UK primary and secondary schools. Their 2021 review of the evidence for retrieval practice found consistent positive effects across subjects and year groups, with an effect size equivalent to approximately three to four months of additional progress when retrieval practice was used as part of routine classroom instruction. Spacing and interleaving have smaller classroom evidence bases, but the available studies show positive results (Weinstein et al., 2018).

The EEF Teaching and Learning Toolkit, which aggregates evidence across thousands of studies, assigns formative assessment strategies (which include retrieval practice) approximately four months of additional progress, and metacognition and self-regulated learning approximately seven months. Both of these are mechanisms through which desirable difficulties operate: students who understand why spacing and retrieval are beneficial are better at self-regulating their study than those who follow their intuitions, which, as Bjork's work shows, tend to be systematically wrong.

The Ebbinghaus forgetting curve is the empirical foundation for spacing: Bjork's theory of why spacing works is the most complete account of this well-established phenomenon. Working memory research, particularly Baddeley's model, provides a complementary framework for understanding why cognitive load theory and desirable difficulties must be balanced: novice learners whose working memory is fully taxed by the surface demands of a task may not have cognitive capacity available to benefit from interleaving or generation effects. Scaffolding appropriately reduces cognitive load for novices, creating the conditions under which desirable difficulties can be introduced progressively.

Common Misconceptions About Bjork's Work

Bjork's ideas are increasingly cited in teacher training and CPD materials, but several misconceptions have attached themselves to the framework. Addressing these directly prevents misapplication in classrooms.

Misconception 1: Desirable difficulties work for all learners equally. They do not. Bjork's research has been conducted primarily with adults who have sufficient prior knowledge to engage in productive retrieval attempts. For absolute beginners, the generation effect requires enough prior knowledge to produce a plausible attempt. Without it, generation simply produces blank responses and no learning benefit. Retrieval practice is similarly limited for students with very low prior knowledge; you cannot retrieve what was never encoded. Scaffolding and direct instruction must precede and support the introduction of desirable difficulties, particularly for younger learners or those with limited subject knowledge.

Misconception 2: Bjork argues against explicit teaching. He does not. Bjork's framework concerns the conditions for practice, review, and assessment. The generation effect works because a correct answer is subsequently provided; without explicit teaching, generation without feedback has no learning benefit. Bjork's research complements, rather than contradicts, direct instruction: teach explicitly, then create the conditions for effortful retrieval.

Misconception 3: Making lessons harder always helps. Difficulty is only desirable when it triggers the specific cognitive processes that increase storage strength (retrieval, discrimination, elaboration). Difficulty that arises from confusing presentation, lack of prerequisite knowledge, or unclear task design is simply undesirable difficulty: it imposes cognitive effort without producing learning gains. A badly designed worksheet is not a desirable difficulty.

Misconception 4: Bjork's research means students should never re-read notes. Re-reading is not always useless. It is most useful in the first pass through new material, when students are building a basic schema. The problem is when re-reading replaces retrieval practice during revision. Once material has been initially learned, re-reading produces poor returns compared to retrieval, and students who conflate fluent re-reading with learning are deceived by their own performance.

Misconception 5: Testing hurts low-confidence students. Some teachers worry that low-stakes testing will damage the confidence of students who frequently fail to retrieve material. The research literature, including classroom studies by Pooja Agarwal and colleagues, consistently shows that retrieval practice is beneficial for low-attaining students when implemented correctly, which means low-stakes, normalised as a routine tool rather than an evaluative tool, and followed by feedback on what was not retrieved. What damages confidence is high-stakes retrieval with public consequences for failure, not retrieval practice itself.

Applying Bjork's Framework: Planning Principles for Your Classroom

Translating Bjork's research into classroom practice does not require wholesale curriculum redesign. The following planning principles are achievable within existing timetable and curriculum constraints.

Distribute reviews deliberately. When planning a unit, identify three or four moments after the main teaching phase where you will return to key content. These reviews should be spaced: one week after initial teaching, three weeks after, and again near an end-of-unit point. Each review should require retrieval, not just recognition. A five-minute starter that asks students to recall from memory, without reference to notes, is sufficient. The testing effect means this review is more efficient than re-teaching the same content.

Interleave practice during the middle of a unit. Once students have been taught at least two related topics or procedures, mix practice across both rather than practising each in isolation. Mixed problem sets are more difficult to create but produce substantially better transfer. This is especially valuable in Mathematics, Science, and Modern Foreign Languages, where procedural discrimination is central to the subject.

Use generation before explanation, where appropriate. Before teaching a new concept, ask students to attempt a related problem or recall a related idea from memory. The attempt does not need to be correct; the cognitive effort of the attempt primes encoding of the correct answer. A minute of silent independent thinking before an explanation costs almost nothing and produces a measurable benefit for retention.

Vary practice conditions. Avoid practising a skill always in the same format, context, or with the same materials. A student who has only ever written formal essays in response to printed prompts may not transfer their skill to a different format. Varied practice surfaces the deep structure of knowledge and supports the metacognition needed for self-regulated study.

Teach students about the performance-learning distinction. One of the most impactful things a teacher can do is explain directly to students why spacing and retrieval practice feel worse during study but produce better results. Students who understand the cognitive science behind their study strategies are more likely to choose effortful strategies over comfortable but ineffective ones. This is the bridge between Bjork's laboratory findings and the EEF's seven-month metacognition effect.

What Bjork's Research Does Not Tell Us

No account of Bjork's work should omit its limitations. Several are worth naming explicitly.

The majority of Bjork's foundational experiments were conducted with adult university students in laboratory settings, using word lists, paired associates, and mathematics problems. The ecological validity of these findings for primary-age children, for subjects with high affective demands (such as Personal Social Health Education or Drama), or for learners with working memory difficulties is less well established. The desirable difficulties framework should be applied with professional judgement, not as a universal prescription.

The performance-learning distinction, while theoretically coherent, creates a practical problem for teachers: if current performance is a poor indicator of learning, how do you know whether students are actually learning? Bjork's answer is that delayed tests are better indicators than immediate tests, but this is difficult to implement systematically in time-pressured curricula. Teachers need to balance the ideal measurement conditions against the practical demands of formative assessment.

Finally, while desirable difficulties are well evidenced for factual and procedural knowledge, the evidence for complex, creative, or evaluative learning is thinner. Writing, artistic judgement, and disciplinary reasoning may require different conditions from the retrieval of mathematical procedures or historical facts. Bjork's framework is a powerful tool; it is not the only tool a teacher needs.

In your next lesson, identify one place where students currently review material by re-reading or copying notes, and replace it with a five-minute retrieval activity: blank paper recall, a low-stakes quiz without reference to notes, or a generation task before the day's explanation. Do not grade it. Use it as the starting point for that lesson's instruction.

5 Ways to Boost Long-Term Learning in Your Classroom infographic for teachers
5 Ways to Boost Long-Term Learning in Your Classroom

Further Reading: Key Research Papers

Further Reading: Key Papers on Bjork's Learning Theory

The following papers provide the primary evidence base for the desirable difficulties framework. They are listed in order of foundational importance and are recommended for teachers undertaking CPD, PGCE assignments, or leadership roles in curriculum planning.

Memory and Metamemory Considerations in the Training of Human Beings View study ↗

29 citations

Bjork, R.A. (1994). In J. Metcalfe & A.P. Shimamura (Eds.), Metacognition: Knowing About Knowing. MIT Press.

This is the paper in which Bjork first used the term 'desirable difficulties' and outlined the core framework. Writing for a training and instruction context, Bjork distinguishes between conditions that support performance and those that support long-term retention and transfer. The paper connects storage and retrieval strength theory directly to practical training design and remains the clearest single introduction to the framework for teachers and practitioners.

Making Things Hard on Yourself, But in a Good Way: Creating Desirable Difficulties to Enhance Learning View study ↗

520+ citations

Bjork, E.L. & Bjork, R.A. (2011). In M.A. Gernsbacher, R.W. Pew, L.M. Hough & J.R. Pomerantz (Eds.), Psychology and the Real World. Worth Publishers.

This is the most widely cited summary of the desirable difficulties literature and the best starting point for teachers new to the framework. Both Robert and Elizabeth Bjork contributed equally, and the paper covers spacing, interleaving, testing, and generation effects with clear definitions and empirical support. It includes a direct discussion of why students and instructors misjudge what works, making it essential reading for anyone designing revision programmes or study skills interventions.

Learning Versus Performance: An Integrative Review View study ↗

380+ citations

Soderstrom, N.C. & Bjork, R.A. (2015). Perspectives on Psychological Science, 10(2), 176–199.

This review paper provides the most thorough treatment of the performance-learning distinction in the literature, synthesising decades of evidence showing that performance during practice is a systematically unreliable index of durable learning. Soderstrom and Bjork review evidence from spacing, interleaving, testing, and generation experiments, demonstrating in each case that the condition producing better immediate performance is not the condition producing better delayed retention. This paper is directly relevant to how teachers design and interpret formative assessment.

Desirable Difficulties in Theory and Practice View study ↗

180+ citations

Bjork, R.A. & Bjork, E.L. (2020). Memory, 28(1), 103–116.

This 2020 paper is an update and revisit of the framework, addressing concerns about ecological validity and classroom applicability that had accumulated in the literature since 1994. The Bjorks respond to critiques, clarify the boundary conditions of each desirable difficulty, and address the particular challenge of interleaving in subjects where novice learners may lack the prerequisite knowledge to benefit from mixed practice. Teachers who have already read the 2011 paper will find this a valuable complement, particularly sections on when desirable difficulties are and are not appropriate.

Learning Concepts and Categories: Is Spacing the 'Enemy of Induction'? View study ↗

430+ citations

Kornell, N. & Bjork, R.A. (2008). Psychological Science, 19(6), 585–592.

This paper reports the key experiments on interleaving and category learning that established why interleaved practice produces better discrimination and transfer than blocked practice. Using an art-style learning paradigm (participants categorise artists' paintings), Kornell and Bjork showed that interleaved study produced better generalisation to new examples, while participants nonetheless believed that blocked study had worked better for them. The subjective experience of interleaving as less effective is the central finding for teachers trying to persuade students to adopt mixed practice in their revision.

Loading audit...

Robert Bjork is the single most important cognitive psychologist that most teachers have never heard of. His research at UCLA over five decades has produced findings that directly contradict how the majority of lessons are planned and assessed. Bjork's central argument is this: the conditions that make learning feel easiest in the short term are precisely the conditions that produce the weakest long-term retention. Conversely, the conditions that feel difficult, slow, and even frustrating are the ones that produce durable, transferable learning. Understanding this distinction is one of the most practically useful things a teacher can do.

Key Takeaways

    • Storage vs Retrieval Strength: Bjork distinguishes between how well something is stored in long-term memory and how easily it can be retrieved right now. These are independent, and teaching often conflates them.
    • Desirable Difficulties: Spacing, interleaving, retrieval practice, generation, and variation all slow down apparent progress but produce significantly stronger long-term memory and transfer.
    • Forgetting Is Functional: Bjork's New Theory of Disuse reframes forgetting as a feature of an adaptive memory system, not a failure. Some forgetting before re-study is desirable because it strengthens storage when retrieval eventually occurs.
    • Performance vs Learning: Current performance during a lesson is a poor indicator of durable learning. Students and teachers are both routinely fooled by this distinction.
    • Metacognitive Illusions: Students consistently misjudge which study strategies work best, preferring massed practice and re-reading because they feel productive, when the evidence shows they produce shallow retention.

Performance vs. Learning: The Great Illusion infographic for teachers
Performance vs. Learning: The Great Illusion

Who Is Robert Bjork? Biography and Academic Career

Robert A. Bjork is Distinguished Professor of Psychology at the University of California, Los Angeles (UCLA), where he has worked since 1974. He earned his undergraduate degree from the University of Minnesota and his PhD from Stanford University, where he was supervised by William Estes, one of the founders of mathematical learning theory. This mathematical training shaped Bjork's career: he has always approached learning empirically, designing experiments rather than building untestable theories.

Bjork served as Chair of the UCLA Department of Psychology and as editor of the journal Psychological Review, the field's flagship theoretical journal. He is a former president of the Association for Psychological Science and a recipient of the Distinguished Scientific Contribution Award from the American Psychological Association. These affiliations signal the breadth of his influence: Bjork is not an education researcher who occasionally makes claims about teaching; he is a foundational figure in cognitive psychology whose conclusions happen to have profound educational implications.

What distinguishes Bjork from many learning researchers is his focus on practical application. From the 1970s onwards, he was interested in why military and pilot training programmes produced poor transfer to real-world performance, and he spent decades working with training organisations in the United States government to apply cognitive science to instruction. His work has been taken up by the US Air Force, NASA, and professional sports organisations, as well as schools and universities.

Elizabeth Bjork: Equal Partner in the Research

Robert Bjork's work cannot be understood without acknowledging his research partner, Elizabeth Ligon Bjork, also a professor of cognitive psychology at UCLA. Elizabeth Bjork has led and co-authored many of the key papers in the desirable difficulties literature, including the foundational 2011 paper 'Making Things Hard on Yourself, But in a Good Way' and the 2020 revisit, 'Desirable Difficulties in Theory and Practice'. In much of the research literature, papers are credited to both Bjorks, and the ideas are genuinely joint.

Elizabeth Bjork's independent contributions include work on directed forgetting (the conditions under which people can intentionally inhibit memories), the role of cue competition in memory retrieval, and the mechanisms underlying the testing effect. Teachers reading any paper by "Bjork & Bjork" should understand that this is a collaborative research programme of comparable standing to Rosenshine's work on instruction, not a secondary contributor citing a more famous colleague.

The Difference Between Storage Strength and Retrieval Strength

The foundation of Bjork's work is a distinction that sounds simple but has sweeping consequences for classroom practice. He proposes that every memory has two independent properties: storage strength and retrieval strength (Bjork & Bjork, 1992).

Storage strength is a measure of how thoroughly a piece of knowledge is encoded in long-term memory. It accumulates over time and, crucially, once high, it never decreases. You do not lose access to a well-encoded memory; storage strength is permanent. However, storage strength alone does not determine whether you can recall something. That is determined by retrieval strength.

Retrieval strength is a measure of how accessible a memory is right now, at this moment. Retrieval strength fluctuates dramatically. It is high immediately after study and drops sharply without use. It is higher in familiar contexts and lower in novel ones. Retrieval strength is the thing most people mean when they say they "remember" something, but it is the less important of the two for long-term outcomes.

The relationship between these two properties is the crux of the theory. Bjork's key finding is this: when retrieval strength is high, practice has little effect on storage strength. When retrieval strength is low but not zero (that is, when you have partially forgotten something and have to work to retrieve it), successful retrieval produces a large increase in both storage strength and retrieval strength. The implication for teaching is counterintuitive. If you allow students to forget a little before returning to material, re-learning that material produces a much more durable memory than revising it when it is still fresh (Bjork, 1994).

This is why massed practice (covering everything in one concentrated block) produces poor long-term retention despite appearing highly effective in the short term. The knowledge is accessible during massed study because retrieval strength is high, but retrieval strength decays quickly, and because storage strength has not been substantially increased, the knowledge is gone within days or weeks.

What Are Desirable Difficulties?

Bjork coined the term 'desirable difficulties' to describe conditions that make learning harder in the short term but produce stronger retention and transfer in the long term (Bjork, 1994). The word 'desirable' is important: not all difficulty is beneficial. Difficulties are desirable only when they trigger deeper cognitive processing that strengthens storage strength and improves the learner's ability to apply the knowledge flexibly in new contexts.

The contrast with undesirable difficulty is crucial for teachers. An undesirable difficulty is one that imposes extra effort without triggering useful cognitive processing: unclear instructions, unhelpful distractors, tasks beyond the student's prerequisite knowledge, or inaccessible text. These slow learning down without producing any compensatory benefit. A desirable difficulty, by contrast, slows apparent progress but engages the learner in retrieval, elaboration, and discrimination that actually strengthens what they know.

Bjork identifies five types of desirable difficulty, each with a distinct mechanism and classroom application.

Spacing: Distributing Practice Over Time

Spacing is the practice of distributing study or practice over time rather than concentrating it in a single block. The spacing effect, first documented by Ebbinghaus in the 1880s, is one of the most replicated findings in all of cognitive psychology. Spaced practice consistently produces better long-term retention than massed practice, often by a factor of two or more, even when total study time is held constant (Bjork & Bjork, 2011).

The mechanism connects directly to storage and retrieval strength. When you return to material after a delay, retrieval strength has dropped. The effort required to reconstruct the memory triggers a process Bjork calls 'retrieval-induced forgetting of competitors' and produces a large increase in storage strength. The longer the gap (within reason), the greater the benefit of re-study. Teachers applying this principle space reviews of content across days, weeks, and months, rather than reviewing everything at the end of a unit. For a detailed practical guide to structuring spaced practice in your timetable, see the article on spaced practice.

Classroom example (Year 8 Science): A teacher covers chemical reactions in Week 1. Rather than leaving it until end-of-year revision, they include a 10-minute retrieval starter on chemical reactions in Week 5, Week 12, and Week 20. Each review is deliberately shorter than the initial lesson. Students who find the Week 5 starter difficult are not failing; they are experiencing the desirable difficulty of reduced retrieval strength, and the effort they invest in retrieving the information substantially increases their storage strength.

Interleaving: Mixing Topics and Problem Types

Interleaving means practising different topics or problem types in a mixed rather than blocked sequence. Instead of completing twenty long-division problems followed by twenty fraction problems, students complete a mixed set: long-division, fraction, long-division, fraction, or even more varied sequences (Kornell & Bjork, 2008).

Interleaving is one of the least intuitive desirable difficulties because it demonstrably makes practice feel harder and produces worse immediate performance than blocked practice. Students who interleave problems will make more errors during practice. Teachers observing an interleaved practice session can easily mistake this for confusion or poor teaching. However, in tests conducted 24 hours or a week later, interleaved practice consistently produces better performance, sometimes substantially so (Kornell & Bjork, 2008).

The mechanism is discrimination learning. When problems of the same type are grouped together, students apply a single strategy repeatedly without needing to identify which strategy is appropriate. In interleaved practice, each problem requires students to identify the problem type first, then select and apply the correct method. This identification step is the additional cognitive work that strengthens both the knowledge of procedures and the ability to discriminate between problem types, which is precisely what is required in an exam or in applying knowledge to real situations.

Full guidance on building interleaved practice into schemes of work is available in the article on interleaving.

Classroom example (GCSE Mathematics, Year 10): A teacher is covering quadratics, completing the square, and the quadratic formula. Rather than blocking each method over separate lessons and then having students practise each in isolation, she creates weekly mixed problem sets that combine all three methods, along with problems from previous units (factorising, linear equations, and simultaneous equations). Students initially find these sets frustrating. After six weeks, they significantly outperform a comparable class that practised each method in isolation, not just on the three methods but on their ability to select the right approach unprompted.

Retrieval Practice: Testing as a Learning Tool

Retrieval practice refers to the act of recalling information from memory rather than re-reading or reviewing it. The testing effect, which is the finding that retrieval practice produces stronger retention than restudying, is one of the most robust phenomena in memory science (Roediger & Karpicke, 2006). Bjork's contribution to this literature includes the understanding that retrieval is not merely a measurement tool; it is itself a learning event that substantially increases storage strength.

The mechanism is that retrieval requires reconstructing a memory trace from partial cues, and this reconstruction process strengthens and elaborates the stored representation. By contrast, re-reading or reviewing material that is visible to the student provides no retrieval challenge; retrieval strength is artificially elevated by the presence of the information, and storage strength gains nothing (Bjork & Bjork, 2011).

Low-stakes quizzing, flashcards, retrieval starters, brain dumps, and practice questions without notes are all forms of retrieval practice. The method matters less than the principle: students must retrieve information from memory, not just recognise it when prompted. For a full classroom implementation guide, see the article on retrieval practice.

Classroom example (Year 6 History): A teacher finishes a unit on the Second World War. Rather than asking students to read over their notes and create a mind map, she begins the next three lessons with a 5-minute blank-page recall activity: "Write down everything you can remember about causes of the Second World War. No notes." Students write independently, share with a partner to fill gaps, and the teacher reveals anything significant that was missed. This is more uncomfortable than reviewing notes, but it triples retention compared to passive review (Roediger & Karpicke, 2006).

Generation: Creating Answers Before Being Told Them

Generation refers to the practice of requiring learners to produce an answer or solution before being shown the correct response, even when they are likely to produce an error or incomplete answer. The generation effect, documented by Slamecka and Graf (1978) and extended by Bjork and colleagues, shows that generated information is better remembered than read information (Bjork & Bjork, 2020).

The principle applies across contexts: students who attempt a problem before seeing the solution, who try to recall a vocabulary word before seeing it, or who generate an explanation before the teacher gives one will remember the target information better than students who simply read the correct answer. The difficulty of failing to generate the answer is itself the mechanism. The cognitive effort of searching memory and attempting to construct a response increases storage strength for the correct answer when it is subsequently provided.

This runs counter to the common teaching instinct to protect students from struggling. Allowing students to attempt a problem, produce an incomplete answer, and then encounter the correct solution is not careless instruction; it is applied learning science. The teacher's role shifts from gatekeeper of correct information to architect of productive failure.

Classroom example (Year 9 French): A teacher is introducing new vocabulary for jobs and occupations. Rather than displaying the French word and asking students to repeat it, she displays only the English word and asks students to write their best guess at the French equivalent, including any words they half-remember or cognates they can identify. After 90 seconds of individual effort, she displays the French words. Students who guessed wrong and then saw the correct answer remember the vocabulary better a week later than students who were shown the French word from the start (Kornell & Bjork, 2008).

Variation: Changing the Conditions of Practice

Variation means practising the same skill or applying the same knowledge across varied contexts, surface forms, or problem formats. Rather than practising one type of essay question repeatedly, students practise under different question formats, different stimulus materials, or different time constraints. Rather than teaching a mathematical procedure with one class of problem, teachers present the same underlying concept across problems that look quite different on the surface (Bjork & Bjork, 2020).

The benefit of variation is transfer. When students encounter only one form of a problem, they often develop knowledge that is brittle: it works for that specific form but fails when the surface features change. Variation forces the learner to distinguish the deep structure of a problem from its surface features, which is what enables transfer to novel situations. This connects directly to cognitive load theory, which identifies schema abstraction (extracting general principles from specific instances) as the goal of effective instruction.

Classroom example (Year 7 English): A teacher wants students to understand how writers use sentence structure for effect. She could teach this using only newspaper headlines, practise it with newspaper headlines, and assess it using newspaper headlines. Instead, she uses examples from novels, speeches, advertising, poetry, and non-fiction reports. Each context presents the same underlying concept (sentence structure for effect) in a different surface form. Students develop a more transferable, flexible understanding because they have had to identify the concept across varied contexts, rather than recognising a familiar surface pattern.

The New Theory of Disuse: Why Forgetting Is Not the Enemy

One of Bjork's most provocative claims is that forgetting is not a bug in the memory system but a feature of an adaptive one. The 'New Theory of Disuse' (Bjork & Bjork, 1992) proposes that the human memory system has evolved to manage an extraordinarily large amount of stored information by making unused information less accessible over time. This is not storage decay, which would mean the information is lost. It is retrieval strength reduction, which means the information becomes harder to access without being erased.

The adaptive value of this system is that currently relevant information stays accessible while older, unused information becomes temporarily suppressed. If every memory maintained the same retrieval strength regardless of use, the most recently acquired knowledge would be constantly competing with older memories, and retrieval would be chaotic. The reduction of retrieval strength for low-use information is a useful filtering mechanism.

What makes this theory practically significant is the relationship between forgetting and re-learning. When retrieval strength for a piece of knowledge drops but storage strength is still present, re-learning that knowledge produces a disproportionately large increase in storage strength. This is the 'savings in relearning' effect, first documented by Ebbinghaus and thoroughly explained by Bjork's theoretical framework. Allowing students to partially forget material before reviewing it is not pedagogical negligence; it is the optimal condition for maximising long-term storage strength.

Teachers who interpret student forgetting between lessons as evidence that they 'didn't really learn it' are applying the wrong model of memory. A student who cannot recall something after two weeks has not necessarily failed to learn it; they have simply experienced the natural reduction in retrieval strength that the memory system applies to low-activation knowledge. Re-engaging that knowledge will restore and strengthen it, especially if the retrieval is effortful.

Performance vs Learning: The Most Important Distinction in Teaching

Bjork and Soderstrom (2015) make a distinction that is perhaps the most important single idea in the desirable difficulties literature: performance and learning are not the same thing. Performance is what a student can do during or immediately after instruction. Learning is the relatively permanent change in knowledge or skill that endures over time and transfers to new contexts. The problem is that performance is visible and learning is not.

This creates a systematic illusion. A lesson where students perform well (give correct answers, complete tasks, demonstrate fluency) feels effective. A lesson where students struggle, produce errors, and take longer to complete tasks feels less effective. But if the struggling lesson was employing desirable difficulties (spaced retrieval, interleaving, generation), it is producing substantially more learning than the fluent lesson, which may be producing only the performance illusion.

This illusion affects both students and teachers. Students, given a free choice of study method, consistently prefer massed practice, re-reading, and highlighting because these strategies produce high fluency during study, which feels like learning. Teachers, when observing lessons, tend to rate lessons with high student engagement and correct responses more positively than lessons with productive struggle and errors. Both judgements are, from a learning science perspective, likely to be wrong (Soderstrom & Bjork, 2015).

The educational implication is significant. Bjork and colleagues argue that standard assessment practice, particularly formative assessment that occurs immediately or shortly after teaching, is measuring performance rather than learning. A student who scores well on an end-of-lesson exit ticket may have learned very little; a student who scores poorly on a test two weeks later, but then retrieves the material successfully after a hint, may have strong storage strength that simply could not be accessed at that moment. The EEF Toolkit consistently rates metacognition and self-regulated learning as producing approximately seven months of additional progress (+7 months) precisely because teaching students to manage their own retrieval, spacing, and self-testing makes them better at learning rather than just at performing.

Comparing Bjork's Approach with Traditional Teaching

The contrast between what Bjork's research recommends and what typical classroom practice looks like is stark. The table below summarises the key differences.

Traditional Practice Bjork-Informed Practice Mechanism
Massed practice: teach a topic and practise it immediately Spaced practice: return to topics after delays of days or weeks Low retrieval strength at time of practice increases storage strength gains
Blocked practice: complete all problems of one type before moving on Interleaved practice: mix problem types in practice sessions Discrimination learning; forces identification of problem type, not just procedure application
Re-reading and review: students re-read notes or textbooks Retrieval practice: students recall from memory without reference materials Retrieval strengthens storage; re-reading with material present produces little encoding gain
Explain first, then practise: teachers give complete explanation before students attempt tasks Generation before instruction: students attempt problems or recall before explanation Generation effect; effort of attempted retrieval primes encoding of the correct answer
Consistent conditions: same format, context, and problem type in practice and assessment Variable practice: vary formats, contexts, and problem features during learning Schema abstraction; variation forces extraction of deep structure, enabling transfer
Minimise errors: scaffold to ensure high success rates during practice Desirable errors welcome: allow productive failure before correction Error generation activates retrieval competition, strengthening the correct answer when given

This does not mean that traditional practices are entirely wrong. Direct instruction is well supported by evidence, and explicit teaching before retrieval practice is appropriate, particularly for novice learners who lack the prerequisite knowledge to generate useful responses. Bjork's framework is not an argument against instruction; it is an argument about the conditions under which practice produces the most durable results.

The Desirable Difficulties Framework infographic for teachers
The Desirable Difficulties Framework

How Bjork's Research Connects to EEF and UK School Evidence

Bjork's work is primarily laboratory-based, conducted with university students in controlled experiments. Teachers in UK schools rightly want to know whether these findings hold in classrooms, with mixed-ability groups, across subjects, and under curriculum constraints.

The evidence from classroom-based research is encouraging. The EEF (Education Endowment Foundation) has funded multiple studies of retrieval practice in UK primary and secondary schools. Their 2021 review of the evidence for retrieval practice found consistent positive effects across subjects and year groups, with an effect size equivalent to approximately three to four months of additional progress when retrieval practice was used as part of routine classroom instruction. Spacing and interleaving have smaller classroom evidence bases, but the available studies show positive results (Weinstein et al., 2018).

The EEF Teaching and Learning Toolkit, which aggregates evidence across thousands of studies, assigns formative assessment strategies (which include retrieval practice) approximately four months of additional progress, and metacognition and self-regulated learning approximately seven months. Both of these are mechanisms through which desirable difficulties operate: students who understand why spacing and retrieval are beneficial are better at self-regulating their study than those who follow their intuitions, which, as Bjork's work shows, tend to be systematically wrong.

The Ebbinghaus forgetting curve is the empirical foundation for spacing: Bjork's theory of why spacing works is the most complete account of this well-established phenomenon. Working memory research, particularly Baddeley's model, provides a complementary framework for understanding why cognitive load theory and desirable difficulties must be balanced: novice learners whose working memory is fully taxed by the surface demands of a task may not have cognitive capacity available to benefit from interleaving or generation effects. Scaffolding appropriately reduces cognitive load for novices, creating the conditions under which desirable difficulties can be introduced progressively.

Common Misconceptions About Bjork's Work

Bjork's ideas are increasingly cited in teacher training and CPD materials, but several misconceptions have attached themselves to the framework. Addressing these directly prevents misapplication in classrooms.

Misconception 1: Desirable difficulties work for all learners equally. They do not. Bjork's research has been conducted primarily with adults who have sufficient prior knowledge to engage in productive retrieval attempts. For absolute beginners, the generation effect requires enough prior knowledge to produce a plausible attempt. Without it, generation simply produces blank responses and no learning benefit. Retrieval practice is similarly limited for students with very low prior knowledge; you cannot retrieve what was never encoded. Scaffolding and direct instruction must precede and support the introduction of desirable difficulties, particularly for younger learners or those with limited subject knowledge.

Misconception 2: Bjork argues against explicit teaching. He does not. Bjork's framework concerns the conditions for practice, review, and assessment. The generation effect works because a correct answer is subsequently provided; without explicit teaching, generation without feedback has no learning benefit. Bjork's research complements, rather than contradicts, direct instruction: teach explicitly, then create the conditions for effortful retrieval.

Misconception 3: Making lessons harder always helps. Difficulty is only desirable when it triggers the specific cognitive processes that increase storage strength (retrieval, discrimination, elaboration). Difficulty that arises from confusing presentation, lack of prerequisite knowledge, or unclear task design is simply undesirable difficulty: it imposes cognitive effort without producing learning gains. A badly designed worksheet is not a desirable difficulty.

Misconception 4: Bjork's research means students should never re-read notes. Re-reading is not always useless. It is most useful in the first pass through new material, when students are building a basic schema. The problem is when re-reading replaces retrieval practice during revision. Once material has been initially learned, re-reading produces poor returns compared to retrieval, and students who conflate fluent re-reading with learning are deceived by their own performance.

Misconception 5: Testing hurts low-confidence students. Some teachers worry that low-stakes testing will damage the confidence of students who frequently fail to retrieve material. The research literature, including classroom studies by Pooja Agarwal and colleagues, consistently shows that retrieval practice is beneficial for low-attaining students when implemented correctly, which means low-stakes, normalised as a routine tool rather than an evaluative tool, and followed by feedback on what was not retrieved. What damages confidence is high-stakes retrieval with public consequences for failure, not retrieval practice itself.

Applying Bjork's Framework: Planning Principles for Your Classroom

Translating Bjork's research into classroom practice does not require wholesale curriculum redesign. The following planning principles are achievable within existing timetable and curriculum constraints.

Distribute reviews deliberately. When planning a unit, identify three or four moments after the main teaching phase where you will return to key content. These reviews should be spaced: one week after initial teaching, three weeks after, and again near an end-of-unit point. Each review should require retrieval, not just recognition. A five-minute starter that asks students to recall from memory, without reference to notes, is sufficient. The testing effect means this review is more efficient than re-teaching the same content.

Interleave practice during the middle of a unit. Once students have been taught at least two related topics or procedures, mix practice across both rather than practising each in isolation. Mixed problem sets are more difficult to create but produce substantially better transfer. This is especially valuable in Mathematics, Science, and Modern Foreign Languages, where procedural discrimination is central to the subject.

Use generation before explanation, where appropriate. Before teaching a new concept, ask students to attempt a related problem or recall a related idea from memory. The attempt does not need to be correct; the cognitive effort of the attempt primes encoding of the correct answer. A minute of silent independent thinking before an explanation costs almost nothing and produces a measurable benefit for retention.

Vary practice conditions. Avoid practising a skill always in the same format, context, or with the same materials. A student who has only ever written formal essays in response to printed prompts may not transfer their skill to a different format. Varied practice surfaces the deep structure of knowledge and supports the metacognition needed for self-regulated study.

Teach students about the performance-learning distinction. One of the most impactful things a teacher can do is explain directly to students why spacing and retrieval practice feel worse during study but produce better results. Students who understand the cognitive science behind their study strategies are more likely to choose effortful strategies over comfortable but ineffective ones. This is the bridge between Bjork's laboratory findings and the EEF's seven-month metacognition effect.

What Bjork's Research Does Not Tell Us

No account of Bjork's work should omit its limitations. Several are worth naming explicitly.

The majority of Bjork's foundational experiments were conducted with adult university students in laboratory settings, using word lists, paired associates, and mathematics problems. The ecological validity of these findings for primary-age children, for subjects with high affective demands (such as Personal Social Health Education or Drama), or for learners with working memory difficulties is less well established. The desirable difficulties framework should be applied with professional judgement, not as a universal prescription.

The performance-learning distinction, while theoretically coherent, creates a practical problem for teachers: if current performance is a poor indicator of learning, how do you know whether students are actually learning? Bjork's answer is that delayed tests are better indicators than immediate tests, but this is difficult to implement systematically in time-pressured curricula. Teachers need to balance the ideal measurement conditions against the practical demands of formative assessment.

Finally, while desirable difficulties are well evidenced for factual and procedural knowledge, the evidence for complex, creative, or evaluative learning is thinner. Writing, artistic judgement, and disciplinary reasoning may require different conditions from the retrieval of mathematical procedures or historical facts. Bjork's framework is a powerful tool; it is not the only tool a teacher needs.

In your next lesson, identify one place where students currently review material by re-reading or copying notes, and replace it with a five-minute retrieval activity: blank paper recall, a low-stakes quiz without reference to notes, or a generation task before the day's explanation. Do not grade it. Use it as the starting point for that lesson's instruction.

5 Ways to Boost Long-Term Learning in Your Classroom infographic for teachers
5 Ways to Boost Long-Term Learning in Your Classroom

Further Reading: Key Research Papers

Further Reading: Key Papers on Bjork's Learning Theory

The following papers provide the primary evidence base for the desirable difficulties framework. They are listed in order of foundational importance and are recommended for teachers undertaking CPD, PGCE assignments, or leadership roles in curriculum planning.

Memory and Metamemory Considerations in the Training of Human Beings View study ↗

29 citations

Bjork, R.A. (1994). In J. Metcalfe & A.P. Shimamura (Eds.), Metacognition: Knowing About Knowing. MIT Press.

This is the paper in which Bjork first used the term 'desirable difficulties' and outlined the core framework. Writing for a training and instruction context, Bjork distinguishes between conditions that support performance and those that support long-term retention and transfer. The paper connects storage and retrieval strength theory directly to practical training design and remains the clearest single introduction to the framework for teachers and practitioners.

Making Things Hard on Yourself, But in a Good Way: Creating Desirable Difficulties to Enhance Learning View study ↗

520+ citations

Bjork, E.L. & Bjork, R.A. (2011). In M.A. Gernsbacher, R.W. Pew, L.M. Hough & J.R. Pomerantz (Eds.), Psychology and the Real World. Worth Publishers.

This is the most widely cited summary of the desirable difficulties literature and the best starting point for teachers new to the framework. Both Robert and Elizabeth Bjork contributed equally, and the paper covers spacing, interleaving, testing, and generation effects with clear definitions and empirical support. It includes a direct discussion of why students and instructors misjudge what works, making it essential reading for anyone designing revision programmes or study skills interventions.

Learning Versus Performance: An Integrative Review View study ↗

380+ citations

Soderstrom, N.C. & Bjork, R.A. (2015). Perspectives on Psychological Science, 10(2), 176–199.

This review paper provides the most thorough treatment of the performance-learning distinction in the literature, synthesising decades of evidence showing that performance during practice is a systematically unreliable index of durable learning. Soderstrom and Bjork review evidence from spacing, interleaving, testing, and generation experiments, demonstrating in each case that the condition producing better immediate performance is not the condition producing better delayed retention. This paper is directly relevant to how teachers design and interpret formative assessment.

Desirable Difficulties in Theory and Practice View study ↗

180+ citations

Bjork, R.A. & Bjork, E.L. (2020). Memory, 28(1), 103–116.

This 2020 paper is an update and revisit of the framework, addressing concerns about ecological validity and classroom applicability that had accumulated in the literature since 1994. The Bjorks respond to critiques, clarify the boundary conditions of each desirable difficulty, and address the particular challenge of interleaving in subjects where novice learners may lack the prerequisite knowledge to benefit from mixed practice. Teachers who have already read the 2011 paper will find this a valuable complement, particularly sections on when desirable difficulties are and are not appropriate.

Learning Concepts and Categories: Is Spacing the 'Enemy of Induction'? View study ↗

430+ citations

Kornell, N. & Bjork, R.A. (2008). Psychological Science, 19(6), 585–592.

This paper reports the key experiments on interleaving and category learning that established why interleaved practice produces better discrimination and transfer than blocked practice. Using an art-style learning paradigm (participants categorise artists' paintings), Kornell and Bjork showed that interleaved study produced better generalisation to new examples, while participants nonetheless believed that blocked study had worked better for them. The subjective experience of interleaving as less effective is the central finding for teachers trying to persuade students to adopt mixed practice in their revision.

Educational Technology

Back to Blog

{"@context":"https://schema.org","@graph":[{"@type":"Organization","@id":"https://www.structural-learning.com/#org","name":"Structural Learning","url":"https://www.structural-learning.com/","logo":{"@type":"ImageObject","url":"https://cdn.prod.website-files.com/5b69a01ba2e409501de055d1/5b69a01ba2e40996a5e055f4_structural-learning-logo.png"}},{"@type":"Person","@id":"https://www.structural-learning.com/team/paul-main/#person","name":"Paul Main","url":"https://www.structural-learning.com/team/paul-main","jobTitle":"Founder","affiliation":{"@id":"https://www.structural-learning.com/#org"}},{"@type":"BreadcrumbList","@id":"https://www.structural-learning.com/post/robert-bjork-teachers-guide-desirable#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.structural-learning.com/"},{"@type":"ListItem","position":2,"name":"Blog","item":"https://www.structural-learning.com/blog"},{"@type":"ListItem","position":3,"name":"Robert Bjork: A Teacher's Guide to Desirable Difficulties","item":"https://www.structural-learning.com/post/robert-bjork-teachers-guide-desirable"}]},{"@type":"BlogPosting","@id":"https://www.structural-learning.com/post/robert-bjork-teachers-guide-desirable#article","headline":"Robert Bjork: A Teacher's Guide to Desirable Difficulties","description":"","author":{"@id":"https://www.structural-learning.com/team/paul-main/#person"},"publisher":{"@id":"https://www.structural-learning.com/#org"},"datePublished":"2026-02-19","dateModified":"2026-02-19","inLanguage":"en-GB"}]}