Storage Strength and Retrieval Strength: Why Forgetting Helps Learning
|
February 19, 2026
Bjork's storage and retrieval strength theory explains why students forget after exams and why forgetting is a feature, not a bug. Practical planning guidance for spacing, retrieval practice, and diagnostic assessment.
Most teachers have experienced the following moment: a student performed well in last week's lesson, answered questions confidently, and appeared to understand the material. Three weeks later, the same student stares blankly at the same content as though they have never encountered it. The teacher feels the material must be retaught from scratch. This experience is so common that it is treated as an inevitable feature of school life. Bjork and Bjork (1992) argue it is nothing of the sort. It is a predictable consequence of confusing two independent properties of memory.
Key Takeaways
Two independent memory properties: Storage strength measures how deeply something is encoded; retrieval strength measures how accessible it is right now. A lesson that increases retrieval strength may do almost nothing to storage strength.
High retrieval blocks learning: When retrieval strength is already high, re-studying produces minimal gains in storage strength. Allowing some forgetting before returning to material produces dramatically stronger long-term retention.
Forgetting is functional: The memory system depresses retrieval strength for unused information to manage cognitive resources. This is adaptive, not a failure. It is also reversible through retrieval practice.
Performance misleads both parties: A student who performs well in a lesson has not necessarily learned. A student who struggles to recall material during a review has not necessarily forgotten. Both teachers and students routinely misread these signals.
The fix is structural: Spacing, interleaving, and retrieval practice are effective precisely because they exploit the storage/retrieval distinction. They work by deliberately reducing retrieval strength before re-study, not in spite of it.
Storage Strength vs. Retrieval Strength: Knowing the Difference
What Storage Strength and Retrieval Strength Mean
Bjork and Bjork (1992) proposed that every memory has two separable properties, each of which follows its own rules and responds differently to practice.
Storage strength is a measure of how thoroughly a piece of knowledge is encoded in long-term memory. It accumulates incrementally over time, and once established at a high level, it does not decay. This is an important point: you do not gradually lose well-stored knowledge. Storage strength is relatively permanent. It is also largely invisible, in the sense that you cannot introspect on your own storage strength for a given item of knowledge; you can only infer it from your ability to retrieve that knowledge under different conditions.
Retrieval strength is a measure of how accessible a memory is at a given moment. Unlike storage strength, retrieval strength fluctuates substantially. It is highest immediately after study or practice, drops sharply over hours and days without use, rises again with successful retrieval, and is sensitive to context: retrieval strength is typically higher in familiar environments, with familiar cues, and in low-stress conditions than in novel ones.
The critical insight is that these two properties are independent of each other. You can have high storage strength and low retrieval strength (the "knew it but couldn't recall it" experience in an exam). You can also have low storage strength and high retrieval strength (you can answer the question easily right after the lesson, but the knowledge will be gone within days). These two failure modes look identical from the outside, but they have completely different implications for what a teacher should do next.
The Four Quadrants of Memory Knowledge
A useful way to understand the theory is to arrange the two properties on two axes, producing four combinations. Each quadrant corresponds to a recognisable situation in the classroom.
Quadrant
Storage Strength
Retrieval Strength
What This Looks Like
Teacher Response
1
High
High
Student recalls fluently; knowledge is durable and accessible. This is the target state for key content.
Maintain with widely-spaced retrieval; move on to new material.
2
High
Low
Student cannot recall it at the moment, but knowledge is solidly encoded. Seems like forgetting; is actually temporary inaccessibility. Common after a well-taught unit with no subsequent retrieval.
A single retrieval practice event will restore access rapidly. Do not reteach from scratch.
3
Low
High
Student can answer correctly right now, but the knowledge is shallowly encoded. It will be gone within days. This is the cramming quadrant. Also the "seems to understand in the lesson" quadrant.
The student needs spaced retrieval practice, not more exposure to the content. Additional input will not fix shallow encoding.
4
Low
Low
Student neither recalls it nor has it stored durably. Genuine gap: either the content was never taught, was taught inaccessibly, or prerequisite knowledge is missing.
Reteach. Check prerequisites. Address cognitive load barriers before expecting encoding to occur.
The practical value of this table is that it forces a diagnostic question. When a student fails to recall something, the instinctive teacher response is to reteach it (Quadrant 4 response). But if the student is actually in Quadrant 2, reteaching is wasteful. A five-minute retrieval activity would restore access far more efficiently, and the act of retrieval would also increase storage strength further, moving the student securely into Quadrant 1.
The New Theory of Disuse Explained
Bjork and Bjork (1992) called their account the 'New Theory of Disuse' to distinguish it from older models, which held that memories simply decay or fade through lack of use, in the way that a unused path through a field gradually becomes overgrown.
The new theory proposes something quite different. Storage strength, once built, does not decay. What decays is retrieval strength, and this decay is not accidental but adaptive. The human memory system manages an enormous number of stored representations. If every stored item were equally and permanently accessible, the system would become unworkable: every attempt to retrieve one thing would be swamped by interference from thousands of related items. The depression of retrieval strength for infrequently used information is the system's way of managing this interference. Think of it less as forgetting and more as filing: the information moves from the top of the pile to a drawer, but it has not been discarded.
The practical consequence of this model is that forgetting is not the enemy of learning. It is a necessary intermediate state that the learning system passes through on the way to durable encoding. A memory that has never been retrieved under conditions of reduced retrieval strength has never been genuinely tested. Its storage strength remains uncertain. A memory that has been allowed to become partially inaccessible and has then been successfully retrieved has demonstrated genuine storage strength and has had that storage strength reinforced by the retrieval event itself (Bjork, 1994).
This reframing has a direct implication for how teachers think about revision and review. The common approach of reviewing material when it is still fresh (high retrieval strength) produces the comfortable experience of fluent recall, but adds little to long-term retention. Reviewing material after a gap (low retrieval strength) feels harder and produces more errors, but those errors and the effort of retrieval are precisely the conditions that drive storage strength upward.
Why Cramming Creates High Retrieval, Low Storage
Cramming is the most widely observed application of the storage/retrieval distinction, and it illustrates the theory cleanly. A student who revises for an examination by reading through all their notes the night before will enter the examination with very high retrieval strength for the material. They will feel prepared. The information is highly accessible. They may perform adequately in the examination if it occurs within twelve to twenty-four hours.
However, because the student has been reviewing material with already-high retrieval strength, almost nothing has been done to increase storage strength (Bjork & Bjork, 1992). Within days of the examination, retrieval strength decays sharply, and because storage strength is low, the information becomes genuinely inaccessible. This is the mechanism behind the phenomenon that teachers describe as "forgetting everything after the exam." It is not that students have stopped caring about the subject, or that their memory has mysteriously erased itself. The content was encoded shallowly and the cramming strategy did nothing to change that.
The contrast with spaced retrieval practice is instructive. A student who retrieves the same information on five occasions, spread across several weeks, with gaps between each retrieval attempt, will have built substantially higher storage strength. They may recall less fluently in the day or two immediately before the examination (retrieval strength temporarily low after the last review gap), but they will retain the knowledge for months or years after the examination, because the information is genuinely encoded.
Classroom example (Year 11 Biology, GCSE): A teacher notices that students who revised enzyme function by rereading notes the night before a test scored well on the test but could not answer questions about enzymes six weeks later during a practice paper. She introduces a low-stakes retrieval quiz on enzyme function every three weeks throughout the year, using questions the students answer from memory without notes. Students initially find these quizzes uncomfortable. By March, they can answer enzyme questions reliably, even for content taught in September. The retrieval strength fluctuates between quizzes, but storage strength has been built across multiple retrieval events.
Forgetting Curves and Storage Strength
Hermann Ebbinghaus documented the forgetting curve in 1885, showing that retention of new information drops sharply over the first hours and days after learning, then levels off. Bjork's framework provides the theoretical machinery that explains what Ebbinghaus observed empirically.
Ebbinghaus was measuring retrieval strength: his forgetting curve shows retrieval strength declining from 100% at the time of study to roughly 20% after a month without re-exposure. Bjork's contribution is to distinguish this observable retrieval strength curve from the less visible storage strength curve. Storage strength, in Bjork's model, does not track the forgetting curve. It is built during retrieval events, particularly those that occur when retrieval strength is low.
This distinction matters for practical planning. If you look only at the Ebbinghaus curve, the implication seems to be that you should re-study material as frequently as possible to keep retrieval strength from falling. Bjork's theory reveals that this is exactly wrong. Allowing retrieval strength to fall, then practising retrieval at the trough of the forgetting curve, produces the largest possible increase in storage strength. The goal is not to prevent the forgetting curve from dropping. The goal is to exploit the drop by timing retrieval practice to coincide with moments of reduced retrieval strength (Bjork & Bjork, 1992).
A practical implication is that the optimum spacing of review sessions is not constant. The first review after learning should occur relatively soon (within one to two days), when storage strength is low and retrieval strength has dropped to a level where retrieval requires effort but is still achievable. Subsequent reviews should be spaced further apart, because each successful retrieval event increases storage strength and slows the subsequent decay of retrieval strength. This expanding-interval pattern is the basis of spaced practice systems. For a detailed guide to implementing this in your classroom, see the article on spaced practice.
Why Re-Study Fails at High Retrieval Strength
There is an asymmetry in the storage/retrieval relationship that has direct implications for lesson design. Bjork's research shows that the benefit of a study or practice event is inversely related to the current retrieval strength of the material being studied. When retrieval strength is high, a study event produces a small gain in storage strength. When retrieval strength is low, the same study event produces a much larger gain.
This asymmetry means that massed re-study is self-defeating. A student who reads through a chapter, then immediately re-reads it, gains little from the second reading because retrieval strength is still maximal from the first. The same student who reads the chapter, waits a day, then attempts to recall the main points from memory (with retrieval strength now reduced) will gain substantially more from that retrieval event than from any amount of immediate re-reading.
The practical implication is that the structure of practice matters more than the quantity. Fifteen minutes of spaced retrieval distributed over a week produces more durable learning than an hour of massed re-reading on a single evening. This finding has been replicated across subjects, age groups, and types of knowledge, from vocabulary learning in language classes to procedural skills in mathematics and science (Soderstrom & Bjork, 2015).
Classroom example (Year 9 French Vocabulary): A teacher sets homework using a vocabulary learning application that shows students words they already know at high frequency alongside new words. Students find this enjoyable: they are mostly getting correct answers because retrieval strength is high for known words. The teacher replaces this with a distributed retrieval task: on day one, students learn twelve new words; on day two, they retrieve all twelve from memory before seeing them; on day five, they retrieve again; on day fourteen, they retrieve again. The second approach produces slower apparent progress in weeks one and two but substantially better retention at six weeks.
Why Students and Teachers Misread Performance
One of the most practically significant implications of Bjork's framework is that it reveals the gap between performance and learning. Performance is observable: it is what a student can do right now, in this lesson, under current conditions. Learning is the durable change in knowledge or skill that persists over time and transfers to new contexts. Performance during a lesson correlates poorly with long-term learning (Soderstrom & Bjork, 2015).
This means that the signals teachers typically use to assess whether students have learned something are unreliable. A student who can answer questions fluently during a lesson has high retrieval strength for the material right now. That high retrieval strength may reflect genuine high storage strength (Quadrant 1: excellent), or it may reflect the recent exposure to the material (Quadrant 3: fragile). The teacher cannot tell from the in-class performance which quadrant the student is in.
The converse is equally important. A student who struggles to answer a retrieval question three weeks after the lesson is not necessarily in Quadrant 4 (genuine gap requiring reteaching). They may be in Quadrant 2 (high storage, low retrieval), and a brief retrieval practice event will restore access quickly. Teachers who treat every retrieval failure as evidence of insufficient learning will over-reteach, which is time-inefficient and misses the opportunity to use the retrieval failure itself as a productive learning event.
This connects to the broader framework of formative assessment. Effective formative assessment should distinguish between temporary inaccessibility and genuine gaps. Asking students to retrieve information from memory under conditions of reduced retrieval strength is itself a more valid assessment than asking them to perform immediately after teaching.
Working Memory and Retrieval Reconstruction
Understanding why retrieval practice builds storage strength requires a brief look at what happens at the level of working memory. When a student attempts to retrieve a memory under conditions of reduced retrieval strength, working memory is recruited to search long-term memory, to activate partial cues, and to reconstruct the target representation from fragmentary traces.
This reconstruction process is cognitively demanding and is the key mechanism. When the answer is present in the environment (as it is during re-reading or note-reviewing), working memory does not need to perform this reconstruction. The student recognises the information rather than recalling it. Recognition and recall are distinct cognitive processes: recognition requires only that the presented information activates the stored representation, while recall requires the reverse process, constructing a representation from internal cues.
The effort of recall under reduced retrieval strength is what drives storage strength upward. This is also why simply making material more vivid, colourful, or interesting does not reliably increase long-term retention. Visual presentation affects encoding quality (which is the domain of cognitive load theory), but it does not address the retrieval dynamics that determine whether encoded information becomes durably stored. Storage strength is built through retrieval, not through re-exposure, regardless of how well-designed that re-exposure is.
Classroom example (Year 7 History): A teacher creates a beautiful and well-organised revision display on the causes of the Norman Conquest. Students enjoy consulting it and can answer questions about it confidently while it is on the wall. When the display is removed and students are asked to recall the causes from memory two weeks later, performance drops sharply. A parallel class instead uses five-minute brain dumps at the start of three subsequent lessons, writing everything they remember from memory. Their recall two weeks later is significantly better than the class with the display, despite (or because of) the greater apparent difficulty of the task.
The Forgetting-Learning Loop: How We Build Long-Term Memory
How Desirable Difficulties Exploit This Distinction
Bjork coined the term 'desirable difficulties' to describe conditions that slow apparent progress but increase long-term retention and transfer (Bjork, 1994). Each of the main desirable difficulties works by exploiting the storage/retrieval distinction in a specific way.
Spaced practice works by allowing retrieval strength to drop between practice sessions. The gap is not wasted time; it is the condition that makes the subsequent retrieval event maximally productive for storage strength.
Interleaving works by preventing students from building retrieval strength for one problem type through blocked repetition. Instead, students must identify which approach to use before applying it, which requires genuinely accessing stored knowledge rather than continuing a just-established retrieval pathway.
The testing effect is the finding that testing produces better retention than re-studying, even when the test produces errors. It works directly on storage strength: each retrieval attempt, successful or not, reconstructs and reinforces the memory trace. Even a failed recall attempt primes subsequent learning of the correct answer.
Desirable difficulties as a family of strategies are united by this mechanism: they all introduce conditions that reduce retrieval strength temporarily in order to create the conditions under which storage strength can be substantially increased. Understanding this underlying logic helps teachers apply the principles flexibly rather than following rigid recipes.
Interleaving deserves closer examination because it is the desirable difficulty teachers find most counterintuitive and students find most uncomfortable. In blocked practice, a student completes ten problems of the same type, then ten of a different type. In interleaved practice, problem types are mixed. Blocked practice produces better immediate performance; interleaved practice produces better retention and transfer (Kornell & Bjork, 2008).
The storage/retrieval framework explains why. During blocked practice, the student solves the first problem of a type, establishing the retrieval pathway. Problems two through ten travel that same pathway with high (and rising) retrieval strength, so each adds minimal storage strength. During interleaved practice, retrieval strength for any given approach drops between instances because other problem types intervene. Reinstating the retrieval pathway is a genuine storage-strength-building event.
The practical difficulty is that students experiencing interleaved practice will often tell teachers they are confused. Their current performance is visibly worse, and this is uncomfortable for both parties. The research evidence is clear that this discomfort is not only acceptable but necessary. Kornell and Bjork (2008) found that students consistently preferred blocked practice and rated it as more effective, even after their own test results demonstrated the opposite. This is the metacognitive illusion created by the performance/learning distinction.
Classroom example (Year 10 Mathematics, GCSE): A teacher is covering simultaneous equations, quadratics, and inequalities. Rather than practising each topic in separate blocked sets, she creates a weekly mixed problem set combining all three alongside earlier topics (linear equations, rearranging formulae). Students initially find the sets frustrating and request separate topic sheets. Six weeks later, they significantly outperform a parallel class on both individual topics and on unseen problems combining elements of more than one topic. The retrieval difficulty of identifying the correct method before applying it is precisely what has built stronger storage for each approach.
Teachers using interleaving need to explain the rationale to students explicitly. Without this explanation, students interpret the difficulty as evidence that they are failing, rather than as evidence that the practice design is working. Sharing the storage/retrieval distinction is one way to do this, and is addressed in the section below.
How to Explain This to Students
Students arrive in lessons with intuitive theories about memory and learning that are largely incorrect. These intuitive theories lead them towards ineffective study strategies: rereading notes, highlighting, and massed practice the night before an assessment. Sharing the storage/retrieval distinction directly with students is an act of metacognitive instruction that has measurable effects on their study behaviour (Bjork & Bjork, 2011).
The explanation does not need to be complex. A straightforward classroom version runs as follows:
"Your memory has two separate dials. One is how well something is stored, deep down. The other is how easy it is to get it out right now. The problem is that studying something when you can already get it out easily does almost nothing to store it more deeply. That's why rereading your notes before a test feels helpful but often isn't. Your brain already knows the information is there, so it doesn't bother making a stronger copy. What actually stores things deeply is retrieving them when it's hard. When you have to struggle to get something out of your memory, that's when your brain makes the strongest copy."
This explanation, delivered in age-appropriate language, prepares students for the experience of interleaved or spaced practice. It shifts the meaning of difficulty during practice from "I don't understand this" to "my brain is building a stronger memory right now." This reframing is not motivational rhetoric; it is an accurate description of the cognitive process, and it has been shown to reduce student resistance to desirable difficulty interventions (Soderstrom & Bjork, 2015).
For younger students, a physical analogy can help. Ask them to imagine that remembering something is like lifting a weight. Lifting a weight that is already in your hands (high retrieval strength) is easy but does not build strength. Picking up a weight that has been put down and feels heavy (low retrieval strength) is harder but builds strength faster. This metaphor captures the key asymmetry without requiring students to understand the theoretical framework.
What This Theory Does Not Tell Us
Bjork's framework is powerful, but it is important for teachers to understand its limits. The theory describes the relationship between storage strength, retrieval strength, and practice, but it does not specify what constitutes sufficient prior knowledge for retrieval practice to work. If a student has genuinely never encountered a concept, or lacks prerequisite schema structures, retrieval practice will produce confusion rather than learning. The theory presupposes that there is something stored to retrieve, even weakly.
The research base is also more robust for some types of knowledge than others. The spacing and testing effects have been replicated extensively for factual and procedural knowledge: vocabulary, historical facts, mathematical procedures, scientific terminology. The evidence for complex, conceptual knowledge (constructing arguments, evaluating evidence, generating creative responses) is more mixed, though there is no evidence that spacing and retrieval practice are harmful for these types of learning either (Soderstrom & Bjork, 2015).
The theory does not directly address motivation. Students who are disengaged from a subject, or who have significant anxiety about performance, may not benefit from desirable difficulties in the same way, because the emotional processing involved in high-stakes retrieval attempts can interfere with the cognitive mechanisms the theory describes. Teachers working with students in this situation may need to address motivation and anxiety before the full benefits of retrieval practice become available. This is where self-regulated learning frameworks become relevant alongside the memory science.
Connecting to EEF Evidence and Classroom Impact
The Education Endowment Foundation's Teaching and Learning Toolkit rates metacognition and self-regulation at an average impact of seven months' additional progress. Much of the evidence base behind that estimate involves students learning to use spacing, retrieval practice, and elaborative interrogation: strategies that work precisely because they exploit the storage/retrieval distinction.
The EEF evidence on formative assessment (four months' average impact) also connects to this framework. Formative assessment designed around retrieval rather than recognition gives teachers more accurate information about storage strength, not just current retrieval strength. A teacher who uses a low-stakes retrieval quiz two weeks after teaching a topic is measuring something closer to genuine learning than a teacher who asks questions during the lesson when retrieval strength is still high.
This matters for planning. If you want to know whether students have learned something, ask them to retrieve it from memory under conditions of reduced retrieval strength: at least a day after teaching, without access to notes, in a format that requires recall rather than recognition. The results will be less impressive and more accurate than a lesson-end check. They will also be more useful for deciding what to do next, because they distinguish Quadrant 2 students (who need a retrieval prompt) from Quadrant 4 students (who need reteaching).
Common Study Strategies Compared
Most study advice that students receive focuses on increasing retrieval strength without touching storage strength. Understanding this distinction allows teachers to evaluate common strategies directly.
Strategy
Effect on Retrieval Strength
Effect on Storage Strength
Verdict
Rereading notes
Raises temporarily
Minimal gain
Poor long-term return
Highlighting
Raises slightly during activity
Minimal to no gain
No evidence of benefit over rereading
Summarising
Raises during activity
Small to moderate gain if done from memory
Better if summary is written without notes
Mind mapping from notes
Raises during activity
Small gain
Better if done from memory (then becomes retrieval practice)
Retrieval practice (flashcards, free recall, quizzing)
Rises after each attempt
Large gain, especially when retrieval is effortful
Most effective single strategy
Spaced retrieval practice
Fluctuates between sessions; rises after each
Very large cumulative gain
Most effective overall approach
Practice tests under exam conditions
Variable (stress can suppress retrieval)
Large gain, particularly after feedback
Effective, especially with corrective feedback
The table reveals a common pattern: strategies that involve actively retrieving information from memory (rather than reviewing it in front of you) consistently produce larger gains in storage strength. The single most impactful change teachers can make to their students' study habits is to shift them from recognition-based revision (reading, highlighting, reviewing) to recall-based revision (blank-page recall, flashcards, practice questions without notes).
Why Students Forget Everything After an Exam
The experience of students forgetting all examination content within days of the exam is a diagnostic signal, not a mystery. Using Bjork's framework, the mechanism is completely clear.
Examination preparation typically involves students working through content repeatedly in the weeks before an exam, with the goal of being able to answer questions correctly on the day. This is a legitimate goal and the strategy succeeds: retrieval strength peaks around the time of the examination. However, if that preparation consisted primarily of rereading, summarising, and reviewing material that was already familiar, then each revision session was conducted under conditions of relatively high retrieval strength. The gain in storage strength from each session was therefore small. Retrieval strength was successfully maintained; storage strength was not substantially built.
Within one to two weeks of the examination, retrieval strength decays (as it always does without use), and because storage strength is modest, the material crosses below the threshold of accessible recall. This is not a failure of memory, intelligence, or effort. It is the predictable outcome of using the wrong practice strategy.
The fix is not to revise more. It is to revise differently. Students who prepare for examinations using spaced retrieval practice spread across weeks and months, deliberately allowing retrieval strength to fall between sessions, will retain the material for months and years after the examination. They will also perform better on the examination itself, because they will have built sufficient storage strength that even under the moderately elevated stress of examination conditions, retrieval strength remains adequate.
A Practical Planning Guide for Teachers
The storage/retrieval framework translates into a small number of practical planning decisions that can be applied across any subject.
First, build retrieval events into your scheme of work rather than treating review as an end-of-unit activity. A minimum effective dose is three spaced retrieval practice events for each major piece of content, timed to coincide with periods of reduced retrieval strength. For content taught in Week 1, a retrieval event in Week 2, another in Week 5, and a third in Week 10 will produce substantially better retention than three revision lessons in Weeks 9, 10, and 11.
Second, use the four quadrants as a diagnostic lens. Before deciding whether to reteach, ask whether a retrieval event might restore access first. If students struggle with a retrieval starter, wait for their responses before judging their knowledge. Students in Quadrant 2 will often recall more than they initially think if given a little time and a few retrieval cues.
Third, teach the storage/retrieval distinction explicitly to your students, at an age-appropriate level. Students who understand why difficulty is productive during practice are better placed to persist with effortful revision strategies rather than defaulting to comfortable but ineffective ones. This is a direct application of self-regulated learning principles and requires no specialist training to implement.
Fourth, align your formative assessment practices to measure storage strength rather than retrieval strength. Ask students to retrieve content from memory with at least a day's gap from the last teaching of that content. Use low-stakes conditions to reduce the anxiety that can suppress retrieval strength independently of storage strength.
In your next lesson, identify one piece of content that you taught at least one week ago and design a five-minute blank-page retrieval activity around it. Ask students to write everything they remember, without notes or prompts. Observe their responses using the four-quadrant lens: who recalls it fluently (Quadrant 1), who struggles but gets there with effort (Quadrant 2), and who has genuinely no access (Quadrant 4). Let the data determine whether your next move is to maintain, prompt, or reteach.
---
From Performance to Learning: 3 Structural Classroom Fixes
Further Reading: Key Papers on This Topic
Bjork, R.A. & Bjork, E.L. (1992) "A new theory of disuse and an old theory of stimulus fluctuation"View study ↗
The foundational paper introducing the storage/retrieval distinction and the New Theory of Disuse. Bjork and Bjork argue that retrieval strength decays with disuse while storage strength does not, and that this asymmetry explains why spaced practice and retrieval practice are more effective than massed study. Essential reading for any teacher wanting to understand the theoretical basis of desirable difficulties.
Soderstrom, N.C. & Bjork, R.A. (2015) "Learning versus performance: An integrative review"View study ↗
This review paper synthesises decades of research distinguishing performance during learning (current retrieval strength) from actual learning (storage strength). Soderstrom and Bjork examine why the two dissociate and what conditions create the largest gap between them. Directly relevant to teachers who want to understand why in-class performance is a poor predictor of long-term retention.
Kornell, N., Bjork, R.A. & Garcia, M.A. (2011) "Why tests appear to prevent forgetting"View study ↗
Kornell, Bjork and Garcia examine the mechanisms behind the testing effect, arguing that tests do not simply strengthen individual memories but also reduce interference from competing memories. The paper provides evidence that the testing effect operates through storage strength enhancement rather than retrieval strength maintenance, with implications for how teachers design low-stakes quizzes.
Storm, B.C., Bjork, R.A. & Storm, J.C. (2010) "Optimizing retrieval as a learning event"View study ↗
Storm, Bjork and Storm explore the conditions under which retrieval is most productive as a learning event, focusing on the relationship between retrieval difficulty and subsequent retention. Their findings support the counterintuitive claim that harder retrieval (lower retrieval strength at the time of testing) produces stronger long-term storage, provided the retrieval is ultimately successful.
Bjork, E.L. & Bjork, R.A. (2011) "Making things hard on yourself, but in a good way"View study ↗
A readable synthesis paper aimed partly at practitioners, in which Elizabeth and Robert Bjork review the full range of desirable difficulties (spacing, interleaving, testing, variation) through the lens of the storage/retrieval distinction. The paper includes a section on metacognitive illusions and why learners consistently choose less effective study strategies. A strong starting point for teachers new to the Bjork research programme.
Most teachers have experienced the following moment: a student performed well in last week's lesson, answered questions confidently, and appeared to understand the material. Three weeks later, the same student stares blankly at the same content as though they have never encountered it. The teacher feels the material must be retaught from scratch. This experience is so common that it is treated as an inevitable feature of school life. Bjork and Bjork (1992) argue it is nothing of the sort. It is a predictable consequence of confusing two independent properties of memory.
Key Takeaways
Two independent memory properties: Storage strength measures how deeply something is encoded; retrieval strength measures how accessible it is right now. A lesson that increases retrieval strength may do almost nothing to storage strength.
High retrieval blocks learning: When retrieval strength is already high, re-studying produces minimal gains in storage strength. Allowing some forgetting before returning to material produces dramatically stronger long-term retention.
Forgetting is functional: The memory system depresses retrieval strength for unused information to manage cognitive resources. This is adaptive, not a failure. It is also reversible through retrieval practice.
Performance misleads both parties: A student who performs well in a lesson has not necessarily learned. A student who struggles to recall material during a review has not necessarily forgotten. Both teachers and students routinely misread these signals.
The fix is structural: Spacing, interleaving, and retrieval practice are effective precisely because they exploit the storage/retrieval distinction. They work by deliberately reducing retrieval strength before re-study, not in spite of it.
Storage Strength vs. Retrieval Strength: Knowing the Difference
What Storage Strength and Retrieval Strength Mean
Bjork and Bjork (1992) proposed that every memory has two separable properties, each of which follows its own rules and responds differently to practice.
Storage strength is a measure of how thoroughly a piece of knowledge is encoded in long-term memory. It accumulates incrementally over time, and once established at a high level, it does not decay. This is an important point: you do not gradually lose well-stored knowledge. Storage strength is relatively permanent. It is also largely invisible, in the sense that you cannot introspect on your own storage strength for a given item of knowledge; you can only infer it from your ability to retrieve that knowledge under different conditions.
Retrieval strength is a measure of how accessible a memory is at a given moment. Unlike storage strength, retrieval strength fluctuates substantially. It is highest immediately after study or practice, drops sharply over hours and days without use, rises again with successful retrieval, and is sensitive to context: retrieval strength is typically higher in familiar environments, with familiar cues, and in low-stress conditions than in novel ones.
The critical insight is that these two properties are independent of each other. You can have high storage strength and low retrieval strength (the "knew it but couldn't recall it" experience in an exam). You can also have low storage strength and high retrieval strength (you can answer the question easily right after the lesson, but the knowledge will be gone within days). These two failure modes look identical from the outside, but they have completely different implications for what a teacher should do next.
The Four Quadrants of Memory Knowledge
A useful way to understand the theory is to arrange the two properties on two axes, producing four combinations. Each quadrant corresponds to a recognisable situation in the classroom.
Quadrant
Storage Strength
Retrieval Strength
What This Looks Like
Teacher Response
1
High
High
Student recalls fluently; knowledge is durable and accessible. This is the target state for key content.
Maintain with widely-spaced retrieval; move on to new material.
2
High
Low
Student cannot recall it at the moment, but knowledge is solidly encoded. Seems like forgetting; is actually temporary inaccessibility. Common after a well-taught unit with no subsequent retrieval.
A single retrieval practice event will restore access rapidly. Do not reteach from scratch.
3
Low
High
Student can answer correctly right now, but the knowledge is shallowly encoded. It will be gone within days. This is the cramming quadrant. Also the "seems to understand in the lesson" quadrant.
The student needs spaced retrieval practice, not more exposure to the content. Additional input will not fix shallow encoding.
4
Low
Low
Student neither recalls it nor has it stored durably. Genuine gap: either the content was never taught, was taught inaccessibly, or prerequisite knowledge is missing.
Reteach. Check prerequisites. Address cognitive load barriers before expecting encoding to occur.
The practical value of this table is that it forces a diagnostic question. When a student fails to recall something, the instinctive teacher response is to reteach it (Quadrant 4 response). But if the student is actually in Quadrant 2, reteaching is wasteful. A five-minute retrieval activity would restore access far more efficiently, and the act of retrieval would also increase storage strength further, moving the student securely into Quadrant 1.
The New Theory of Disuse Explained
Bjork and Bjork (1992) called their account the 'New Theory of Disuse' to distinguish it from older models, which held that memories simply decay or fade through lack of use, in the way that a unused path through a field gradually becomes overgrown.
The new theory proposes something quite different. Storage strength, once built, does not decay. What decays is retrieval strength, and this decay is not accidental but adaptive. The human memory system manages an enormous number of stored representations. If every stored item were equally and permanently accessible, the system would become unworkable: every attempt to retrieve one thing would be swamped by interference from thousands of related items. The depression of retrieval strength for infrequently used information is the system's way of managing this interference. Think of it less as forgetting and more as filing: the information moves from the top of the pile to a drawer, but it has not been discarded.
The practical consequence of this model is that forgetting is not the enemy of learning. It is a necessary intermediate state that the learning system passes through on the way to durable encoding. A memory that has never been retrieved under conditions of reduced retrieval strength has never been genuinely tested. Its storage strength remains uncertain. A memory that has been allowed to become partially inaccessible and has then been successfully retrieved has demonstrated genuine storage strength and has had that storage strength reinforced by the retrieval event itself (Bjork, 1994).
This reframing has a direct implication for how teachers think about revision and review. The common approach of reviewing material when it is still fresh (high retrieval strength) produces the comfortable experience of fluent recall, but adds little to long-term retention. Reviewing material after a gap (low retrieval strength) feels harder and produces more errors, but those errors and the effort of retrieval are precisely the conditions that drive storage strength upward.
Why Cramming Creates High Retrieval, Low Storage
Cramming is the most widely observed application of the storage/retrieval distinction, and it illustrates the theory cleanly. A student who revises for an examination by reading through all their notes the night before will enter the examination with very high retrieval strength for the material. They will feel prepared. The information is highly accessible. They may perform adequately in the examination if it occurs within twelve to twenty-four hours.
However, because the student has been reviewing material with already-high retrieval strength, almost nothing has been done to increase storage strength (Bjork & Bjork, 1992). Within days of the examination, retrieval strength decays sharply, and because storage strength is low, the information becomes genuinely inaccessible. This is the mechanism behind the phenomenon that teachers describe as "forgetting everything after the exam." It is not that students have stopped caring about the subject, or that their memory has mysteriously erased itself. The content was encoded shallowly and the cramming strategy did nothing to change that.
The contrast with spaced retrieval practice is instructive. A student who retrieves the same information on five occasions, spread across several weeks, with gaps between each retrieval attempt, will have built substantially higher storage strength. They may recall less fluently in the day or two immediately before the examination (retrieval strength temporarily low after the last review gap), but they will retain the knowledge for months or years after the examination, because the information is genuinely encoded.
Classroom example (Year 11 Biology, GCSE): A teacher notices that students who revised enzyme function by rereading notes the night before a test scored well on the test but could not answer questions about enzymes six weeks later during a practice paper. She introduces a low-stakes retrieval quiz on enzyme function every three weeks throughout the year, using questions the students answer from memory without notes. Students initially find these quizzes uncomfortable. By March, they can answer enzyme questions reliably, even for content taught in September. The retrieval strength fluctuates between quizzes, but storage strength has been built across multiple retrieval events.
Forgetting Curves and Storage Strength
Hermann Ebbinghaus documented the forgetting curve in 1885, showing that retention of new information drops sharply over the first hours and days after learning, then levels off. Bjork's framework provides the theoretical machinery that explains what Ebbinghaus observed empirically.
Ebbinghaus was measuring retrieval strength: his forgetting curve shows retrieval strength declining from 100% at the time of study to roughly 20% after a month without re-exposure. Bjork's contribution is to distinguish this observable retrieval strength curve from the less visible storage strength curve. Storage strength, in Bjork's model, does not track the forgetting curve. It is built during retrieval events, particularly those that occur when retrieval strength is low.
This distinction matters for practical planning. If you look only at the Ebbinghaus curve, the implication seems to be that you should re-study material as frequently as possible to keep retrieval strength from falling. Bjork's theory reveals that this is exactly wrong. Allowing retrieval strength to fall, then practising retrieval at the trough of the forgetting curve, produces the largest possible increase in storage strength. The goal is not to prevent the forgetting curve from dropping. The goal is to exploit the drop by timing retrieval practice to coincide with moments of reduced retrieval strength (Bjork & Bjork, 1992).
A practical implication is that the optimum spacing of review sessions is not constant. The first review after learning should occur relatively soon (within one to two days), when storage strength is low and retrieval strength has dropped to a level where retrieval requires effort but is still achievable. Subsequent reviews should be spaced further apart, because each successful retrieval event increases storage strength and slows the subsequent decay of retrieval strength. This expanding-interval pattern is the basis of spaced practice systems. For a detailed guide to implementing this in your classroom, see the article on spaced practice.
Why Re-Study Fails at High Retrieval Strength
There is an asymmetry in the storage/retrieval relationship that has direct implications for lesson design. Bjork's research shows that the benefit of a study or practice event is inversely related to the current retrieval strength of the material being studied. When retrieval strength is high, a study event produces a small gain in storage strength. When retrieval strength is low, the same study event produces a much larger gain.
This asymmetry means that massed re-study is self-defeating. A student who reads through a chapter, then immediately re-reads it, gains little from the second reading because retrieval strength is still maximal from the first. The same student who reads the chapter, waits a day, then attempts to recall the main points from memory (with retrieval strength now reduced) will gain substantially more from that retrieval event than from any amount of immediate re-reading.
The practical implication is that the structure of practice matters more than the quantity. Fifteen minutes of spaced retrieval distributed over a week produces more durable learning than an hour of massed re-reading on a single evening. This finding has been replicated across subjects, age groups, and types of knowledge, from vocabulary learning in language classes to procedural skills in mathematics and science (Soderstrom & Bjork, 2015).
Classroom example (Year 9 French Vocabulary): A teacher sets homework using a vocabulary learning application that shows students words they already know at high frequency alongside new words. Students find this enjoyable: they are mostly getting correct answers because retrieval strength is high for known words. The teacher replaces this with a distributed retrieval task: on day one, students learn twelve new words; on day two, they retrieve all twelve from memory before seeing them; on day five, they retrieve again; on day fourteen, they retrieve again. The second approach produces slower apparent progress in weeks one and two but substantially better retention at six weeks.
Why Students and Teachers Misread Performance
One of the most practically significant implications of Bjork's framework is that it reveals the gap between performance and learning. Performance is observable: it is what a student can do right now, in this lesson, under current conditions. Learning is the durable change in knowledge or skill that persists over time and transfers to new contexts. Performance during a lesson correlates poorly with long-term learning (Soderstrom & Bjork, 2015).
This means that the signals teachers typically use to assess whether students have learned something are unreliable. A student who can answer questions fluently during a lesson has high retrieval strength for the material right now. That high retrieval strength may reflect genuine high storage strength (Quadrant 1: excellent), or it may reflect the recent exposure to the material (Quadrant 3: fragile). The teacher cannot tell from the in-class performance which quadrant the student is in.
The converse is equally important. A student who struggles to answer a retrieval question three weeks after the lesson is not necessarily in Quadrant 4 (genuine gap requiring reteaching). They may be in Quadrant 2 (high storage, low retrieval), and a brief retrieval practice event will restore access quickly. Teachers who treat every retrieval failure as evidence of insufficient learning will over-reteach, which is time-inefficient and misses the opportunity to use the retrieval failure itself as a productive learning event.
This connects to the broader framework of formative assessment. Effective formative assessment should distinguish between temporary inaccessibility and genuine gaps. Asking students to retrieve information from memory under conditions of reduced retrieval strength is itself a more valid assessment than asking them to perform immediately after teaching.
Working Memory and Retrieval Reconstruction
Understanding why retrieval practice builds storage strength requires a brief look at what happens at the level of working memory. When a student attempts to retrieve a memory under conditions of reduced retrieval strength, working memory is recruited to search long-term memory, to activate partial cues, and to reconstruct the target representation from fragmentary traces.
This reconstruction process is cognitively demanding and is the key mechanism. When the answer is present in the environment (as it is during re-reading or note-reviewing), working memory does not need to perform this reconstruction. The student recognises the information rather than recalling it. Recognition and recall are distinct cognitive processes: recognition requires only that the presented information activates the stored representation, while recall requires the reverse process, constructing a representation from internal cues.
The effort of recall under reduced retrieval strength is what drives storage strength upward. This is also why simply making material more vivid, colourful, or interesting does not reliably increase long-term retention. Visual presentation affects encoding quality (which is the domain of cognitive load theory), but it does not address the retrieval dynamics that determine whether encoded information becomes durably stored. Storage strength is built through retrieval, not through re-exposure, regardless of how well-designed that re-exposure is.
Classroom example (Year 7 History): A teacher creates a beautiful and well-organised revision display on the causes of the Norman Conquest. Students enjoy consulting it and can answer questions about it confidently while it is on the wall. When the display is removed and students are asked to recall the causes from memory two weeks later, performance drops sharply. A parallel class instead uses five-minute brain dumps at the start of three subsequent lessons, writing everything they remember from memory. Their recall two weeks later is significantly better than the class with the display, despite (or because of) the greater apparent difficulty of the task.
The Forgetting-Learning Loop: How We Build Long-Term Memory
How Desirable Difficulties Exploit This Distinction
Bjork coined the term 'desirable difficulties' to describe conditions that slow apparent progress but increase long-term retention and transfer (Bjork, 1994). Each of the main desirable difficulties works by exploiting the storage/retrieval distinction in a specific way.
Spaced practice works by allowing retrieval strength to drop between practice sessions. The gap is not wasted time; it is the condition that makes the subsequent retrieval event maximally productive for storage strength.
Interleaving works by preventing students from building retrieval strength for one problem type through blocked repetition. Instead, students must identify which approach to use before applying it, which requires genuinely accessing stored knowledge rather than continuing a just-established retrieval pathway.
The testing effect is the finding that testing produces better retention than re-studying, even when the test produces errors. It works directly on storage strength: each retrieval attempt, successful or not, reconstructs and reinforces the memory trace. Even a failed recall attempt primes subsequent learning of the correct answer.
Desirable difficulties as a family of strategies are united by this mechanism: they all introduce conditions that reduce retrieval strength temporarily in order to create the conditions under which storage strength can be substantially increased. Understanding this underlying logic helps teachers apply the principles flexibly rather than following rigid recipes.
Interleaving deserves closer examination because it is the desirable difficulty teachers find most counterintuitive and students find most uncomfortable. In blocked practice, a student completes ten problems of the same type, then ten of a different type. In interleaved practice, problem types are mixed. Blocked practice produces better immediate performance; interleaved practice produces better retention and transfer (Kornell & Bjork, 2008).
The storage/retrieval framework explains why. During blocked practice, the student solves the first problem of a type, establishing the retrieval pathway. Problems two through ten travel that same pathway with high (and rising) retrieval strength, so each adds minimal storage strength. During interleaved practice, retrieval strength for any given approach drops between instances because other problem types intervene. Reinstating the retrieval pathway is a genuine storage-strength-building event.
The practical difficulty is that students experiencing interleaved practice will often tell teachers they are confused. Their current performance is visibly worse, and this is uncomfortable for both parties. The research evidence is clear that this discomfort is not only acceptable but necessary. Kornell and Bjork (2008) found that students consistently preferred blocked practice and rated it as more effective, even after their own test results demonstrated the opposite. This is the metacognitive illusion created by the performance/learning distinction.
Classroom example (Year 10 Mathematics, GCSE): A teacher is covering simultaneous equations, quadratics, and inequalities. Rather than practising each topic in separate blocked sets, she creates a weekly mixed problem set combining all three alongside earlier topics (linear equations, rearranging formulae). Students initially find the sets frustrating and request separate topic sheets. Six weeks later, they significantly outperform a parallel class on both individual topics and on unseen problems combining elements of more than one topic. The retrieval difficulty of identifying the correct method before applying it is precisely what has built stronger storage for each approach.
Teachers using interleaving need to explain the rationale to students explicitly. Without this explanation, students interpret the difficulty as evidence that they are failing, rather than as evidence that the practice design is working. Sharing the storage/retrieval distinction is one way to do this, and is addressed in the section below.
How to Explain This to Students
Students arrive in lessons with intuitive theories about memory and learning that are largely incorrect. These intuitive theories lead them towards ineffective study strategies: rereading notes, highlighting, and massed practice the night before an assessment. Sharing the storage/retrieval distinction directly with students is an act of metacognitive instruction that has measurable effects on their study behaviour (Bjork & Bjork, 2011).
The explanation does not need to be complex. A straightforward classroom version runs as follows:
"Your memory has two separate dials. One is how well something is stored, deep down. The other is how easy it is to get it out right now. The problem is that studying something when you can already get it out easily does almost nothing to store it more deeply. That's why rereading your notes before a test feels helpful but often isn't. Your brain already knows the information is there, so it doesn't bother making a stronger copy. What actually stores things deeply is retrieving them when it's hard. When you have to struggle to get something out of your memory, that's when your brain makes the strongest copy."
This explanation, delivered in age-appropriate language, prepares students for the experience of interleaved or spaced practice. It shifts the meaning of difficulty during practice from "I don't understand this" to "my brain is building a stronger memory right now." This reframing is not motivational rhetoric; it is an accurate description of the cognitive process, and it has been shown to reduce student resistance to desirable difficulty interventions (Soderstrom & Bjork, 2015).
For younger students, a physical analogy can help. Ask them to imagine that remembering something is like lifting a weight. Lifting a weight that is already in your hands (high retrieval strength) is easy but does not build strength. Picking up a weight that has been put down and feels heavy (low retrieval strength) is harder but builds strength faster. This metaphor captures the key asymmetry without requiring students to understand the theoretical framework.
What This Theory Does Not Tell Us
Bjork's framework is powerful, but it is important for teachers to understand its limits. The theory describes the relationship between storage strength, retrieval strength, and practice, but it does not specify what constitutes sufficient prior knowledge for retrieval practice to work. If a student has genuinely never encountered a concept, or lacks prerequisite schema structures, retrieval practice will produce confusion rather than learning. The theory presupposes that there is something stored to retrieve, even weakly.
The research base is also more robust for some types of knowledge than others. The spacing and testing effects have been replicated extensively for factual and procedural knowledge: vocabulary, historical facts, mathematical procedures, scientific terminology. The evidence for complex, conceptual knowledge (constructing arguments, evaluating evidence, generating creative responses) is more mixed, though there is no evidence that spacing and retrieval practice are harmful for these types of learning either (Soderstrom & Bjork, 2015).
The theory does not directly address motivation. Students who are disengaged from a subject, or who have significant anxiety about performance, may not benefit from desirable difficulties in the same way, because the emotional processing involved in high-stakes retrieval attempts can interfere with the cognitive mechanisms the theory describes. Teachers working with students in this situation may need to address motivation and anxiety before the full benefits of retrieval practice become available. This is where self-regulated learning frameworks become relevant alongside the memory science.
Connecting to EEF Evidence and Classroom Impact
The Education Endowment Foundation's Teaching and Learning Toolkit rates metacognition and self-regulation at an average impact of seven months' additional progress. Much of the evidence base behind that estimate involves students learning to use spacing, retrieval practice, and elaborative interrogation: strategies that work precisely because they exploit the storage/retrieval distinction.
The EEF evidence on formative assessment (four months' average impact) also connects to this framework. Formative assessment designed around retrieval rather than recognition gives teachers more accurate information about storage strength, not just current retrieval strength. A teacher who uses a low-stakes retrieval quiz two weeks after teaching a topic is measuring something closer to genuine learning than a teacher who asks questions during the lesson when retrieval strength is still high.
This matters for planning. If you want to know whether students have learned something, ask them to retrieve it from memory under conditions of reduced retrieval strength: at least a day after teaching, without access to notes, in a format that requires recall rather than recognition. The results will be less impressive and more accurate than a lesson-end check. They will also be more useful for deciding what to do next, because they distinguish Quadrant 2 students (who need a retrieval prompt) from Quadrant 4 students (who need reteaching).
Common Study Strategies Compared
Most study advice that students receive focuses on increasing retrieval strength without touching storage strength. Understanding this distinction allows teachers to evaluate common strategies directly.
Strategy
Effect on Retrieval Strength
Effect on Storage Strength
Verdict
Rereading notes
Raises temporarily
Minimal gain
Poor long-term return
Highlighting
Raises slightly during activity
Minimal to no gain
No evidence of benefit over rereading
Summarising
Raises during activity
Small to moderate gain if done from memory
Better if summary is written without notes
Mind mapping from notes
Raises during activity
Small gain
Better if done from memory (then becomes retrieval practice)
Retrieval practice (flashcards, free recall, quizzing)
Rises after each attempt
Large gain, especially when retrieval is effortful
Most effective single strategy
Spaced retrieval practice
Fluctuates between sessions; rises after each
Very large cumulative gain
Most effective overall approach
Practice tests under exam conditions
Variable (stress can suppress retrieval)
Large gain, particularly after feedback
Effective, especially with corrective feedback
The table reveals a common pattern: strategies that involve actively retrieving information from memory (rather than reviewing it in front of you) consistently produce larger gains in storage strength. The single most impactful change teachers can make to their students' study habits is to shift them from recognition-based revision (reading, highlighting, reviewing) to recall-based revision (blank-page recall, flashcards, practice questions without notes).
Why Students Forget Everything After an Exam
The experience of students forgetting all examination content within days of the exam is a diagnostic signal, not a mystery. Using Bjork's framework, the mechanism is completely clear.
Examination preparation typically involves students working through content repeatedly in the weeks before an exam, with the goal of being able to answer questions correctly on the day. This is a legitimate goal and the strategy succeeds: retrieval strength peaks around the time of the examination. However, if that preparation consisted primarily of rereading, summarising, and reviewing material that was already familiar, then each revision session was conducted under conditions of relatively high retrieval strength. The gain in storage strength from each session was therefore small. Retrieval strength was successfully maintained; storage strength was not substantially built.
Within one to two weeks of the examination, retrieval strength decays (as it always does without use), and because storage strength is modest, the material crosses below the threshold of accessible recall. This is not a failure of memory, intelligence, or effort. It is the predictable outcome of using the wrong practice strategy.
The fix is not to revise more. It is to revise differently. Students who prepare for examinations using spaced retrieval practice spread across weeks and months, deliberately allowing retrieval strength to fall between sessions, will retain the material for months and years after the examination. They will also perform better on the examination itself, because they will have built sufficient storage strength that even under the moderately elevated stress of examination conditions, retrieval strength remains adequate.
A Practical Planning Guide for Teachers
The storage/retrieval framework translates into a small number of practical planning decisions that can be applied across any subject.
First, build retrieval events into your scheme of work rather than treating review as an end-of-unit activity. A minimum effective dose is three spaced retrieval practice events for each major piece of content, timed to coincide with periods of reduced retrieval strength. For content taught in Week 1, a retrieval event in Week 2, another in Week 5, and a third in Week 10 will produce substantially better retention than three revision lessons in Weeks 9, 10, and 11.
Second, use the four quadrants as a diagnostic lens. Before deciding whether to reteach, ask whether a retrieval event might restore access first. If students struggle with a retrieval starter, wait for their responses before judging their knowledge. Students in Quadrant 2 will often recall more than they initially think if given a little time and a few retrieval cues.
Third, teach the storage/retrieval distinction explicitly to your students, at an age-appropriate level. Students who understand why difficulty is productive during practice are better placed to persist with effortful revision strategies rather than defaulting to comfortable but ineffective ones. This is a direct application of self-regulated learning principles and requires no specialist training to implement.
Fourth, align your formative assessment practices to measure storage strength rather than retrieval strength. Ask students to retrieve content from memory with at least a day's gap from the last teaching of that content. Use low-stakes conditions to reduce the anxiety that can suppress retrieval strength independently of storage strength.
In your next lesson, identify one piece of content that you taught at least one week ago and design a five-minute blank-page retrieval activity around it. Ask students to write everything they remember, without notes or prompts. Observe their responses using the four-quadrant lens: who recalls it fluently (Quadrant 1), who struggles but gets there with effort (Quadrant 2), and who has genuinely no access (Quadrant 4). Let the data determine whether your next move is to maintain, prompt, or reteach.
---
From Performance to Learning: 3 Structural Classroom Fixes
Further Reading: Key Papers on This Topic
Bjork, R.A. & Bjork, E.L. (1992) "A new theory of disuse and an old theory of stimulus fluctuation"View study ↗
The foundational paper introducing the storage/retrieval distinction and the New Theory of Disuse. Bjork and Bjork argue that retrieval strength decays with disuse while storage strength does not, and that this asymmetry explains why spaced practice and retrieval practice are more effective than massed study. Essential reading for any teacher wanting to understand the theoretical basis of desirable difficulties.
Soderstrom, N.C. & Bjork, R.A. (2015) "Learning versus performance: An integrative review"View study ↗
This review paper synthesises decades of research distinguishing performance during learning (current retrieval strength) from actual learning (storage strength). Soderstrom and Bjork examine why the two dissociate and what conditions create the largest gap between them. Directly relevant to teachers who want to understand why in-class performance is a poor predictor of long-term retention.
Kornell, N., Bjork, R.A. & Garcia, M.A. (2011) "Why tests appear to prevent forgetting"View study ↗
Kornell, Bjork and Garcia examine the mechanisms behind the testing effect, arguing that tests do not simply strengthen individual memories but also reduce interference from competing memories. The paper provides evidence that the testing effect operates through storage strength enhancement rather than retrieval strength maintenance, with implications for how teachers design low-stakes quizzes.
Storm, B.C., Bjork, R.A. & Storm, J.C. (2010) "Optimizing retrieval as a learning event"View study ↗
Storm, Bjork and Storm explore the conditions under which retrieval is most productive as a learning event, focusing on the relationship between retrieval difficulty and subsequent retention. Their findings support the counterintuitive claim that harder retrieval (lower retrieval strength at the time of testing) produces stronger long-term storage, provided the retrieval is ultimately successful.
Bjork, E.L. & Bjork, R.A. (2011) "Making things hard on yourself, but in a good way"View study ↗
A readable synthesis paper aimed partly at practitioners, in which Elizabeth and Robert Bjork review the full range of desirable difficulties (spacing, interleaving, testing, variation) through the lens of the storage/retrieval distinction. The paper includes a section on metacognitive illusions and why learners consistently choose less effective study strategies. A strong starting point for teachers new to the Bjork research programme.