Stop Grading Facts: Assessing Conceptual Understanding in the MYP

Updated on  

March 24, 2026

Stop Grading Facts: Assessing Conceptual Understanding in the MYP

|

March 24, 2026

Stop assessing recall and start assessing understanding. A practical rubric for MYP teachers using the Thinking Framework to map depth of knowledge across achievement levels 1-8.

Most MYP rubrics ask teachers to assess "conceptual understanding" at levels 5 to 8. Most MYP teachers end up assessing whether students can describe things clearly. The gap between those two outcomes is not a marking problem. It is a task design problem.

Key Takeaways

  1. Recall is not understanding: A student who can name the parts of a cell is not demonstrating conceptual understanding. A student who can explain why selective permeability matters is.
  2. The Thinking Framework maps directly to MYP levels: Each of the eight cognitive operations corresponds to a specific achievement level band, giving teachers a practical tool for task design.
  3. Three-level assessment architecture: Every MYP summative should have tasks at factual, conceptual, and transfer levels. Most only have the first.
  4. Erickson's generalisation level is the target: The IB asks students to work at the level of generalisations and theories, not facts and topics. Most assessments stop short of this.
  5. Report comments write themselves: When you assess with the depth mapping, you can cite the exact cognitive operation a student is using, making written feedback specific and actionable.

The Problem: Teachers Grade What Is Easy to Measure

A Year 9 history teacher asks students to "explain the causes of the First World War." Forty responses arrive. They all list four or five causes from the lesson notes. They vary slightly in phrasing and order. Most teachers award level 5 or 6 because the descriptions are detailed and accurate.

But detailed description is not analysis. And accurate recall is not conceptual understanding.

The IB's MYP criterion descriptors explicitly use words like "analyse", "evaluate", and "synthesise" at the higher achievement levels. Yet when the task only asks students to describe, teachers have no choice but to mark what they receive. The rubric has been rendered meaningless at levels 7 and 8 because the task never required that depth of thinking.

This is not a teacher failure. The IB's own guidance on task design is less prescriptive than its guidance on rubric descriptors. Teachers understand that level 7 means something more rigorous than level 3, but the training rarely goes deep enough on how to design a task that actually elicits the difference. The result is what Dylan Wiliam (2011) calls "assessment validity drift": the descriptors claim to assess one thing, but the task actually measures another.

What the IB Actually Means by Conceptual Understanding

Lynn Erickson developed the most useful model for understanding what the IB means here. In Concept-Based Curriculum and Instruction for the Thinking Classroom, Erickson and Lanning (2014) describe a Structure of Knowledge with five layers:

Layer Example (Biology) Assessment Implication
Facts The cell membrane is made of phospholipids Level 1-2 assessment: recall and identification
Topics Cell membrane structure Topic-level description: names components
Concepts Selective permeability Level 3-4: describes the concept with detail
Generalisations Cells maintain internal conditions by controlling what moves across membranes Level 5-8 target: explains relationships
Theory Cell theory: living organisms are composed of cells that maintain homeostasis Level 7-8 synthesis: evaluates across systems

The IB's stated aim is that students work at the generalisation and theory levels, not the fact and topic levels. Most MYP summative tasks stop at the concept level at best. A student who "describes selective permeability" has reached level 3 or 4 in Erickson's model. The task that asks them to "explain why cells would cease to function if membrane permeability became non-selective" is pushing towards the generalisation level, which is where levels 5 to 8 live.

Grant Wiggins and Jay McTighe (2005) make the same point in Understanding by Design. Their "facets of understanding" model identifies six dimensions of genuine understanding, the highest of which is transfer: can the student apply their understanding to a new context they have never encountered before? If your summative task is one the student has practised, you are not assessing transfer. You are assessing familiarity.

The Thinking Framework Depth Mapping

The Thinking Framework's eight cognitive operations map directly onto MYP achievement levels. This is Paul Main's core insight: the operations are not just thinking tools. They are depth indicators. Each operation requires a specific kind of cognitive work that aligns with what the IB expects at a given level band.

MYP Level Cognitive Demand Thinking Framework Operations What Student Work Looks Like
1-2 (Limited) Recall, identify Classify (sort into given categories), Sequence (put in given order) Lists facts, follows a template, copies structure
3-4 (Adequate) Describe, outline Compare (similarities and differences), Part-Whole (break down into components) Describes with some detail, identifies components, makes basic comparisons
5-6 (Substantial) Analyse, explain Cause and Effect (explain why), Analogy (connect to other contexts) Explains relationships, identifies causes, transfers to new contexts
7-8 (Excellent) Evaluate, synthesise Perspective (multiple viewpoints), Systems Thinking (interconnections) Evaluates competing arguments, synthesises across sources, identifies systemic patterns

This mapping does something that standard Bloom's Taxonomy guidance does not do for the MYP context: it gives you a concrete cognitive operation to design around, not just a verb to look for in student writing. "Analyse" is an instruction to students. "Cause and Effect" is the thinking structure you build the task around.

The distinction matters because Bloom's verbs describe what you want students to do in their response. The Thinking Framework operations describe the mental structure that makes that response possible. When you design a task that requires Cause and Effect thinking, you are not just hoping students will analyse. You are building the question so that analysis is the only way to answer it.

Norman Webb (1997) made a similar argument with his Depth of Knowledge framework: the cognitive demand is a property of the task, not the student's response. A Webb's Depth of Knowledge level 1 task cannot produce level 4 responses regardless of how capable the student is. The same logic applies here. If your MYP task only requires Classify thinking, you cannot award level 7 even if a student's response is beautifully written.

Designing Assessment Tasks at Each Level

Here are worked examples showing how each Thinking Framework operation translates into an MYP task. The subject examples are drawn from different disciplines to show that the mapping is not subject-specific.

Level Operation Example Task
1-2 Classify Sort these ten historical events into the categories 'political', 'economic', and 'social'.
1-2 Sequence Arrange these stages of mitosis in the correct order and name each one.
3-4 Compare Identify three similarities and three differences between the causes of the First and Second World Wars.
3-4 Part-Whole Break down a short story into its narrative components: setting, protagonist, conflict, resolution. Describe what each contributes.
5-6 Cause and Effect Explain why industrialisation caused urbanisation in 19th-century Britain, then predict where you would expect to see similar patterns emerging in a developing economy today.
5-6 Analogy A student says that a cell is like a factory. Evaluate this analogy: which aspects of cellular function does it explain well, and which aspects does it fail to capture?
7-8 Perspective Evaluate the claim that globalisation benefits everyone by examining it from the perspectives of a multinational company, a rural farmer in a developing country, and an environmental scientist.
7-8 Systems Thinking A government introduces a sugar tax. Map all the likely effects across health outcomes, food industry behaviour, household economics, and public attitudes. Identify which effects might be self-reinforcing.

Notice how the Cause and Effect task at level 5-6 is designed so that the student cannot answer it by describing. They have to explain a mechanism and then transfer that mechanism to a new context. The Perspective task at level 7-8 is designed so that a student who simply lists facts from three viewpoints will produce a level 5-6 response. To reach level 7-8, they have to evaluate the competing claims, which requires synthesising across the perspectives rather than merely reporting them.

This is what metacognition research consistently shows: students can only demonstrate the cognitive complexity that the task demands. Building tasks that constrain the thinking operation is the precondition for accurate assessment.

The Three-Level Assessment Architecture

Julie Stern, Krista Ferraro, and Juliet Mohnkern (2017) describe what they call "transfer task design" in Tools for Teaching Conceptual Understanding. Their argument is that every strong summative assessment needs tasks at three distinct levels, which they call factual, conceptual, and transfer. The MYP achievement band makes this structure particularly legible.

Level Question Type MYP Achievement Band Thinking Framework Operation
1 — Factual Do they know the content? 1-4 Classify, Sequence, Compare, Part-Whole
2 — Conceptual Can they explain the relationships? 5-6 Cause and Effect, Analogy
3 — Transfer Can they apply it to a new context? 7-8 Perspective, Systems Thinking

A properly designed MYP summative has all three levels present. A student who completes only the factual level questions is capped at level 4. A student who also completes the conceptual level questions demonstrates level 5-6 capability. A student who reaches the transfer level and demonstrates they can evaluate from multiple perspectives or map systemic effects earns level 7-8.

Most MYP summatives that teachers share at moderation sessions have level 1 and sometimes level 2 tasks. The transfer task is either absent or disguised as a writing task ("write a report that evaluates...") without scaffolding the specific cognitive operation being assessed.

Wiggins and McTighe (2005) are direct on this point: if you only assess what students can do with familiar material, you are assessing memory, not understanding. The transfer task does not need to be an entirely new topic. It needs to present the concept in a context the student has not previously studied. An economics student who has studied supply and demand in the context of global commodity markets should be assessed on their ability to apply those concepts to a local housing shortage they have not analysed before. That is a transfer task.

Common Assessment Mistakes in the MYP

Three patterns appear so regularly in MYP moderation that they deserve naming directly.

Using the verb "analyse" in the task but accepting "describe" in the marking. This is the most common problem. The task says "analyse the impact of..." but the level 5-6 descriptor has been written so broadly that detailed description earns the band. The solution is to build the analysis operation into the task design rather than hoping students will spontaneously perform it.

Over-scaffolding so the rubric does the thinking. Some teachers provide such detailed rubric criteria that the student only needs to match their response to the descriptor rather than construct an original response. At level 7-8, rubric descriptors should describe the quality of thinking, not the content of the answer. If your descriptor for level 8 includes the specific arguments the student should make, you have described a level 4 task with level 8 packaging.

Assessing the product, not the process. In subjects like Design and Drama, teachers sometimes mark the finished artefact rather than the thinking that produced it. A beautiful model that was built by following a template demonstrates Sequence thinking at level 1-2. A model that was designed to solve a novel constraint, tested against criteria, and revised based on feedback evidence demonstrates Systems Thinking at level 7-8. The product looks similar. The cognitive process is entirely different.

Dylan Wiliam's (2011) research on formative assessment is useful here. He argues that the most effective assessment practice involves teachers making the learning intention visible at two levels: the surface level (what students will do) and the deep level (what cognitive structure they will use to do it). MYP rubric descriptors typically only specify the surface level. The Thinking Framework operations specify the deep level.

Rubric Design Using the Thinking Framework

The following example shows how to upgrade a standard MYP-style rubric by overlaying the Thinking Framework operations. The subject is Year 10 History, Criterion B (Investigating).

Before (standard MYP-style descriptor):

Level Standard Descriptor
7-8 The student consistently analyses and evaluates a range of sources and demonstrates a thorough understanding of historical significance.
5-6 The student analyses some sources and demonstrates a substantial understanding of historical significance.
3-4 The student describes some sources and demonstrates an adequate understanding of historical significance.

After (Thinking Framework-informed descriptor):

Level TF Operation Thinking Framework-Informed Descriptor
7-8 Perspective + Systems Thinking The student evaluates sources by examining competing interpretations and identifying how the historian's standpoint, context, or purpose shapes the argument. They identify patterns across sources that reveal systemic factors in historical change.
5-6 Cause and Effect The student explains how specific sources provide evidence for causal claims. They make explicit connections between historical evidence and the factors they identify as significant, rather than treating evidence as illustration.
3-4 Compare + Part-Whole The student describes what sources show and identifies similarities or differences between them. They can break a source down into its component claims but do not yet explain why those claims are significant.

The after-version does something the before-version does not: it tells the teacher what the student is cognitively doing, not just how much they are doing it. The difference between level 5-6 and level 7-8 is not quantity ("analyses some" versus "analyses a range of"). It is the cognitive operation: explaining causes versus evaluating perspectives.

This also makes the rubric usable as a formative assessment tool. Partway through a unit, you can ask students: "Which level is your current draft at?" A student can identify whether they are writing at the Compare level or the Cause and Effect level. That is a meaningful metacognitive question. "Is this adequately detailed?" is not.

The SOLO Taxonomy Connection

Teachers already familiar with SOLO Taxonomy (Biggs and Collis, 1982) will recognise the parallel. SOLO's five levels move from prestructural through unistructural and multistructural to relational and extended abstract. The Thinking Framework operations map onto this progression in a way that makes SOLO practically useful in the IB context.

Where SOLO provides a description of the structure of student responses, the Thinking Framework provides the operation that produces that structure. A student at SOLO's "relational" level is performing Cause and Effect or Compare thinking. A student at "extended abstract" is performing Perspective or Systems Thinking. The two frameworks are complementary: SOLO tells you what you see in the work, the Thinking Framework tells you what to design for.

This connection also clarifies a frequent confusion in MYP marking: teachers sometimes award level 7-8 for a very detailed response that demonstrates extensive factual knowledge. Extensive factual knowledge, however well organised, is SOLO "multistructural": it lists many related pieces of information without connecting them into a relational or causal structure. Multistructural responses, however detailed, belong in the level 3-4 band. The IB's rubric language implies this, but the Thinking Framework makes it explicit.

What This Means for Report Writing

One of the most practical consequences of using the Thinking Framework depth mapping is that it makes your written feedback specific enough to be acted upon. Hattie and Timperley (2007), in their influential review of feedback research, identify three conditions that make feedback effective: it must address where the student currently is, where they need to go, and how to get there. Generic level descriptions fail all three conditions.

Consider the difference between these two report comments for a Year 9 student in Geography:

Generic: "Ahmed demonstrates a good understanding of migration patterns and writes with clear structure. To improve, he should develop his analysis further and consider a wider range of perspectives."

Thinking Framework-informed: "Ahmed consistently explains cause-and-effect relationships in migration patterns, which places his work solidly at level 5-6. His responses identify push and pull factors and connect them to outcomes. To reach level 7-8, Ahmed needs to evaluate competing perspectives: for instance, examining why the same migration pattern might be described differently by the destination government, the migrants themselves, and environmental scientists. That is the Perspective operation, and his next summative task will be designed to give him the opportunity to practise it."

The second comment tells Ahmed exactly which cognitive operation he is currently performing, which one he needs to develop, and what that will look like in his next task. That is actionable feedback in the sense that metacognition research defines it: the student knows what to do differently, not just that something needs to improve.

This approach also makes moderation conversations more productive. Instead of debating whether a response "analyses" sufficiently, departments can ask: "Is this student performing Cause and Effect thinking or Perspective thinking?" That is an observable, discussable question with a trainable answer.

What to Try With Your Next Summative

Take one MYP summative task you are currently planning. Before you write the criteria descriptors, identify the highest Thinking Framework operation you want to assess at level 7-8. Design the task so that a student cannot reach full marks without performing that operation.

If you are assessing level 7-8 and the operation is Perspective, the task needs to present a claim or situation that genuinely has multiple legitimate interpretations. The student's job is to evaluate those interpretations, not describe them. A task that asks students to "consider different viewpoints" does not automatically require Perspective thinking. A task that asks students to "evaluate which viewpoint is better supported by the evidence you have gathered, and explain what evidence would be needed to change your conclusion" does.

Then write your rubric level descriptions starting from the highest band and working downwards. Level 7-8 describes Perspective or Systems Thinking. Level 5-6 describes what Cause and Effect thinking looks like in this task. Level 3-4 describes Compare or Part-Whole thinking. Level 1-2 describes Classify or Sequence thinking. The criteria now form a cognitive ladder, not just a detail scale.

Run the first summative using this approach. Compare the quality of student responses with your previous cohort's work on a similar topic. The shift is usually visible in the first marking session: fewer responses that are long but shallow, more responses that are shorter but genuinely analytical.

For further support with IB assessment design and applying the Thinking Framework across your department, the resources linked below offer practical starting points.

Key Takeaways

  1. Task design determines ceiling: The cognitive operation in the task sets the maximum level a student can demonstrate. No matter how capable the student, a Classify task cannot produce Perspective thinking.
  2. Erickson's generalisation level is the MYP target: Assessing at the fact or topic level is not what the MYP rubric is designed to reward. The upper bands require generalisation and theory-level responses.
  3. The depth mapping is a design tool, not just a marking tool: Use the Thinking Framework operations before you write the task, not after you have received the student responses.
  4. Three-level architecture: Every MYP summative should contain factual tasks (levels 1-4), conceptual tasks (levels 5-6), and transfer tasks (levels 7-8). Most only have the first level.
  5. Feedback becomes specific: Naming the cognitive operation a student is using or needs to develop transforms generic level descriptions into actionable next steps.

Further Reading: Key Research Papers

Further Reading: Key Papers on MYP Assessment and Conceptual Understanding

These papers provide the theoretical and empirical foundations for the assessment approach described in this guide.

Concept-Based Curriculum and Instruction for the Thinking Classroom View study ↗
Core text

Erickson, H. L., & Lanning, L. A. (2014)

The foundational text for understanding the Structure of Knowledge model that underpins IB curriculum design. Erickson's distinction between facts, topics, concepts, generalisations, and theory is essential for understanding why most MYP assessments assess at the wrong level. Directly applicable to any MYP teacher designing summative tasks.

Tools for Teaching Conceptual Understanding, Secondary View study ↗
Core text

Stern, J., Ferraro, K., & Mohnkern, J. (2017)

Stern et al.'s three-level assessment architecture (factual, conceptual, transfer) maps directly onto the MYP achievement bands. The book provides worked examples across multiple subjects showing exactly how to design transfer tasks. Every MYP coordinator should have this on their professional reading list.

Understanding by Design (Expanded 2nd Edition) View study ↗
Core text

Wiggins, G., & McTighe, J. (2005)

Wiggins and McTighe's "backward design" principle is the most widely adopted framework for designing assessments before planning instruction. Their six facets of understanding provide a complementary lens to the Thinking Framework depth mapping, with particular strength in defining what genuine transfer looks like across disciplines.

The Power of Feedback View study ↗
81+ citations

Hattie, J., & Timperley, H. (2007)

The definitive empirical review of feedback research, identifying three essential elements: where the student is now, where they need to go, and how to bridge the gap. Hattie and Timperley's model explains precisely why generic rubric level descriptions fail to improve student performance, and provides the theoretical basis for the Thinking Framework-informed report comments described in this guide.

Alignment of Assessment, Curriculum, and Instruction View study ↗
Core text

Webb, N. L. (1997)

Webb's original articulation of Depth of Knowledge as a property of tasks rather than students remains one of the clearest arguments in assessment research for designing cognitive demand into questions rather than hoping for it in responses. His framework is the direct precursor to the level-to-operation mapping described in this guide.

Loading audit...

Most MYP rubrics ask teachers to assess "conceptual understanding" at levels 5 to 8. Most MYP teachers end up assessing whether students can describe things clearly. The gap between those two outcomes is not a marking problem. It is a task design problem.

Key Takeaways

  1. Recall is not understanding: A student who can name the parts of a cell is not demonstrating conceptual understanding. A student who can explain why selective permeability matters is.
  2. The Thinking Framework maps directly to MYP levels: Each of the eight cognitive operations corresponds to a specific achievement level band, giving teachers a practical tool for task design.
  3. Three-level assessment architecture: Every MYP summative should have tasks at factual, conceptual, and transfer levels. Most only have the first.
  4. Erickson's generalisation level is the target: The IB asks students to work at the level of generalisations and theories, not facts and topics. Most assessments stop short of this.
  5. Report comments write themselves: When you assess with the depth mapping, you can cite the exact cognitive operation a student is using, making written feedback specific and actionable.

The Problem: Teachers Grade What Is Easy to Measure

A Year 9 history teacher asks students to "explain the causes of the First World War." Forty responses arrive. They all list four or five causes from the lesson notes. They vary slightly in phrasing and order. Most teachers award level 5 or 6 because the descriptions are detailed and accurate.

But detailed description is not analysis. And accurate recall is not conceptual understanding.

The IB's MYP criterion descriptors explicitly use words like "analyse", "evaluate", and "synthesise" at the higher achievement levels. Yet when the task only asks students to describe, teachers have no choice but to mark what they receive. The rubric has been rendered meaningless at levels 7 and 8 because the task never required that depth of thinking.

This is not a teacher failure. The IB's own guidance on task design is less prescriptive than its guidance on rubric descriptors. Teachers understand that level 7 means something more rigorous than level 3, but the training rarely goes deep enough on how to design a task that actually elicits the difference. The result is what Dylan Wiliam (2011) calls "assessment validity drift": the descriptors claim to assess one thing, but the task actually measures another.

What the IB Actually Means by Conceptual Understanding

Lynn Erickson developed the most useful model for understanding what the IB means here. In Concept-Based Curriculum and Instruction for the Thinking Classroom, Erickson and Lanning (2014) describe a Structure of Knowledge with five layers:

Layer Example (Biology) Assessment Implication
Facts The cell membrane is made of phospholipids Level 1-2 assessment: recall and identification
Topics Cell membrane structure Topic-level description: names components
Concepts Selective permeability Level 3-4: describes the concept with detail
Generalisations Cells maintain internal conditions by controlling what moves across membranes Level 5-8 target: explains relationships
Theory Cell theory: living organisms are composed of cells that maintain homeostasis Level 7-8 synthesis: evaluates across systems

The IB's stated aim is that students work at the generalisation and theory levels, not the fact and topic levels. Most MYP summative tasks stop at the concept level at best. A student who "describes selective permeability" has reached level 3 or 4 in Erickson's model. The task that asks them to "explain why cells would cease to function if membrane permeability became non-selective" is pushing towards the generalisation level, which is where levels 5 to 8 live.

Grant Wiggins and Jay McTighe (2005) make the same point in Understanding by Design. Their "facets of understanding" model identifies six dimensions of genuine understanding, the highest of which is transfer: can the student apply their understanding to a new context they have never encountered before? If your summative task is one the student has practised, you are not assessing transfer. You are assessing familiarity.

The Thinking Framework Depth Mapping

The Thinking Framework's eight cognitive operations map directly onto MYP achievement levels. This is Paul Main's core insight: the operations are not just thinking tools. They are depth indicators. Each operation requires a specific kind of cognitive work that aligns with what the IB expects at a given level band.

MYP Level Cognitive Demand Thinking Framework Operations What Student Work Looks Like
1-2 (Limited) Recall, identify Classify (sort into given categories), Sequence (put in given order) Lists facts, follows a template, copies structure
3-4 (Adequate) Describe, outline Compare (similarities and differences), Part-Whole (break down into components) Describes with some detail, identifies components, makes basic comparisons
5-6 (Substantial) Analyse, explain Cause and Effect (explain why), Analogy (connect to other contexts) Explains relationships, identifies causes, transfers to new contexts
7-8 (Excellent) Evaluate, synthesise Perspective (multiple viewpoints), Systems Thinking (interconnections) Evaluates competing arguments, synthesises across sources, identifies systemic patterns

This mapping does something that standard Bloom's Taxonomy guidance does not do for the MYP context: it gives you a concrete cognitive operation to design around, not just a verb to look for in student writing. "Analyse" is an instruction to students. "Cause and Effect" is the thinking structure you build the task around.

The distinction matters because Bloom's verbs describe what you want students to do in their response. The Thinking Framework operations describe the mental structure that makes that response possible. When you design a task that requires Cause and Effect thinking, you are not just hoping students will analyse. You are building the question so that analysis is the only way to answer it.

Norman Webb (1997) made a similar argument with his Depth of Knowledge framework: the cognitive demand is a property of the task, not the student's response. A Webb's Depth of Knowledge level 1 task cannot produce level 4 responses regardless of how capable the student is. The same logic applies here. If your MYP task only requires Classify thinking, you cannot award level 7 even if a student's response is beautifully written.

Designing Assessment Tasks at Each Level

Here are worked examples showing how each Thinking Framework operation translates into an MYP task. The subject examples are drawn from different disciplines to show that the mapping is not subject-specific.

Level Operation Example Task
1-2 Classify Sort these ten historical events into the categories 'political', 'economic', and 'social'.
1-2 Sequence Arrange these stages of mitosis in the correct order and name each one.
3-4 Compare Identify three similarities and three differences between the causes of the First and Second World Wars.
3-4 Part-Whole Break down a short story into its narrative components: setting, protagonist, conflict, resolution. Describe what each contributes.
5-6 Cause and Effect Explain why industrialisation caused urbanisation in 19th-century Britain, then predict where you would expect to see similar patterns emerging in a developing economy today.
5-6 Analogy A student says that a cell is like a factory. Evaluate this analogy: which aspects of cellular function does it explain well, and which aspects does it fail to capture?
7-8 Perspective Evaluate the claim that globalisation benefits everyone by examining it from the perspectives of a multinational company, a rural farmer in a developing country, and an environmental scientist.
7-8 Systems Thinking A government introduces a sugar tax. Map all the likely effects across health outcomes, food industry behaviour, household economics, and public attitudes. Identify which effects might be self-reinforcing.

Notice how the Cause and Effect task at level 5-6 is designed so that the student cannot answer it by describing. They have to explain a mechanism and then transfer that mechanism to a new context. The Perspective task at level 7-8 is designed so that a student who simply lists facts from three viewpoints will produce a level 5-6 response. To reach level 7-8, they have to evaluate the competing claims, which requires synthesising across the perspectives rather than merely reporting them.

This is what metacognition research consistently shows: students can only demonstrate the cognitive complexity that the task demands. Building tasks that constrain the thinking operation is the precondition for accurate assessment.

The Three-Level Assessment Architecture

Julie Stern, Krista Ferraro, and Juliet Mohnkern (2017) describe what they call "transfer task design" in Tools for Teaching Conceptual Understanding. Their argument is that every strong summative assessment needs tasks at three distinct levels, which they call factual, conceptual, and transfer. The MYP achievement band makes this structure particularly legible.

Level Question Type MYP Achievement Band Thinking Framework Operation
1 — Factual Do they know the content? 1-4 Classify, Sequence, Compare, Part-Whole
2 — Conceptual Can they explain the relationships? 5-6 Cause and Effect, Analogy
3 — Transfer Can they apply it to a new context? 7-8 Perspective, Systems Thinking

A properly designed MYP summative has all three levels present. A student who completes only the factual level questions is capped at level 4. A student who also completes the conceptual level questions demonstrates level 5-6 capability. A student who reaches the transfer level and demonstrates they can evaluate from multiple perspectives or map systemic effects earns level 7-8.

Most MYP summatives that teachers share at moderation sessions have level 1 and sometimes level 2 tasks. The transfer task is either absent or disguised as a writing task ("write a report that evaluates...") without scaffolding the specific cognitive operation being assessed.

Wiggins and McTighe (2005) are direct on this point: if you only assess what students can do with familiar material, you are assessing memory, not understanding. The transfer task does not need to be an entirely new topic. It needs to present the concept in a context the student has not previously studied. An economics student who has studied supply and demand in the context of global commodity markets should be assessed on their ability to apply those concepts to a local housing shortage they have not analysed before. That is a transfer task.

Common Assessment Mistakes in the MYP

Three patterns appear so regularly in MYP moderation that they deserve naming directly.

Using the verb "analyse" in the task but accepting "describe" in the marking. This is the most common problem. The task says "analyse the impact of..." but the level 5-6 descriptor has been written so broadly that detailed description earns the band. The solution is to build the analysis operation into the task design rather than hoping students will spontaneously perform it.

Over-scaffolding so the rubric does the thinking. Some teachers provide such detailed rubric criteria that the student only needs to match their response to the descriptor rather than construct an original response. At level 7-8, rubric descriptors should describe the quality of thinking, not the content of the answer. If your descriptor for level 8 includes the specific arguments the student should make, you have described a level 4 task with level 8 packaging.

Assessing the product, not the process. In subjects like Design and Drama, teachers sometimes mark the finished artefact rather than the thinking that produced it. A beautiful model that was built by following a template demonstrates Sequence thinking at level 1-2. A model that was designed to solve a novel constraint, tested against criteria, and revised based on feedback evidence demonstrates Systems Thinking at level 7-8. The product looks similar. The cognitive process is entirely different.

Dylan Wiliam's (2011) research on formative assessment is useful here. He argues that the most effective assessment practice involves teachers making the learning intention visible at two levels: the surface level (what students will do) and the deep level (what cognitive structure they will use to do it). MYP rubric descriptors typically only specify the surface level. The Thinking Framework operations specify the deep level.

Rubric Design Using the Thinking Framework

The following example shows how to upgrade a standard MYP-style rubric by overlaying the Thinking Framework operations. The subject is Year 10 History, Criterion B (Investigating).

Before (standard MYP-style descriptor):

Level Standard Descriptor
7-8 The student consistently analyses and evaluates a range of sources and demonstrates a thorough understanding of historical significance.
5-6 The student analyses some sources and demonstrates a substantial understanding of historical significance.
3-4 The student describes some sources and demonstrates an adequate understanding of historical significance.

After (Thinking Framework-informed descriptor):

Level TF Operation Thinking Framework-Informed Descriptor
7-8 Perspective + Systems Thinking The student evaluates sources by examining competing interpretations and identifying how the historian's standpoint, context, or purpose shapes the argument. They identify patterns across sources that reveal systemic factors in historical change.
5-6 Cause and Effect The student explains how specific sources provide evidence for causal claims. They make explicit connections between historical evidence and the factors they identify as significant, rather than treating evidence as illustration.
3-4 Compare + Part-Whole The student describes what sources show and identifies similarities or differences between them. They can break a source down into its component claims but do not yet explain why those claims are significant.

The after-version does something the before-version does not: it tells the teacher what the student is cognitively doing, not just how much they are doing it. The difference between level 5-6 and level 7-8 is not quantity ("analyses some" versus "analyses a range of"). It is the cognitive operation: explaining causes versus evaluating perspectives.

This also makes the rubric usable as a formative assessment tool. Partway through a unit, you can ask students: "Which level is your current draft at?" A student can identify whether they are writing at the Compare level or the Cause and Effect level. That is a meaningful metacognitive question. "Is this adequately detailed?" is not.

The SOLO Taxonomy Connection

Teachers already familiar with SOLO Taxonomy (Biggs and Collis, 1982) will recognise the parallel. SOLO's five levels move from prestructural through unistructural and multistructural to relational and extended abstract. The Thinking Framework operations map onto this progression in a way that makes SOLO practically useful in the IB context.

Where SOLO provides a description of the structure of student responses, the Thinking Framework provides the operation that produces that structure. A student at SOLO's "relational" level is performing Cause and Effect or Compare thinking. A student at "extended abstract" is performing Perspective or Systems Thinking. The two frameworks are complementary: SOLO tells you what you see in the work, the Thinking Framework tells you what to design for.

This connection also clarifies a frequent confusion in MYP marking: teachers sometimes award level 7-8 for a very detailed response that demonstrates extensive factual knowledge. Extensive factual knowledge, however well organised, is SOLO "multistructural": it lists many related pieces of information without connecting them into a relational or causal structure. Multistructural responses, however detailed, belong in the level 3-4 band. The IB's rubric language implies this, but the Thinking Framework makes it explicit.

What This Means for Report Writing

One of the most practical consequences of using the Thinking Framework depth mapping is that it makes your written feedback specific enough to be acted upon. Hattie and Timperley (2007), in their influential review of feedback research, identify three conditions that make feedback effective: it must address where the student currently is, where they need to go, and how to get there. Generic level descriptions fail all three conditions.

Consider the difference between these two report comments for a Year 9 student in Geography:

Generic: "Ahmed demonstrates a good understanding of migration patterns and writes with clear structure. To improve, he should develop his analysis further and consider a wider range of perspectives."

Thinking Framework-informed: "Ahmed consistently explains cause-and-effect relationships in migration patterns, which places his work solidly at level 5-6. His responses identify push and pull factors and connect them to outcomes. To reach level 7-8, Ahmed needs to evaluate competing perspectives: for instance, examining why the same migration pattern might be described differently by the destination government, the migrants themselves, and environmental scientists. That is the Perspective operation, and his next summative task will be designed to give him the opportunity to practise it."

The second comment tells Ahmed exactly which cognitive operation he is currently performing, which one he needs to develop, and what that will look like in his next task. That is actionable feedback in the sense that metacognition research defines it: the student knows what to do differently, not just that something needs to improve.

This approach also makes moderation conversations more productive. Instead of debating whether a response "analyses" sufficiently, departments can ask: "Is this student performing Cause and Effect thinking or Perspective thinking?" That is an observable, discussable question with a trainable answer.

What to Try With Your Next Summative

Take one MYP summative task you are currently planning. Before you write the criteria descriptors, identify the highest Thinking Framework operation you want to assess at level 7-8. Design the task so that a student cannot reach full marks without performing that operation.

If you are assessing level 7-8 and the operation is Perspective, the task needs to present a claim or situation that genuinely has multiple legitimate interpretations. The student's job is to evaluate those interpretations, not describe them. A task that asks students to "consider different viewpoints" does not automatically require Perspective thinking. A task that asks students to "evaluate which viewpoint is better supported by the evidence you have gathered, and explain what evidence would be needed to change your conclusion" does.

Then write your rubric level descriptions starting from the highest band and working downwards. Level 7-8 describes Perspective or Systems Thinking. Level 5-6 describes what Cause and Effect thinking looks like in this task. Level 3-4 describes Compare or Part-Whole thinking. Level 1-2 describes Classify or Sequence thinking. The criteria now form a cognitive ladder, not just a detail scale.

Run the first summative using this approach. Compare the quality of student responses with your previous cohort's work on a similar topic. The shift is usually visible in the first marking session: fewer responses that are long but shallow, more responses that are shorter but genuinely analytical.

For further support with IB assessment design and applying the Thinking Framework across your department, the resources linked below offer practical starting points.

Key Takeaways

  1. Task design determines ceiling: The cognitive operation in the task sets the maximum level a student can demonstrate. No matter how capable the student, a Classify task cannot produce Perspective thinking.
  2. Erickson's generalisation level is the MYP target: Assessing at the fact or topic level is not what the MYP rubric is designed to reward. The upper bands require generalisation and theory-level responses.
  3. The depth mapping is a design tool, not just a marking tool: Use the Thinking Framework operations before you write the task, not after you have received the student responses.
  4. Three-level architecture: Every MYP summative should contain factual tasks (levels 1-4), conceptual tasks (levels 5-6), and transfer tasks (levels 7-8). Most only have the first level.
  5. Feedback becomes specific: Naming the cognitive operation a student is using or needs to develop transforms generic level descriptions into actionable next steps.

Further Reading: Key Research Papers

Further Reading: Key Papers on MYP Assessment and Conceptual Understanding

These papers provide the theoretical and empirical foundations for the assessment approach described in this guide.

Concept-Based Curriculum and Instruction for the Thinking Classroom View study ↗
Core text

Erickson, H. L., & Lanning, L. A. (2014)

The foundational text for understanding the Structure of Knowledge model that underpins IB curriculum design. Erickson's distinction between facts, topics, concepts, generalisations, and theory is essential for understanding why most MYP assessments assess at the wrong level. Directly applicable to any MYP teacher designing summative tasks.

Tools for Teaching Conceptual Understanding, Secondary View study ↗
Core text

Stern, J., Ferraro, K., & Mohnkern, J. (2017)

Stern et al.'s three-level assessment architecture (factual, conceptual, transfer) maps directly onto the MYP achievement bands. The book provides worked examples across multiple subjects showing exactly how to design transfer tasks. Every MYP coordinator should have this on their professional reading list.

Understanding by Design (Expanded 2nd Edition) View study ↗
Core text

Wiggins, G., & McTighe, J. (2005)

Wiggins and McTighe's "backward design" principle is the most widely adopted framework for designing assessments before planning instruction. Their six facets of understanding provide a complementary lens to the Thinking Framework depth mapping, with particular strength in defining what genuine transfer looks like across disciplines.

The Power of Feedback View study ↗
81+ citations

Hattie, J., & Timperley, H. (2007)

The definitive empirical review of feedback research, identifying three essential elements: where the student is now, where they need to go, and how to bridge the gap. Hattie and Timperley's model explains precisely why generic rubric level descriptions fail to improve student performance, and provides the theoretical basis for the Thinking Framework-informed report comments described in this guide.

Alignment of Assessment, Curriculum, and Instruction View study ↗
Core text

Webb, N. L. (1997)

Webb's original articulation of Depth of Knowledge as a property of tasks rather than students remains one of the clearest arguments in assessment research for designing cognitive demand into questions rather than hoping for it in responses. His framework is the direct precursor to the level-to-operation mapping described in this guide.

No Posts found.
Back to Blog