IB AssessmentIB Assessment: A Teacher's Guide: practical strategies for teachers

Updated on  

June 17, 2026

IB Assessment

|

March 19, 2026

This IB assessment teacher's guide helps educators apply best-fit grading, adapt rubrics for SEND, and make criteria visible.

IB Assessment describes how the International Baccalaureate gathers, judges and reports evidence of what learners can do. It uses published criteria across the Primary Years Programme, Middle Years Programme, Diploma Programme and Career-related Programme. This matters because a teacher's mark is not a running average of effort, neatness or early mistakes. It should be a defensible judgement against criteria, backed by recent and consistent evidence (IBO, 2019; Black, 1998).

In a Year 10 MYP science class, this helps a teacher separate strong data analysis from weaker evaluation. The teacher can record the evidence against Criterion B and C, then explain the final level without comparing the learner with the rest of the class. In the Diploma Programme, the same logic shapes Internal Assessment, External Assessment and the Core. Final grades still depend on IB grade boundaries and moderation.

IB Assessment Defined

Criterion-related assessment in the International Baccalaureate uses published criteria to judge learner work against standards, not against classmates (IBO, 2019). In the Diploma Programme, though, teachers should not describe final grades as purely fixed. Raw marks from Internal Assessment and External Assessment are converted to the 1-7 scale through grade boundaries, and those boundaries are reviewed each session using global performance evidence and grade descriptors. This adds a limited norm-referenced element to final summative grading, even when the classroom rubric remains criterion-related. Black (1998) and Wiliam (2011) still matter here because formative checks help teachers use criteria before the final mark is awarded.

3 things to try in your classroom this week

Rubrics with clear command terms and descriptors support this work. However, each programme has a different scoring route. In the Middle Years Programme, most subject groups use four criteria: A, B, C and D. Each criterion is scored out of 8, giving a maximum of 32 before conversion to the 1-7 grade scale.

In the Diploma Programme, six subjects can contribute up to 42 points. The DP Core is Theory of Knowledge (TOK), the Extended Essay (EE) and Creativity, Activity, Service (CAS). TOK and EE can add up to 3 bonus points through the matrix, while CAS earns no points because it is a completion requirement. This makes 45 the maximum International Baccalaureate Diploma score, with 24 points the usual minimum subject to passing conditions.

Hattie (2009) and Hattie and Timperley (2007) are useful when feedback links current work to visible success criteria. Even so, their effect sizes should not be treated as exact forecasts for IB grades.

Key Takeaways

  • Criterion-referenced assessment measures learner performance against fixed standards, not against other learners.
  • The 'best-fit' approach uses professional judgement to find the centre of gravity in a learner's current work.
  • Complex IB rubrics can induce cognitive overload and must be translated into learner-friendly language.
  • Formative assessment routines make IB criteria visible and actionable before the final summative task.
  • Structural scaffolds help SEND learners show subject understanding without unnecessary language load.
  • Webb's Depth of Knowledge helps teachers map IB command terms to specific cognitive demands.
  • Regular internal standardisation builds consistency and confidence in professional judgment.

◆ Structural Learning
IB Assessment
~22 min
A deep-dive audio episode

A concise Structural Learning audio episode on IB Assessment, grounded in the curated research dossier and focused on practical classroom use.

IB Assessment infographic comparing Criterion-Referenced, Norm-Referenced, and Rubrics for teachers
Criterion-Referenced vs. Norm-Referenced: What's the Difference?

Evidence overview

What the research says

Rubrics should guide learning before they assess work. In an International Baccalaureate classroom, criteria help learners see what counts as stronger evidence, not just what score they received. Wiliam (2011) argued that learners need to understand success criteria if feedback is to change what they do next. The shift is from chasing marks to improving the quality of the response.

A Geography teacher starts a Diploma Programme global politics essay by projecting the International Baccalaureate rubric. The teacher asks the class to define the command terms 'describe' and 'evaluate'. The class then checks these definitions against two short exemplar paragraphs.

Learners write their own definitions in draft planners. What the teacher does: Projects the rubric and leads a criteria discussion. What learners produce: Written command-term definitions and one annotated exemplar.

Why Assessment Matters

IB rubrics ask learners to move between recall, explanation, analysis and evaluation. Webb (1997) gives teachers a useful alignment check: the thinking demand in the teaching task should match the demand in the assessment criterion. A short factual quiz may prepare learners for DoK 1 or 2 work. However, a Diploma Programme global politics essay or an environmental systems and societies evaluation needs sustained reasoning closer to DoK 3 or 4.

IB rubrics can place too much strain on working memory. Sweller (1988) showed that learners can only process a limited amount at one time. So teachers should separate the real subject demand from language that makes the task harder than it needs to be. This matters for EAL, neurodivergent and working-class learners, because dense command terms such as 'synthesise' and 'evaluate' can reward cultural capital as well as subject understanding (Bourdieu, 1986).

SEND learners can struggle with unclear assessment criteria. Abstract rubrics may hide a learner's understanding (Wiggins, 1998). Teachers can use visible thinking routines and concrete examples to make criteria clearer, then provide structured scaffolds that show what a successful response needs to include.

A Mathematics teacher maps an IB investigation rubric against Webb's DoK levels on a wall chart. The teacher points to the DoK Level 2 section and asks learners to complete a calculation exercise. Learners work on mini whiteboards and hold them up for checking.

What the teacher does: Creates a DoK-aligned wall chart and leads a short practice exercise. What learners produce: Completed calculation exercises on mini whiteboards.

Anatomy of Wiliam's 5 Formative Assessment Strategies, visual classroom guide

Assessment in the Classroom

Strong assessment practice starts when teachers connect IB policy to the evidence they collect each week. Learners need clear criteria, explicit modelling and chances to act on feedback before final submission. In the Diploma Programme, this also means knowing the balance between Internal Assessment, where classroom teachers may mark work before IB moderation, and External Assessment, where scripts are marked by IB examiners. The practical test is simple: can the department explain which evidence supports the grade without creating a shadow spreadsheet that doubles workload?

Strategy 1: Best-Fit Grading

Best-fit means teachers judge which achievement band best represents the overall quality of a learner's work. That judgement is useful in education, but it is not bias-free. Complex rubrics can leave room for halo effects, prior knowledge of the learner and departmental habit to shape decisions. This is why assessment reliability depends on moderation, annotated exemplars and recorded rationales (Baird et al., 2017; Sadler, 1989; Torrance & Pryor, 1998).

The teacher reviews an English literature essay alongside the Criterion A rubric. The teacher highlights strong critical analysis, a band 7 descriptor, in green. The teacher also highlights weak textual referencing, a band 4 descriptor, in yellow. They decide that the overall quality sits in the 5-6 band and record a justification.

What the teacher does: Highlights strengths and weaknesses in an essay using colour-coding. What learners produce: Learners review their own highlighted essays and explain why their work fits a specific band.

Strategy 2: Making Criteria Visible

Assessment clarity is not the same as handing out a dense rubric. Wiggins (1998), Sadler (1989) and Andrade (2000) point to a stronger routine: show the standard, compare work against it, then help learners close the gap. Hattie and Timperley (2007) frame feedback around where the learner is going, how they are going and where they should go next, but Simpson (2017) and Slavin (2018) warn against treating a single feedback effect size as a direct prediction for complex International Baccalaureate tasks.

The teacher designs visual checkpoints based on the MYP Design cycle criteria. The teacher places cards for each criteria strand on the learners' desks. The teacher asks the class to move a token onto a card once they have completed that requirement.

What the teacher does: Creates physical cards representing each stage of the design cycle. What learners produce: Learners move tokens to indicate progress and explain their reasoning.

Strategy 3: Translating Rubrics

IB policy language is written to keep standards consistent across programmes. It is not written as classroom-facing guidance for learners. Teachers should reword rubrics in accessible language, then check that the new wording still preserves the original assessment standard. Clearer wording can reduce anxiety and help learners understand the task (Sadler, 1989; Boud, 1995).

The teacher projects the official DP History rubric beside a simpler version. The teacher shows how "synthesises complex information" means "combines facts from three sources to make a new point". The teacher then hands out copies of the translated rubric.

What the teacher does: Creates and distributes a simplified version of the rubric. What learners produce: Learners use the translated rubric to check their work against the criteria.

IB Assessment infographic showing strategies for Criterion-referenced, Best-fit, and Rubrics for teachers
The IB Assessment Hierarchy: From Rubrics to Learner Understanding

Common Misconceptions

A learner does not need to meet every descriptor in a lower band before reaching a higher band. Best-fit grading asks the teacher to identify the centre of gravity in the work: a learner may show strong evaluation while still making minor structural errors. The recorded grade should reflect the prevailing quality, with a short rationale linked to descriptors.

Teachers often think rubrics are just for end-of-term grades. Actually, criteria should guide ongoing assessment. Waiting until the end means learners lose valuable feedback chances. Formative tasks must focus on rubric strands (Brookhart, 2018; Andrade, 2005).

Another error is assuming that the rubric dictates the sequence of teaching. Criteria define the end point, not the lesson sequence. Teachers can build retrieval, modelling, inquiry and discussion before asking learners to produce final evidence.

Averaging is the more damaging misconception. In the Middle Years Programme, adding every score across a term can create a 'zombie grade' that represents neither early struggle nor current ability. Best-fit grading should draw on recent, consistent evidence and cite descriptors, not calculate a mean.

IB continuum language also needs updating. The International Baccalaureate now describes four programmes. These are the Primary Years Programme for ages 3-12, the Middle Years Programme for ages 11-16, the Diploma Programme for ages 16-19, and the Career-related Programme for ages 16-19.

IB Assessment — visual explainer sketchnote
An at-a-glance visual summary of IB Assessment.

In the PYP, the current specified concepts are seven: form, function, causation, change, connection, perspective and responsibility. Reflection was once listed as an eighth key concept. However, the Enhanced PYP (2018) moved it into continuous practice across inquiry, assessment and action. Many school websites and teacher resources still say '8 lenses' in 2026; this article follows the current seven.

For the DP Core, the current Theory of Knowledge course is not the old Ways of Knowing model. TOK now uses the knowledge framework, scope, perspectives, methods and tools, and ethics, alongside a core theme, optional themes and five areas of knowledge. Its assessments are the TOK exhibition, based on three objects, and the TOK essay on prescribed titles.

The same logic applies to MYP eAssessment. Optional on-screen examinations and ePortfolios add an external check to school-based judgement. For example, ePortfolios in areas such as design, physical and health education, arts music and language acquisition are marked by classroom teachers, with samples moderated by IB examiners. When a department understands this process, it can explain why one spelling error does not block a high science level, while weak reasoning or missing evaluation can still hold the grade down.

IB Assessment, slide preview
◆ Structural Learning
IB Assessment
Classroom-readyWhat the theory means in practice

IB Assessment in practice, a classroom-ready briefing you can use this week.

Something went wrong, please try again.
✓ On its way. Download the slides now.

Practical Implementation Guide

Implementation starts as a leadership issue, before it becomes a marking issue. Heads of department need a simple evidence model that meets IB expectations and supports Ofsted conversations about progress. It should also avoid a separate grade book that teachers have to update after every lesson.

Step 1: Examine the Subject Guide. Read the specific IB guide for your subject and programme. Identify the core assessment criteria and highlight the command terms.

Step 2: Translate descriptors into learner-friendly language. Use 'I can' statements, worked examples and non-examples, but keep the translated wording anchored to the official descriptor so the standard does not drift (Brookhart, 2013; Andrade, 2000).

Step 3: Design formative scaffolds. Link short activities to the rubric strands. For example, if Criterion B involves pattern recognition in environmental systems and societies, set a ten-minute graph interpretation task before the larger investigation (Wiliam, 2011).

Step 4: Conduct Internal Standardisation. Gather your department to review a sample of learner work before assigning final grades. Discuss the work against the criteria and agree on the best-fit band.

A History department meets to standardise a recent MYP Year 3 assessment. The lead teacher gives the group three unmarked essays. Each teacher marks the first essay in silence, using the translated rubric. They then share their awarded bands and discuss any differences until they reach a consensus.

What the teacher does: Provides sample essays and leads a standardisation meeting. What learners produce: N/A (This is a teacher-focussed activity).

Teacher and learners use inquiry questions, reflection journals and collaborative discussion in an International Baccalaureate classroom.
International Baccalaureate Inquiry in Action in practice: learners use inquiry routines in an International Baccalaureate classroom.

Assessment Across Subjects

Criterion-related assessment varies by subject, but it always needs clear standards and sound evidence. In Diploma Programme language literature, learners may need to interpret texts and write with control. In global politics, evidence may come from case analysis and argument, while in business management it often comes from applied evaluation. Mathematics analysis and approaches and mathematics applications and interpretation have different emphases, while environmental systems and societies links the sciences with individuals and societies. Arts music tasks add a further layer because teachers judge performance, process and reflection against descriptors, not personal taste (Guskey, 1996; Brookhart, 2013; McMillan, 2007).

MYP teachers often find Criterion A difficult when literacy is weak. Because each criterion can be scored out of 8, small language barriers can have large consequences when the total is mapped from 32 onto the 1-7 scale. A Map It organiser can let learners show causal links before writing: in global politics they might connect power, sovereignty and human rights; in environmental systems and societies they might map pollution sources, feedback loops and stakeholder interests.

When judging learner text production, especially in language literature, teachers can confuse fluent writing with subject understanding. Sentence blocks can support analysis without writing the answer for the learner. For example, a learner might use 'the evidence suggests' or 'this implies' in a first draft, then remove the scaffold once the analytical habit is secure. The same approach helps business management and environmental systems and societies learners explain evidence without turning the task into a formula.

SEND learners can find complex science rubrics hard to use. Teachers can adapt the assessment with a Learning Design Canvas (LDC), which helps them break the rubric into one clear step at a time. Each step has a visible checkpoint: for example, learners first get a card for the hypothesis, then one for the methodology (Gibbons, 2002). This support helps them complete lab reports (Laurillard, 2012).

5 Strategies for Supporting SEND Learners in IB Assessment infographic for teachers
5 Strategies for Supporting SEND Learners in IB Assessment

Common Questions About Assessment

Formative and Summative Assessment

Formative assessment happens during learning and gives feedback on specific strands of the criteria. Summative assessment judges the final performance against the relevant rubric or examination mark scheme. In the Diploma Programme, this summative picture usually combines Internal Assessment, External Assessment and, for the Core, Theory of Knowledge and the Extended Essay. CAS must be completed, but it does not add points.

Borderline Grades in Best-Fit Judgement

When a learner's work sits on the border between two bands, revisit the command terms. Look for the prevailing quality of the work and determine if it leans towards the higher or lower cognitive demand. Document your rationale.

Uneven Performance Across Strands

You must find the centre of gravity. If a learner shows Band 7 analysis but Band 1 communication, the best-fit grade will likely sit in the middle bands. You cannot award a top grade if a fundamental requirement is missing, but you should not award the lowest grade if high-level skills are present.

Lower Cognitive Load Rubrics for SEND Learners

Never hand a SEND learner a full page of dense IB text. Break the rubric down into a checklist of single actions. Present only the criteria for the specific band the learner is working towards.

Department Standardisation Frequency

Sadler (2009) and Bloxham (2009) show why standardisation protects teacher judgement. Before each major assessment, departments should standardise marking with two or three anonymised samples. They should also write brief reasons for each mark, which reduces drift between teachers and makes borderline choices easier to defend.

Percentages, Marks and IB Criteria

Use percentages with care. Classroom assessment should refer to criteria. However, DP components still produce raw marks that are weighted, added together and converted into 1-7 grades through session grade boundaries.

In the MYP, criterion scores are totalled out of 32 and then mapped to the 1-7 scale. In the UK, the phrase 'IB exam' usually refers to Diploma Programme final written examinations, although the qualification also includes coursework, Internal Assessment and the Core. Comparisons with GCSE or four A levels are too blunt, because the International Baccalaureate Diploma spreads assessment across six subjects plus the Core. This means workload and breadth differ from A level specialisation, rather than sitting on a simple harder or easier scale.

Academic integrity now needs to be designed into the assessment, not added as a warning on the cover sheet. The IB does not ban generative AI, but any AI-generated text, image or data must be credited and cannot be presented as the learner's own work. For Internal Assessment, this pushes teachers towards process evidence, supervised drafting, short viva voce checks and oral defences where learners explain why a method, source or model was chosen (IBO, 2023; Dawson et al., 2024).

Draft the translated rubric for your next unit. Then add one authenticity checkpoint. This could be a five-minute oral explanation, a live data annotation, or a short comparison between an early plan and the final submission.

Limitations and Critiques

IB Assessment can look more objective than it is. Criterion descriptors reduce crude ranking, but best-fit judgement still depends on teacher interpretation. Baird et al. (2017) warn that classroom assessment reliability is not the same as large-scale test reliability; moderation, exemplars and shared rationales are needed because two well-informed teachers may still read the same evidence differently.

There are also cultural and linguistic limits. Bourdieu (1986) helps explain why command terms such as 'evaluate' and 'synthesise' can reward cultural capital, academic English and prior exposure to essay conventions. This can affect EAL, working-class and neurodivergent learners, especially in language literature, global politics and environmental systems and societies, where subject understanding is often filtered through dense written performance.

The evidence base needs care. Hattie (2009) is often quoted as if feedback effect sizes transfer cleanly to every rubric, but Simpson (2017) and Slavin (2018) question the comparability of aggregated effect sizes across contexts. Karpicke's retrieval work (Karpicke, 2008) and Vygotsky's account of guided support (Vygotsky, 1978) are useful, yet both require adaptation for multilingual, high-stakes International Baccalaureate settings.

Generative AI adds a new validity problem: coursework may show polished output without secure authorship or understanding (Dawson et al., 2024). Despite these limits, IB Assessment remains valuable when schools treat criteria as public standards, moderate judgement carefully and design tasks that let learners explain their thinking.

Quick-check quiz
10-question self-test
Q1
0%

References

Black, P. (1998). Inside the black box.

Hattie, J. (2009). Visible learning.

Karpicke, J. (2008). The critical importance of retrieval for learning.

Vygotsky, L. (1978). Mind in society: The development of higher psychological processes.

Webb, N. (1997). Criteria for alignment of expectations and assessments.

Wiliam, D. (2011). Embedded formative assessment.

Further Reading: Verified Assessment Sources

These sources replace the contaminated automated reading list and focus on assessment validity, feedback, formative assessment and cognitive demand.

Assessment and Classroom Learning View source ↗

Black and Wiliam (1998) provide the foundational review behind much classroom formative assessment practice. Use it to ground feedback, questioning and evidence-responsive teaching.

The Power of Feedback View source ↗

Hattie and Timperley (2007) clarify how feedback should answer where learners are going, how they are going and where they should go next.

Formative Assessment and the Design of Instructional Systems View source ↗

Sadler (1989) is central for understanding standards, criteria and learner action in formative assessment.

Cognitive Load During Problem Solving View source ↗

Sweller (1988) supports the article's point that assessment scaffolds should reduce unnecessary load while preserving the intended cognitive challenge.

A Generative AI model for Assessment in Higher Education View source ↗

Dawson et al. (2024) is a relevant source for the article's AI assessment section because it frames validity as a more important design problem than cheating prevention alone.

Free Resource Pack

Mastering IB Assessment: A Teacher's Guide

helps resources for IB educators to confidently design, implement, and understand assessment frameworks.

Mastering IB Assessment: A Teacher's Guide, 4 resources
IB AssessmentTeacher GuideCPD VisualQuick ReferenceChecklistPlanning TemplateAssessment DesignCriterion-Referenced AssessmentFormative AssessmentSummative AssessmentIB Global ContextsATL Skills

Download your free bundle

Fill in your details below to get instant access to your resources.

Quick survey (helps us create better resources)

How confident are you in designing and implementing IB criterion-referenced assessments?

Not at all confident
Slightly confident
Moderately confident
Very confident
Extremely confident

To what extent do you feel your school provides adequate support and collaboration opportunities for IB assessment best practices?

Not at all
Slightly
Moderately
Well
Extremely well

How consistently do you integrate feedback from formative assessments into your IB summative task design and learner preparation?

Never
Rarely
Sometimes
Often
Always

Your resource pack is ready

Click the button below to download your resources.

Cognitive Science Platform

Make Thinking Visible

Open a free account and help organise learners' thinking with evidence-based graphic organisers. Reduce cognitive load and guide schema building dynamically.

Create Free Account No credit card required
Paul Main, Founder of Structural Learning
About the Author
Paul Main
Founder & Metacognition Researcher

Paul Main is an educator and metacognition researcher who founded Structural Learning in 2002. With a psychology degree from the University of Sunderland and 22+ years helping schools embed thinking skills, he bridges the gap between educational research and classroom practice. Fellow of the RSA and Chartered College of Teaching, with 128+ Google Scholar citations.

More →

International Baccalaureate

Back to Blog