Reliability of an assessment

Posted by @natasabrouwer on April 16, 2022, 2:19 p.m.


Calibrating session

How to increase a consistent scoring using answer models?
How to score reliable using grading forms?
More information about this topic will follow soon.

Interpreting reliability

If an assessment is repeatedly taken under the same conditions and it shows every time the same results, the assessment is called reliable. After the take a reliability coefficient (Crohnbach's alpha, KR-20 or KR-21) can be calculated (for MC assessments, open ended question assessments and assessment using (grading) rubrics). This coefficient is expressed as a value between -1 and 1. If the value is higher than 0.7, the assessment is reliable for diagnostic purposes (for feedback to students), if the value is higher than 0.8, the assessment is reliable for selective purposes (grading). If the coefficient is too low for the purposes of the assessment it is advised to lengthen the assessment by  mutliplying the number of questions with a factor n:

n = R1/R2 x (R2-1)/(R1-1)              [Spearman Brown formula]

whereby: R1 is reliability coefficient of the assessment, R2 is the desired reliability (0.8)

If the calculated length of the assessment is disproportionate (too long for the duration of the take), it is advised to let the questions peer review by an assessment coordinator or program director. 

All of the following contributes to the reliability of an assessment

Objectivity: The questions are unambiguous, and the possible answers so clear that consistency in scoring is possible (ie independent of time of assessment, assessor, etc.). 

Specificity: The questions are constructed so that only students who master the content can answer the questions correctly. The different questions in the assessment must be independent from each other.

Differentiation: Both based on individual questions as based on the entire assessment, a distinction can be made between students who master the content good or less good. 

Length of the assessment: The number of questions is large enough to exclude lucky shots.  

Reference: this text above is translated from Kader Toetsbeleid UvA, page 13.

In order to make an assessment as reliable as possible, it is therefore recommended to:

- ask feedback from a peer reviewer (and possibly an assessment coordinator) in the assessment construction phase. 
- to interpret the reliability coefficient in order to control (and to optimize) the quality of future assessments. 

Original author: Susan Voogd
Creative Commons Licence Logo Creative Commons 3.0 BY SA applies to all content on Starfish.
Starfish-education support for the publishing on does not constitute an endorsement of the contents, which reflect the views only of the authors and Starfish-education cannot be held responsible for any use which may be made of the information contained therein. Starfish-education cannot be held responsible for the content published by authors that is not conform with Creative Commons 3.0 BY SA.

See also