Understanding Cohen’s Kappa with an Example

Inter-annotator Agreement Score

Prakhar Mishra
4 min readSep 12, 2022

In this blog, I’ll be talking about Cohen’s Kappa and walk you through a practical example for calculating the same.

During the early stage of building an end-to-end machine learning pipeline, getting control over the quality of data for training our machine learning models is a very important step. Simply because of the fact that our machine learning model is as good as the data we feed in.

In practice, in lack of ready-made labeled data, it’s often good to get your domain-specific data labeled from human annotators. And to keep aside any bias that one annotator might add to the data, you should get the same sample labeled with more than one person. But then the question comes, how much agreement or reliability exists between these annotators? Cohen’s Kappa is one such statistical measure that helps in calculating the inter-rater agreement in presence of two raters. So let’s see what it is —

Let’s consider the task of sentiment analysis. Both annotators 1 and 2 label each of the 5 sentences with a positive or negative class. And clearly, we can see both agreement and disagreement between their annotations.

--

--