Training
Consulting
International Aid
Thursday
Apr252013

What is the Intra-Class Correlation Coefficient?

What are validity and reliability?

When collecting quantitative data (such as in a survey) we often have a number of questions about the reliability and validity of the data. The term “validity” is generally used to describe whether measured data accurately reflects an underlying but intangible construct (eg. Is IQ an accurate reflection on a person’s intelligence).

“Reliability” describes whether we obtain the same quantitative data irrespective of factors such as:

a)      the choice of interviewer collecting the data,

b)      the specific question asked of a survey participant (eg. Both IQ and school exam scores are a measure of a person’s intelligence), and

c)      the point in time when the question is asked.

There are a number of statistical methods used to describe the validity and reliability of quantitative data. Within this tutorial we shall describe one technique known as the Intra-Class Correlation Coefficient.

What is the Intra-Class Correlation Coefficient?

The Intra-Class Correlation Coefficient (ICC) is typically used when we have a number of different interviewers, raters, or assessors within our survey. Let us suppose that we have n participants (or items) within our survey, and each participant is assessed by k different interviewers. We are interested in knowing what level of agreement we see between our k interviewers (did the interviewers record the same results for each participant). The ICC is the proportion of the total variance within our data that is explained by the variance between interviewers. In most cases the value of the ICC ranges from 0 to 1, where as the ICC approaches a value of one then we see a perfect agreement between examiners and as the ICC approaches a value of zero then we see no agreement between examiners.

Which version of the Intra-Class Correlation Coefficient should you use?

In many papers a result is simply labelled as an Intra-Class Correlation Coefficient. In reality there are a number of different versions of the ICC and it is important to understand which version of the ICC has been or should be used for each different application. To understand the different versions of the ICC we need to understand how our data has been collected and what specific question we would like to ask of our data.

The first major breakdown between the different versions of the ICC is into three categories:

A. Each of our n participants is seen by a different set of k interviewers (resulting in k x n interviewers in total for our survey). In this case we also assume that these interviewers were randomly chosen from a much larger set of possible interviewers.

B. Each participant is seen by the same set of k interviewers, where these interviewers were randomly chosen from a much larger set of possible interviewers.
C. Each participant is seen by the same set of k interviewers, where we are specifically interested in the level of agreement between these particular interviewers.

When discussing the level of agreement that we see between interviewers we also need to decide what we will do with the systematic differences between the interviewers (eg. One interviewer might consistently record lower ratings for a particular survey question). Hence we need to also have different versions of the ICC depending upon whether we will:

1. Consider systematic differences between interviewers as important and regard them as a difference between interviewers (in the literature this is often labelled as measuring “absolute agreement”)
2. Systematic differences between interviewers are not included within the ICC’s between-interviewer variance (this is labelled as measuring “consistency”)

Considering these different characteristics we end up with 5 different versions of an ICC (A, B1, B2, C1, and C2). When we have a different set of interviewers for each question (category A) then we can’t specifically explore systematic differences between interviewers (hence we can’t divide A into A1 and A2).

ICC in SPSS, Stata, and SAS

 Version of the ICC SPSS Options Stata Syntax A Model:  One-way Random (Type:  Not used for this Model) icc measure participant B1 Model:  Two-way Random Type:  Absolute Agreement icc measure participant interviewer B2 Model:  Two-way Random Type:  Consistency icc measure participant interviewer, consistency C1 Model:  Two-way Mixed Type:  Absolute Agreement icc measure participant interviewer, mixed absolute C2 Model:  Two-way Mixed Type:  Consistency icc measure participant interviewer, mixed

Figure 1:  How to specify the version of the ICC in SPSS and Stata. In SPSS there are
drop-down lists for the Model and the Type. In Stata there is a corresponding syntax to produce each version of the ICC (where these are demonstrated here using hypothetical variables entitled measure, participant, and interviewer).

To calculate an ICC in SPSS you will need your data in wide format (ie. One column or variable for each examiner). For SPSS Version 20 you would then select the menu option “Analyze -> Scale -> Reliability Analysis”. Within the “Reliability Analysis” window choose the items (variables) relating to each interviewer and click “Statistics”. Within the “Reliability Analysis: Statistics” select “Intraclass correlation coefficient” and choose the appropriate “Model” and “Type” from the window’s drop-down list (see Figure 1). Complete the procedure by selecting “Continue” and “OK”.

To calculate an ICC in Stata Version 12.1 you will need your data in long format (ie. One column for the variable of interest, one column indicating which survey participant each row pertains to, and one column indicating which interviewer each row pertains to). Keep in mind that each survey participant is seen by a number of interviewers. The Intra-Class Correlation Coefficient is then calculated using Stata’s “icc” command (as described in Figure 1 using variables for the measure, participant, and interviewer).

SAS Version 9.3 does not specifically have an ICC command, instead you will need to SAS’s MIXED or NLMIXED commands. As that process is slightly more complex interested readers are referred to the SAS/STAT User’s Guide.

Conclusion

The Intra-Class Correlation Coefficient is a useful measure for describing reliability and validity within a set of data. Within this tutorial we have described the different versions of the ICC (depending upon how the data was collected and the specific question of interest within the reliability analysis). Further information about the mathematical formulation of the ICC can be found in the paper “Intraclass Correlations : Uses in Assessing Rater Reliability” by Shrout and Fleiss (Psychological Bulletin 1979, Vol. 86, No. 2, 420-428).

If you would like to find out more: