Subscribe to RSS
DOI: 10.1055/s-0038-1634116
Measuring Agreement for Ordered Ratings in 3 x 3 Tables
Publication History
Received: 08 March 2005
accepted: 18 December 2005
Publication Date:
07 February 2018 (online)
Summary
Objectives: When two raters consider a qualitative variable ordered according to three categories, the qualitative agreement is commonly assessed with a symmetrically weighted kappa statistic. However, these statistics can present paradoxes, since they may be insensitive to variations of either complete agreements or disagreements.
Methods: Agreement may be summarized by the relative amounts of complete agreements, partial and maximal disagreements beyond chance. Fixing the marginal totals and the trace, we computed symmetrically weighted kappa statistics and we developed a new statistic for qualitative agreements. Data sets from the literature were used to illustrate the methods.
Results: We show that agreement may be better assessed with the unweighted kappa index, κc, and a new statistic ζ, which assesses the excess of maximal disagreements with respect to the partial ones, and does not depend on a particular weighting system. When ζis equal to zero, maximal and partial disagreements beyond chance are equal. With its estimated large sample variance, we compared the values of two contingency tables.
Conclusions: The (κc, ζ) pair is sensitive to variations in agreements and/or disagreements and enables locating the difference between two qualitative agreements. The qualitative agreement is better with increasing values of κc and ζ.
-
References
- 1 Brenner H, Kliebsch U. Dependence of weighted kappa coefficients on the number of categories. Epidemiology 1996; 7 (02) 199-202.
- 2 Hasman A, de Bruijn LM, Arends JW. Evaluation of a method that supports pathology report coding. Methods Inf Med 2001; 40 (04) 293-7.
- 3 Bloom HJ, Richardson WW. Histological grading and prognosis in breast cancer; a study of 1409 cases of which 359 have been followed for 15 years. British Journal of Cancerology 1957; 11 (03) 359-77.
- 4 Elston CW, Ellis IO. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 1991; 19 (05) 403-10.
- 5 Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960; 20 (01) 37-46.
- 6 Cohen J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin 1968; 70 (04) 213-20.
- 7 Fleiss JL. Statistical methods for rates and proportions. Wiley; New York: 1981: 236pp 212-36.
- 8 Kraemer HC, Periyakoil VS, Noda A. Kappa coefficients in medical research. Statistics in Medicine 2002; 21: 2109-29.
- 9 Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEG Technology 1971; 11 (03) 101-9.
- 10 Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 1973; 33: 613-9.
- 11 Feinstein AR, Cicchetti DV. High agreement but low kappa: I. the problems of two paradoxes. Journal of Clinical Epidemiology 1990; 43 (06) 543-9.
- 12 Graham P, Jackson R. The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology 1993; 46 (09) 1055-62.
- 13 Byrt T, Bishop J, Carlin JB. Bias, prevalence and kappa. Journal of clinical epidemiology 1993; 46 (05) 423-9.
- 14 King TS, Chinchilli VM. A generalized concordance correlation coefficient for continuous and categorical data. Statistics in Medicine 2001; 20: 2131-47.
- 15 Tanner MA, Young MA. Modeling agreement among raters. Journal of the American Statistical Association 1985; 80: 175-80.
- 16 Agresti A. A model for agreement between ratings on a categorical scale. Biometrics 1988; 44: 539-48.
- 17 Becker MP. Using association models to analyse agreement data: two examples. Statistics in Medicine 1989; 8: 1199-1207.
- 18 May SM. Modelling observer agreement - an alternative to kappa. Journal of Clinical Epidemiology 1994; 47 (011) 1315-24.
- 19 Feinstein AR, Cicchetti DV. High agreement but low kappa: II. Resolving the paradoxes. Journal of Clinical Epidemiology 1990; 43 (06) 551-8.
- 20 Bishop J, Carlin J, Nolan T. Evaluation of the properties and reliability of a clinical severity scale for acute asthma in children. Journal of Clinical Epidemiology 1992; 45 (01) 71-6.
- 21 Rao C. Linear statistical inference and its applications. New York: Wiley; 1965
- 22 Harris GC, Denley HE, Pinder SE, Lee AH, Ellis IO, Elston CW. et al Correlation of histologic prognostic factors in core biopsies and therapeutic excisions of invasive breast carcinoma. American Journal of Surgical Pathology 2003; 27 (01) 11-5.
- 23 Fleiss JL, Cohen J, Everitt BS. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin 1969; 72 (05) 323-7.
- 24 Pinder SE, Murray S, Ellis IO, Trihia H, Elston CW, Gelber RD. et al The importance of the histologic grade of invasive breast carcinoma and response to chemotherapy. Cancer 1998; 83 (08) 1529-39.
- 25 Bowker AH. Bowker's test for symmetry. Journal of the American Statistical Association 1948; 43: 572-574.