Dataset for: Assessing method agreement for paired repeated binary measurements administered by multiple raters

Method comparison studies are essential for development in medical and clinical fields. These studies often compare a cheaper, faster, or less invasive measuring method with a widely used one to see if they have sufficient agreement for interchangeable use. Moreover, unlike simply reading measurements from devices, e.g., reading body temperature from a thermometer, the response measurement in many clinical and medical assessments is impacted not only by the measuring device but also by the rater. For example, widespread inconsistencies are commonly observed among raters in psychological or cognitive assessment studies due to different characteristics such as rater training and experience, especially in large-scale assessment studies when many raters are employed. This paper proposes a model-based approach to assess agreement of two measuring methods for paired repeated binary measurements under the scenario where the agreement between two measuring methods and the agreement among raters are required to be studied simultaneously. Based upon the generalized linear mixed models (GLMM), the decision on the adequacy of interchangeable use is made by testing the equality of fixed effects of methods. Approaches for assessing method agreement, such as the Bland-Altman diagram and Cohen's kappa, are also developed for repeated binary measurements based upon the latent variables in GLMMs. We assess our novel model-based approach by simulation studies and a real clinical application, in which patients are evaluated repeatedly for delirium with two validated screening methods. Both the simulation studies and the real data analyses demonstrate that our proposed approach can effectively assess method agreement.