Large data sets comprising diagnoses about chronic conditions are becoming increasingly available for research purposes. In Germany, it is planned that aggregated claims data including medical diagnoses from the statutory health insurance with roughly 70 million insurants will be published on a regular basis. Validity of the diagnoses in such big data sets can hardly be assessed. In case the data set comprises prevalence, incidence and mortality, it is possible to estimate the proportion of false positive diagnoses using mathematical relations from the illness-death model. We apply the method to age-specific aggregated claims data from 70 million Germans about type 2 diabetes in Germany stratified by sex and report the findings in terms of the ratio of false positive diagnoses of type 2 diabetes (FPR) in the data set. The age-specific FPR for men and women changes with age. In men, the FPR increases linearly from 1 to 3 per mil in the age 30 to 50. For ages between 50 to 80 years, FPR
This entry is cross posted from the the SITN Flash, a bimonthly publication written and edited by Harvard graduate students. You can find my piece, as well as archives of previous articles written by many graduate students at the Science in the News website.