PhD thesis: Performance indicators of mammography screening

    Publikation: Ph.d. afhandling/ kandidat/ diplomPh.d. afhandlingForskningpeer review


    Although the benefits of mammography screening in reducing breast cancer mortality have been shown in both randomized controlled trials and observational studies, realizing the full potential of mammography in current clinical practice requires continuous monitoring and evaluation. To ensure that the mammography screening organizations are as effective as expected, without unnecessary harms for the screened women, the European Guidelines for Quality Assurance in Breast Cancer Screening and Diagnosis have specified performance indicators together with acceptable and desirable standards. The main parts of these performance indicators are so-called short term indicators, designed to provide an early indication of a certain component of the screening organization or they are surrogate indicators to the prediction of future long-term reduction in breast cancer mortality. Implementation of a screening organization is per definition a long-term process. Although performance indicators becomes available from the beginning, the evaluation of the organizations effect on mortality will, depending on age group be available after 2-6 years , and will only occur if acceptable levels of the more short term indicators are met. Therefore, regularly monitoring of short-term indicators is essential.

    The objective of this thesis was to evaluate performance indicators of mammography screening, and discuss the lessons that can be learned from national evaluations of similar organized screening programs and from international evaluations of organized screening programs versus opportunistic screening. We used data from the two longest running organized screening programs in Denmark (Copenhagen and Funen) and from the Breast Cancer Surveillance Consortium (BCSC) in the United States (US), covering the screening history for women aged 50-69 years. These populations were representative of the Danish and American women in the targeted age group and of screening practice in Denmark and the US. Linking screening data with high quality cancer registries provided us with almost complete cancer follow-up, and offered us an almost ideal setting for an evaluation of screening performance.

    The goal of reducing breast cancer in the population can only be reached if a high proportion of the target population attends for screening. Therefore the monitoring of recruitment and participation forms an essential part of the quality assurance of screening. The performance indicators for mammography screening specified in the European Guidelines include monitoring of both the coverage of the targeted population by invitation to screening, the coverage of the targeted population by screening examination, and the participation rate in screening among invited women. However, these measures are cross-sectional measures reflecting a year or an invitation round, and do thereby not reflect the long-term acceptance of the screening program of the participating women. It was our underlying hypothesis that the cross-sectional measures overestimated the actual protection of the individual women. We therefore conducted a study to provide an even more comprehensive assessment of program performance by supplementing the routine performance indicators with an estimate of the long term acceptance of screening among the targeted women. The main findings were that participation rate might be a bad indicator of coverage by examination as participation rates were highly influenced by invitation scheme and technical administrative errors. Furthermore, the participation rate overestimated the level of protection provided to the individual women. Therefore it might be valuable to include a new measure reflecting the long term acceptance of screening.

    In contrast to the organized mammography screening in Denmark, mammography screening in the US is predominantly opportunistic. In the US, different recommendations concerning starting and stopping age and screening interval for mammography screening are issued by different organizations e.g. the US Preventive Services Task Force (USPSTF) and the American Cancer Society (ACS), but the actual use of screening is largely determined by the women, their medical practitioners and their access to health care. Previous comparative studies indicated that both recall rates and interval cancer rates were higher in the US than in Europe, suggesting both lower sensitivity and lower specificity. However results might be influenced by differences in procedures and definitions. Thereby we conducted a study to compare sensitivity and specificity of mammography screening between the US and Denmark. The main findings was that taking time since last screen into account American and Danish women had the same probability of having their asymptomatic breast tumors detected at screening. However, the majority of women free of asymptomatic cancers experienced more harms in terms of false-positive findings in the US than in Denmark.

    In the US, the reported cumulative false-positive risks of mammography screening range from 46-63% for annual screens while studies from Europe report risks varying between 8-21% for biennial screens. These risks clearly indicate that American women experience more harms in terms of false-positive than European women. However, different screening interval, type of mammogram and the choice of statistical method might influence the estimates. Therefore we conducted a study to evaluate to what extent screening interval, type of mammogram and statistical method might explain the reported differences in false-positive risks. This study demonstrated that women entering mammography screening at age 50-69 years in the US had a substantially higher cumulative false-positive risk compared to women entering screening in Denmark. Furthermore, we found that neither screening interval, type of mammogram or statistical method influences on the false-positive risks.

    The striking difference in false-positive tests between the US and Europe, might be explained by e.g. less frequent use of double reading with consensus, less frequent comparison to previous mammograms and differences in guidelines for acceptable levels of mammograms judged as abnormal and recommendations for required reading volume . However, it might also be influenced by the underlying medico-legal context in the US. This is characterized by more frequent lawsuits for missed cancers, and may predispose radiologists to recall more women for further assessment. It is unknown how much these contribute to the higher false-positive rates.

    This thesis highly suggests that conclusions drawn from comparisons of performance indicators are not straightforward, even when comparing within a small homogeneous country like Denmark. All evaluations of performance should be based on standard definitions and undertaken with cautions in terms of the influence of contextual conditions.
    StatusUdgivet - 2015