• Research article
  • Open access
  • Published: 14 May 2020

Application of the matched nested case-control design to the secondary analysis of trial data

  • Christopher Partlett   ORCID: orcid.org/0000-0001-5139-3412 1 , 2 ,
  • Nigel J. Hall 3 ,
  • Alison Leaf 4 , 2 ,
  • Edmund Juszczak 2 &
  • Louise Linsell 2  

BMC Medical Research Methodology volume  20 , Article number:  117 ( 2020 ) Cite this article

19k Accesses

14 Citations

3 Altmetric

Metrics details

A nested case-control study is an efficient design that can be embedded within an existing cohort study or randomised trial. It has a number of advantages compared to the conventional case-control design, and has the potential to answer important research questions using untapped prospectively collected data.

We demonstrate the utility of the matched nested case-control design by applying it to a secondary analysis of the Abnormal Doppler Enteral Prescription Trial. We investigated the role of milk feed type and changes in milk feed type in the development of necrotising enterocolitis in a group of 398 high risk growth-restricted preterm infants.

Using matching, we were able to generate a comparable sample of controls selected from the same population as the cases. In contrast to the standard case-control design, exposure status was ascertained prior to the outcome event occurring and the comparison between the cases and matched controls could be made at the point at which the event occurred. This enabled us to reliably investigate the temporal relationship between feed type and necrotising enterocolitis.

Conclusions

A matched nested case-control study can be used to identify credible associations in a secondary analysis of clinical trial data where the exposure of interest was not randomised, and has several advantages over a standard case-control design. This method offers the potential to make reliable inferences in scenarios where it would be unethical or impractical to perform a randomised clinical trial.

Peer Review reports

Key messages

A matched nested case-control design provides an efficient way to investigate causal relationships using untapped data from prospective cohort studies and randomised controlled trials

This method has several advantages over a standard case-control design, particularly when studying time-dependent exposures on rare outcomes

It offers the potential to make reliable inferences in scenarios where unethical or practical issues preclude the use of a randomised controlled trial

Randomised controlled trials (RCTs) are regarded as the gold standard in evidence based medicine, due to their prospective design and the minimisation of important sources of bias through the use of randomisation, allocation concealment and blinding. However, RCTs are not always appropriate due to ethical or practical issues, particularly when investigating risk factors for an outcome. If beliefs about the causal role of a risk factor are already embedded within a clinical community, based on concrete evidence or otherwise, then it is not possible to conduct an RCT due to lack of equipoise. It is often not feasible to randomise potential risk factors, for example, if they are biological or genetic or if there is a strong element of patient preference involved. In such scenarios, the main alternative is to conduct an observational study; either a prospective cohort study which can be complicated and costly, or a retrospective case-control study with methodological shortcomings.

The nested case-control study design employs case-control methodology within an established prospective cohort study [ 1 ]. It first emerged in the 1970–80s and was typically used when it was expensive or difficult to obtain data on a particular exposure for all members of the cohort; instead a subset of controls would be selected at random [ 2 ]. This method with the use of matching has been shown to be an efficient design that can be used to provide unbiased estimates of relative risk with considerable cost savings [ 3 , 4 , 5 ]. Cases who develop the outcome of interest at a given point in time are matched to a random subset of members of the cohort who have not experienced the outcome at that time. These controls may develop the outcome later and become a case themselves, and they may also act as a control for other cases [ 6 , 7 ]. This approach has a number of advantages compared to the standard case-control design: (1) cases and controls are sampled from the same population, (2) exposures are measured prior to the outcome occurring, and (3) cases can be matched to controls at the time (e.g. age) of the outcome event.

More recently, the nested case-control design has been used within RCTs to investigate the causative role of risk factors in the development of trial outcomes [ 8 , 9 , 10 ]. In this paper we investigate the utility of the matched nested case-control design in a secondary analysis of the ADEPT: Abnormal Doppler Enteral Prescription Trial (ISRCTN87351483) data, to investigate the role of different types of milk feed (and changes in types of milk feed) in the development of necrotising enterocolitis. We illustrate the use of this methodology and explore issues relating to its implementation. We also discuss and appraise the value of this methodology in answering similar challenging research questions using clinical trial data more generally.

ADEPT: Abnormal Doppler Enteral Prescription Trial (ISRCTN87351483) was funded by Action Medical Research (SP4006) and investigated whether early (24–48 h after birth) or late (120–144 h after birth) introduction of milk feeds was a risk factor for necrotising enterocolitis (NEC) in a population of 404 infants born preterm and growth-restricted, following abnormal antenatal Doppler blood flow velocities [ 11 ]. Consent and randomisation occurred in the first 2 days after birth. There was no difference found in the incidence of NEC between the two groups, however there was interest in the association between feed type (formula/fortifier or exclusive mother/donor breast milk) and the development of NEC. Breast milk is one of few factors believed to reduce the risk of NEC that has been widely adopted into clinical practice, despite a paucity of high quality population based data [ 12 , 13 ]. However, due to lack of equipoise it would not be ethical or feasible to conduct a trial randomising newborn infants to formula or breast milk.

With additional funding from Action Medical Research (GN2506), the authors used a matched nested case-control design to investigate the association between feed type and the development of severe NEC, defined as Bell’s staging Stage II or III [ 14 ], using detailed daily feed log data from the ADEPT trial. The feed type and quantity of feed was recorded daily until an infant had reached full feeds and had ceased parenteral nutrition, or until 28 days after birth, whichever was longest. Using this information, infants were classified according to the following predefined exposures:

Exposure to formula milk or fortifier in the first 14 days of life

Exposure to formula milk or fortifier in the first 28 days of life

Any prior exposure to formula milk or fortifier

Change in feed type (between formula, fortifier or breast milk) within the previous 7 days.

In the remainder of the methods section we discuss the challenges of conducting this analysis and practical issues encountered in applying the matched nested case-control methodology. In the results section we present data from different aspects of the analysis, to illustrate the utility of this approach in answering the research question.

Cohort time axis

For the main trial analysis, time of randomisation was defined as time zero, which is the conventional approach given that events occurring prior to randomisation cannot be influenced by the intervention under investigation. However, for the nested case-control analysis, time zero was defined as day of delivery because age in days was considered easier to interpret, and also it was possible for an outcome event to occur prior to randomisation. Infants were followed up until their exit time, which was defined by the first occurrence of NEC, death or the last daily feed log record.

Case definition

An infant was defined as a case at their first recorded incidence of severe NEC, defined as Bell’s staging Stage II or III [ 14 ]. Infants could only be included as a case once; subsequent episodes of NEC in the same infant were not counted. Once an infant had been identified as a case, they could not be included in any future risk sets for other cases, even if the NEC episode had been resolved.

Risk set definition

One of the major challenges was identifying an appropriate risk set from which controls could be sampled, whilst also allowing the analysis to incorporate the time dependent feed log data and adjust for known confounders. A diagnosis of NEC has a crucial impact on the subsequent feeding of an infant, therefore it was essential that the analysis only included exposure to non-breast milk feeds prior to the onset of NEC. A standard case-control analysis would have produced misleading results in this context, as infants would have been defined as a cases if they had experienced NEC prior to the end of the study period, regardless of the timing of the event in relation to exposure to non-breast milk. Using a matched nested case-control design allowed us to match an infant with a diagnosis of NEC (case) at a given point in time (days from delivery) to infants with similar characteristics (with respect to other important confounding factors), who had not experienced NEC at the failure time of the case. Figure  1 is a schematic diagram of this process. Each time an outcome event occurred (case), infants that were still at risk were eligible to be selected as a control (risk set). A matching algorithm was used to select a sample of controls with similar characteristics from this risk set. Infants selected as controls could go on to become a case themselves, and could also be included in the risk sets for other cases.

figure 1

Schematic diagram illustrating the selection of controls from each risk set. Three days following delivery, an infant develops NEC. At this point, there are 11 infants left in the risk set. Four controls with the closest matching are selected, including one infant that becomes a future case on day 18

Selection of matching factors

An important consideration was the appropriate selection of matching factors as well as identifying the optimum mechanism for matching. Sex, gestational age and birth weight were considered to be clear candidates for matching factors, as they are all associated with the development NEC. Gestational age and birth weight in particular are both likely to impact the infant’s feeding and thus their exposure to non-breast milk feeds. Both gestational age and birth weight were matched simultaneously, because of the strong collinearity between gestational age and birth weight, illustrated in Fig.  2 . This was achieved by minimising the Mahalanobis distance from the case to prospective controls of the same sex [ 15 ]. That is, selecting the control closest in gestational age and birth weight to the case while taking into account the correlation between these characteristics.

figure 2

Scatterplot of birth weight versus gestational age for infants with NEC (cases) and those without (controls)

Typically, treatment allocation would be incorporated as a matching factor since in a secondary analysis it is a nuisance factor imposed by the trial design, which should be accounted for. However, in this example, the ADEPT allocation is associated with likelihood of exposure, since it directly influences the feeding regime. For example, an infant randomised to receive early introduction of feeds is more likely to be exposed to non-breast milk feeds in the first 14 days (44%) than an infant randomised to late introduction of feeds (23%). The main trial results also demonstrated no evidence of association with the outcome (NEC) and therefore there was a concern about the potential for overmatching. Overmatching is caused by inappropriate selection of matching factors (i.e. factors which are not associated with the outcome of interest), which may harm the statistical efficiency of the analysis [ 16 ]. Therefore, we did not include the ADEPT allocation as a matching factor, but we conduct an unadjusted and adjusted analysis by trial arm, to examine its impact on the results.

Selection of controls

Another important consideration was the method used to randomly select controls from each risk set for each case. This can be performed with or without replacement and including or excluding the case in the risk set. We chose the recommended option of sampling without replacement and excluding the case from the risk set, which produces the optimal unbiased estimate of relative risk, with greater statistical efficiency [ 17 , 18 ]. However, infants could be included in multiple risk sets and be selected more than once as a control. We also included future cases of NEC as controls in earlier risk sets, as their exclusion can also lead to biased estimates of relative risk [ 19 ].

Number of controls

In standard case-control studies it has been shown that there is little statistical efficiency gained from having more than four matched controls relative to each case [ 20 , 21 ]. Using five controls is only 4% more efficient than using four, therefore there is no added benefit in using additional controls if a cost is attached, for example taking extra biological samples in a prospective cohort setting. However gains in statistical efficiency are possible by using more than four controls if the probability of exposure among controls is low (< 0.1) [ 4 , 5 ]. Neither of these were issues for this particular analysis, as there were no additional costs involved in using more controls and prevalence of the defined exposures to non-breast milk was over 20% among infants without a diagnosis of NEC. However, there was a concern that including additional controls with increasing distance from the gestational age and birth weight of the case may undermine the matching algorithm. Also, increasing the number of controls sampled per case would lead to an increase in repeated sampling, resulting in larger number of duplicates present in the overall matched control population. This was a particular concern as control duplication was most likely to occur for infants with the lowest birth weight and gestational ages, from which there is a much smaller pool of control infants to sample from. This would have resulted in a small number of infants (with low birth weight and gestational age) being sampled multiple times and having disproportionate weighting in the matched control sample. Therefore, we limited the number of matched controls to four per case.

Statistical analysis

The baseline characteristics of infants with NEC, the matched control group, and all infants with no diagnosis of NEC (non-cases) were compared. Numbers (with percentages) were presented for binary and categorical variables, and means (and standard deviations) or medians (with interquartile range and/or range) for continuous variables. Cases were matched to four controls with the same sex and smallest Mahalanobis distance based on gestational age and birth weight. Conditional logistic regression was used to calculate the odds ratio of developing NEC for cases compared matched controls for each predefined exposure with 95% confidence intervals. Unadjusted odds ratios were calculated, along with estimates adjusting for ADEPT allocation.

The results of the full analysis, including the application of this method to explore the relationship between feed type and other clinically relevant outcomes, are reported in a separate clinical paper (in preparation). Of the 404 infants randomised to ADEPT, 398 were included in this analysis (1 infant was randomised in error, 1 set of parents withdrew consent, 3 infants had no daily feed log data and for 1 infant the severity of NEC was unknown). There were 35 cases of severe NEC and 363 infants without a diagnosis of severe NEC (non-cases). Of the 140 matched controls randomly sampled from the risk set, 109 were unique, 31 were sampled more than once, and 8 had a subsequent diagnosis of severe NEC.

The baseline characteristics of infants with severe NEC (cases) and their matched controls are shown in Table  1 , alongside the characteristics of infants without a diagnosis of severe NEC (non-cases). The matching algorithm successfully produced a well matched collection of controls, based on the majority of these characteristics. There were, however, a slightly higher proportion of infants with the lowest birthweights (< 750 g) among the cases compared to the matched controls (49% vs 38%). The only other factors to show a noticeable difference between cases and matched controls are maternal hypertension (37% vs 49%) and ventilation at trial entry (6% vs 21%), neither of which have been previously identified as risk factors for NEC. Figure  3 shows scatter plots of birth weight and gestational age for the 35 individual cases of NEC and their matched controls, which provides a visual representation of the matching.

figure 3

Scatterplots showing the matched cases and controls for each case of severe NEC. Each panel contains a separate case of NEC and the matched controls

The main results of the adjusted analysis are presented in Fig.  4 . Unadjusted analyses are included in Table A 1 in the supplementary material, alongside a post-hoc sensitivity analysis that additionally includes covariate adjustment for gestational age and birthweight. While the study did not identify any significant trends between feed-type and severe NEC the findings were consistent with the a priori hypothesis, that exposure to non-breast milk feeds is associated with an increased risk of NEC. In addition, the study identified some potential trends in the association of feed-type with other important outcomes, worthy of further investigation.

figure 4

Forest plot showing the adjusted odds ratio comparing severe NEC to exposures. Odds ratios are adjusted for sex, gestational age and birthweight (via matching) and trial arm (via covariate adjustment). a Odds ratio and 95% confidence interval. b 109 unique controls

Employing a matched nested case-control design for this secondary analysis of clinical trial data overcame many of the limitations of a standard case-control analysis. We were able to select controls from the same population as the cases thus avoiding selection bias. Using matching, we were able to create a comparable sample of controls with respect to important clinical characteristics and confounding factors. This method allowed us to reliably investigate the temporal relationship between feed type and severe NEC since the exposure data was collected prospectively prior to the outcome occurring. We were also able to successfully investigate the relationship between feed type and several other important outcomes such as sepsis. A standard case-control analysis is typically based on recall or retrospective data collection once the outcome is known, which can introduce recall bias. If we had performed a simple comparison between cases and non-cases of NEC without taking into account the timing of the exposure, this would have produced misleading results. Another advantage of the matched nested case-control design was that we were able to match cases to controls at the time of the outcome event so that they were of comparable ages. The methodology is especially powerful when the timing of the exposure is of importance, particularly for time-dependent exposures such as the one studied here.

While the efficient use of existing trial data has a number of benefits, there are of course disadvantages to using data that were collected for another primary purpose. For instance, it is possible that such data are less robustly collected and checked. As a result, researchers may be more likely to encounter participants with either invalid or missing data.

For instance, the some of the additional feed log data collected in ADEPT were never intended to be used to answer clinical research questions, rather, their purpose was to monitor the adherence of participants to the intervention or provide added background information. In this study, it was necessary to make assumptions about missing data to fill small gaps in the daily feed logs. Researchers should take care that such assumptions are fully documented in the statistical analysis plan in advance and determined blinded to the outcome. Another option is to plan these sub-studies at the design phase, however, there needs to be a balance between the potential burden of additional data collection and having a streamlined trial that is able to answer the primary research question.

Another limitation of the methodology is that it is only possible to match on known confounders. This is in contrast to a randomised controlled trial, in which it is possible to balance on unknown and unmeasured baseline characteristics. As a consequence, particular care must be given to select important matching factors, but also to avoid overmatching.

The methodology allows for participants to be selected as controls multiple times, so there is the possibility that systematic duplication of a specific subset of participants (e.g. infants with a lower birthweight and smaller gestational age) could lead to a small number of participants disproportionately influencing the results. Within this study, we conducted sensitivity analyses with fewer controls, and were able to demonstrate that this had a minimal impact on the findings.

We have demonstrated how a matched nested case-control design can be embedded within an RCT to identify credible associations in a secondary analysis of clinical trial data where the exposure of interest was not randomised. We planned this study after the clinical trial data had already been collected, but it could have been built in seamlessly as a SWAT (Study Within A Trial) during the trial design phase, to ensure that all relevant data were collected in advance with minimal effort. This method has several advantages over a standard case-control design and offers the potential to make reliable inferences in scenarios where unethical or practical issues preclude the use of an RCT. Moreover, because of the flexibility of the methodology in terms of the design and analysis, the matched nested case-control design could reasonably be applied to a wide range of challenging research questions. There is an abundance of high quality large prospective studies and clinical trials with well characterised cohorts, in which this methodology could be applied to investigate causal relationships, adding considerable value for money to the original studies.

Availability of data and materials

ADEPT trial data are available upon reasonable request, subject to the NPEU Data Sharing Policy.

Abbreviations

Abnormal Doppler Enteral Prescription Trial

  • Randomised controlled trial

Necrotising enterocolitis

Continuous positive airway pressure

Umbilical artery catheter

Umbilical venous catheter

Study within a trial

Breslow N. Design and analysis of case-control studies. Annu Rev Public Health. 1982;3(1):29–54.

Article   CAS   Google Scholar  

Mantel N. Synthetic retrospective studies and related topics. Biometrics. 1973;29(3):479–86.

Breslow NE. Statistics in epidemiology: the case-control study. J Am Stat Assoc. 1996;91(433):14–28.

Breslow NE, Lubin J, Marek P, Langholz B. Multiplicative models and cohort analysis. J Am Stat Assoc. 1983;78(381):1–12.

Article   Google Scholar  

Goldstein L, Langholz B. Asymptotic theory for nested case-control sampling in the cox regression model. Ann Stat. 1992;20(4):1903–28.

Ernster VL. Nested Case-Control Studies. Prev Med. 1994;23(5):587–90. https://doi.org/10.1006/pmed.1994.1093 .

Article   CAS   PubMed   Google Scholar  

Essebag V, Genest J Jr, Suissa S, Pilote L. The nested case-control study in cardiology. Am Heart J. 2003;146(4):581–90. https://doi.org/10.1016/S0002-8703(03)00512-X .

Article   PubMed   Google Scholar  

Nieuwlaat R, Connolly BJ, Hubers LM, Cuddy SM, Eikelboom JW, Yusuf S, et al. Quality of individual INR control and the risk of stroke and bleeding events in atrial fibrillation patients: a nested case control analysis of the ACTIVE W study. Thromb Res. 2012;129(6):715–9.

Fox GJ, Nhung NV, Loi NT, Sy DN, Britton WJ, Marks GB. Barriers to adherence with tuberculosis contact investigation in six provinces of Vietnam: a nested case–control study. BMC Infect Dis. 2015;15(1):103.

Mattson CL, Bailey RC, Agot K, Ndinya-Achola J, Moses S. A nested case-control study of sexual practices and risk factors for prevalent HIV-1 infection among young men in Kisumu, Kenya. Sex Trans Dis. 2007;34(10):731.

Google Scholar  

Leaf A, Dorling J, Kempley S, McCormick K, Mannix P, Linsell L, et al. Early or delayed enteral feeding for preterm growth-restricted infants: a randomized trial. Pediatrics. 2012;129(5):e1260–e8. https://doi.org/10.1542/peds.2011-2379 .

Lucas A, Cole TJ. Breast milk and neonatal necrotising enterocolitis. Lancet. 1990;336(8730):1519–23.

McGuire W, Anthony MY. Donor human milk versus formula for preventing necrotising enterocolitis in preterm infants: systematic review. Arch Dis Child Fetal Neonatal Ed. 2003;88(1):F11–F4.

Walsh MC, Kliegman RM. Necrotizing enterocolitis: treatment based on staging criteria. Pediatr Clin N Am. 1986;33:179–201.

Mahalanobis PC. On the Generalized Distance in Statistics. Proceedings of the National Institute of Science of India. 1936;2:49-55.

Brookmeyer R, Liang K, Linet M. Matched case-control designs and overmatched analyses. Am J Epidemiol. 1986;124(4):693–701.

Lubin JH. Extensions of analytic methods for nested and population-based incident case-control studies. J Chronic Dis. 1986;39(5):379–88.

Robins JM, Gail MH, Lubin JH. More on" biased selection of controls for case-control analyses of cohort studies". Biometrics. 1986;42(2):293–9.

Lubin JH, Gail MH. Biased selection of controls for case-control analyses of cohort studies. Biometrics. 1984;40(1):63–75.

Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics. 1975;31(3):643–9.

Gail M, Williams R, Byar DP, Brown C. How many controls? J Chronic Dis. 1976;29(11):723–31.

Meeting abstracts from the 5th International Clinical Trials Methodology Conference (ICTMC 2019). Trials. 2019;20(Suppl 1):579 Brighton, UK. 06–09 October 2019. doi: 10e.1186/s13063-019-3688-6 .

Download references

Acknowledgements

This work was presented at the International Clinical Trials Methodology Conference (ICTMC) in 2019 and the abstract is published within Trials [ 22 ].

This work was supported by Action Medical Research [Grant number GN2506]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and affiliations.

Nottingham Clinical Trials Unit, University of Nottingham, Nottingham, UK

Christopher Partlett

National Perinatal Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK

Christopher Partlett, Alison Leaf, Edmund Juszczak & Louise Linsell

University Surgery Unit, Faculty of Medicine, University of Southampton, Southampton, UK

Nigel J. Hall

Department of Child Health, Faculty of Medicine, University of Southampton, Southampton, UK

Alison Leaf

You can also search for this author in PubMed   Google Scholar

Contributions

NH, AL, EJ and LL conceived the project. CP performed the statistical analyses under the supervision of LL and EJ. CP and LL drafted the manuscript and EJ, AL and NH critically reviewed it. All authors were involved in the interpretation of results. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Christopher Partlett .

Ethics declarations

Ethics approval and consent to participate.

No ethical approval was required for this study, since it used only previously collected, fully anonymised research data.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1..

Table A1 Association between exposures and the development of Severe NEC. Each case is matched to 4 controls with the same sex and the smallest distance in terms of the Malhalanobis distance based on gestational age and birthweight.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Partlett, C., Hall, N.J., Leaf, A. et al. Application of the matched nested case-control design to the secondary analysis of trial data. BMC Med Res Methodol 20 , 117 (2020). https://doi.org/10.1186/s12874-020-01007-w

Download citation

Received : 03 December 2019

Accepted : 05 May 2020

Published : 14 May 2020

DOI : https://doi.org/10.1186/s12874-020-01007-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Preterm infants
  • Neonatology
  • Statistical methods
  • Nested case-control

BMC Medical Research Methodology

ISSN: 1471-2288

nested case control study selection bias

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Interpreting Epidemiologic Evidence: Connecting Research to Applications (2nd edn)

  • < Previous chapter
  • Next chapter >

Interpreting Epidemiologic Evidence: Connecting Research to Applications (2nd edn)

8 Selection Bias in Case-Control Studies

  • Published: August 2016
  • Cite Icon Cite
  • Permissions Icon Permissions

In cohort studies, sampling of study participants is independent of the outcome. In contrast, in case-control studies participants are sampled at different rates depending on whether or not they develop the outcome of interest: typically all cases and a small sample of eligible controls are recruited. Controls in case-control study are used to estimate the distribution of exposure and confounders in the source population from which the cases are drawn. Thus, the challenge in case-control studies is to generate a sample of controls that represent the population experience that generated the cases, i.e., selecting from those who would have become identified cases in the study had they developed the disease of interest. Selection bias can be introduced when the chosen controls deviate from this ideal through a lack of correspondence between the source of cases and selected controls with respect to calendar time, health care seeking behavior, or other attributes. Tools for evaluating the potential for selection bias in case-control studies include comparing measured exposure prevalence among controls to an external population and determining whether the exposure among controls follows expected patterns, examining exposure-disease associations in relation to markers of susceptibility to bias, adjusting for markers of selection, and evaluating whether expected associations between exposure and disease can be confirmed.

Signed in as

Institutional accounts.

  • GoogleCrawler [DO NOT DELETE]
  • Google Scholar Indexing

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code

Institutional access

  • Sign in with a library card Sign in with username/password Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Sign in through your institution

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Sign in with a library card

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Combining high quality...

Combining high quality data with rigorous methods: emulation of a target trial using electronic health records and a nested case-control design

  • Related content
  • Peer review
  • Bahareh Rasouli , postdoctoral researcher 1 2 ,
  • Jessica Chubak , senior investigator and affiliate professor 3 4 ,
  • James S Floyd , associate professor 5 ,
  • Bruce M Psaty , professor 5 6 ,
  • Matthew Nguyen , data consultant 3 ,
  • Rod L Walker , collaborative biostatistician 3 ,
  • Kerri L Wiggins , research scientist 7 ,
  • Roger W Logan , senior research scientist 8 ,
  • Goodarz Danaei , professor 2 8 9
  • 1 Institute of Environmental Medicine, Karolinska Institutet, Stockholm, Sweden
  • 2 Department of Global Health and Population, Harvard TH Chan School of Public Health, Boston, MA 02115, USA
  • 3 Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
  • 4 Department of Epidemiology, University of Washington, Seattle, WA, USA
  • 5 Cardiovascular Health Research Unit, Departments of Medicine and Epidemiology, University of Washington, Seattle, WA, USA
  • 6 Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
  • 7 Department of Medicine, University of Washington, Seattle, WA, USA
  • 8 Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
  • 9 CAUSALab, Harvard TH Chan School of Public Health, Boston, MA, USA
  • Correspondence to: G Danaei gdanaei{at}hsph.harvard.edu
  • Accepted 27 October 2023

Emulating a target trial reduces the potential for bias in observational comparative effectiveness research. Owing to feasibility constraints, large cohort studies often use electronic health records without validating key variables or collecting additional data. A case-control design allows researchers to validate, supplement, or collect additional data on key measurements in a much smaller sample compared with the entire cohort. In this article, Rasouli and colleagues describe methods to emulate a target trial using a nested case-control design, and provide a detailed guideline, an analytical program, and results of a clinical example.

Summary points

Case-control studies are efficient designs for studies that require validation of key variables; data collection can occur on all cases and a sample of controls rather than on an entire cohort

Case-control studies are vulnerable to several biases, including prevalent user bias and inappropriate adjustment for covariates when treatment and confounders are measured at the date when cases are identified and controls are sampled (ie, the index date)

Emulating the design and analysis of a target randomized controlled trial can minimize some of these biases in comparative effectiveness case-control studies

The proposed approach combines the benefits of measure validation in nested case-control designs with the strengths of target trial emulation, reducing bias

Randomized controlled trials are considered the ideal study design for comparative effectiveness research. Given that such trials are usually costly, lengthy, and, in some instances, unethical or infeasible, interest is increasing in using observational studies such as those conducted using data from electronic health records and administrative datasets to inform clinical decision making. 1 2 3 Analysis of observational data, however, requires careful consideration of possible biases, including confounding, selection bias, 1 4 5 6 and measurement error. 2 3 7 8 9 10 11 Methods and tools for minimizing these biases are therefore essential.

Suppose researchers aim to use electronic health record data to study the effect of initiating a treatment (eg, statins) on risk of cardiovascular disease events. Such events identified in the electronic health record using international classification of disease (ICD) codes are likely to be misclassified compared with review of medical records 12 and this may lead to substantial bias. 2 3 7 8 9 10 Collecting additional data to validate measurements (in this example for outcomes) using the entire electronic health record cohort is often impractical because it requires substantial time and resources. 9 13 This impracticality of gathering data on the complete cohort has led many researchers to conduct nested case-control studies to allocate their limited resources to gather high quality data on a subsample rather than on the entire cohort. However, common approaches to designing and analyzing case-control studies are prone to several biases. 1 4 5 6 Emulating a target trial can reduce the potential for these types of bias. 3 In this approach, investigators first specify a clear causal question and develop a detailed protocol for a target randomized trial to answer that causal question. Then, they modify the protocol to accommodate the observational nature of the data. 14

Most previous studies that emulated a target trial used a cohort design. We have already shown that emulating a target trial using a cohort design can provide estimates of treatment effects that are much more consistent with those observed in randomized controlled trials compared with estimates based on conventional methods without emulating a target trial. 15 16 Methods to emulate a target trial using a case-control design have only recently been developed, 17 and a detailed description and analytical guideline on how to implement these methods has not been published. Although it may seem counterintuitive to conceptualize emulating a target trial using a nested case-control design, it is worth noting that nested case-control studies are just an efficient way of sampling from an underlying cohort, and cohort studies are meant to estimate the same underlying effect size that would have been observed in randomized controlled trials.

In this paper we discuss common biases in the design and analysis of case-control studies and how emulating a target trial may reduce these biases. Using a clinical example, we then describe the protocol of a target trial that we wish to emulate. For our clinical example we explain how a nested case-control design can be used to emulate the target trial, and we estimate the observational analog of the intention-to-treat and per protocol effects. Finally, we present the results of our clinical example. Supplemental file 1 provides a detailed guideline and an analytical code to implement the target trial emulation approach using a nested case-control design.

The methods we suggest can be applied in two ways: by reanalyzing a previously conducted case-control study, or by conducting a new case-control analysis to emulate a target trial. If, however, a previously conducted case-control study collected data on cases and controls only at the time of the event or sampling (known as the index date) or a few time points before that date, it may not be appropriate to use the proposed methods unless additional data across time can be obtained from the same electronic health record dataset. To implement the proposed methods, it is essential to have access to comprehensive data across time on eligibility, potential confounders, and treatment. In our clinical example, we reanalyzed a previous case-control study that had been linked to underlying electronic health record data.

Biased approaches in the design and analysis of case-control studies

Conventional case-control studies often evaluate the values of treatment and confounders at the event date for cases and the matched date for controls—that is, the index or reference date (see supplemental figure 1). This approach leads to prevalent user bias and bias due to inappropriate adjustment for covariates; both major types of bias.

Prevalent user, or differential survival, bias

In case-control studies, assessing treatment or exposure at the index date may lead to prevalent user bias. Current users have survived and continue taking treatment; if treatment affects the outcome or shares common causes with the outcome, current users are not comparable to non-users, resulting in bias. 18 19 20 This differential survival bias is more obvious when treatment has a short term effect, such as the harmful effects of postmenopausal hormone replacement therapy on myocardial infarction. 21

Bias due to inappropriate adjustment for covariates

Adjusting for potential confounders measured at the index date may create bias if those variables are affected by past treatment. 17 Such adjustment may either remove part of the effect of interest (if the covariate measured at the index date is a mediator) or lead to collider stratification bias (if the covariate shares a common cause with the outcome).

Case-control studies that to some extent measure exposure, covariates, and eligibility before the index date may have less bias.

Trial emulation using case-control design to reduce bias

In a randomized controlled trial, participants are assigned randomly to a treatment strategy at time zero—that is, when they meet eligibility criteria and follow-up starts. Successful emulation of a target trial requires a clear definition of time zero, here referred to as the enrollment date. Enrollment date is a point in (or short period of) time at which eligibility criteria are satisfied, treatment is assessed, and follow-up starts. Assigning enrollment dates allows for comparison of treatment initiators (incident users) with non-initiators at a point in time to prevent prevalent user bias, and for measuring confounders before the observed treatment to prevent bias due to adjustment for covariates affected by past treatment.

The analytical dataset can be created in two ways. In the simplest approach, the entire period represented in the data can be assigned as the enrollment period of a single trial. In this approach, each row of data corresponds to one person. Eligibility can be assessed for all individuals in the dataset, and as soon as a person becomes eligible, they can enter into the trial. Baseline is defined for an individual as the first time when all eligibility criteria are met (the first enrollment date). Values of baseline covariates should be assessed before this enrollment date, and the observed treatment should be recorded at the time of enrollment ( fig 1 ). To assess if imposing eligibility criteria may introduce selection bias, we suggest that researchers compare baseline characteristics of the enrolled population with the patients who are not eligible, noting that restricting the study population to a subset of patients in the target population may introduce selection bias if being selected into the study is associated with both the treatment and the outcome owing to shared underlying factors (common causes). For each eligible case, controls who are eligible can be randomly sampled as of the case’s enrollment date using incidence density or risk set sampling. The estimated odds ratio from such a case-control analysis approximates the hazard ratio or (in the presence of constant hazards) incidence rate ratio from the target trial. 22 23

Fig 1

Schematic diagram of a case-control study with incidence density sampling to emulate a target trial using the entire timeframe as the enrollment period. Squares represent selected control index dates and triangles represent index dates for cases (with letters depicting individuals)

  • Download figure
  • Open in new tab
  • Download powerpoint

The approach discussed above allows each patient to enroll in only one trial. However, we suggest discretizing the data by time, such as days, weeks, or months, and using each time interval as an enrollment period, thus allowing each individual to enroll in as many periods for which they are eligible. In this approach, each row of data represents one copy of a person who enrolls in a trial, here referred to as a person trial. This approach maximizes the use of data and improves statistical efficiency. Figure 2 illustrates the process of sampling cases and controls using this approach. The detailed steps are explained in section 2.3 of the guideline (see supplemental file 1) and supplemental text (see supplemental file 2). In supplemental figures 2 and 3 we also illustrate the steps of sampling case person trials and control person trials in diagrams. Briefly, all eligible person trials that lead to an event will be included in the analytical dataset as “case person trials.” One or more “control person trials” are randomly sampled for each case person trial from all eligible person trials within the dataset.

Fig 2

Schematic diagram of a case-control study with incidence density sampling to emulate a target trial using each calendar month as one enrollment period. Each month (m) is considered as the enrollment period for a trial. Each letter identifies one individual. Triangles represent cases and squares represent controls. It is possible for a case to be sampled as control before experiencing the event (for example, in trial 3, individual E is sampled as a control for case A and later becomes a case). To identify cases within each monthly trial, individuals who are eligible at m and experience an event between month m+1 and the end of follow-up are identified (A and E), the event month (q) is recorded, and the case status is validated. These observations are referred to as case person trials (see process 1 in the guideline (supplemental file 1) for more detail). To sample controls for each case within each monthly trial using incidence density sampling, all individuals who are eligible at month m are identified, n (2 in the example in the figure) controls are randomly selected for each case, and a randomly selected month between m and their end of follow-up is selected for each control (referred to as q and shown as a square). This process is repeated for subsequent cases within each trial (m) and for subsequent monthly trials. These observations are referred to as control person trials. It should be ensured that these sampled controls did not experience an event before month q, using existing or additional data. To conduct an intention-to-treat analysis analog and in the absence of differential loss to follow-up, information on treatment at month m, confounders before that month, and the case or control status are sufficient for analysis (no information is required at month q). However, to adjust for non-adherence using a per protocol analysis or to adjust for differential loss to follow-up (in intention to treat or per protocol), information on time varying determinants of treatment and loss to follow-up between months m and q is required to estimate inverse probability weights (see process 2 in the guideline (supplemental file 1) for more detail). Alternatively, month q can be chosen for all controls sampled for each case to be the same as event month for the case, which is often referred to as risk set sampling

Trial emulation using case-control design: Analysis

Once the analytical dataset is created, the observational analog of the intention-to-treat effect can be estimated by comparing the treatment initiation status at enrollment of case person trials with control person trials using a pooled logistic regression model, adjusting for confounders measured at or before enrollment. If each patient is allowed to enroll in multiple trials, the variance of the estimated effect size should be adjusted for the within person correlation of person trials using an appropriate variance estimator, such as a robust variance estimator. 24 More details are provided in section 2.2 of the guideline (see supplemental file 1).

To adjust for imperfect adherence and estimate a per protocol effect, we propose artificially censoring non-adherent person trials when they deviate from their assigned treatment strategy at the time of treatment discontinuation. The resulting dataset will only include person trials that are always adherent to assigned treatment. To adjust for potential selection bias due to censoring because of imperfect adherence, inverse probability weights should be estimated using time varying data on prognostic factors associated with the probability of treatment. In section 2.3 of the guideline (see supplemental file 1), we explain in detail how to include such time varying factors in an expanded dataset of case person trials and control person trials. Inverse probability weights should be estimated in the control population because controls represent the target population, whereas the association between treatment and confounders may be different among cases. 25 A similar approach using inverse probability weights can also be used to adjust for differential loss to follow-up (eg, due to disenrollment or competing risks) by including factors associated with loss to follow-up in the time varying dataset.

When using data from an existing matched case-control study, the covariates used for matching should be included in the outcome model, preferably in the same functional form used for matching. This method is followed because matching in case-control studies creates an association between the matching variable and case-control status.

Clinical example

Protocol for target trial and its observational emulation.

We developed a detailed protocol of a target trial to estimate the effect of starting statin treatment on the prevention of myocardial infarction ( table 1 ) and emulated this trial under a nested case-control design using electronic health record data. The few differences between the target trial and the emulated trial are presented in the last column of table 1 and are mostly due to data availability and lack of randomization. As is common in observational comparative effectiveness research using electronic health records, we restricted the population to patients who had been enrolled in the health plan for more than one year. This requirement serves at least two purposes. Firstly, it allows investigators to gather information on eligibility, treatment, and confounders for a period preceding the patient’s eligibility, which is crucial for adjusting for baseline confounding factors, and, secondly, it ensures that eligible patients are long term users of the healthcare system from which the study data are derived, thus reducing loss to follow-up. This restriction may, however, induce selection bias or limit the generalizability of the findings. For comparison of the protocol of the target trial with a previously conducted randomized controlled trial, we also describe the protocol for a clinical randomized controlled trial: the Justification for the Use of Statins in Primary Prevention: An Intervention Trial Evaluating Rosuvastatin trial (JUPITER) 26 in supplemental file 2.

Specification of protocols for a target trial on the effect of statin initiation and risk of myocardial infarction and emulation of the target trial using EHR data

  • View inline

We chose a case-control design rather than a cohort design because our aim was to validate outcome status (cases and non-cases) using manual review of medical records, which would not be practical to do on the entire cohort. Although bias correction methods have been proposed to adjust for outcome misclassification, they often rely on assumptions and modeling choices that may introduce additional uncertainty or bias. 27

Data sources

As mentioned before, data from a previously conducted case-control study can be reanalyzed or a new case-control study conducted to emulate the target trial if electronic health record data are available over the entire span of the study. Here, we used data from a previously conducted case-control study that had validated diagnoses for myocardial infarction. We extracted additional data from the electronic health record database from which the cases and controls arose. The electronic health record data came from Kaiser Permanente Washington (KPWA), an integrated healthcare delivery system in the US providing medical care and coverage to about 700 000 members in Washington State. The main advantages of this setting were convenience and the availability of extensive electronic health record data as well as data from previously conducted nested case-control studies. 28 29 The previous case-control studies, however, had collected data on cases and controls only at the index date or at a few time points before it, but our access to the same electronic health record dataset allowed us to obtain comprehensive data on eligibility, potential confounders, and treatment across time. We were then able to compare results obtained from a cohort design with unvalidated outcomes to those obtained from a case-control design with validated outcomes in the same population. KPWA maintains computerized data on diagnoses, hospital admissions, procedures, outpatient visits, laboratory test results, vital signs, and prescriptions. Information on statin prescription fills was derived from a pharmacy database, which included all outpatient prescription fills at KPWA pharmacies and prescription claims submitted by outside pharmacies. Pharmacy data comprised a unique patient identifier, drug name, strength, route of administration, date dispensed, quantity dispensed, and days’ supply. We chose a large set of potential confounders based on a priori knowledge of determinants of statin initiation and incidence of myocardial infarction. However, similar to other studies based on electronic health records, we did not have data on diet or physical activity. Data on blood pressure and body mass index were available from 2005 onwards, and these variables were included only in a sensitivity analysis. We adjusted for total cholesterol and high density lipoprotein cholesterol in the main analysis and additionally for low density lipoprotein cholesterol in a sensitivity analysis. To ascertain comorbidities, we used ICD-9 (international classification of diseases, ninth revision) diagnosis codes. Analyses were adjusted for history of atrial fibrillation, chronic obstructive pulmonary disease, cancer, cataract, dementia, depression, diabetes, heart failure, hypertension, and chronic kidney disease.

The myocardial infarction diagnosis among cases and lack of a myocardial infarction event among controls was verified by medical record review as part of the Heart and Vascular Health study. This previously conducted case-control study among KPWA enrollees (1994-2010) is further described in supplemental file 2. 28 29 30

We allowed each patient to enroll in multiple emulated monthly trials (see details in supplemental file 1). We used a pooled logistic regression model to estimate the observational analog of intention-to-treat effect of statin initiation on occurrence of myocardial infarction after adjusting for confounders at enrollment date as well as trial month and follow-up month. We also estimated the per protocol effect of statins after adjustment for time varying confounders through inverse probability weighting.

To evaluate the impact of validation for case or control status, we also used cohort data without outcome validation to emulate the same target trial, and we calculated the observational analog of intention-to-treat and per protocol effects of statin initiation on occurrence of myocardial infarction. Furthermore, we performed a set of sensitivity analyses to evaluate the impact of alternative designs and analytic approaches (see supplemental table 2).

Among the 10 128 unique cases and controls in the Heart and Vascular Health study, 1221 and 4267, respectively, met the eligibility criteria for at least one person trial ( fig 3 ). The main reason for exclusion was not having data on selected confounders in the six months before a potential enrollment period (see supplemental figure 4). Characteristics were overall similar between ineligible and eligible patients, although eligible patients had a slightly higher risk profile (eg, higher levels of low density lipoprotein cholesterol) and higher prevalence of documented comorbidities (eg, diabetes) than ineligible patients (see supplemental table 1). We created 198 monthly emulated trials (between January 1994 and June 2010). Across all trials, 15 263 were eligible case person trials, and we sampled five controls for each case using incidence density sampling, thereby creating 76 315 control person trials ( fig 3 ). Some individuals selected as controls later became cases. The median follow-up time (time between enrollment and event date for cases and sampling date for controls) was 25 months (interquartile range 11-43 months) for initiators and 30 (12-55) months for non-initiators. Overall, statin treatment was initiated in 1.9% (287 of 15 263) of the eligible case person trials and 1.5% (1167 of 76 315) of the sampled control person trials. Statin initiators were generally less healthy than non-initiators ( table 2 ). They were on average older and had a higher prevalence of diabetes and smoking, higher total cholesterol levels, and lower high density lipoprotein cholesterol levels.

Fig 3

Flowchart of person trials in case-control trial emulation of statin treatment initiation and risk of myocardial infarction using incidence density sampling (1994-2010). *Cases and controls overlap (ie, controls may later be selected as cases). †Some people were initiators in some trials and non-initiators in other trials

Baseline characteristics of eligible person trials of statin treatment initiation on risk of myocardial infarction (1994-2010), case-control design with validated outcome status and EHR data. Values are number (percentage) unless stated otherwise

In a minimally adjusted (only trial month and follow-up month) pooled logistic regression model for validated myocardial infarction the intention-to-treat odds ratio was 1.26 (95% confidence interval 1.10 to 1.44). After further adjustment for confounders at enrollment date, the odds ratio was 0.80 (0.69 to 0.92) ( table 3 ). Adherence to assigned treatment was assessed among sampled controls; 41% (596 of 1454) of initiators discontinued treatment within one year and 64% (931 of 1454) discontinued treatment within five years. Conversely, 8% (7210 of 90 124) of non-initiators started treatment within one year and 38% (34 247 of 90 124) started treatment within five years ( fig 4 ). The per protocol odds ratio after censoring non-adherent person trials and adjusting for determinants of adherence using inverse probability weights was 0.71 (95% confidence interval 0.58 to 0.87). Figure 5 summarizes the results obtained using different analytic approaches and study designs. The biased case-control analysis using validated outcome status and measures of covariate and treatment at the index date (as opposed to enrollment date) produced an odds ratio for statin use of 1.12 (0.96 to 1.31) (see supplemental table 2).

Estimated odds ratios for effect of statin treatment initiation on risk of myocardial infarction (1994-2010) in case-control design using validated outcome status and EHR data

Fig 4

Adherence to treatment by statin initiation status among controls (1994-2010) using case-control design with validated outcome status. Last deviation from protocol among initiators occurred at month 122 and among non-initiators at month 144

Fig 5

Estimated effects of statin treatment initiation on risk of myocardial infarction (1994-2010) using different study designs and analytic approaches. Effect size is pooled hazard ratio from meta-analysis and odds ratios elsewhere. *From Danaei et al 2012. 20 No results for per protocol analysis were reported. On average across trials, 21% of statin initiators discontinued treatment and 25% of statin non-initiators started treatment. Whiskers represent 95% CIs. CI=confidence interval; ICD=international classification of diseases; MRR=medical record review; RCT=randomized controlled trial

In sensitivity analyses, a case-control analysis comparing unvalidated controls sampled from the KPWA dataset with validated cases from the Heart and Vascular Health study showed an odds ratio of 0.80 (0.70 to 0.90) (see supplemental table 2). Comparing unvalidated controls with unvalidated cases from electronic health record data, identified only by ICD codes, showed an odds ratio of 1.00 (0.90 to 1.10) ( fig 5 ). Other sensitivity analyses are reported in supplemental table 7.

Concluding remarks

Using a nested case-control design, we emulated a target trial of the effect of statin initiation on incidence of myocardial infarction. We compared our findings to those reported in a previous meta-analysis of intention-to-treat results from randomized controlled trials (pooled hazard ratio of 0.69). 20 The estimated intention-to-treat treatment effect from our emulated trial was consistent with benefit (odds ratio 0.80). In contrast, an odds ratio of 1.12 was obtained for a biased case-control analysis with confounders and treatment measured at the index date. The smaller protective effect estimated in the trial emulations compared with previous randomized controlled trials could be explained by unmeasured confounding, differences in eligibility criteria, longer follow-up time, and lower adherence in our study population compared with those enrolled in randomized controlled trials.

Strengths and limitations of this study

The proposed methods may not be usable in all settings. Firstly, the methods require time varying data on eligibility, treatment, and confounders. Therefore, such analyses cannot be implemented within existing case-control studies that have measured such factors only at the index date (or sporadically before it). In our clinical example, we resolved this by extracting additional data on these factors for our selected cases and controls from the healthcare system’s electronic health record. However, even in the context of a high quality electronic health record system, as in our study, information on major confounders may not be available for all potentially eligible individuals during the period of study. Therefore, investigators may limit the eligible population to those with recent measurements of major confounders, which reduces sample size and may introduce selection bias and limit generalizability. Secondly, the proposed inverse probability weighting to adjust for imperfect adherence and differential loss to follow-up is sensitive to violations of the positivity assumption. Such violations are more common with longer durations of follow-up and may lead to undue influence of a few observations. In sections 3.4 and 3.6 of the guideline (see supplemental file 1), we propose several ways to reduce the potential for such violations. Finally, the proposed methods, especially allowing each patient to enroll in multiple trials and estimating time varying inverse probability weights are conceptually and analytically complicated. The guideline (see supplemental file 1) aims to resolve this issue by providing detailed guidance on how to conceptualize the methods, prepare the analytic datasets, and conduct the analysis using the provided SAS macro and sample dataset.

Despite these limitations, the proposed methods have several major strengths compared with biased analytical methods used in case-control studies that evaluate eligibility, treatment, and confounders at the index date. Drafting a protocol for the target trial helps clarify the causal question of interest by clearly defining eligibility criteria, treatment strategies, and outcome definitions. Emulating the protocol using observational data helps reduce bias due to prevalent user bias (ie, differential survival bias) by defining intervention strategies as treatment initiation among those who were eligible at study enrollment. In addition, measuring confounders at or before treatment assignment prevents inappropriate adjustment for factors that may be affected by previous treatment (ie, collider stratification bias and adjustment for a mediator). Notably, not all case-control analyses evaluate exposure and confounders at the index date; those that evaluate these variables before the index date may be less prone to bias.

Compared with previous methods proposed to emulate the design and analysis of a target trial using a cohort design, the case-control design allows researchers to focus resources on efficiently collecting data on high quality and expensive measures of eligibility, confounders, treatments, and outcomes. In our clinical example, we obtained a sensitivity of 95-96% and a specificity of 98% for the diagnosis of myocardial infarction by ICD codes compared with medical reviews (as the ideal method for validation). But even this small error in measurement was enough to introduce substantial bias in the results. In other clinical examples, the measurement error may be much larger. 2 3 7 8 9 10 12 Our clinical example illustrates the benefit of implementing these methods and shows the process of emulating a sequence of nested trials to increase statistical efficiency.

In conclusion, emulating a target trial using a nested case-control design allows high quality data and validated measures to be combined with analytic methods that are less prone to common biases in comparative effectiveness research. The accompanying guideline and analytic code should allow other investigators to implement these analyses.

Ethics statements

Ethical approval.

This study was approved by the Harvard TH Chan School of Public Health Research ethics committee (protocol No: IRB18-1075, institutional review board (IRB) effective date: 24 Sept 2018), and Kaiser Permanente Washington Region IRB (study No: 1191203-1, IRB effective date: 13 Jun 2018). The lead author affirms that the manuscript is an honest, accurate, and transparent account of the study being reported and that no important aspects of the study have been omitted.

Data availability statement

No additional data available.

Acknowledgments

We thank Miguel A Hernán, who contributed to the design of the study and development of the methodology; Barbra A Dickerman who contributed to developing the methodology and interpreting the results and reviewed the manuscript and the guideline; all participants, investigators, and the staff of the Kaiser Permanente Washington Health Research Institute and Heart and Vascular Health study; and 22 researchers and data analysts for providing comments on an earlier draft of the guideline during our qualitative interviews. We are grateful to Kelly Meyers for project management support at Kaiser Permanente Washington Health Research Institute.

Contributors: GD the principal investigator, conceived the study, led the development of the methodology, and contributed to the interpretation of the results and writing of the manuscript and the guideline. BR contributed to developing the methodology, data analysis, interpretation of the results, and writing of the manuscript and the guideline. JC, MN, and RLW (cohort electronic health record data) and JSF, BMP, and KLW (case-control data from the Heart and Vascular Health study) developed and implemented the data creation protocol, contributed to the interpretation of the results, and reviewed and revised the manuscript and the guideline. RWL developed the SAS macros for the data analysis. All authors read and approved the final version of the manuscript and the guideline. GD is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and take responsibility for the integrity of the data and analysis.

Funding: This study was funded by the Patient-Centered Outcomes Research Institute (award No/project ID: ME-1609-36748). The postdoctoral fellowship to BR is supported by Novo Nordisk Foundation (No: NNF17OC0027580). Kaiser Permanente also supported this research. The funders had no role in considering the study design or in the collection, analysis, interpretation of data, writing of the report, or decision to submit the article for publication.

Competing interests: All authors have completed the ICMJE uniform disclosure form at http://www.icmje.org/disclosure-of-interest/ and declare: support from the Patient-Centered Outcomes Research Institute, Novo Nordisk Foundation, and Kaiser Permanente for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.

The lead author (GD) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered) have been explained.

Provenance and peer review: Not commissioned; externally peer reviewed.

  • Gershman B ,
  • Dahabreh IJ
  • D’Agostino RB Jr . ,
  • D’Agostino RB Sr .
  • Lévesque LE ,
  • Hanley JA ,
  • Porcher R ,
  • Riveros C ,
  • García Rodríguez LA ,
  • Hernán MA ,
  • Walker RL ,
  • Jackson ML ,
  • Nelson JC ,
  • Kohane IS ,
  • Davidson J ,
  • Banerjee A ,
  • Muzambi R ,
  • Warren-Gash C
  • Munger KL ,
  • O’Reilly E ,
  • Cantero OF ,
  • Rodríguez LA ,
  • Dickerman BA ,
  • García-Albéniz X ,
  • Denaxas S ,
  • Schneeweiss S ,
  • Patrick AR ,
  • Stürmer T ,
  • Tavakkoli M ,
  • Manson JE ,
  • Johnson KC ,
  • Women’s Health Initiative Investigators
  • Rodrigues L ,
  • Kirkwood BR
  • Vandenbroucke JP ,
  • Ridker PM ,
  • Danielson E ,
  • Fonseca FA ,
  • JUPITER Study Group
  • Gilbert R ,
  • Martin RM ,
  • Donovan J ,
  • Heckbert SR ,
  • Koepsell TD ,
  • Lemaitre RN ,
  • Blondon M ,
  • Wiggins KL ,

nested case control study selection bias

Potential self-selection bias in a nested case-control study on indoor environmental factors and their association with asthma and allergic symptoms among pre-school children

Affiliation.

  • 1 Technical University of Denmark, Lyngby, Denmark. [email protected]
  • PMID: 16990165
  • DOI: 10.1080/14034940600607467

Selection bias means a systematic difference between the characteristics of selected and non-selected individuals in epidemiological studies. Such bias may be introduced if participants select themselves for a study. The present study aims at identifying differences in family characteristics, including health, building characteristics of the home, and socioeconomic factors between participating and non-participating families in a nested case-control study on asthma and allergy among children. Information was collected in a baseline questionnaire to the parents of 14,077 children aged 1-6 years in a first step. In a second step 2,156 of the children were invited to participate in a case-control study. Of these, 198 cases and 202 controls were finally selected. For identifying potential selection bias, information concerning all invited families in the case-control study was obtained from the baseline questionnaire. Results show that there are several possible biases due to self-selection involved in an extensive study on the impact of the home environment on asthma and allergy among children. Factors associated with participating were high socioeconomic status of the family, more health problems in the case families, and health-related lifestyle factors, such as non-smoking parents. The overall conclusion of this study is that there are selection biases involved in studies that need close cooperation with the families involved. One solution to this problem is stratification, i.e. investigating associations between exposures and health in the same socioeconomic strata.

Publication types

  • Comparative Study
  • Randomized Controlled Trial
  • Research Support, Non-U.S. Gov't
  • Air Pollution, Indoor / adverse effects*
  • Asthma / epidemiology*
  • Asthma / etiology
  • Case-Control Studies
  • Child, Preschool
  • Family Characteristics
  • Follow-Up Studies
  • Respiratory Hypersensitivity / epidemiology*
  • Respiratory Hypersensitivity / etiology
  • Risk Factors
  • Selection Bias*
  • Sick Building Syndrome / complications
  • Sick Building Syndrome / epidemiology*
  • Socioeconomic Factors
  • Surveys and Questionnaires
  • Sweden / epidemiology

nested case control study selection bias

PH717 Module 10 - Bias

Identifying and preventing bias.

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  

On This Page sidebar

Control Selection Bias in a Case-Control Study

Rules for avoiding selection bias in a case-control study.

Learn More sidebar

Suppose we wish to study the possible association between socioeconomic status (SES), defined based on current household income) and risk of cancer of the cervix. We have a source population with 2,000,200 women in whom there are 200 cases of cancer of the cervix during a calendar year.

If we studied all women in the source population and simply used the median household income to classify them as "higher" or "lower" SES,   we would have found the exposure-disease distribution in the table below.

The risk ratio and the odds ratio are both 3.0

However, we can't study the entire population, and since the outcome is relatively uncommon, we decide to do a case-control study. We begin by going to a large medical center that treats cervical cancer patients who are referred from all over the state and identify 100 patients with cervical cancer who agree to be interviewed. The interviews reveal that 75 of the women are from households with incomes that meet our definition of lower SES, and the other 25 are higher SES. To get non-disease control subjects, we send members of our research team into the neighborhood around the medical center and have them go door to door during the day to invite women to be interviewed as controls for our study. In many cases, no one seems to be home, but our team persists and eventually finds 200 control women, of whom 120 meet our definition of lower SES, and 80 who are of higher SES. The sample we have selected is summarized in the contingency table below, which is oriented to facilitate the use of the oddsratio.wald function in R.

The analysis is performed as follows:

> ORtable<-matrix(c(80,120,25,75),nrow = 2,ncol = 2) > oddsratio.wald(ORtable)

$data                   Outcome

Predictor   Disease1 Disease2 Total

  Exposed1       80       25    105   Exposed2       120       75    195   Total          200     100    300

                  odds ratio with 95% C.I. Predictor   estimate   lower   upper   Exposed1         1       NA    NA             Exposed2         2 1.172783 3.410692

$p.value                   two-sided Predictor     midp.exact fisher.exact chi.square   Exposed1       NA NA NA Exposed2 0.009913739   0.01048725 0.01023571

To summarize, the estimated odds ratio was 2.0; the 95% confidence interval for the odds ratio was 1.17 to 3.41; and the p-value was 0.01.

The result is statistically significant, but the estimated measure of association is biased. The odds ratio in the source population was 3.0, but the odds ratio in the biased sample was 2.0, i.e., there was bias toward the null. What went wrong?

The estimated OR was biased because counts in the contingency table for the sample were not representative of the exposure-disease distribution in the source population, which was the entire state. The cases, who were referred from all over the state did, in fact, indicate the exposure distribution in diseased women in the source population, i.e. 75 to 25 or 3:1. However, the exposure distribution among the 200 controls in the sample (120:80) was not representative of the exposure distribution in non-disease women in the source population (1:1). This occurred because the controls were selected by a different mechanism. By going door to door during working hours, the research team missed many women who were employed, so the sample over-selected women who were unemployed. Therefore, there was a greater tendency to select non-diseased controls of lower SES.

The figure below summarizes this scenario, showing the distribution in the source population at the top, the contingency table for the biased sample at the lower right, and the contingency table from a fair sample at the lower left. We have used images of ladles to represent the relative sampling for the contingency tables at the bottom. The representative sample has ladles in each of the four cells that are of equal size indicating a proportionate sample that is representative of the distributions in the source population. However, the biased sample at the lower right has a larger ladle for the non-diseased controls of lower SES, who were over-sampled because of the enrollment method used for the controls, and the controls of higher SES were therefore under sampled, as shown by a smaller ladle.

nested case control study selection bias

  • Controls must come from the same source population as the cases and must be representative of the exposure distribution in the source population. One way to test whether the controls have been selected appropriately is to consider the "would criterion," i.e., if the controls had experienced the outcome, would they have been identified as potential cases? If not, there is selection bias.
  • Controls must be selected independently from exposure, meaning that whether or not a person is exposed or unexposed should not influence selection or enrollment of a control subject.

Hemifacial microsomia is a rare congenital malformation in which the lower half of one side of the face is underdeveloped and does not grow normally. The condition varies in severity and is sometimes subtle and only recognized by an astute pediatrician. Affected children can be referred for surgery to improve facial symmetry by reconstructing the bony and soft tissues, but the surgery is only done in certain large medical centers that receive referrals from all over the United States.

nested case control study selection bias

Researchers wanted to study whether maternal smoking or maternal diabetes are associated with the risk of this condition in their offspring. The names of children treated for this problem were obtained from a medical center in Michigan that specializes in this type of corrective surgery. How can they identify control subjects in a way that avoids selection bias? [Hint: How can they ensure that the "would criterion" is met?] Think about this for a few minutes, and try to devise an unbiased sampling strategy for controls before you look at how the investigators achieved this.

return to top | previous page | next page

Content ©2021. All Rights Reserved. Date last modified: November 1, 2021. Wayne W. LaMorte, MD, PhD, MPH

  • Research article
  • Open access
  • Published: 07 May 2024

Floods and cause-specific mortality in the UK: a nested case-control study

  • Danijela Gasevic 1 ,
  • Zhengyu Yang 1 ,
  • Guowei Zhou 2 ,
  • Yan Zhang 2 ,
  • Jiangning Song 3 ,
  • Hong Liu 2   na1 ,
  • Shanshan Li 2   na1 &
  • Yuming Guo   ORCID: orcid.org/0000-0002-1766-6592 2   na1  

BMC Medicine volume  22 , Article number:  188 ( 2024 ) Cite this article

362 Accesses

18 Altmetric

Metrics details

Floods are the most frequent weather-related disaster, causing significant health impacts worldwide. Limited studies have examined the long-term consequences of flooding exposure.

Flood data were retrieved from the Dartmouth Flood Observatory and linked with health data from 499,487 UK Biobank participants. To calculate the annual cumulative flooding exposure, we multiplied the duration and severity of each flood event and then summed these values for each year. We conducted a nested case-control analysis to evaluate the long-term effect of flooding exposure on all-cause and cause-specific mortality. Each case was matched with eight controls. Flooding exposure was modelled using a distributed lag non-linear model to capture its nonlinear and lagged effects.

The risk of all-cause mortality increased by 6.7% (odds ratio (OR): 1.067, 95% confidence interval (CI): 1.063–1.071) for every unit increase in flood index after confounders had been controlled for. The mortality risk from neurological and mental diseases was negligible in the current year, but strongest in the lag years 3 and 4. By contrast, the risk of mortality from suicide was the strongest in the current year (OR: 1.018, 95% CI: 1.008–1.028), and attenuated to lag year 5. Participants with higher levels of education and household income had a higher estimated risk of death from most causes whereas the risk of suicide-related mortality was higher among participants who were obese, had lower household income, engaged in less physical activity, were non-moderate alcohol consumers, and those living in more deprived areas.

Conclusions

Long-term exposure to floods is associated with an increased risk of mortality. The health consequences of flooding exposure would vary across different periods after the event, with different profiles of vulnerable populations identified for different causes of death. These findings contribute to a better understanding of the long-term impacts of flooding exposure.

Peer Review reports

Floods are the most frequent type of weather-related disaster, accounting for about 47% of all weather-related disasters from 1995 to 2015 [ 1 , 2 ]. Between 1995 and 2015, more than 2.3 billion people were affected by flood disasters, with over 157 thousand people dying directly as a result of floods [ 3 ]. In recent years, many intense urban flooding events have been recorded in the UK, resulting in loss of lives, damages to personal property and public health infrastructure, and disruption to vital services such as water, communications, energy, and public transport [ 4 , 5 , 6 , 7 , 8 , 9 ]. Approximately 1.9 million people across the UK are at risk of floods, and this number will double as early as the 2050s [ 10 ].

In addition to immediate fatalities due to drowning and acute trauma [ 11 ], floods can also cause short- (lasting days or weeks) or medium-health impacts (several weeks or months), including the spread of water- and vector-borne diseases, such as cholera, typhoid, or malaria; injuries during evacuations and disaster clean-up; and exposure to chemical hazards [ 1 , 12 ]. Non-communicable diseases (e.g. cardiovascular disease, neoplasms, chronic respiratory diseases, and diabetes) which need prolonged treatment and care can be exacerbated after floods due to a disruption in care, treatment, medication, supplies, equipment, and overcrowding in shelters [ 13 , 14 , 15 , 16 , 17 , 18 ]. Mental health issues may arise from stressors caused by floods (e.g. property damage, financial loss, loss of a loved one) and have long-lasting health effects on mortality and morbidity. These long-term health consequences may arise from several pathways, including impairment of the immune system, sleep disturbances, substance abuse, and inadequate self-care [ 19 , 20 , 21 , 22 ].

Despite the severe impacts of floods, there currently is limited epidemiological evidence on the long-term mortality impacts of exposure to floods. To address these gaps in knowledge, we utilized the UK Biobank project, a population-based study with a large sample size, to explore the long-term effects of flooding on mortality. We aimed to estimate the risk of all-cause and seven cause-specific mortality associated with floods and to explore the lag patterns in mortality risk. We also conducted subgroup analyses to identify populations who are potentially more vulnerable to flood-related death.

Study design and study population

We conducted a nested case-control study within a cohort of participants registered with the UK Biobank study. About 0.5 million residents aged between 37 and 73 years were enrolled in the UK Biobank from 2006 to 2010, from 21 assessment centres across England, Wales, and Scotland. The cohort was followed up until the date of death or the study end date (December 31, 2020). We excluded participants lacking longitude and latitude data of residence ( n = 11), participants with missing data on age, sex, and ethnicity ( n = 2775), and those who died in the year of recruitment ( n = 141). A total of 499,487 participants were included (Fig 1 ). All participants in the UK Biobank study provided informed consent. The utilization of the data presented in this paper has been approved by the UK Biobank access committee under UK Biobank application number 55257.

figure 1

A flow diagram to show participants whose data were used to estimate the association between flooding exposure and mortality

Case-control selection

With the nested case-control design, we matched controls to cases with replacement at the time of the outcome event and assessed exposure retrospectively, from the date of death or end of follow-up. This ensures identical exposure lengths across participants. Using a risk-set sampling method, each case was matched with eight controls randomly selected from study participants who met the matching criteria for age (within 5 years), sex (male and female), and ethnicity (White, Black, Asian or Asian British, mixed, Chinese, and others). The index date for cases corresponded to the date of death; while for controls, it was the date of death of the matched case participant. For twelve case-control sets, eligible controls were less than eight but at least one (Fig 1 ).

Participants were eligible for inclusion as cases for the study if they died during the study period. We defined all-cause mortality and seven cause-specific mortality categories using the International Classification of Diseases, edition 10 (ICD-10), classification as follows: neoplasms, C00–D48; cardiovascular disease, I00–I99; respiratory diseases, J09–J98; digestive disease, K20–K93; neurodegenerative disease, F01–03, G122, G20, G21, G23, G30, G31; mental and behavioural disorders: F00–F90; and suicide: X60–X84, Y10–Y34, Y87.

Flooding exposure

We collected flood data during 2000–2020 from the Dartmouth Flood Observatory (DFO), which is a global catalogue of all flood events with detailed information on start date, end date, centroids, impacted geographic areas, and severities. All documented flood events were sourced from news, government, and instrumental sources and have been validated by satellite observations [ 23 , 24 ]. Participants whose home addresses fall within flood-affected areas were considered as having been exposed to a flood event. To assess the long-term effect of floods, we calculated a cumulative exposure during the study period for each participant. Building on previous research [ 25 , 26 ], we derived the annual cumulative exposure by multiplying the duration and severity of each flood event and summing these values for each year. Our preliminary analyses suggested a weak negative association between flood severity and duration (Pearson coefficient: − 0.03). The severity of each flood event documented in the DFO was classified based on a pre-defined scale, detailed in Additional file 1 : Table S1. For each participant, annual cumulative flooding exposure was calculated using equation ( 1 ):

where \(\mathrm{Flood}\;{\mathrm{index}}_{i,\text{year}=m}\) stands for the cumulative flooding exposure in year \(m\) for participants \(i\) . \({{\text{Duration}}}_{ij}\) and \({{\text{Severity}}}_{ij}\) represent the duration (day) and the severity of the \(j\) th flood event that participant \(i\) experienced in year \(m\) , respectively. If there were no flood events within a given year, a flood index of 0 was recorded.

Meteorological data

We extracted hourly temperature and relative humidity data from the European Centre for Medium-Range Weather Forecasts Reanalysis v5 (ERA-5) reanalysis data set with a spatial resolution of 0.1°×0.1°. We mapped meteorological data to the participant’s geocoded residential address at baseline. Daily meteorological data were calculated by averaging hourly data within each day. Daily temperature and relative humidity were then aggregated into yearly averages.

Baseline data collected by the UK Biobank include demographics, lifestyle factors, socioeconomic status, and anthropometric measurements. We included additional covariates informed by existing literature, beyond those used for matching cases and controls [ 27 , 28 , 29 ]: body mass index (BMI), physical activity, healthy diet score, cigarette smoking, alcohol consumption, educational attainment, average total annual household income before tax, Townsend deprivation index (TDI), overall health rating, and assessment centres. BMI was calculated from objectively measured weight and height as weight over height squared and expressed as kg/m 2 . Physical activity was derived from the International Physical Activity Questionnaire-Short Form (IPAQ-SF) [ 30 ]. Participants were categorized at ‘high’ (≥ 1500 metabolic equivalent (MET)-minutes/week), ‘moderate’ (≥ 600 MET-minutes/week), or ‘low’ levels of physical activity following standardized IPAQ-SF scoring guidance [ 30 ]. Diet score was calculated based on the following dietary factors: vegetable intake ≥ 3 servings/day; fruit intake ≥ 3 servings /day; whole grains ≥ 3 servings/day; refined grains ≤ 1.5 servings/day; fish intake ≥ 2 servings/day; unprocessed red meat intake ≤ 2 servings/week; and processed meat intake ≤ 2 servings/week. Each point was given for each favourable dietary factor, and the suboptimal diet was defined as a diet score < 4. Smoking status was coded into three categories: current, former, and never. Low-risk alcohol consumption was defined as moderate drinking (no more than one drink/day for women and two drinks/day for men; one drink is measured as 8 g ethanol in the UK) on a relatively regular frequency [ 31 ]. Educational attainment was coded in two categories: ‘high’ (college or university degree) or ‘low’ (A/AS levels or equivalent, O levels/GCSEs or equivalent, or none of the above). Annual household income was classified into two groups (< £31,000 and ≥ £31,000). TDI was utilized to define area deprivation level, with participants being classified as either high (TDI above the median) or low [ 32 ]. Self-reported health was categorized as poor, fair, good, and excellent [ 27 ].

Statistical analysis

We performed conditional logistic regression analysis to estimate the risk of mortality associated with per unit increase in flood index. Year-specific flood index was modelled using a distributed lag non-linear model featuring a non-linear exposure-response association and the additional lag-response association, respectively [ 33 , 34 , 35 , 36 ]. The lag-response association refers to how the risk changes over time and provides an estimation of the combined immediate and delayed effects that accumulate throughout the lag period. We first modelled the exposure-response curve with a natural cubic spline with three degrees of freedom. However, the nonlinear analysis indicated an approximately linear relationship (Additional file 1 : Fig. S1). Further, both the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) favoured the linear model (Additional file 1 : Table S2). Therefore, we applied a linear exposure-response relationship in the formal analysis. The lag-response curve was modelled with a natural cubic spline with three degrees of freedom plus an intercept. The exposure window comprised the 0 to 5 years before the index date. A maximum lag of 5 years was used because the flood-related mortality risk declined to zero by the lag year 5.

Estimates of risk were obtained from the crude model that only included flood (model 1); the multivariate model that additionally controlled for socioeconomic status (education attainment, household income, and deprivation) (model 2); and the full model that additionally adjusted for BMI, physical activity, smoking, alcohol consumption, suboptimal diet, overall health rating, mean temperature, mean relative humidity, and assessment centre which serves as an indicator of the recruitment location for each participant (model 3). All variance inflator factors were less than 1.5, indicating no multicollinearity. Temperature and relative humidity terms were defined as the average annual mean temperature and relative humidity over 6 years (lag 0–5 years) preceding the index date, respectively. Given that the crude model (model 1) did not include any covariates, all participants were retained in the analysis. For models 2 and 3, we excluded participants with any missing data. In sensitivity analyses, we employed multiple imputation to address missing covariate data and assess the robustness of our findings.

We further identified subgroups vulnerable to floods through stratification analyses by age group (≤64 and >65 years), sex, weight status defined according to BMI (≤ 24.9, 25–29.9, ≥ 30), education attainment, household income, physical activity, suboptimal diet, alcohol consumption status, smoking status, and area deprivation level. Results are presented as odds ratios (ORs) and their 95% confidence intervals (95% CIs) per unit increase in flood index. The significance of the difference in results between subgroups was tested using a random-effect meta-regression model.

Sensitivity analysis

We carried out the following sensitivity analyses: (1) Multiple imputation by chained equations was used for the missing values. Five imputed data sets were created, and their results were combined using Rubin’s rules [ 37 ]. (2) Alternative degrees of freedom were used for the lag-response association of flood. (3) Alternative degrees of freedom were used for the non-linear exposure-response relationship of mean temperature and relative humidity. (4) Alternative matching ratios (1:4 and 1:6) were used. (5) Excluding data after 2020 to control for the effect of the COVID-19 pandemic. (6) To capture the variation in flooding impacts within the year preceding mortality, we performed additional analyses with monthly flood index.

Table 1 shows the baseline characteristics of the 33,021 death cases and the 258,393 matched controls. The mean age (± standard deviation (SD)) of participants at study entry was 61.3 (± 6.4) years; 170,549 (58.5%) were male; 281,175 (96.5%) were white. Participants who died were more likely to have a higher BMI and lower household income; were less likely to be university graduates; more likely to smoke; and consumed less fruit and vegetables and more red and processed meat. They were also more likely to rate their overall health as poor and fair. Baseline characteristics of cases and controls with any missing values in covariates are shown in Additional file 1 : Table S3.

The distributions of the flood index and meteorological factors are shown in Table 2 . The annual average flood index across all participants during the study period ranged from 0.0 to 38.3, with a median value of 1.8 (25 th to 75 th percentiles: 0.5 to 3.6). Cases exposed to higher levels of flooding than controls during the 6 years before the end of follow-up (Additional file 1 : Fig. S2). The median annual mean temperature was 10.0 °C (25 th to 75 th percentiles: 9.3°C to 10.7°C) (Table 2 ). The flood index was negatively correlated with mean temperature (Pearson r = − 0.04) but positively correlated with relative humidity (Pearson r = 0.08).

Figure 2 illustrates the estimated cumulative OR of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5. Per unit increase in flood index was associated with a 9.2% increased risk of all-cause mortality (OR: 1.092, 95% CI: 1.090–1.093) in the crude model. The results remained similar after further adjustment for socio-economic status (OR: 1.090, 95% CI: 1.088–1.091), whereas adjustment for lifestyle factors decreased the strength of the association (OR for fully adjusted model: 1.067, 95% CI: 1.063–1.071). Similar effects were observed for cause-specific mortality after fully adjusting the models, whereby a greater flood index was associated with a greater risk of death from neurodegenerative diseases (OR: 1.068, 95% CI: 1.050–1.087), neoplasm (OR: 1.063, 95% CI: 1.058–1.068), respiratory diseases (OR: 1.062, 95% CI: 1.045–1.080), suicide (OR: 1.052, 95% CI: 1.018–1.088), cardiovascular diseases (OR: 1.051, 95% CI: 1.042–1.059), mental diseases (OR: 1.047, 95% CI: 1.008–1.087), and digestive diseases (OR: 1.031, 95% CI: 1.011–1.052) (Fig. 2, Additional file 1 : Table S4).

figure 2

Cumulative odds ratio of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5. Estimates of risk were obtained from the crude model that only included flood (crude); the multivariate model that additionally controlled for socioeconomic status (education attainment, household income, and deprivation); and the full model that additionally adjusted for BMI, physical activity, smoking, alcohol consumption, suboptimal diet, overall health rating, mean temperature, mean relative humidity, and assessment centre which serves as an indicator of the recruitment location for each participant (fully adjusted). The error bars represent 95% confidence intervals

Figure 3 shows the lag structure in the effects of flooding exposure on all-cause and cause-specific mortality. For all-cause mortality, the magnitude of associations increased from the current year (OR: 1.012, 95% CI: 1.011–1.013) to the lag year 3 (OR: 1.016, 95% CI: 1.015–1.017), and subsequently diminished to zero by lag year 5. For neurodegenerative mortality and mortality due to mental-ill health, the mortality risk was negligible in the current year, but strongest in the lag years 3 and 4. By contrast, the risk of mortality from suicide was the strongest in the current year (OR: 1.018, 95% CI: 1.008–1.028), and attenuated to lag year 5.

figure 3

Overall lag structure in effects of flooding exposure on cause-specific mortality. Shaded areas represent 95% confidence intervals for the odds ratio

Subgroup analyses revealed that participants with higher levels of education and household income had a higher estimated risk of death from most causes in association with flooding exposure. Participants aged below 64 and female had a higher estimated risk of death from all-cause mortality, respiratory diseases, and neoplasm, but a lower estimated risk of death from digestive and mental diseases, respectively. The risk of suicide-related mortality in association with flooding exposure was higher among participants who were obese, had lower household income, engaged in less physical activity, were non-moderate alcohol consumers, and had high deprivation levels (Table 3 ).

Our sensitivity analysis suggested that using multiple imputed data did not change study findings (Additional file 1 : Fig. S3). Our results were not dependent on modelling assumptions and remained unaffected by the COVID-19 pandemic (Additional file 1 : Fig. S4–7). Matching ratios of 1:4 and 1:6 revealed a modest increase in the odds ratio for all-cause mortality and neoplasms, while the odds ratio for other causes of death remained unchanged ( Additional file 1 : Fig. S8 ). For all-cause mortality, with per unit increase in monthly flood index, odds ratios over 0–12 months preceding the mortality ranged from 1.005 (95% CI: 1.005–1.006) to 1.018 (95% CI: 1.017–1.018) (Additional file 1 : Table S5).

In this nested case-control study, we observed a significantly increasing risk of mortality associated with floods. The exposure-response curve was linear, with no discernible thresholds. The lag pattern varied across different causes of death. Flooding exposure has a long-lasting impact on neurodegenerative and mental diseases, whereas it has an immediate impact on suicide. Subgroup analyses revealed specific groups of vulnerable populations for flood-related death, which varied according to the cause of death.

Every unit increase in flood index was associated with a 6.7% increase in all-cause mortality risk over the following 6 years. This finding was similar to cause-specific mortality. Although few epidemiological studies have assessed the long-term effect of floods on mortality, our findings are consistent with previous findings of short-term flooding exposure showing increased risk of cholera at lag 0–20 weeks [ 38 ], diarrhoea at lag 0–28 weeks [ 38 ], respiratory infection at lag 3 months [ 39 ], typhoid fever at lag 1 week [ 40 ], malaria at lag 1 year [ 12 ], malnutrition at lag 1 year [ 41 ], and mental disorders at lag 6 months [ 19 ]. One study assessing the effects of flooding on mortality in England and Wales during 1994–2005 suggested a deficit of deaths in the post-flood period [ 42 ]. The inconsistency might result from the underestimation of death number, which can occur when deaths are registered at different places after displacement and a short observation period (one year after flooding exposure) during which the occurrence of death has not been observed. Milojevic et al. reported a slight but non-significant increase in mortality rates following the floods in Bangladesh in the flooded areas compared to non-flooded areas [ 43 ]. The accuracy of their results might be subject to recall bias in exposure assessment, given that exposure to flooding was ascertained from an interview survey four years after the flood event.

The long-term health deterioration resulting from floods could be attributed to mental health disorders driven by financial losses and community or social disruption, especially for those who live in resource-poor countries and communities (e.g. floodplains or non-resistant buildings, lack of warning systems and awareness of flooding hazard) [ 1 , 44 ]. For example, previous studies reported a significant and continued increase in the prevalence of post-traumatic stress disorder (PTSD), stress, anxiety, depression, and even suicide ideation following flooding exposure [ 20 ], which contribute to worse health outcomes. Additionally, among people who were exposed to floods, those who had chronic medical conditions are at higher risk of health deterioration due to potential disruptions in medication and healthcare services. Previous studies have noted that older adults and those receiving long-term care services showed decreased treatment adherence (e.g. interruption of medication and access to physicians) months and even years after the flooding [ 45 , 46 ], which serve to exacerbate or prolong symptoms of existing conditions.

Flooding exposure has a long-lasting impact on neurodegenerative and mental diseases, reaching its peak at 3–4 years post-exposure whereas the highest risk of mortality due to suicide occurs in the year of exposure. This suggests that varying health issues should be given consideration, depending on the stage following flooding exposure. A study in Queensland has noted that direct exposure to flood resulted in an increase in alcohol and tobacco usage half a year after flooding [ 46 ] and use of substances has been associated with an increased risk of suicide attempts in previous studies [ 47 , 48 , 49 , 50 ]. However, social support and compensation coverage have been demonstrated to have had a positive impact on health [ 20 ], which helps reduce the risk of subsequent suicide attempts. By contrast, cognitive decline is more likely to occur 2 years later after natural disasters [ 51 , 52 ], resulting from the new onset of depression and disruption of social contacts (e.g. loss of interactions with neighbours) [ 51 ].

Our findings align with previous studies highlighting the vulnerability of cancer patients to disruptions in healthcare services following natural disasters. While limited evidence suggests an increased risk of disease exacerbation among cancer patients post-disaster, our primary concern is the potential for delays in receiving essential cancer care [ 14 , 18 ]. Natural disasters like floods can severely disrupt healthcare systems, leading to damage to oncology centres, loss of medical records, pharmaceutical shortages, displacement of healthcare workers, and disruptions in pathology specimen handling, all of which can compromise cancer patient care [ 13 , 53 , 54 , 55 , 56 ]. The relocation of cancer patients to temporary shelters can be particularly challenging and distressing, especially for those with clinical instability [ 53 ]. Additionally, initial recovery efforts following natural disasters often prioritize immediate needs such as providing shelter, food, water, and addressing injuries from environmental hazards, infectious diseases, or other acute conditions [ 57 ]. This prioritization of immediate needs may inadvertently overlook the continuity of care required for non-acute medical issues like cancer. Given the individualized and continuous nature of cancer treatment, neoplasms are particularly susceptible to the disruptions caused by natural disasters. Our study demonstrates the association between floods and elevated mortality risks among cancer patients, reinforcing the urgent need to prioritize the needs of cancer patients before, during, and after disasters [ 13 , 58 ].

Profiles of vulnerable populations to flood-related mortality varied across causes of death. Of all factors considered, socio-economic status, which is determined by individual levels of education and income, has been identified as a significant modifier of flood-related mortality impacts. Individuals with higher socioeconomic status tend to have an increased risk of flood-related mortality from chronic diseases (e.g. cardiovascular diseases, respiratory diseases, and neurodegenerative diseases) but decreased risk of flood-related mortality from suicide. Although there is very limited evidence that can elucidate this finding, some insights can be gathered from the following studies. It is reported that people in high socioeconomic groups are more likely to be affected by work-life conflict-induced mental illness due to their higher occupational aspirations but a greater discrepancy between aspirations and reality [ 59 , 60 ]. Flooding exposure may further amplify the disparity between an individual’s aspirations and their actual circumstances, resulting in a negative impact on their mental health. Long-lasting psychological illness has been associated with worse chronic medical conditions [ 61 ].

In our study, we observed that participants with higher BMI and lower physical activity levels exhibited a significantly higher risk of flood-related mortality from suicide, but a comparatively lower risk of all-cause mortality. These associations can be attributed to different factors. On the one hand, individuals in low socio-economic groups, those engaged in minimal physical activity, and non-moderate alcohol drinkers, are at higher risk of developing suicide ideation in a short time following psychological trauma associated with flooding exposure [ 49 , 62 ]. On the other hand, individuals with higher levels of physical activity may be prone to engage in risk-taking behaviours during flooding events, potentially leading to increased mortality rates [ 63 ]. These behaviours could involve actions such as entering floodwaters to cross a river or stream, safeguarding property and families (e.g. through activities like sandbagging homes and clearing drains), and participating in rescuing operations [ 63 ]. Surprisingly, current smokers demonstrated a decreased risk of flood-related mortality from all-cause deaths and neoplasms. We acknowledge that residual confounding, raising from unmeasured factors at follow-up, might contribute to these associations. Nevertheless, it is important to note that our study represents the first report of a higher mortality risk after long-term exposure to floods, highlighting the need for further investigations to validate this finding and explore potential underlying reasons.

Based on our research, flooding exposure is responsible for advancing a substantial number of deaths, with the impact persisting for up to 6 years. Our findings suggest that preventive interventions should be implemented peri- and post-flooding periods to reduce avoidable deaths due to flooding exposure. Following a flooding event, there is an increased risk of suicide within the first year. Therefore, timely provision of coping support and stress management is crucial to avert psychological illness, particularly among individuals in low socio-economic groups, those engaged in less physical activity, and non-moderate alcohol drinkers. In long-term rehabilitation, more resources should be allocated towards addressing the chronic medical conditions of populations that have been exposed to flooding, especially neurological well-being. It is also crucial to pay attention to the high-income population, although further research is needed to elucidate the underlying mechanisms behind their greater mortality risks associated with flooding exposure.

The limitations merit consideration. Our participants were residents in the UK who were more likely to live in less socioeconomically deprived areas, therefore, our results may not be generalizable to a whole population, especially people in low- and middle-income countries. Like most of cohort studies, covariates were collected at enrolment in the biobank. Due to the limited information on behavioural changes after the baseline examinations, we are unable to exclude the effect of behavioural changes on the risk estimates. However, most of the covariates (e.g. socio-economic status) were considered as effect modifiers rather than confounders, therefore, any changes in these factors should not have a substantial impact on our estimates. While the flood index accounts for cumulative exposure, it does not yet capture the potential differential impacts of distinct flood phases (warning, event, post-event) on mortality. Further research with a short-term design would be helpful to investigate the impacts of distinct flood phases. The destructive power of floods can differ based on factors like terrain, altitude, water management, drainage, urbanization, and building design. Therefore, a single severity label for an entire flood event may not fully capture the nuances of varying local experiences. Further research is needed to refine our exposure assessment as more detailed data becomes available. Lastly, it is likely for people to move after flooding exposure, with or without moving back into their homes. We assumed that participants did not move, which may have underestimated the effect of floods if an individual moved from an area with a high risk of flooding to an area with a lower risk of flooding. However, exposure to flooding can still have a long-term impact on them due to potential property damage and financial loss, even if people relocate to areas with a low risk of flooding in the aftermath of the event.

In conclusion, this study provides robust epidemiological evidence for associations of long-term exposure to flooding with increased risk of mortality. The health consequences of flooding exposure can vary across different periods after the event. These findings contribute to a better understanding of the long-term impacts of flooding exposure and can help improve public health practices to reduce the disease burden associated with floods.

Availability of data and materials

Data used in this study are available through registration on the UK Biobank.

Abbreviations

Akaike Information Criterion

Bayesian Information Criterion

Body mass index

Confidence interval    

Coronavirus disease 2019

Dartmouth Flood Observatory

European Centre for Medium-Range Weather Forecasts Reanalysis v5

International Classification of Diseases, edition 10

International Physical Activity Questionnaire-Short Form

Metabolic equivalent minutes per week

Post-traumatic stress disorder

Standard deviation

Townsend deprivation index

Lee J, Perera D, Glickman T, Taing LN. Water-related disasters and their health impacts: a global review. Prog Disaster Sci. 2020;8:100123.

Article   Google Scholar  

Wahlstrom M, Guha-Sapir D. The human cost of weather-related disasters 1995–2015. Geneva: UNISDR. 2015.

Wahlstrom M, Guha-Sapir D. The human cost of natural disasters: a global perspective. Geneva: UNISDR. 2015.

Rubinato M, Nichols A, Peng Y, Zhang JM, Lashford C, Cai YP, et al. Urban and river flooding: Comparison of flood risk management approaches in the UK and China and an assessment of future knowledge needs. Water Sci Eng. 2019;12(4):274–83.

Pitt M. Learning Lessons From the 2007 Floods. An Independent Review by Sir Michael Pitt. Cabinet Office, London. 2008.

Pregnolato M, Ford A, Wilkinson SM, Dawson RJ. The impact of flooding on road transport: A depth-disruption function. Transp Res D Transp Environ. 2017;55:67–81.

Thorne C. Geographies of UK flooding in 2013/4. Geogr J. 2014;180(4):297–309.

Marsh T, Kirby C, Muchan K, Barker L, Henderson E, Hannaford J. The winter floods of 2015/2016 in the UK-a review: UK:  NERC/Centre for Ecology and Hydrology; 2016.

Wallemacq P, Below R, McClean D. Economic losses, poverty and disasters: 1998-2017: Geneva: United Nations Office for Disaster Risk Reduction; 2018.

Sayers P, Horritt M, Carr S, Kay A, Mauz J, Lamb R, et al. Third UK Climate Change Risk Assessment (CCRA3): Future flood risk. UK: Research undertaken by Sayers and Partners for the Committee on Climate Change; 2020.

Jonkman SN, Kelman I. An analysis of the causes and circumstances of flood disaster deaths. Disasters. 2005;29(1):75–97.

Article   PubMed   Google Scholar  

Boyce R, Reyes R, Matte M, Ntaro M, Mulogo E, Metlay JP, et al. Severe Flooding and Malaria Transmission in the Western Ugandan Highlands: Implications for Disease Control in an Era of Global Climate Change. J Infect Dis. 2016;214(9):1403–10.

Article   PubMed   PubMed Central   Google Scholar  

Man RX, Lack DA, Wyatt CE, Murray V. The effect of natural disasters on cancer care: a systematic review. Lancet Oncol. 2018;19(9):e482–99.

Ryan B, Franklin RC, Burkle FM Jr, Aitken P, Smith E, Watt K, et al. Identifying and Describing the Impact of Cyclone, Storm and Flood Related Disasters on Treatment Management, Care and Exacerbations of Non-communicable Diseases and the Implications for Public Health. PLoS Curr. 2015;7:21–50.

Google Scholar  

McKinney N, Houser C, Meyer-Arendt K. Direct and indirect mortality in Florida during the 2004 hurricane season. Int J Biometeorol. 2011;55(4):533–46.

Rath B, Young EA, Harris A, Perrin K, Bronfin DR, Ratarh R, et al. Adverse Respiratory Symptoms and Environmental Exposures Among Children and Adolescents Following Hurricane Katrina. Public Health Rep. 2011;126(6):853–60.

Buajaroen H. Management of health care services for flood victims: The case of the shelter at Nakhon Pathom Rajabhat University Central Thailand. Australas Emerg Nurs. 2013;16(3):116–22.

Loehn B, Pou AM, Nuss DW, Tenney J, McWhorter A, DiLeo M, et al. Factors affecting access to head and neck cancer care after a natural disaster: a post-Hurricane Katrina survey. Head Neck. 2011;33(1):37–44.

Alderman K, Turner LR, Tong S. Assessment of the health impacts of the 2011 summer floods in Brisbane. Disaster Med Public Health Prep. 2013;7(4):380–6.

Zhong S, Yang L, Toloo S, Wang Z, Tong S, Sun X, et al. The long-term physical and psychological health impacts of flooding: A systematic mapping. Sci Total Environ. 2018;626:165–94.

Article   CAS   PubMed   Google Scholar  

Trief PM, Ouimette P, Wade M, Shanahan P, Weinstock RS. Post-traumatic stress disorder and diabetes: co-morbidity and outcomes in a male veterans sample. J Behav Med. 2006;29(5):411–8.

Sharpe I, Davison CM. Climate change, climate-related disasters and mental disorder in low- and middle-income countries: a scoping review. BMJ Open. 2021;11(10):e051908.

Carozza DA, Boudreault M. A global flood risk modeling framework built with climate models and machine learning. J Adv Model Earth Syst. 2021;13(4):e2020MS002221.

Yang Z, Huang W, McKenzie JE, Xu R, Yu P, Ye T, et al. Mortality risks associated with floods in 761 communities worldwide: time series study. BMJ. 2023;383:e075081.

Steenland K, Seals R, Klein M, Jinot J, Kahn HD. Risk estimation with epidemiologic data when response attenuates at high-exposure levels. Environ Health Perspect. 2011;119(6):831–7.

White E, Hunt JR, Casso D. Exposure measurement in cohort studies: the challenges of prospective data collection. Epidemiol Rev. 1998;20(1):43–56.

Ganna A, Ingelsson E. 5 year mortality predictors in 498,103 UK Biobank participants: a prospective population-based study. Lancet. 2015;386(9993):533–40.

Wang M, Zhou T, Song Q, Ma H, Hu Y, Heianza Y, et al. Ambient air pollution, healthy diet and vegetable intakes, and mortality: a prospective UK Biobank study. Int J Epidemiol. 2022;51(4):1243–53.

Wang M, Zhou T, Song Y, Li X, Ma H, Hu Y, et al. Joint exposure to various ambient air pollutants and incident heart failure: a prospective analysis in UK Biobank. European heart journal. 2021;42(16):1582–91.

Article   CAS   PubMed   PubMed Central   Google Scholar  

IPAQ Research Committee. Guidelines for data processing and analysis of the International Physical Activity Questionnaire (IPAQ)-short and long forms. https://www.physio-pedia.com/images/c/c7/Quidelines_for_interpreting_the_IPAQ.pdf (accessed January 20, 2024).

Zhang YB, Chen C, Pan XF, Guo J, Li Y, Franco OH, et al. Associations of healthy lifestyle and socioeconomic status with mortality and incident cardiovascular disease: two prospective cohort studies. BMJ. 2021;373:n604.

Chen H, Cao Y, Ma Y, Xu W, Zong G, Yuan C. Age- and sex-specific modifiable risk factor profiles of dementia: evidence from the UK Biobank. Eur J Epidemiol. 2023;38(1):83–93.

Jung CR, Chung WT, Chen WT, Lee RY, Hwang BF. Long-term exposure to traffic-related air pollution and systemic lupus erythematosus in Taiwan: A cohort study. Sci Total Environ. 2019;668:342–9.

Gasparrini A. Modeling exposure-lag-response associations with distributed lag non-linear models. Stat Med. 2014;33(5):881–99.

Li Y-Z, Huang S-H, Shi S, Chen W-X, Wei Y-F, Zou B-J, et al. Association of long-term particulate matter exposure with all-cause mortality among patients with ovarian cancer: A prospective cohort. Sci Total Environ. 2023;884:163748.

Kriit HK, Andersson EM, Carlsen HK, Andersson N, Ljungman PL, Pershagen G, et al. Using distributed lag non-linear models to estimate exposure lag-response associations between long-term air pollution exposure and incidence of cardiovascular disease. Int J Env Res Pub He. 2022;19(5):2630.

Article   CAS   Google Scholar  

Rubin DB. Multiple imputation for nonresponse in surveys: Canada: Wiley; 2004.

Hashizume M, Wagatsuma Y, Faruque ASG, Hayashi T, Hunter PR, Armstrong B, et al. Factors determining vulnerability to diarrhoea during and after severe floods in Bangladesh. J Water Health. 2008;6(3):323–32.

Saulnier DD, Hanson C, Ir P, Alvesson HM, von Schreeb J. The Effect of Seasonal Floods on Health: Analysis of Six Years of National Health Data and Flood Maps. Int J Env Res Pub He. 2018;15(4):665–77.

Liu ZD, Lao JH, Zhang Y, Liu YY, Zhang J, Wang H, et al. Association between floods and typhoid fever in Yongzhou, China: Effects and vulnerable groups. Environ Res. 2018;167:718–24.

Rodriguez-Llanes JM, Ranjan-Dash S, Mukhopadhyay A, Guha-Sapir D. Flood-Exposure Is Associated with Higher Prevalence of Child Undernutrition in Rural Eastern India. Int J Env Res Pub He. 2016;13(2):210.

Milojevic A, Armstrong B, Kovats S, Butler B, Hayes E, Leonardi G, et al. Long-term effects of flooding on mortality in England and Wales, 1994–2005: controlled interrupted time-series analysis. Environ Health. 2011;10(1):11–9.

Milojevic A, Armstrong B, Hashizume M, McAllister K, Faruque A, Yunus M, et al. Health Effects of Flooding in Rural Bangladesh. Epidemiology. 2012;23(1):107–15.

Ahern M, Kovats RS, Wilkinson P, Few R, Matthies F. Global health impacts of floods: Epidemiologic evidence. Epidemiol Rev. 2005;27:36–46.

Tomio J, Sato H, Mizumura H. Interruption of Medication among Outpatients with Chronic Conditions after a Flood. Prehospital Disaster. 2010;25(1):42–50.

Turner LR, Alderman K, Huang CR, Tong SL. Impact of the 2011 Queensland floods on the use of tobacco, alcohol and medication. Aust Nz J Publ Heal. 2013;37(4):396.

Orri M, Seguin JR, Castellanos-Ryan N, Tremblay RE, Cote SM, Turecki G, et al. A genetically informed study on the association of cannabis, alcohol, and tobacco smoking with suicide attempt. Mol Psychiatr. 2021;26(9):5061–70.

Berlin I, Hakes JK, Hu MC, Covey LS. Tobacco Use and Suicide Attempt: Longitudinal Analysis with Retrospective Reports. Plos One. 2015;10(4):e0122607.

Darvishi N, Farhadi M, Haghtalab T, Poorolajal J. Alcohol-Related Risk of Suicidal Ideation, Suicide Attempt, and Completed Suicide: A Meta-Analysis. Plos One. 2015;10(5):e0126870.

Gobbi G, Atkin T, Zytynski T. Association of Cannabis Use in Adolescence and Risk of Depression, Anxiety, and Suicidality in Young Adulthood: A Systematic Review and Meta-analysis. Jama Psychiat. 2019;76(4):426–34.

Hikichi H, Aida J, Kondo K, Tsuboya T, Matsuyama Y, Subramanian SV, et al. Increased risk of dementia in the aftermath of the 2011 Great East Japan Earthquake and Tsunami. P Natl Acad Sci USA. 2016;113(45):E6911–8.

Akanuma K, Nakamura K, Meguro K, Chiba M, Gutierrez Ubeda SR, Kumai K, et al. Disturbed social recognition and impaired risk judgement in older residents with mild cognitive impairment after the Great East Japan Earthquake of 2011: the Tome Project. Psychogeriatrics. 2016;16(6):349–54.

Morikawa N, Yanagisawa S, Iwasashi H, Tabata T, Abe T, Nakamura R, et al. Cancer patients in the hospital damaged by the Japan earthquake and tsunami. J Clin Oncol. 2012;30(15):e19508-e.

Cancer Ullman K, Disasters Care During Natural. J Natl Cancer I. 2011;103(24):1819–20.

Arrieta MI, Foreman RD, Crook ED, Icenogle ML. Providing Continuity of Care for Chronic Diseases in the Aftermath of Katrina: From Field Experience to Policy Recommendations. Disaster Med Public. 2009;3(3):174–82.

Rosenthal E. How the Cancer Community Fared During Hurricane Sandy's Mid-Atlantic Sweep. Oncology Times. 2012;34(23):8-10.

Prasad AS, Francescutti LH. Natural disasters. International encyclopedia of public health. 2017;5:215-22.

Aitsi-Selmi A, Murray V. Protecting the Health and Well-being of Populations from Disasters: Health and Health Care in The Sendai Framework for Disaster Risk Reduction 2015–2030. Prehospital Disaster. 2016;31(1):74–8.

Cochran DB, Wang EW, Stevenson SJ, Johnson LE, Crews C. Adolescent Occupational Aspirations: Test of Gottfredson’s Theory of Circumscription and Compromise. Career Dev Q. 2011;59(5):412–27.

Kim YM, Cho SI. Socioeconomic status, work-life conflict, and mental health. Am J Ind Med. 2020;63(8):703–12.

Patten SB, Williams JVA, Lavorato DH, Modgill G, Jette N, Eliasziw M. Major depression as a risk factor for chronic disease incidence: longitudinal analyses in a general population cohort. Gen Hosp Psychiat. 2008;30(5):407–13.

Vancampfort D, Hallgren M, Firth J, Rosenbaum S, Schuch FB, Mugisha J, et al. Physical activity and suicidal ideation: A systematic review and meta-analysis. J Affect Disorders. 2018;225:438–48.

Hamilton K, Demant D, Peden AE, Hagger MS. A systematic review of human behaviour in and around floodwater. Int J Disaster Risk Reduction. 2020;47:101561.

Download references

Acknowledgements

We gratefully thank all the participants in the UK Biobank, and everyone involved in planning and conducting the study.

Australian Research Council grant DP210102076

Australian National Health and Medical Research Council grant GNT2000581)

China Scholarship Council grant 202006010044 (YW)

China Scholarship Council grant 202006010043 (BW)

China Scholarship Council grant 201906210065 (PY)

National Health and Medical Research Council grant GNT2009866 (SL)

National Health and. Medical Research Council grant GNT1163693 (YG)

National Health and Medical Research Council grant GNT2008813 (YG)

Author information

Hong Liu, Shanshan Li, and Yuming Guo are co-senior authors.

Authors and Affiliations

School of Public Health and Preventive Medicine, Monash University, Level 2, 553 St Kilda Road, Melbourne, VIC, 3004, Australia

Yao Wu, Danijela Gasevic, Bo Wen, Zhengyu Yang & Pei Yu

Department of Dermatology, Xiangya Hospital, Central South University, Changsha, 410008, Hunan, China

Guowei Zhou, Yan Zhang, Hong Liu, Shanshan Li & Yuming Guo

Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Melbourne, VIC, 3800, Australia

Jiangning Song

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization: YG, JS, HL, SL, YW. Methodology: YW, DG. Data collection: BW, ZY, GZ, YZ, ZY

Visualization: YW, BW. Supervision: YG, HL, SL. Writing—original draft: YW. Writing—review and editing: DG, BW, ZY, PY. All authors read and approved the final manuscript.

Authors’ Twitter handles

Yuming Guo: @YumingGuo007

Yao Wu: @Yaoyaowu13

Corresponding author

Correspondence to Yuming Guo .

Ethics declarations

Ethics approval and consent to participate.

UK Biobank has ethical approval from the North West Multi-Centre Research Ethics Committee (reference 16/NW/0274).

Consent for publication

Consent for publication was obtained from all included participants.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:.

Table S1. Definitions of different severities of flood events. Table S2. The Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) of nonlinear and linear models. Table S3. Baseline characteristics of cases and matched controls enrolled in UK Biobank, including missing values. Table S4. Cumulative odds ratios of cause-specific mortality associated with per unit increase in flood index over lag years 0–5. Table S5. Cumulative odds ratios of all-cause mortality associated with per unit increase in monthly flood index over lag month 0–12. Figure S1. Nonlinear curves of the associations between flood index and all-cause and cause-specific mortality. Figure S2. Cumulative flood index of cases and controls during the six years before the date of death or the end of the follow-up. Figure S3. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 using complete data after multiple imputation. Figure S4. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 using different degrees of freedom for lag-response association of flood index. Figure S5. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 using different degrees of freedom for mean temperature. Figure S6. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 using different degrees of freedom for relative humidity. Figure S7. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 after excluding deaths after 2020. Figure S8. Cumulative odds ratios of all-cause and cause-specific mortality associated with per unit increase in flood index over lag years 0–5 using different matching ratios.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Wu, Y., Gasevic, D., Wen, B. et al. Floods and cause-specific mortality in the UK: a nested case-control study. BMC Med 22 , 188 (2024). https://doi.org/10.1186/s12916-024-03412-0

Download citation

Received : 27 November 2023

Accepted : 29 April 2024

Published : 07 May 2024

DOI : https://doi.org/10.1186/s12916-024-03412-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural disaster

BMC Medicine

ISSN: 1741-7015

nested case control study selection bias

IMAGES

  1. PPT

    nested case control study selection bias

  2. bias in nested case control study

    nested case control study selection bias

  3. Nested case control study design

    nested case control study selection bias

  4. case control study how to select controls

    nested case control study selection bias

  5. NESTED CASE CONTROL STUDY

    nested case control study selection bias

  6. The flowchart of the nested case-control study

    nested case control study selection bias

VIDEO

  1. Selection Bias Example 2

  2. 22 January 2024

  3. Part 1: Presentation

  4. الحلقه 38 : Study Design 6 (Case Control study & Cohort study)

  5. Week 8 : CASE CONTROL STUDY

  6. Mastering Intention to Treat, Per Protocol, and As Treated Analysis

COMMENTS

  1. Nested case-control studies: advantages and disadvantages

    a) The nested case-control study is a retrospective design. b) The study design minimised selection bias compared with a case-control study. c) Recall bias was minimised compared with a case-control study. d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism.

  2. Bias in full cohort and nested case-control studies?

    Fundamentally, a properly executed case-control study nested in a cohort is valid if the corresponding analysis of the full cohort is valid. The mathematics of the likelihoods are the same for both, 5 as Langholz and Richardson 1 point out, and the same software procedures work for both. The only salient difference between the two designs is ...

  3. A Practical Overview of Case-Control Studies in Clinical Practice

    The main advantages of a nested case-control study are as follows: (1) cost reduction and effort minimization, as only a fraction of the parent cohort requires the necessary outcome assessment; (2) reduced selection bias, as both case and control subjects are sampled from the same population; and (3) flexibility in analysis by allowing testing ...

  4. Methodologic considerations in the design and analysis of nested case

    The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study.

  5. Are nested case-control studies biased?

    It has been recently asserted that the nested case-control study design, in which case-control sets are sampled from cohort risk sets, can introduce bias ("study design bias") when there are lagged exposures. The bases for this claim include a theoretic and an "empirical evaluation" argument. Both of these arguments are examined and ...

  6. Analysis of Nested Case-Control Study Designs: Revisiting the Inverse

    The bias is defined by the difference between the estimates from the nested case-control sample and the full cohort estimate. For the stratified model, I used histology ( x 1 ) as the only covariate, and tumor stage ( x 2 ), and the age at baseline ( x 3 ) as the stratification factors to allow different baseline hazard functions.

  7. Adjusting for selection bias in retrospective, case-control studies

    Selection bias typically arises when the selection criteria are associated with the risk factor under investigation. We develop a method which produces bias-adjusted estimates for the odds ratio. Our method hinges on 2 conditions. The first is that a variable that separates the risk factor from the selection criteria can be identified.

  8. A flexible matching strategy for matched nested case-control studies

    As noted in prior nested case-control simulation studies, on average, all control selection algorithms produced a small upwardly biased estimate of the association between exposure and prostate cancer incidence ... A simulation study of relative efficiency and bias in the nested case-control study design. Epidemiol Methods, 2 (1) (2013), pp. 85-93.

  9. Application of the matched nested case-control design to the secondary

    A nested case-control study is an efficient design that can be embedded within an existing cohort study or randomised trial. It has a number of advantages compared to the conventional case-control design, and has the potential to answer important research questions using untapped prospectively collected data. We demonstrate the utility of the matched nested case-control design by applying it ...

  10. Bias in Full Cohort and Nested Case-Control Studies? : Epidemiology

    The simulations by Deubner et al appear to show bias in nested case-control studies with lagged measures of exposure. Each step of the simulation seems reasonable. Simulated case-control studies assign case and control status to members of the cohort, preserving their age and work history information. In each simulation, the authors randomly ...

  11. Are nested case-control studies biased?

    It has been recently asserted that the nested case-control study design, in which case-control sets are sampled from cohort risk sets, can introduce bias ("study design bias") when there are lagged exposures. The bases for this claim include a theoretical and an "empirical evaluation" argument. We examined both of these arguments and found them ...

  12. Assessing bias in case-control studies. Proper selection of cases and

    Assessing bias in case-control studies. Proper selection of cases and controls. K Sutton-Tyrrell; ... a nested case-control study, ... Errboe M and Baelum V (2007) Selection bias in case‐control studies on periodontitis: a systematic review, European Journal of Oral Sciences, 10.1111/j.1600-0722.2007.00476.x, 115:5, ...

  13. Potential self-selection bias in a nested case-control study on indoor

    logical studies [2-4] . Only a few studies on selection bias have been reported from nested case-control studies with good background data on responders and non-responders [5,6]. This study is part of an investigation of the impact of indoor environmental factors on the prevalence of asthma and allergy among children in Sweden

  14. Are Nested Case-Control Studies Biased? : Epidemiology

    The nested case-control design is well established as an epidemiologic study design. Nonetheless, a number of articles and letters have appeared recently asserting that the nested case-control study design is susceptible to a form of study design bias. 1-5 Given the theoretical understanding of the validity of the standard nested case-control design, in which case-control sets consist of the ...

  15. PDF Incidence Density Sampling for Nested Case-Control Study Designs

    reduction in costs, data collection efforts, and analysis compared to a full study cohort approach. The nested case-control study achieves all this with a relatively minor loss in statistical efficiency. The nested case-control study minimises selection bias and recall bias (cases and controls may recall past exposure differently) in the study.

  16. 8 Selection Bias in Case-Control Studies

    For many years, there was the misperception that case-control studies were structurally inferior to cohort designs, but as the conceptual basis for the design became more fully understood (Miettinen, 1985), it was recognized that case-control studies just reflect an efficient sampling rather than a census of the source population.While there are distinctive challenges in implementing case ...

  17. Nested case-control studies: advantages and disadvantages

    Answers. Statements a,b, andcare true, whereasdis false. The aim of the study was to investigate whether prescription of. antipsychotic drugs was associated with venous. thromboembolism. A nested ...

  18. Combining high quality data with rigorous methods: emulation of a

    Emulating a target trial reduces the potential for bias in observational comparative effectiveness research. Owing to feasibility constraints, large cohort studies often use electronic health records without validating key variables or collecting additional data. A case-control design allows researchers to validate, supplement, or collect additional data on key measurements in a much smaller ...

  19. Case-control matching: effects, misconceptions, and recommendations

    Bias introduced by case-control matching is an intentional selection bias. Over the past two decades, a consensus has emerged in epidemiology that causal reasoning, with the help of directed acyclic graphs, has improved our understanding of confounding and its control [5-10].When confounding is defined by characteristic structures among causal relationships in the source population, the ...

  20. Potential self-selection bias in a nested case-control study on indoor

    In a second step 2,156 of the children were invited to participate in a case-control study. Of these, 198 cases and 202 controls were finally selected. For identifying potential selection bias, information concerning all invited families in the case-control study was obtained from the baseline questionnaire. Results show that there are several ...

  21. Case-Control Studies

    Risk Set Sampling: In the nested case-control study a control would be selected from the population at risk at the point in time when a case was diagnosed. The Rare Outcome Assumption. ... The key to avoiding selection bias is to select the controls by a similar, if not identical, mechanism in order to ensure that the controls provide an ...

  22. Nested case-control study

    A nested case-control (NCC) study is a variation of a case-control study in which cases and controls are drawn from the population in a fully enumerated cohort. [1] Usually, the exposure of interest is only measured among the cases and the selected controls. Thus the nested case-control study is more efficient than the full cohort design.

  23. Control Selection Bias in a Case-Control Study

    Control Selection Bias in a Case-Control Study. The Key to Understanding Selection Bias. In all of the mechanisms that result in selection bias, there is over or under representation of one or more of the exposure-disease categories, i.e., one or more of the cells in the contingency table over or under represents what is true in the source ...

  24. Floods and cause-specific mortality in the UK: a nested case-control study

    Study design and study population. We conducted a nested case-control study within a cohort of participants registered with the UK Biobank study. About 0.5 million residents aged between 37 and 73 years were enrolled in the UK Biobank from 2006 to 2010, from 21 assessment centres across England, Wales, and Scotland.