- Open access
- Published: 01 November 2023

In-hospital fall prediction using machine learning algorithms and the Morse fall scale in patients with acute stroke: a nested case-control study
- Jun Hwa Choi 1 , 2 ,
- Eun Suk Choi 1 , 3 &
- Dougho Park 4 , 5
BMC Medical Informatics and Decision Making volume 23 , Article number: 246 ( 2023 ) Cite this article
182 Accesses
Metrics details
Falls are one of the most common accidents in medical institutions, which can threaten the safety of inpatients and negatively affect their prognosis. Herein, we developed a machine learning (ML) model for fall prediction in patients with acute stroke and compared its accuracy with that of the existing fall risk prediction tool, the Morse Fall Scale (MFS).
This is a retrospective nested case-control study. The initial sample size was 8462 admitted to a single cerebrovascular specialty hospital with acute stroke. A total of 156 fall events occurred, and each fall case was randomly matched with six control cases. Six ML algorithms were used, namely, regularized logistic regression, support vector machine, naïve Bayes (NB), k-nearest neighbors, random forest, and extreme-gradient boosting (XGB).
We included 156 in the fall group and 934 in the non-fall group. The mean ages of the fall and non-fall groups were 68.3 (± 12.2) and 65.3 (± 12.9) years old, respectively. The MFS total score was significantly higher in the fall group (54.3 ± 18.3) than in the non-fall group (37.7 ± 14.7). The area under the receiver operating curve (AUROC) of the MFS in predicting falls was 0.76 (0.73–0.79). XGB had the highest AUROC of 0.85 (0.78–0.92), and XGB and NB had the highest F1 score of 0.44.
Conclusions
The AUROC values of all of ML algorithms were similar to those of the MFS in predicting fall risk in patients with acute stroke, allowing for accurate and efficient fall screening.
Peer Review reports
In-hospital falls are among the most common patient safety incidents in healthcare facilities [ 1 ]. They can increase the length of hospital stay, incur additional healthcare costs, and can even lead to legal disputes between the healthcare providers and patients [ 2 ]. In their multi-center study, Morello et al. [ 3 ] found that in-hospital falls cause an average of eight additional days of stay in hospital and incur an average additional cost of $6669. Furthermore, they negatively affect patients and their families because of increased time and financial burden [ 4 ]. The incidence of falls is particularly high among those with cerebrovascular diseases due to impaired postural stability, decreased sensory function, and motor deficits [ 5 ]. Previous studies have reported high post-stroke fall rates, with 1.8–14% of patients with stroke experiencing falls during hospitalization [ 6 , 7 ].
Medical staff must assess fall risk based on a patient’s characteristics to effectively predict their probability [ 8 ]. There have been several fall risk assessing tools, such as the St. Thomas Risk Assessment Tool [ 9 ], Hendrich II Fall Risk Model [ 10 ], Johns Hopkins Fall Risk Assessment Revised Tool [ 11 ], and Morse Fall Scale (MFS) [ 12 ]. These tools have been developed by determining and categorizing the fall risk factors. However, sufficient staff and time are required to complete these evaluations and they do not sufficiently reflect the characteristics of those patients with potential risks [ 13 , 14 ]. Among them, the MFS is the most widely utilized tool for assessing the risk of falls in South Korea [ 13 ]; it consists of six items, including fall history, secondary diagnosis, use of assistive devices, intravenous or heparin cap, gait, and self-insight related to gait disorders [ 12 , 15 ]. The MFS has been validated in several studies and is considered a reliable tool for measuring fall risk [ 16 , 17 , 18 ]. However, it has limitations in predicting fall risk factors in uncooperative patients. Therefore, reflecting the characteristics of a patient’s clinical situation in fall prediction and supplementing the drawbacks or limitations of fall risk assessment tools used in clinical practice are needed to improve fall prediction [ 19 , 20 ]. Various factors affect the likelihood of falls and, in medical institutions, which treat patients with severe disease, a predictive model for disease-specific fall risk factors is essential. Nevertheless, fall risk screening tools are not sufficient to prevent in-hospital falls [ 14 ].
In recent years, there has been a rapid increase in medical research based on machine learning (ML) [ 21 ]. It is primarily used for implementing prediction models; however, its scope is expanding to include the classification of disease severity [ 22 ], medical decision-making [ 23 ], and application of newly developed therapeutic interventions [ 24 ]. An advantage of ML-based models is their ability to predict a patient’s prognosis or progress in a specific situation based on data from the electronic health records (EHRs) [ 25 ]. ML is able to integrate clinical information in a meaningful manner, providing medical staff with comprehensive information for ensuring fully informed medical decision-making [ 26 ]. Previous studies have shown that ML-based algorithms can produce results equivalent to, or better than, those produced by traditional tools if sufficient data and appropriate algorithms are used [ 27 , 28 ]. However, to the best of our knowledge, there have been no studies till date presenting an ML-based fall prediction model in hospitalized patients with acute stroke.
This study aimed for the following: (1) to develop an ML model for prediction of in-hospital fall risk among patients with acute stroke; (2) to compare the ML model’s predictive performance with that of the existing fall risk assessment tool, the MFS.
Data source and patient inclusion
This retrospective study utilized EHRs to identify patients who were admitted to a single cerebrovascular specialty hospital between January 2016 and June 2022 with a primary diagnosis of acute stroke, as defined by the International Classification of Diseases-10 codes I60–I63. We initially identified 8462 patients. During this period, 156 fall events occurred (1.84%). If a significant difference in frequency was found between the fall event group and the control group, a retrospective nested case-control study was performed using random sampling methods, which were frequently used in previous related studies [ 20 , 29 ]. Each fall case was randomly matched with six control groups (n = 936), with matching performed based on admission in the same quarter and ward. Cases with missing values were excluded from the study (Fig. 1 ). For the robustness of the statistical analysis, if there were two or more fall events in one admission, the first fall was used as the index.

Flowchart of patient inclusion
This study design was reviewed and approved by the Institutional Review Board of Pohang Stroke and Spine Hospital (Approval No. PSSH0475-202108-HR-016-04). The informed consent was waived by the Institutional Review Board of Pohang Stroke and Spine Hospital due to the retrospective nature of this study and anonymity of the database. The study was conducted following the principles of the Declaration of Helsinki.
Study variables
Evaluation indicators assessed during the initial hospitalization were used as the main variables to predict falls. Age and sex were identified as basic information. Body mass index (BMI), haemoglobin level, and albumin level were checked to reflect the patient’s nutritional status. Stroke subtypes were classified as subarachnoid hemorrhage (I60), intracerebral hemorrhage (I61 and I62), or ischemic stroke (I63). Finally, the National Institutes of Health Stroke Scale (NIHSS) score was assessed to identify stroke severity.
As factors reflecting the patient’s status at admission, the differences in admission route, admission method, and ward type were assessed. As a result, the admission route was divided into emergency room and outpatient admission, and the admission method was classified into walking, wheelchair, and bedridden. In addition, the initial admission ward was classified into general ward, integrated nursing care service (INCS), and special care units – intensive care unit (ICU) and stroke unit (SU).
Socioeconomic factors were divided into the medical insurance type and residence area. The medical insurance type was classified into medical aid and national health insurance coverage. According to the Korean administrative distinct, the residential area was divided into the “dong” and “ eup/myeon .” Accompanying diseases such as hypertension, diabetes, dyslipidemia, arrhythmia, cardiovascular diseases, osteoporosis, degenerative spinal disease, and neurodegenerative brain disease were assessed (Table S1 ). The prescribed drugs were checked with the standard drug code name and the Anatomical Therapeutic Chemical Classification System developed by the World Health Organization. In the fall group, drugs administered on the day of the fall event were included, and in the control group, drugs administered at the time of admission were included. Medications were categorized into antidepressants, anxiolytics, antipsychotics, antiepileptic drugs, and diuretics, and patients were classified as those without medication history in the category, those taking only one type of medication, and those taking multiple classes of medications.
Finally, our ML models were compared with the existing fall risk prediction tool, the MFS, which was evaluated by skilled nursing staff at the time of patient admission. The total MFS scores, a routine assessment of fall screening in the setting of this study, was used to predict falls. The list of all variables used in the predictive model is summarized in Table S2 .
Statistical analysis
This study used the R software version 4.3.0 (R Core Team, R Foundation for Statistical Computing, Vienna, Austria) for all statistical and ML analyses. Continuous variables were presented as mean ± standard deviation, and categorical variables were presented as frequencies (percentages). For comparison between the fall and non-fall groups, independent t-tests were performed for continuous variables, and chi-square (trend) tests were performed for categorical variables. P -values of < 0.05 were considered statistically significant. Univariable binary logistic regression models were applied to evaluate the predictive power of the MFS, and the area under the receiver operating characteristic curve (AUROC) was calculated and compared with other ML models.
To investigate the relationship between fall occurrence and variables, a binary logistic regression model was established. Variable selection was performed using stepwise backward elimination, and the Akaike information criterion was used as an estimator of multivariable model fitness. The measurement of multicollinearity was conducted using the criterion of sqrt (variable inflation factor) > 2.
Data pre-processing and ML process
Data pre-processing was performed for the ML prediction model. First, variables with low frequency and those showing multicollinearity were identified. For continuous variables, centering and scaling were performed. One-hot encoding was applied to convert categorical variables into numeric variables. Data were randomly divided into training and validation data at a 2:1 ratio. To balance the dependent variable, training data were oversampled using the synthetic minority oversampling technique. Six ML algorithms were used for the ML process, namely, regularized logistic regression (RLR), support vector machine (SVM), naïve Bayes (NB), k-nearest neighbors (KNN), random forest (RF), and extreme-gradient boosting (XGB). For internal validation, 10-fold cross-validation was repeated 50 times using training data. Hyperparameter tuning was conducted using a combination of random and grid searches (Table S3). To assess their prediction performance in terms of AUROC, F1 score, sensitivity, specificity, positive predictive value, and negative predictive value, the optimal trained models for each algorithm were applied to the validation data. Finally, feature importance was measured for the RLR, RF, and XGB models (Fig. 2 ). The “caret” package in R software was used for the ML process [ 30 ]. The entire code for this study is provided in the online supplementary materials.

Frame of machine learning prediction for this study
Baseline characteristics and the Morse fall scale
In our final analysis, there were 156 and 934 patients in the fall and non-fall groups, respectively. Table 1 summarizes the baseline characteristics of the patients in the fall and non-fall groups. The features of the in-hospital falls recorded are summarized in Table 2 .
The mean MFS score was significantly higher in the fall group (54.3 ± 18.3) than in the non-fall group (37.7 ± 14.7). The AUROC of the MFS in predicting falls was 0.76 (0.73–0.79), and the sensitivity and specificity were 0.72 (0.65–0.79) and 0.74 (0.71–0.77), respectively. The cutoff value for predicting falls using the mean MFS score was 42.50 points.
Stepwise logistic regression model
Table 3 presents the final binary logistic regression model after stepwise backward elimination. Figure 3 shows the distribution of adjusted odds ratios (aOR) and 95% confidence intervals (CI) for each variable. The type of ward was significantly associated with a lower risk of falls in the INCS, whereas the ICU/SU was associated with a higher risk of falls. In addition, admission with wheelchair ambulation, diabetes, arrhythmia, degenerative spinal diseases, cerebral neurodegenerative diseases, and medications were significantly associated with a higher risk of falls. In comparison, dyslipidemia and alert mental status were significantly associated with a lower risk of falls.

Forest plot of the final stepwise logistic regression model for predicting in-hospital falls
ML prediction
Variables with zero variance, such as osteoporosis, cardiovascular disease, and degenerative spinal diseases, were excluded from the analysis. No evidence of multicollinearity was noted among the continuous variables with correlation coefficients of ≥ 0.7. The ratios of falls to non-falls in the training and validation datasets were 104:626 and 52:308, respectively. After applying the synthetic minority oversampling technique to the training dataset, the revised ratio of falls to non-falls became 624:626. Table S4 presents the confusion matrix for all prediction models.
Among the six ML algorithms, XGB had the highest AUROC of 0.85 (0.78–0.92), and XGB and NB had the highest F1 score of 0.44. The KNN-based prediction model had the highest sensitivity of 0.71 (0.58–0.82), whereas XGB had the highest sensitivity at 0.65 (0.46–0.81). RF showed the highest positive predictive value of 0.85 (0.58–0.96), and KNN showed the highest negative predictive value of 0.94 (0.90–0.96). All ML algorithms showed similar or slightly improved AUROC values compared with MFS (Table 4 ).
The NIHSS was the most important feature in predicting falls in all models, including RLR, RF, and XGB. Other variables such as age, BMI, albumin, and hemoglobin were also important predictors. Ward type was a significant variable for predicting falls. In addition, medications and arrhythmia were identified as the top five variables in the RLR model (Fig. 4 ).

Feature importance analysis revealing the top five most important variables in regularized logistic regression (RLR), random forest (RF), and extreme gradient boosting (XGB) algorithms. Stroke severity was the most important for predicting in-hospital falls
This study proposed ML-based models for predicting in-hospital falls in acute stroke using EHRs. The models demonstrated comparable performance to the MFS in predicting falls. Previous studies using ML to predict falls in hospitalized patients have reported valid results [ 31 , 32 , 33 ]. Wang et al. [ 34 ] reported a robust fall prediction with multi-view ensemble learning with missing values, and their model showed an AUROC of 0.81, which was similar to ours. Nakatani et al. [ 29 ] presented a natural language process-based inpatient fall prediction model using EHRs and reported an AUROC of 0.84, which was similar to ours. Our results show that disease-specific variables are essential predictors of falls in this patient group and can improve the accuracy of fall prediction. Furthermore, our findings suggest that ML algorithms can be tailored to specific healthcare settings and disease populations to develop more accurate prediction models. Such prediction models may be critical in reducing fall-related injuries and, ultimately, improving patient outcomes.
Moreover, developing and applying a fall prediction model using ML algorithms has clinical significance in improving the efficiency of medical staff. Nursing staff feel much stress and limitations when assessing and intervening for fall risk with assessment tools [ 35 ]. Furthermore, identifying fall risk factors based on the characteristics of each patient requires time and can become an excessive burden [ 36 ]. In actual clinical practice, it is difficult for nursing staff to search and find individual risk factors for falls for each patient and provide nursing care accordingly. To overcome these limitations, the use of ML algorithms to predict falls provides an easy and fast way to obtain accurate results. Therefore, this approach has significant clinical significance, enabling nursing staff to predict falls quickly and accurately and intervene accordingly, reducing fall occurrences.
One notable finding among the critical risk factors for falls in patients with acute stroke was the ward type, which was particularly important in INCS. Previous studies in South Korea have yielded inconsistent results regarding the relationship between fall rates and INCS, with some showing higher rates and others showing no significant difference [ 37 , 38 ]. The present study proved that INCS significantly reduced the risk of falls in patients hospitalized with acute stroke. Thus, the characteristics of patients with acute stroke, most of whom show varying degrees of neurological impairment, may have contributed to these results. In cerebrovascular specialty hospitals, INCS might have focused on fall prevention activities on such disease characteristics. However, more studies are needed to explore this relationship further.
BMI can reflect nutritional status [ 39 ], and our results that BMI was one of the critical variables to predict in-hospital falls in patients with acute stroke can indicate that falls may occur frequently in patients with low body weight or weakened physical motor function [ 40 ]. Among our results, albumin and hemoglobin levels were found to be important variables for fall risk. Previous studies have reported low albumin levels and anemia as risk factors for falls in patients hospitalized in the acute phase, and these could be equally applied to patients with acute stroke. Finally, socioeconomic status, a well-known risk factor, was found to be unrelated to the in-hospital falls in this study [ 41 , 42 ]. These results were attributed to the reason that this study was conducted in a single region and incorporated only patients with acute stroke. Therefore, we consider that disease characteristics may make a greater contribution to the risk of falls than socioeconomic characteristics.
Medication use, a well-known fall predictor, was another critical variable in our analysis. Previous studies have shown that medication use, including analgesics, sedatives, vasodilators, and muscle relaxants, is a significant risk factor for falls [ 43 , 44 ]. Further, polypharmacy increases fall risk [ 45 ]. This is particularly relevant for patients with acute stroke because they often have comorbid conditions and receive multiple medications, including central nervous system medications, sedatives, and narcotics, all associated with increased fall risk [ 46 ].
In the present study, ensemble models – RF and XGB – showed slightly higher AUROC values but generally lower sensitivity. Conversely, more classical ML algorithms such as the RLR and KNN showed decent AUROC values, along with balanced sensitivity and specificity. This can be attributed to the regularization and relatively simple classification methods overcoming overfitting better than the tree-based ensemble models in this dataset [ 47 ]. However, these results cannot be generalized, and more studies based on various databases are needed. Further, this model is intended for screening to prevent falls and is very cost-effective. However, the cost can be much greater once a fall event occurs. Therefore, even if the sensitivity is relatively low, their high specificity and negative predictive value can provide clues for nursing staff to select and focus on patients who need to focus more on fall-prevention activities during their hospitalization [ 48 ].
This study is the first to develop an ML-based fall prediction model for patients with acute stroke. We were able to present validated results of ML prediction by comparing them with the MFS, which is the most widely utilized existing fall prediction tool. Furthermore, using multiple ML algorithms for prediction made it possible to directly compare each model’s performance.
This study has several limitations. First, this was a single-center study, which may have limited generalizability. More studies using big data from multiple institutions are needed to verify the results and improve generalizability. Second, this retrospective study used EHRs, which might result in ambiguity in defining some variables. Third, the dataset only observed falls during hospitalization for acute stroke and did not provide long-term follow-up outcomes. Fourth, the timing drug information collection was different between groups. That is, in the fall group, when an event occurred, the medication list was identified with the index date, but in the non-fall group, it was identified with the admission date as the index date. This may have been a source of bias. Finally, despite various statistical adjustments, the outcome variable, in-hospital falls, has a highly imbalanced ground truth, making it difficult to establish causality.
In this study, the ML algorithms used for predicting in-hospital falls among patients with acute stroke showed valid results. Their prediction performance was not equivalent to that of the MFS and they can be readily applied and overcoming the disadvantages of the MFS at the same time. Furthermore, the ML models integrate initial clinical information in a meaningful direction to enable the construction of prediction models that can be used at the beginning of hospitalization. Therefore, the use of ML models for fall prediction is of great clinical significance in allowing medical staff to perform more accurate and efficient fall screening. Ultimately, this study provided cornerstone data for the practical use of the fall screening model of patients with acute stroke in real clinical settings base.
Data Availability
The dataset and entire code supporting the conclusions of this article are included within this article and its additional files.
Abbreviations
adjusted odds ratio
area under the receiver operating characteristic curve
body mass index
confidence intervals
electronic health records
intensive care unit
integrated nursing care service
k -nearest neighbors
Morse fall scale
machine learning
naïve Bayes
random forest
regularized logistic regression
stroke unit
support vector machine
extreme-gradient boosting
Schoberer D, Breimaier HE, Zuschnegg J, Findling T, Schaffer S, Archan T. Fall prevention in hospitals and nursing homes: clinical practice guideline. Worldviews Evid Based Nurs. 2022;19:86–93.
Article PubMed PubMed Central Google Scholar
Peel NM. Epidemiology of falls in older age. Can J Aging. 2011;30:7–19.
Article PubMed Google Scholar
Morello RT, Barker AL, Watts JJ, Haines T, Zavarsek SS, Hill KD, et al. The extra resource burden of in-hospital falls: a cost of falls study. Med J Aust. 2015;203:367.
Shiffman J. A social explanation for the rise and fall of global health issues. Bull World Health Organ. 2009;87:608–13.
Quigley PA. Redesigned fall and injury management of patients with Stroke. Stroke. 2016;47:e92–4.
Nyberg L, Gustafson Y. Fall prediction index for patients in Stroke rehabilitation. Stroke. 1997;28:716–21.
Article CAS PubMed Google Scholar
Tutuarima JA, van der Meulen JH, de Haan RJ, van Straten A, Limburg M. Risk factors for falls of hospitalized Stroke patients. Stroke. 1997;28:297–301.
Joint Commission. Preventing falls and fall-related injuries in health care facilities. Sentin Event Alert. 2015;55:1–5.
Google Scholar
Oliver D, Britton M, Seed P, Martin FC, Hopper AH. Development and evaluation of evidence based risk assessment tool (STRATIFY) to predict which elderly inpatients will fall: case-control and cohort studies. BMJ. 1997;315:1049–53.
Article CAS PubMed PubMed Central Google Scholar
Hendrich A, Nyhuis A, Kippenbrock T, Soja ME. Hospital falls: development of a predictive model for clinical practice. Appl Nurs Res. 1995;8:129–39.
Poe SS, Cvach M, Dawson PB, Straus H, Hill EE. The johns hopkins fall risk assessment tool: postimplementation evaluation. J Nurs Care Qual. 2007;22:293–8.
Jewell VD, Capistran K, Flecky K, Qi Y, Fellman S. Prediction of falls in acute care using the Morse fall risk scale. Occup Ther Health Care. 2020;34:307–19.
Choi EH, Ko MS, Yoo CS, Kim MK. Characteristics of fall events and fall risk factors among inpatients in general hospitals in Korea. J Korean Clin Nurs Res. 2017;23:350–60.
Morris R, O’Riordan S. Prevention of falls in hospital. Clin Med (Lond). 2017;17:360–2.
Baek S, Piao J, Jin Y, Lee SM. Validity of the Morse Fall Scale implemented in an electronic medical record system. J Clin Nurs. 2014;23:2434–40.
Chow SK, Lai CK, Wong TK, Suen LK, Kong SK, Chan CK, et al. Evaluation of the Morse fall scale: applicability in Chinese hospital populations. Int J Nurs Stud. 2007;44:556–65.
Kim KS, Kim JA, Choi YK, Kim YJ, Park MH, Kim HY, et al. A comparative study on the validity of fall risk assessment scales in Korean hospitals. Asian Nurs Res. 2011;5:28–37.
Article Google Scholar
Urbanetto JS, Pasa TS, Bittencout HR, Franz F, Rosa VP, Magnago TS. Analysis of risk prediction capability and validity of Morse fall scale Brazilian version. Rev Gaucha Enferm Brazilian Version. 2017;37:e62200.
Olsson E, Löfgren B, Gustafson Y, Nyberg L. Validation of a fall risk index in Stroke rehabilitation. J Stroke Cerebrovasc Dis. 2005;14:23–8.
Najafpour Z, Godarzi Z, Arab M, Yaseri M. Risk factors for falls in hospital in-patients: a prospective nested case control study. Int J Health Policy Manag. 2019;8:300–6.
Weissler EH, Naumann T, Andersson T, Ranganath R, Elemento O, Luo Y, et al. The role of machine learning in clinical research: transforming the future of evidence generation. Trials. 2021;22:537.
Park D, Kim BH, Lee SE, Kim DY, Kim M, Kwon HD, et al. Machine learning-based approach for Disease severity classification of carpal tunnel syndrome. Sci Rep. 2021;11:17464.
Sanchez-Martinez S, Camara O, Piella G, Cikes M, González-Ballester MÁ, Miron M, et al. Machine learning for clinical decision-making: challenges and opportunities in cardiovascular imaging. Front Cardiovasc Med. 2021;8:765693.
Lewanowicz A, Wiśniewski M, Oronowicz-Jaśkowiak W. The use of machine learning to support the therapeutic process - strengths and weaknesses. Postep Psychiatr Neurol. 2022;31:167–73.
PubMed Google Scholar
Adkins DE. Machine learning and electronic health records: a paradigm shift. Am J Psychiatry. 2017;174:93–4.
Cipriano LE. Evaluating the impact and potential impact of machine learning on medical decision making. Med Decis Making. 2023;43:147–9.
Park JH. Machine-learning algorithms based on screening tests for mild cognitive impairment. Am J Alzheimers Dis Other Demen. 2020;35:1533317520927163.
Park D, Jeong E, Kim H, Pyun HW, Kim H, Choi YJ et al. Machine learning-based three-month outcome prediction in acute ischemic Stroke: a single cerebrovascular-specialty hospital study in South Korea. Diagnostics (Basel) 2021;11.
Nakatani H, Nakao M, Uchiyama H, Toyoshiba H, Ochiai C. Predicting inpatient falls using natural language processing of nursing records obtained from Japanese electronic medical records: case-control study. JMIR Med Inform. 2020;8:e16970.
Kuhn M, CaReT. Classification and regression training. R package version 6. 0–90. 2021.
Lindberg DS, Prosperi M, Bjarnadottir RI, Thomas J, Crane M, Chen Z, et al. Identification of important factors in an inpatient fall risk prediction model to improve the quality of care using EHR and electronic administrative data: a machine-learning approach. Int J Med Inform. 2020;143:104272.
Patterson BW, Engstrom CJ, Sah V, Smith MA, Mendonça EA, Pulia MS, et al. Training and interpreting machine learning algorithms to evaluate fall risk after emergency department visits. Med Care. 2019;57:560–6.
Thapa R, Garikipati A, Shokouhi S, Hurtado M, Barnes G, Hoffman J, et al. Predicting falls in long-term care facilities: machine learning study. JMIR Aging. 2022;5:e35373.
Wang L, Xue Z, Ezeana CF, Puppala M, Chen S, Danforth RL, et al. Preventing inpatient falls with injuries using integrative machine learning prediction: a cohort study. NPJ Digit Med. 2019;2:127.
Brians LK, Alexander K, Grota P, Chen RW, Dumas V. The development of the RISK tool for fall prevention. Rehabil Nurs. 1991;16:67–9.
Dubbeldam R, Lee YY, Pennone J, Mochizuki L, Le Mouel C. Systematic review of candidate prognostic factors for falling in older adults identified from motion analysis of challenging walking tasks. Eur Rev Aging Phys Act. 2023;20:2.
Yoon S-J, Lee C-K, Jin I-S, Kang J-G. Incidence of falls and risk factors of falls in inpatients. Qual Improv Health Care. 2018;24:2–14.
Jung YA, Sung KM. A comparison of patients’ nursing service satisfaction, hospital commitment and revisit intention between general care unit and comprehensive nursing care unit. J Korean Acad Nurs Adm 2018;24.
Bechard LJ, Duggan C, Touger-Decker R, Parrott JS, Rothpletz-Puglia P, Byham-Gray L, et al. Nutritional status based on body mass index is associated with morbidity and mortality in mechanically ventilated critically ill children in the PICU. Crit Care Med. 2016;44:1530–7.
Yi SW, Kim YM, Won YJ, Kim SK, Kim SH. Association between body mass index and the risk of falls: a nationwide population-based study. Osteoporos Int. 2021;32:1071–8.
Kim T, Choi SD, Xiong S. Epidemiology of fall and its socioeconomic risk factors in community-dwelling Korean elderly. PLoS ONE. 2020;15:e0234787.
Mikos M, Trybulska A, Czerw A. Falls – The socio-economic and medical aspects important for developing prevention and treatment strategies. Ann Agric Environ Med. 2021;28:391–6.
Michalcova J, Vasut K, Airaksinen M, Bielakova K. Inclusion of medication-related fall risk in fall risk assessment tool in geriatric care units. BMC Geriatr. 2020;20:454.
de Jong MR, Van der Elst M, Hartholt KA. Drug-related falls in older patients: implicated Drugs, consequences, and possible prevention strategies. Ther Adv Drug Saf. 2013;4:147–54.
Ie K, Chou E, Boyce RD, Albert SM. Fall risk-increasing Drugs, polypharmacy, and falls among low-income community-dwelling older adults. Innov Aging. 2021;5:igab001.
Abdollahi M, Whitton N, Zand R, Dombovy M, Parnianpour M, Khalaf K, et al. A systematic review of fall risk factors in Stroke survivors: towards improved assessment platforms and protocols. Front Bioeng Biotechnol. 2022;10:910698.
Subramanian J, Simon R. Overfitting in prediction models - is it a problem only in high dimensions? Contemp Clin Trials. 2013;36:636–41.
Trevethan R. Sensitivity, specificity, and predictive values: foundations, pliabilities, and pitfalls in research and practice. Front Public Health. 2017;5:307.
Download references
Acknowledgements
The authors would like to thank all nurses in Pohang Stroke and Spine Hospital for their dedication and support for patients as well as this study.
Author information
Authors and affiliations.
College of Nursing, Kyungpook National University, 680 Gukchaebosang-ro, Jung-gu, Daegu, 41944, Republic of Korea
Jun Hwa Choi & Eun Suk Choi
Department of Quality Improvement, Pohang Stroke and Spine Hospital, Pohang, Republic of Korea
Jun Hwa Choi
Research Institute of Nursing Science, Kyungpook National University, Daegu, Republic of Korea
Eun Suk Choi
Medical Research Institute, Pohang Stroke and Spine Hospital, 352, Huimang-daero, Nam-gu, Pohang, 37659, Republic of Korea
Dougho Park
Department of Medical Science and Engineering, School of Convergence Science and Technology, Pohang University of Science and Technology, Pohang, Republic of Korea
You can also search for this author in PubMed Google Scholar
Contributions
JHC and DP conceptualized the study. DP contributed to the methodology as well. ESC validated the results and DP performed the formal analysis and contributed in data curation. JHC contributed in investigation and data curation. JHC and DP contributed in writing - original draft. ESC and DP contributed to writing - review and editing. JHC and DP also contributed to visualization, and ESC did the supervision. All authors have read and approved the final manuscript.
Corresponding authors
Correspondence to Eun Suk Choi or Dougho Park .
Ethics declarations
Competing interests.
The authors declare no competing interests.
Ethics approval and consent to participate
This study design was reviewed and approved by the Institutional Review Board (IRB) of Pohang Stroke and Spine Hospital (approval no. PSSH0475-202108-HR-016-04). The informed consent requirement was waived by the IRB owing to the retrospective nature of this study and anonymity of the database. The study was conducted following the principles of the Declaration of Helsinki.
Consent for publication
Not applicable.
Sponsor’s role
None reported.
Additional information
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Material 1
Supplementary material 2, rights and permissions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and Permissions
About this article
Cite this article.
Choi, J.H., Choi, E.S. & Park, D. In-hospital fall prediction using machine learning algorithms and the Morse fall scale in patients with acute stroke: a nested case-control study. BMC Med Inform Decis Mak 23 , 246 (2023). https://doi.org/10.1186/s12911-023-02330-0
Download citation
Received : 01 July 2023
Accepted : 09 October 2023
Published : 01 November 2023
DOI : https://doi.org/10.1186/s12911-023-02330-0
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Accidental falls
- Machine learning
- Risk assessment
BMC Medical Informatics and Decision Making
ISSN: 1472-6947
- Submission enquiries: [email protected]
- General enquiries: [email protected]
- Open access
- Published: 04 November 2023
Prediction of incidence of neurological disorders in HIV-infected persons in Taiwan: a nested case–control study
- Ya-Wei Weng 1 , 2 ,
- Susan Shin-Jung Lee 2 , 3 , 4 ,
- Hung-Chin Tsai 2 , 3 , 4 , 5 ,
- Chih-Hui Hsu 6 &
- Sheng-Hsiang Lin 1 , 6 , 7
BMC Infectious Diseases volume 23 , Article number: 759 ( 2023 ) Cite this article
103 Accesses
Metrics details
Neurological disorders are still prevalent in HIV-infected people. We aimed to determine the prevalence of neurological disorders and identify their risk factors in HIV-infected persons in Taiwan.
We identified 30,101 HIV-infected people between 2002 and 2016 from the National Health Insurance Research Database in Taiwan, and analyzed the incidence of neurological disorders. We applied a retrospective, nested case–control study design. The individuals with (case group) and without (control group) a neurological disorder were then matched by age, sex and time. Factors associated with neurological disorders were analyzed using a conditional logistic regression model, and a nomogram was generated to estimate the risk of developing a neurological disorder.
The incidence of neurological disorders was 13.67 per 1000 person-years. The incidence remained stable during the observation period despite the use of early treatment and more tolerable modern anti-retroviral therapy. The conditional logistic regression model identified nine clinical factors and comorbidities that were associated with neurological disorders, namely age, substance use, traumatic brain injury, psychiatric illness, HIV-associated opportunistic infections, frequency of emergency department visits, cART adherence, urbanization, and monthly income. These factors were used to establish the nomogram.
Neurological disorders are still prevalent in HIV-infected people in Taiwan. To efficiently identify those at risk, we established a nomogram with nine risk factors. This nomogram could prompt clinicians to initiate further evaluations and management of neurological disorders in this population.
Peer Review reports
Due to the widespread use of combination antiretroviral therapy (cART), the life expectancy of individuals infected with human immunodeficiency virus (HIV) has improved and even approaches that of the general population [ 1 ]. However, a gap remains in comorbidity-free years between HIV-infected individuals and the general population [ 2 ]. In addition to comorbidities including cardiovascular diseases, cancers, diabetes, dyslipidemia and chronic renal diseases, which are prevalent in people living with HIV (PLWH)[ 3 , 4 ], neuropsychiatric conditions are also common in PLWH [ 5 ]. The neurological complications of HIV are quite diverse, and in the early stages of infection can include meningitis, encephalitis and Bell's palsy. Late-stage symptoms include HIV-associated neurocognitive disorders, toxoplasma encephalitis, tuberculous meningitis, cryptococcal meningitis and neurosyphilis [ 6 ]. As with the other HIV-associated comorbidities, HIV-associated neurocognitive disorders are still prevalent in the modern cART era, with an overall prevalence rate of around 45% [ 7 , 8 ]. These disorders can affect the quality of life and contribute to mortality in PLWH [ 9 ]. The pattern of HIV-associated neurocognitive disorders has changed in the recent two decades [ 10 ], and the prevalence may be underestimated due to a lack of awareness [ 11 ].
HIV also affects the central nervous system early in infection [ 12 ], and blood–brain barrier disruption has been demonstrated early in the course of primary HIV infection [ 13 ]. Thus, central nervous system infection caused by primary HIV infection or other pathogens (virus, bacteria, fungi) is also a common neurological complication in HIV-infected patients. However, there are limited data about neurological disorders in PLWH in the Asia–Pacific region [ 14 , 15 ].
In Taiwan, cART has been provided free of charge since 1997. However, guidelines for the diagnosis and treatment of HIV/AIDS in Taiwan have recommended initiating cART according to different CD4 cell levels at different times: < 200 cells/mm 3 in 2006, < 350 cells/mm 3 in 2010, < 500 cells/mm 3 in 2013, and "treat all" since 2016. Improvement in treatment coverage for PLWH was also implemented in other countries due to new scientific evidence around HIV treatment during this period of time [ 16 ]. Several studies have reported that CD4 nadir and CD4 count are predictors of HIV neurological disorders in the era of modern cART [ 17 , 18 , 19 ]. Thus, there may have been dynamic changes or even improvements in neurological disorders in PLWH in Taiwan during this time.
Several clinical factors and comorbidities have been reported to contribute to cognitive impairment in PLWH, including advanced HIV disease [ 17 ], duration of HIV infection [ 20 , 21 ], obesity and diabetes [ 22 ], increased age [ 23 ], and hepatitis C infection [ 23 ]. In addition, alcohol use, substance abuse, traumatic brain injury, sleep disorders and psychiatric illnesses may also predispose to cognitive disorders in PLWH [ 24 ].
In the present study, we aimed to determine the dynamic changes in neurological disorders from 2002 to 2017, and to identify risk factors for neurological disorders in HIV-infected persons even under different treatment strategies in Taiwan.
Study population and study design
This was a retrospective, population-based, nested case–control study using clinical data retrieved from the Taiwan National Health Insurance Research Database (NHIRD). Patients with a diagnosis of HIV infection during the period from 1 January 2002 to 31 December 2016 were identified in the NHIRD. HIV infection is a notifiable disease in Taiwan and the cost of copayments for medical services for patients with HIV infection can be waived, and this can help to ensure the accuracy of the diagnosis of these patients.
Data source
By using the incidence of neurological disorders in HIV patients as the outcome variable, we excluded individuals with missing age or sex data and neurological disorders before the diagnosis of HIV infection. To estimate the effects of potential covariates on the risk of neurological disorders, a nested case–control study design with age, sex and time matching was applied in this study (Fig. 1 ). The primary outcome was the incidence of a first diagnosis of a neurological disorder after a diagnosis of HIV. Neurological disorders included neurocognitive disorders and central nervous system infections. The covariates were dyslipidemia, hepatitis C infection, substance use, alcoholism, traumatic brain injury, sleep apnea, sexually transmitted diseases, diabetes mellitus, psychiatric illnesses and HIV-associated opportunistic infections. These covariates were defined as the diagnoses recorded once or more during inpatient care or twice or more during ambulatory care within 1 year before the index date. Demographic profile (including sex, birth date, urbanization and monthly income), frequency of emergency department (ED) visits, and cART adherence were also extracted as covariates. The frequency of ED visits was analyzed because a previous study showed that ED visits were primarily driven by disease severity in people with HIV infection [ 25 ]. Adherence to cART was calculated as the proportion of days covered by dividing the number of days of ART coverage during the measurement period by the length of the measurement period [ 26 ]. Urbanization level was classified into urban, suburban and rural categories based on five aspects: population density, percentage of residents who were agricultural workers, the number of physicians per 100,000 people, percentage of residents with college or higher education, and percentage of residents aged 65 years or older [ 27 ].

Flow chart of the HIV cohort for evaluating the risk of neurological disorders
Diagnoses in the NHIRD are coded based on International Classification of Diseases, Ninth Edition (ICD-9) and Tenth Edition (ICD-10) codes. ICD-9 codes were used between 2002 and 2014, and ICD-10 codes were used between 2015 and 2017. The ICD-9 and ICD-10 codes for the outcomes and covariates are provided in the Supplementary Table 1 . The end of the observation period was defined as the occurrence of a neurological disorder, the end of 2017, or withdrawal from the National Health Insurance program.
This study was conducted after approval by the Institutional Review Board (IRB) of the National Cheng Kung University Hospital (B-EX-109–026). Since personal identification information is encrypted before releasing the data to researchers, informed consent was able to be waived from the IRB of the institute.
Statistical analysis
Incidence rates were expressed per 1000 prospective person-years of observation from 2002 through 2017. Continuous variables were compared using the Student's t test, and categorical variables were compared using the chi-square test or Fisher's exact test. Variables significantly associated with the risk of neurological disorders in univariate conditional logistic regression analysis were then selected to construct the final multivariate logistic regression model. All statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC). A p value < 0.05 was considered to be statistically significant.
A nomogram is a two-dimensional diagram used to represent a mathematical function involving several predictors [ 28 ]. The variables significantly associated with the risk of neurological disorders in the multivariate logistic regression analysis were used to generate a nomogram.
Demographic and clinical characteristics
A total of 30,101 HIV-infected people were identified from 2002 to 2016, of whom 24,239 were used for further matching. A total of 2132 (8.8%) individuals were diagnosed with neurological disorders during the follow-up period. Of the 2132 HIV-infected people with neurological disorders, 87.27% were male and the mean age (± standard deviation) at diagnosis was 38.5 ± 14.7 years. About 65.45% of individuals received cART therapy. Among these 2132 individuals, 1168 (54.8%) individuals have central nervous system infections, and 997 (46.8%) individuals have neurocognitive disorders. Half of the neurological disorders were identified before the initiation of cART. The proportion of central nervous system infections and neurocognitive disorders were quite similar before and after starting cART. The overall incidence of neurological disorders was 13.67 per 1000 person-years (Fig. 2 ). The incidence of central nervous system infections was 7.49 per 1000 person-years, and the incidence of neurocognitive disorders was 6.40 per 1000 person-years. The median time from the index date to a diagnosis of a neurological disorder was 3.6 years. The individuals with (case group) and without (control group) a neurological disorder were then matched by age, sex and time. The cases and controls were selected at a 1:4 ratio (Fig. 1 ). Table 1 shows the demographic and clinical characteristics of the case ( n = 1655) and control ( n = 6620) groups.

Incidence rate (per 1000 person-years) of neurological disorders among HIV-infected persons in Taiwan from 2002–2017
Factors associated with neurological disorders in the HIV-infected persons
Risk factors included in conditional logistic regression analysis were age at HIV diagnosis, dyslipidemia, hepatitis C infection, substance use, alcoholism, traumatic brain injury, sleep apnea, sexually transmitted diseases, diabetes mellitus, psychiatric illnesses, HIV-associated opportunistic infections, frequency of ED visits, cART adherence, urbanization level and monthly income. Odds ratios, adjusted odds ratios and their corresponding upper and lower 95% confidence intervals are presented in Table 2 . In the univariate analysis, older age, hepatitis C infection, substance use, alcoholism, traumatic brain injury, sexually transmitted diseases, psychiatric illnesses, HIV-associated opportunistic infections, frequency of ED visits, cART adherence, urbanization and monthly income were associated with neurological disorders. Dyslipidemia, sleep apnea and diabetes were not associated with neurological disorders. In the multivariate analysis, hepatitis C infection, alcoholism and sexually transmitted diseases were no longer significant. Due to concerns about confounding by age, we then performed subgroup analyses of only younger subjects (arbitrarily defined as less than 40 years of age) and only older subjects (40 years or older). The results are shown in Table 3 .
According to the multivariate analysis results, a nomogram was generated to estimate the risk of developing a neurological disorder as shown in Fig. 3 . By summing the risk score for each factor as shown in the nomogram, the risk of developing a neurological disorder for each individual can be assessed.

Nomogram for predicting the development of neurological disorders in HIV-infected persons
In this retrospective nested case–control study, we found several risk factors for neurological disorders in HIV-infected people and then developed a simple risk scoring system to identify those at risk. To the best of our knowledge, this scoring system is the first to be specifically designed for identifying neurological disorders in people infected with HIV. Several clinical factors and comorbidities have been reported to be associated with neurological disorders in HIV-infected people, including the frequency of ED visits [ 29 ], cART adherence [ 30 , 31 ], advanced HIV disease [ 17 ], duration of HIV infection [ 20 , 21 ], and older age [ 23 ]. Comorbidities including obesity, diabetes [ 22 ], hepatitis C infection [ 23 ], alcohol use, substance abuse, traumatic brain injury, sleep disorders and psychiatric illnesses [ 24 ] have also been associated with neurological disorders in HIV-infected people. The large number of factors which can contribute to the development of neurological disorders in this population makes it more complex to predict. Through the proposed nomogram with some basic clinical information, clinicians can identify those at risk and initiate further screening for comorbidities, drug compliance education, or even cognitive function evaluations. This nomogram may serve as a screening tool for identifying risk populations.
Educational attainment [ 32 ], tobacco use [ 33 ], and cART regimen [ 34 , 35 ] can also influence neurocognitive function. Since educational attainment is closely related to the level of income [ 36 , 37 ] and monthly income could be extracted from the NHIRD, we used monthly income as a covariate instead of educational attainment as data on educational attainment are not available in the NHIRD. However, more research is needed to evaluate whether adding more parameters (clinical factors and/or biomarkers) could better predict the development of neurological complications in HIV-infected people.
The incidence of neurological disorders in HIV-infected persons was stable from 2006 to 2017 (13.67 per 1000 person-years) even though early treatment and even a "treat all" policy was applied during this period and more tolerable modern cART was used. This finding is consistent with previous studies in which neurological complications were still prevalent in HIV-infected persons due to it being neuroinvasive, neurotropic and neurovirulent [ 38 , 39 ]. Thus, neurological manifestations are an important concern among people with HIV infection.
In the subgroup analyses of only younger subjects and only older subjects, substance use was significantly associated with neurological disorders in the younger subjects(adjusted HR = 1.45, p = 0.003), but not in the older subjects(adjusted HR = 1.01, p = 0.963). This may be because substance use is typically higher in adolescents and young adults, and the neurological complications of substance use can occur in both acute and early HIV infection [ 40 ]. This should raise awareness of neurological disorders in young HIV-infected people with substance use disorders.
The key strength of this study is the application of a nationwide database to identify predictors of neurological disorders. The high coverage, easy accessibility, and low copayments result in high adherence of beneficiaries to the National Health Insurance program, which minimizes potential selection and information biases.
Some limitations should also be addressed. First, some risk factors for neurological disorders such as low CD4 cell count, high blood viral load, low educational attainment, tobacco use and cART regimen are not included in the NHIRD and could not be incorporated into the scoring system. Both CD4 cell count and blood viral load are important predictors of outcomes in HIV-infected persons [ 17 , 41 ]. In addition, we used HIV-associated opportunistic infections as a proxy for advanced HIV status. Second, the diagnosis of neurological disorders and comorbidities depended on claims data from the NHIRD, and physicians who cared for these patients were not neurologists, which may have led to underestimation of the proportion of neurological disorders. Third, cART adherence was calculated by the proportion of days covered, and the actual adherence rate may have been lower, especially in those with neurological disorders [ 42 , 43 ].
In conclusion, neurological disorders are still prevalent in HIV-infected persons. To efficiently identify those at risk, we established a nomogram with nine risk factors. This nomogram could prompt clinicians to initiate further evaluations and management of neurological disorders.
Availability of data and materials
The de-linked datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. The data are not publicly available because the use of the National Health Insurance Research Database is limited to research purposes only.
Abbreviations
Combination antiretroviral therapy
Emergency department
Human immunodeficiency virus
International Classification of Diseases, Ninth/Tenth Edition
National Health Insurance Research Database
People living with HIV
Antiretroviral Therapy Cohort C. Survival of HIV-positive patients starting antiretroviral therapy between 1996 and 2013: a collaborative analysis of cohort studies. Lancet HIV. 2017;4(8):e349–56.
Article Google Scholar
Increased overall life expectancy but not comorbidityfree years for people with HIV. https://www.croiconference.org/wp-content/uploads/sites/2/resources/2020/program-information/croi2020-program-and-information-guide.pdf .
Pourcher V, Gourmelen J, Bureau I, Bouee S. Comorbidities in people living with HIV: an epidemiologic and economic analysis using a claims database in France. PLoS One. 2020;15(12):e0243529.
Article CAS PubMed PubMed Central Google Scholar
Roomaney RA, van Wyk B, Pillay-van Wyk V: Aging with HIV: Increased Risk of HIV Comorbidities in Older Adults. Int J Environ Res Public Health 2022, 19(4).
Owe-Larsson B, Sall L, Salamon E, Allgulander C. HIV infection and psychiatric illness. Afr J Psychiatry (Johannesbg). 2009;12(2):115–28.
CAS PubMed Google Scholar
Thakur KT, Boubour A, Saylor D, Das M, Bearden DR, Birbeck GL. Global HIV neurology: a comprehensive review. AIDS. 2019;33(2):163–84.
Article PubMed Google Scholar
Wang Y, Liu M, Lu Q, Farrell M, Lappin JM, Shi J, Lu L, Bao Y. Global prevalence and burden of HIV-associated neurocognitive disorder: a meta-analysis. Neurology. 2020;95(19):e2610–21.
Article CAS PubMed Google Scholar
Wei J, Hou J, Su B, Jiang T, Guo C, Wang W, Zhang Y, Chang B, Wu H, Zhang T. The prevalence of frascati-criteria-based HIV-Associated Neurocognitive Disorder (HAND) in HIV-Infected Adults: a systematic review and meta-analysis. Front Neurol. 2020;11:581346.
Article PubMed PubMed Central Google Scholar
Banerjee N, McIntosh RC, Ironson G. Impaired neurocognitive performance and mortality in HIV: assessing the prognostic value of the HIV-dementia scale. AIDS Behav. 2019;23(12):3482–92.
Heaton RK, Franklin DR, Ellis RJ, McCutchan JA, Letendre SL, Leblanc S, Corkran SH, Duarte NA, Clifford DB, Woods SP, et al. HIV-associated neurocognitive disorders before and during the era of combination antiretroviral therapy: differences in rates, nature, and predictors. J Neurovirol. 2011;17(1):3–16.
Ian E, Gwen CL, Soo CT, Melissa C, Chun-Kai H, Eosu K, Hyo-Youl K, Asad K, Scott L, Chung-Ki LP, et al. The burden of HIV-associated neurocognitive disorder (HAND) in the Asia-Pacific region and recommendations for screening. Asian J Psychiatr. 2016;22:182–9.
Valcour V, Chalermchai T, Sailasuta N, Marovich M, Lerdlum S, Suttichom D, Suwanwela NC, Jagodzinski L, Michael N, Spudich S, et al. Central nervous system viral invasion and inflammation during acute HIV infection. J Infect Dis. 2012;206(2):275–82.
Rahimy E, Li FY, Hagberg L, Fuchs D, Robertson K, Meyerhoff DJ, Zetterberg H, Price RW, Gisslen M, Spudich S. Blood-Brain Barrier Disruption Is Initiated During Primary HIV Infection and Not Rapidly Altered by Antiretroviral Therapy. J Infect Dis. 2017;215(7):1132–40.
Chan FCC, Chan P, Chan I, Chan A, Tang THC, Lam W, Fong WC, Lee MP, Li P, Chan GHF. Cognitive screening in treatment-naive HIV-infected individuals in Hong Kong - a single center study. BMC Infect Dis. 2019;19(1):156.
Cysique LA, Letendre SL, Ake C, Jin H, Franklin DR, Gupta S, Shi C, Yu X, Wu Z, Abramson IS, et al. Incidence and nature of cognitive decline over 1 year among HIV-infected former plasma donors in China. AIDS. 2010;24(7):983–90.
Gupta S, Williams B, Montaner J. Realizing the potential of treatment as prevention: global ART policy and treatment coverage. Curr HIV/AIDS Rep. 2014;11(4):479–86.
Ellis RJ, Badiee J, Vaida F, Letendre S, Heaton RK, Clifford D, Collier AC, Gelman B, McArthur J, Morgello S, et al. CD4 nadir is a predictor of HIV neurocognitive impairment in the era of combination antiretroviral therapy. AIDS. 2011;25(14):1747–51.
Walker KA, Brown GG. HIV-associated executive dysfunction in the era of modern antiretroviral therapy: a systematic review and meta-analysis. J Clin Exp Neuropsychol. 2018;40(4):357–76.
Yusuf AJ, Hassan A, Mamman AI, Muktar HM, Suleiman AM, Baiyewu O. Prevalence of HIV-Associated Neurocognitive Disorder (HAND) among Patients Attending a Tertiary Health Facility in Northern Nigeria. J Int Assoc Provid AIDS Care. 2017;16(1):48–55.
Wright EJ, Grund B, Cysique LA, Robertson KR, Brew BJ, Collins G, Shlay JC, Winston A, Read TR, Price RW, et al. Factors associated with neurocognitive test performance at baseline: a substudy of the INSIGHT Strategic Timing of AntiRetroviral Treatment (START) trial. HIV Med. 2015;16(Suppl 1):97–108.
McCutchan JA, Marquie-Beck JA, Fitzsimons CA, Letendre SL, Ellis RJ, Heaton RK, Wolfson T, Rosario D, Alexander TJ, Marra C, et al. Role of obesity, metabolic variables, and diabetes in HIV-associated neurocognitive disorder. Neurology. 2012;78(7):485–92.
McCombe JA, Vivithanaporn P, Gill MJ, Power C. Predictors of symptomatic HIV-associated neurocognitive disorders in universal health care. HIV Med. 2013;14(2):99–107.
Fabbiani M, Ciccarelli N, Castelli V, Soria A, Borghetti A, Colella E, Moschese D, Valsecchi M, Emiliozzi A, Gori A, et al. Hepatitis C virus-related factors associated WITH cognitive performance in HIV-HCV-coinfected patients. J Neurovirol. 2019;25(6):866–73.
Winston A, Spudich S. Cognitive disorders in people living with HIV. Lancet HIV. 2020;7(7):e504–13.
Pezzin LE, Fleishman JA. Is outpatient care associated with lower use of inpatient and emergency care? An analysis of persons with HIV disease. Acad Emerg Med. 2003;10(11):1228–38.
Nau DPJS, VA: Pharmacy Quality Alliance: Proportion of days covered (PDC) as a preferred method of measuring medication adherence. 2012, 6:25.
Chieh Yu Liu, Yunh Tai Hung, Yi Li Chuang, Yi Ju Chen, Wen Shun Weng, Jih Shin Liu, Kung Yee Liang: Incorporating Development Stratification of Taiwan Townships into Sampling Design of Large Scale Health Interview Survey. 2006, 4(1):1-22.
Kattan MW. Nomograms are superior to staging and risk grouping systems for identifying high-risk patients: preoperative application in prostate cancer. Curr Opin Urol. 2003;13(2):111–6.
Tsai YT, Chen YC, Hsieh CY, Ko WC, Ko NY. Incidence of neurological disorders among HIV-infected individuals with universal health care in Taiwan from 2000 to 2010. J Acquir Immune Defic Syndr. 2017;75(5):509–16.
Wright MJ, Woo E, Foley J, Ettenhofer ML, Cottingham ME, Gooding AL, Jang J, Kim MS, Castellon SA, Miller EN, et al. Antiretroviral adherence and the nature of HIV-associated verbal memory impairment. J Neuropsychiatry Clin Neurosci. 2011;23(3):324–31.
Obermeit LC, Morgan EE, Casaletto KB, Grant I, Woods SP. Group HIVNRP: antiretroviral non-adherence is associated with a retrieval profile of deficits in verbal episodic memory. Clin Neuropsychol. 2015;29(2):197–213.
Lovden M, Fratiglioni L, Glymour MM, Lindenberger U, Tucker-Drob EM. Education and cognitive functioning across the life span. Psychol Sci Public Interest. 2020;21(1):6–41.
Kariuki W, Manuel JI, Kariuki N, Tuchman E, O’Neal J, Lalanne GA. HIV and smoking: associated risks and prevention strategies. HIV AIDS (Auckl). 2016;8:17–36.
PubMed Google Scholar
Ma Q, Vaida F, Wong J, Sanders CA, Kao YT, Croteau D, Clifford DB, Collier AC, Gelman BB, Marra CM, et al. Long-term efavirenz use is associated with worse neurocognitive functioning in HIV-infected patients. J Neurovirol. 2016;22(2):170–8.
Mollan KR, Smurzynski M, Eron JJ, Daar ES, Campbell TB, Sax PE, Gulick RM, Na L, O’Keefe L, Robertson KR, et al. Association between efavirenz as initial therapy for HIV-1 infection and increased risk for suicidal ideation or attempted or completed suicide: an analysis of trial data. Ann Intern Med. 2014;161(1):1–10.
The relationship between education, income, economic freedom and happiness. https://www.shs-conferences.org/articles/shsconf/pdf/2020/03/shsconf_ichtml_2020_03004.pdf .
Carlson R, McChesney CJTE: Income sustainability through educational attainment. 2015, 4(1).
Uwishema O, Ayoub G, Badri R, Onyeaka H, Berjaoui C, Karabulut E, Anis H, Sammour C, Mohammed Yagoub FEA, Chalhoub E. Neurological disorders in HIV: Hope despite challenges. Immun Inflamm Dis. 2022;10(3):e591.
Kranick SM, Nath A. Neurologic complications of HIV-1 infection and its treatment in the era of antiretroviral therapy. Continuum (Minneap Minn). 2012;18(6 Infectious Disease):1319–37.
Weber E, Morgan EE, Iudicello JE, Blackstone K, Grant I, Ellis RJ, Letendre SL, Little S, Morris S, Smith DM, et al. Substance use is a risk factor for neurocognitive deficits and neuropsychiatric distress in acute and early HIV infection. J Neurovirol. 2013;19(1):65–74.
Simpson DM, Haidich AB, Schifitto G, Yiannoutsos CT, Geraci AP, McArthur JC, Katzenstein DA. team As: Severity of HIV-associated neuropathy is associated with plasma HIV-1 RNA levels. AIDS. 2002;16(3):407–12.
Hinkin CH, Castellon SA, Durvasula RS, Hardy DJ, Lam MN, Mason KI, Thrasher D, Goetz MB, Stefaniak M. Medication adherence among HIV+ adults: effects of cognitive dysfunction and regimen complexity. Neurology. 2002;59(12):1944–50.
Weikum D. Neurocognitive Impairment Impacts Hiv Medication Adherence. 2016.
Google Scholar
Download references
Acknowledgements
We are grateful to all research assistants for providing the statistical consulting services from the Biostatistics Consulting Center, Clinical Medicine Research Center, National Cheng Kung University Hospital.
This work was supported by Kaohsiung Veterans General Hospital (KSVGH110-D08-1 to YWW) and Veterans Affairs Council, Republic of China (VAC112-001).
Author information
Authors and affiliations.
Institute of Clinical Medicine, College of Medicine, National Cheng Kung University, Tainan, Taiwan
Ya-Wei Weng & Sheng-Hsiang Lin
Division of Infectious Disease, Department of Internal Medicine, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
Ya-Wei Weng, Susan Shin-Jung Lee & Hung-Chin Tsai
Faculty of Medicine, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
Susan Shin-Jung Lee & Hung-Chin Tsai
School of Medicine, College of Medicine, National Sun Yat-Sen University, Kaohsiung, Taiwan
Institute of Biomedical Sciences, National Sun Yat-Sen University, Kaohsiung, Taiwan
Hung-Chin Tsai
Biostatistics Consulting Center, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
Chih-Hui Hsu & Sheng-Hsiang Lin
Department of Public Health, College of Medicine, National Cheng Kung University, Tainan, Taiwan
Sheng-Hsiang Lin
You can also search for this author in PubMed Google Scholar
Contributions
YWW, SJL, HCT and SHL conceived the study and designed the protocol. YWW, CHH and SHL performed the data management and analyses. YWW drafted the paper. SJL and HCT revised the manuscript. SHL provided critical revisions and supervised the paper. All authors contributed to and approved the final paper.
Corresponding author
Correspondence to Sheng-Hsiang Lin .
Ethics declarations
Ethics approval and consent to participate.
This study was conducted after approval by the Institutional Review Board (IRB) of the National Cheng Kung University Hospital (B-EX-109–026). Since personal identification information is encrypted before releasing the data to researchers, informed consent was able to be waived from the Institutional Review Board (IRB) of the National Cheng Kung University Hospital (B-EX-109–026). And all methods were carried out in accordance with relevant guidelines and regulations.
Consent for publication
Not applicable.
Competing interests
All authors declare no competing interests.
Additional information
Publisher’s note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: supplementary table 1..
ICD-9 and ICD 10 codes used for neurological disorders and covariates.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Reprints and Permissions
About this article
Cite this article.
Weng, YW., Lee, S.SJ., Tsai, HC. et al. Prediction of incidence of neurological disorders in HIV-infected persons in Taiwan: a nested case–control study. BMC Infect Dis 23 , 759 (2023). https://doi.org/10.1186/s12879-023-08761-4
Download citation
Received : 05 July 2023
Accepted : 27 October 2023
Published : 04 November 2023
DOI : https://doi.org/10.1186/s12879-023-08761-4
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Neurological disorders
- National health insurance research database
BMC Infectious Diseases
ISSN: 1471-2334
- Submission enquiries: [email protected]
- General enquiries: [email protected]

Case-Control Studies
- 1
- | 2
- | 3
- | 4
- | 5
- | 6
- | 7
- | 8

A Nested Case-Control Study
Retrospective and prospective case-control studies.

E pi_Tools.XLSX
All Modules
Suppose a prospective cohort study were conducted among almost 90,000 women for the purpose of studying the determinants of cancer and cardiovascular disease. After enrollment, the women provide baseline information on a host of exposures, and they also provide baseline blood and urine samples that are frozen for possible future use. The women are then followed, and, after about eight years, the investigators want to test the hypothesis that past exposure to pesticides such as DDT is a risk factor for breast cancer. Eight years have passed since the beginning of the study, and 1.439 women in the cohort have developed breast cancer. Since they froze blood samples at baseline, they have the option of analyzing all of the blood samples in order to ascertain exposure to DDT at the beginning of the study before any cancers occurred. The problem is that there are almost 90,000 women and it would cost $20 to analyze each of the blood samples. If the investigators could have analyzed all 90,000 samples this is what they would have found the results in the table below.
Table of Breast Cancer Occurrence Among Women With or Without DDT Exposure
While 1,439 breast cancers is a disturbing number, it is only 1.6% of the entire cohort, so the outcome is relatively rare, and it is costing a lot of money to analyze the blood specimens obtained from all of the non-diseased women. There is, however, another more efficient alternative, i.e., to use a case-control sampling strategy. One could analyze all of the blood samples from women who had developed breast cancer, but only a sample of the whole cohort in order to estimate the exposure distribution in the population that produced the cases.
If one were to analyze the blood samples of 2,878 of the non-diseased women (twice as many as the number of cases), one would obtain results that would look something like those in the next table.
Odds of Exposure: 360/1079 in the cases versus 432/2,446 in the non-diseased controls.
Totals Samples analyzed = 1,438+2,878 = 4,316
Total Cost = 4,316 x $20 = $86,320
With this approach a similar estimate of risk was obtained after analyzing blood samples from only a small sample of the entire population at a fraction of the cost with hardly any loss in precision. In essence, a case-control strategy was used, but it was conducted within the context of a prospective cohort study. This is referred to as a case-control study "nested" within a cohort study.
Rothman states that one should look upon all case-control studies as being "nested" within a cohort. In other words the cohort represents the source population that gave rise to the cases. With a case-control sampling strategy one simply takes a sample of the population in order to obtain an estimate of the exposure distribution within the population that gave rise to the cases. Obviously, this is a much more efficient design.
It is important to note that, unlike cohort studies, case-control studies do not follow subjects through time. Cases are enrolled at the time they develop disease and controls are enrolled at the same time. The exposure status of each is determined, but they are not followed into the future for further development of disease.
As with cohort studies, case-control studies can be prospective or retrospective. At the start of the study, all cases might have already occurred and then this would be a retrospective case-control study. Alternatively, none of the cases might have already occurred, and new cases will be enrolled prospectively. Epidemiologists generally prefer the prospective approach because it has fewer biases, but it is more expensive and sometimes not possible. When conducted prospectively, or when nested in a prospective cohort study, it is straightforward to select controls from the population at risk. However, in retrospective case-control studies, it can be difficult to select from the population at risk, and controls are then selected from those in the population who didn't develop disease. Using only the non-diseased to select controls as opposed to the whole population means the denominator is not really a measure of disease frequency, but when the disease is rare , the odds ratio using the non-diseased will be very similar to the estimate obtained when the entire population is used to sample for controls. This phenomenon is known as the r are-disease assumption . When case-control studies were first developed, most were conducted retrospectively, and it is sometimes assumed that the rare-disease assumption applies to all case-control studies. However, it actually only applies to those case-control studies in which controls are sampled only from the non-diseased rather than the whole population.
The difference between sampling from the whole population and only the non-diseased is that the whole population contains people both with and without the disease of interest. This means that a sampling strategy that uses the whole population as its source must allow for the fact that people who develop the disease of interest can be selected as controls. Students often have a difficult time with this concept. It is helpful to remember that it seems natural that the population denominator includes people who develop the disease in a cohort study. If a case-control study is a more efficient way to obtain the information from a cohort study, then perhaps it is not so strange that the denominator in a case-control study also can include people who develop the disease. This topic is covered in more detail in EP813 Intermediate Epidemiology.
Students usually think of case-control studies as being only retrospective, since the investigators enroll subjects who have developed the outcome of interest. However, case-control studies, like cohort studies, can be either retrospective or prospective. In a prospective case-control study, the investigator still enrolls based on outcome status, but the investigator must wait to the cases to occur.
return to top | previous page | next page
Content ©2016. All Rights Reserved. Date last modified: June 7, 2016. Wayne W. LaMorte, MD, PhD, MPH
- - Google Chrome
Intended for healthcare professionals
- Access provided by Google Indexer
- My email alerts
- BMA member login
- Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Search form
- Advanced search
- Search responses
- Search blogs
- Nested case-control...
Nested case-control studies: advantages and disadvantages
- Related content
- Peer review
- Philip Sedgwick , reader in medical statistics and medical education 1
- 1 Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
- p.sedgwick{at}sgul.ac.uk
Researchers investigated whether antipsychotic drugs were associated with venous thromboembolism. A population based nested case-control study design was used. Data were taken from the UK QResearch primary care database consisting of 7 267 673 patients. Cases were adult patients with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. For each case, up to four controls were identified, matched by age, calendar time, sex, and practice. Exposure to antipsychotic drugs was assessed on the basis of prescriptions on, or during the 24 months before, the index date. 1
There were 25 532 eligible cases (15 975 with deep vein thrombosis and 9557 with pulmonary embolism) and 89 491 matched controls. The primary outcome was the odds ratios for venous thromboembolism associated with antipsychotic drugs adjusted for comorbidity and concomitant drug exposure. When adjusted using logistic regression to control for potential confounding, prescription of antipsychotic drugs in the previous 24 months was significantly associated with an increased occurrence of venous thromboembolism compared with non-use (odds ratio 1.32, 95% confidence interval 1.23 to 1.42). The researchers concluded that prescription of antipsychotic drugs was associated with venous thromboembolism in a large primary care population.
Which of the following statements, if any, are true?
a) The nested case-control study is a retrospective design
b) The study design minimised selection bias compared with a case-control study
c) Recall bias was minimised compared with a case-control study
d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism
Statements a , b , and c are true, whereas d is false.
The aim of the study was to investigate whether prescription of antipsychotic drugs was associated with venous thromboembolism. A nested case-control study design was used. The study design was an observational one that incorporated the concept of the traditional case-control study within an established cohort. This design overcomes some of the disadvantages associated with case-control studies, 2 while incorporating some of the advantages of cohort studies. 3 4
Data for the study above were extracted from the UK QResearch primary care database, a computerised register of anonymised longitudinal medical records for patients registered at more than 500 UK general practices. Patient data were recorded prospectively, the database having been updated regularly as patients visited their GP. Cases were all adult patients in the register with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. There were 25 532 cases in total. For each case, up to four controls were identified from the register, matched by age, calendar time, sex, and practice. In total, 89 491 matched controls were obtained. Data relating to prescriptions for antipsychotic drugs on, or during the 24 months before, the index date were extracted for the cases and controls. The index date was the date in the register when venous thromboembolism was recorded for the case. The cases and controls were compared to ascertain whether exposure to prescription of antipsychotic drugs was more common in one group than in the other. Despite the data for the cases and controls being collected prospectively, the nested case-control study is described as retrospective ( a is true) because it involved looking back at events that had already taken place and been recorded in the register.
Selection bias is of particular concern in the traditional case-control study. Described in a previous question, 5 selection bias is the systematic difference between the study participants and the population they are meant to represent with respect to their characteristics, including demographics and morbidity. Cases and controls are often selected through convenience sampling. Cases are typically recruited from hospitals or general practices because they are convenient and easily accessible to researchers. Controls are often recruited from the same hospital clinics or general practices as the cases. Therefore, the selected cases may not be representative of the population of all cases. Equally, the controls might not be representative of otherwise healthy members of the population. The above nested case-control study was population based, with the QResearch primary care database incorporating a large proportion of the UK population. The cases and controls were selected from the database and therefore should be more representative of the population than those in a traditional case-control study. Hence, selection bias was minimised by using the nested case-control study design ( b is true).
The traditional case-control study involves participants recalling information about past exposure to risk factors after identification as a case or control. The study design is prone to recall bias, as described in a previous question. 6 Recall bias is the systematic difference between cases and controls in the accuracy of information recalled. Recall bias will exist if participants have selective preconceptions about the association between the disease and past exposure to the risk factor(s). Cases may, for example, recall information more accurately than controls, possibly because of an association with the disease or outcome. Although in the study above the cases and controls were identified retrospectively, the data for the QResearch primary care database were collected prospectively. Therefore, there was no reason for any systematic differences between groups of study participants in the accuracy of the information collected. Therefore, recall bias was minimised compared with a traditional case-control study ( c is true).
Not all of the patient records in the UK QResearch primary care database were used to explore the association between prescription of antipsychotic drugs and development of venous thromboembolism. A nested case-control study was used instead, with cases and controls matched on age, calendar time, sex, and practice. This was because it was statistically more efficient to control for the effects of age, calendar time, sex, and practice by matching cases and controls on these variables at the design stage, rather than controlling for their potential confounding effects when the data were analysed. The matching variables were considered to be important factors that could potentially confound the association between prescription of antipsychotic drugs and venous thromboembolism, but they were not of interest as potential risk factors in themselves. Matching in case-control studies has been described in a previous question. 7
Unlike a traditional case-control study, the data in the example above were recorded prospectively. Therefore, it was possible to determine whether prescription of antipsychotic drugs preceded the occurrence of venous thromboembolism. Nonetheless, only association, and not causation, can be inferred from the results of the above nested case-control study ( d is false)—that is, those people who were exposed to prescribed antipsychotic drugs were more likely to have developed venous thromboembolism. This is because the observed association between prescribed antipsychotic drugs and occurrence of venous thromboembolism may have been due to confounding. In particular, it was not possible to measure and then control for, through statistical analysis, all factors that may have affected the occurrence of venous thromboembolism.
The example above is typical of a nested case-control study; the health records for a group of patients that have already been collected and stored in an electronic database are used to explore the association between one or more risk factors and a disease or condition. The management of such databases means it is possible for a variety of studies to be undertaken, each investigating the risk factors associated with different diseases or outcomes. Nested case-control studies are therefore relatively inexpensive to perform. However, the major disadvantage of nested case-control studies is that not all pertinent risk factors are likely to have been recorded. Furthermore, because many different healthcare professionals will be involved in patient care, risk factors and outcome(s) will probably not have been measured with the same accuracy and consistency throughout. It may also be problematic if the diagnosis of the disease or outcome changes with time.
Cite this as: BMJ 2014;348:g1532
Competing interests: None declared.
- ↵ Parker C, Coupland C, Hippisley-Cox J. Antipsychotic drugs and risk of venous thromboembolism: nested case-control study. BMJ 2010 ; 341 : c4245 . OpenUrl Abstract / FREE Full Text
- ↵ Sedgwick P. Case-control studies: advantages and disadvantages. BMJ 2014 ; 348 : f7707 . OpenUrl CrossRef
- ↵ Sedgwick P. Prospective cohort studies: advantages and disadvantages. BMJ 2013 ; 347 : f6726 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Retrospective cohort studies: advantages and disadvantages. BMJ 2014 ; 348 : g1072 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Selection bias versus allocation bias. BMJ 2013 ; 346 : f3345 . OpenUrl FREE Full Text
- ↵ Sedgwick P. What is recall bias? BMJ 2012 ; 344 : e3519 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Why match in case-control studies? BMJ 2012 ; 344 : e691 . OpenUrl FREE Full Text
- Published: 23 January 2015
Nested case–control studies: should one break the matching?
- Ørnulf Borgan 1 &
- Ruth Keogh 2
Lifetime Data Analysis volume 21 , pages 517–541 ( 2015 ) Cite this article
1589 Accesses
18 Citations
Metrics details
In a nested case–control study, controls are selected for each case from the individuals who are at risk at the time at which the case occurs. We say that the controls are matched on study time. To adjust for possible confounding, it is common to match on other variables as well. The standard analysis of nested case–control data is based on a partial likelihood which compares the covariates of each case to those of its matched controls. It has been suggested that one may break the matching of nested case–control data and analyse them as case–cohort data using an inverse probability weighted (IPW) pseudo likelihood. Further, when some covariates are available for all individuals in the cohort, multiple imputation (MI) makes it possible to use all available data in the cohort. In the paper we review the standard method and the IPW and MI approaches, and compare their performance using simulations that cover a range of scenarios, including one and two endpoints.
This is a preview of subscription content, access via your institution .
Access options
Buy single article.
Instant access to the full article PDF.
Price includes VAT (Russian Federation)
Rent this article via DeepDyve.
Aalen OO, Borgan Ø, Gjessing HK (2008) Survival and event history analysis: a process point of view. Springer, New York
Book Google Scholar
Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. Ann Stat 10:1100–1120
Article MATH MathSciNet Google Scholar
Bartlett JW, Seaman SR, White IR, Carpenter JR (2014) Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res. doi: 10.1177/0962280214521348
Borgan Ø, Samuelsen SO (2013) Nested case–control and case–cohort studies. In: Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH (eds) Handbook of survival analysis. Chapman and Hall/CRC Press, Boca Raton, Florida, pp 343–367
Google Scholar
Borgan Ø, Goldstein L, Langholz B (1995) Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann Stat 23:1749–1778
Breslow NE (1996) Statistics in epidemiology: the case–control study. J American Stat Assoc 91:14–28
Carpenter JR, Kenward MG (2013) Multiple imputation and its aplication. Wiley, New York
Chen K (2001) Generalized case–cohort estimation. J R Stat Soc Ser B 63:791–809
Article MATH Google Scholar
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, Hoboken
Book MATH Google Scholar
Keogh RH, Cox DR (2014) Case–control studies. Cambridge University Press, Cambridge
Keogh RH, White IR (2013) Using full-cohort data in nested case–control and case–cohort studies by multiple imputation. Stat Med 32:4021–4043
Article MathSciNet Google Scholar
Langholz B, Borgan Ø (1995) Counter-matching: a stratified nested case–control sampling method. Biometrika 82:69–79
Meng X (1994) Multiple-imputation inferences with uncongenial sources of input. Stat Sci 9:538–558
Oakes D (1981) Survival times: aspects of partial likelihood (with discussion). Int Stat Rev 49:235–264
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Rundle AG, Vineis P, Ahsan H (2005) Design options for molecular epidemiology research within cohort studies. Cancer Epidemiol Biomark Prev 14:1899–1907
Article Google Scholar
Saarela O, Kulathinal S, Arjas E, Läärä E (2008) Nested case–control data utilized for multiple outcomes: a likelihood approach and alternatives. Stat Med 27:5991–6008
Samuelsen SO (1997) A pseudolikelihood approach to analysis of nested case–control studies. Biometrika 84:379–394
Samuelsen SO, Ånestad H, Skrondal A (2007) Stratified case–cohort analysis of general cohort sampling designs. Scand J Stat 34:103–119
Scheike TH, Juul A (2004) Maximum likelihood estimation for Cox’s regression model under nested case–control sampling. Biostatistics 5:193–206
Scott AJ, Wild CJ (1986) Logistic models under case-control or choice based sampling. J R Stat Soc Ser B 48:170–182
MATH MathSciNet Google Scholar
Scott AJ, Wild CJ (2002) Logistic models under case-control or choice based sampling. J R Stat Soc Ser B 64:207–219
Støer NC, Samuelsen SO (2012) Comparison of estimators in nested case–control studies with multiple outcomes. Lifetime Data Anal 18:261–283
Støer NC, Samuelsen SO (2013) Inverse probability weighting in nested case–control studies with additional matching—a simulation study. Stat Med 32:5328–5339
Støer NC, Samuelsen SO (2014) multipleNCC: weighted Cox-regression for nested case-control data. http://CRAN.R-project.org/package=multipleNCC , R package version 1.0
Van Buuren S (2007) Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res 16:219–242
Van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in R. J Stat Softw 45:1–67
White IR, Royston P (2009) Imputing missing covariate values for the Cox model. Stat Med 28:1982–1998
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30:377–399
Download references
Acknowledgments
Most of this research was done when Ørnulf Borgan was visiting the Department of Medical Statistics at London School of Hygiene and Tropical Medicine the spring of 2014. The department is acknowledged for its hospitality and for providing the best working facilities. We also want to thank Nathalie Støer for letting us use her new R package multipleNCC before it was made publicly available.
Author information
Authors and affiliations.
Department of Mathematics, University of Oslo, P.O.Box 1053, Blindern, 0316, Oslo, Norway
Ørnulf Borgan
Department of Medical Statistics, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Ørnulf Borgan .
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (pdf 35 KB)
Rights and permissions.
Reprints and Permissions
About this article
Cite this article.
Borgan, Ø., Keogh, R. Nested case–control studies: should one break the matching?. Lifetime Data Anal 21 , 517–541 (2015). https://doi.org/10.1007/s10985-015-9319-y
Download citation
Received : 01 August 2014
Accepted : 06 January 2015
Published : 23 January 2015
Issue Date : October 2015
DOI : https://doi.org/10.1007/s10985-015-9319-y
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Case–cohort
- Competing risks
- Cox regression
- Inverse probability weighting
- Multiple imputation
- Nested case–control
Advertisement
- Find a journal
- Publish with us
- Research article
- Open access
- Published: 21 July 2008
Advantages of the nested case-control design in diagnostic research
- Cornelis J Biesheuvel 1 , 2 ,
- Yvonne Vergouwe 1 ,
- Ruud Oudega 1 ,
- Arno W Hoes 1 ,
- Diederick E Grobbee 1 &
- Karel GM Moons 1
BMC Medical Research Methodology volume 8 , Article number: 48 ( 2008 ) Cite this article
48k Accesses
87 Citations
2 Altmetric
Metrics details
Despite its benefits, it is uncommon to apply the nested case-control design in diagnostic research. We aim to show advantages of this design for diagnostic accuracy studies.
We used data from a full cross-sectional diagnostic study comprising a cohort of 1295 consecutive patients who were selected on their suspicion of having deep vein thrombosis (DVT). We draw nested case-control samples from the full study population with case:control ratios of 1:1, 1:2, 1:3 and 1:4 (per ratio 100 samples were taken). We calculated diagnostic accuracy estimates for two tests that are used to detect DVT in clinical practice.
Estimates of diagnostic accuracy in the nested case-control samples were very similar to those in the full study population. For example, for each case:control ratio, the positive predictive value of the D-dimer test was 0.30 in the full study population and 0.30 in the nested case-control samples (median of the 100 samples). As expected, variability of the estimates decreased with increasing sample size.
Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies and should also be (re)appraised in current guidelines on diagnostic accuracy research.
Peer Review reports
In diagnostic research it is essential to determine the accuracy of a test to evaluate its value for medical practice [ 1 ]. Diagnostic test accuracy is assessed by comparing the results of the index test with the results of the reference standard in the same patients. Given the cross-sectional nature of a diagnostic accuracy question, the design may be referred to as a cross-sectional cohort design. The (cohort) characteristic by which the study subjects (cohort members) are selected is 'the suspicion of the target disease', defined by the presence of particular symptoms or signs [ 2 ]. The collected study data allow for calculation of all diagnostic accuracy parameters of the index test, such as sensitivity, specificity, odds ratio, receiver operating characteristic (ROC) curve and predictive values, i.e. the probabilities of presence and absence of the disease given the index test result(s).
Subjects are not always selected on their initial suspicion of having the disease but often on the true presence or absence of the disease among those who underwent the reference test in routine care practice, which merely reflects a cross-sectional case-control design [ 3 , 4 ]. Appraisal of such conventional case-control design in diagnostic accuracy research has been limited due to its problems related to the incorrect sampling of cases and controls [ 3 – 7 ]. These problems may be overcome by applying a nested (cross-sectional) case-control study design, which may be advantageous over a full (cross-sectional) cohort design. The rationale, strengths and limitations of a nested case-control approach in epidemiology studies have widely been discussed in the literature [ 8 – 11 ], but not so much in the context of diagnostic accuracy research [ 6 ].
We therefore aim to show advantages of the nested case-control design for addressing diagnostic accuracy questions and discuss its pros and cons in relation to a conventional case-control design and to the full (cross sectional) cohort design in this domain. We will illustrate this with data from a recently conducted diagnostic accuracy study.
Case-control versus nested case-control design
The essence of a case-control study is that cases with the condition under study arise in a source population and controls are a representative sample of this same source population. Not the entire population is studied, what would be a full cohort study or census approach, but rather a random sample from the source population [ 12 ]. A major flaw inherent to case-control studies, described as early as 1959 [ 13 ], is the difficulty to ensure that cases and controls are a representative sample of the same source population. In a nested case-control study the cases emerge from a well-defined source population and the controls are sampled from that same population. The main difference between a case-control and a nested case-control study is that in the former the cases and controls are sampled from a source population with unknown size, whereas the latter is 'nested' in an existing predefined source population with known sample size. This source population can be a group or cohort of subjects that is followed over time or not.
The term 'cohort' is commonly referred to a group of subjects followed over time in etiologic or prognostic research. But in essence, time is no prerequisite for the definition of a cohort. A cohort is a group of subjects that is defined by the same characteristic. This characteristic can be a particular birth year, a particular living area, and also the presence of a particular sign or symptom that makes them suspected of having a particular disease as in diagnostic research. Accordingly, a cross-sectional study can either be a cross-sectional case-control study or a cross-sectional cohort study.
Case-control and nested case-control design in diagnostic accuracy research
In diagnostic accuracy research the case-control design is incorrectly applied when subjects are selected from routine care databases. First, this design commonly leads to biased estimates of diagnostic accuracy of the index test due to referral or (partial) verification bias [ 4 , 14 – 18 ]. In routine care, physicians selectively refer patients for additional tests, including the reference test, based on previous test results. This is good clinical practice but a bad starting point for diagnostic research. As said, for diagnostic research purposes all subjects suspected of the target disease preferably undergo the index test(s) plus reference test irrespective of previous test results. Second, selection of patients with a negative reference test result as 'controls' may lead to inclusion of controls that correspond to a different clinical domain, i.e. patients who underwent the reference test but not necessarily because they were similarly suspected of the target condition [ 16 , 17 ]. A third disadvantage of such case-control design is that absolute probabilities of disease presence given the index test results, i.e. the predictive values or post-test probabilities, that are the desired parameters for patient care, cannot be obtained. Cases and controls are sampled from a source population of unknown size. The total number of patients that were initially suspected of the target disease based on the presence of symptoms or signs, i.e. the true source population, is commonly unknown as in routine care patients are hardly classified by their symptoms and signs at presentation [ 18 ]. Hence, the sampling fraction of cases and controls is unknown and valid estimates of the absolute probabilities of disease presence cannot be calculated [ 12 ].
A nested case-control study in diagnostic research includes the full population or cohort of patients suspected of the target disease. The 'true' disease status is obtained for all these patients with the reference standard. Hence, there is no referral or partial verification bias. The results of the index tests can then be obtained for all subjects with the target condition but only for a sample of the subjects without the target condition. Usually all patients with the target disease are included, but this could as well be a sample of the cases. Besides the absence of bias, all measures of diagnostic accuracy, including the positive and negative predictive values, can simply be obtained by weighing the controls with the case-control sampling fraction, as explained in Figure 1 .

Theoretical example of a full study population and a nested case-control sample . The index test result and the outcome are obtained for all patients of the study population. The case-control ratio was 1:4 (sampling fraction (SF) = 160/400 = 0.40). Valid diagnostic accuracy measures can be obtained from the nested case-control sample, by multiplying the controls with 1/sampling fraction. For example, the positive predictive value (PPV) of a full study population can be calculated with a/(a + b), in this example 30/(30 + 100) = 0.23. In a nested case-control sample the PPV is calculated with a/(a + (1/SF)*b), in this example: 30/(30 + 2.5*40) = 0.23. In a case-control sample however, the controls are sampled from a source population with unknown size. Therefore, the sample fraction is unknown and valid estimate of the PPV cannot be calculated.
Potential advantages of a nested case-control design in diagnostic research
The nested case-control study design can be advantageous over a full cross-sectional cohort design when actual disease prevalence in subjects suspected of a target condition is low, the index test is costly to perform, or if the index test is invasive and may lead to side effects. Under these conditions, one limits patient burden and saves time and money as the index test is performed in only a sample of the control subjects.
Furthermore, the nested case-control design is of particular value when stored data (serum, images etc.) of an existing study population are re-analysed for diagnostic research purposes. Using a nested case-control design, only data of a sample of the full study population need to be retrieved and analysed without having to perform a new diagnostic study from the start. This may for example apply to evaluation of tumour markers to detect cancer, but also for imaging or electrophysiology tests.
Diagnostic accuracy estimates derived from a nested case-control study, should be virtually identical to a full cohort analysis. However, the variability of the accuracy estimates will increase with decreasing sample size. We illustrate this with data of a diagnostic study on a cohort of patients who were suspected of DVT.
A cross-sectional study was performed among a cohort of adult patients suspected of deep vein thrombosis (DVT) in primary care. This suspicion was primarily defined by the presence of a painful and swollen or red leg that existed no longer than 30 days. Details on the setting, data collection and main results have been described previously. [ 19 , 20 ] In brief, the full study population included 1295 consecutive patients who visited one of the participating primary care physicians with above symptoms and signs of DVT. Patients were excluded if pulmonary embolism was suspected. The general practitioner systematically documented information on patient history and physical examination. Patient history included information such as age, gender, history of malignancy, and recent surgery. Physical examination included swelling of the affected limb and difference in circumference of the calves calculated as the circumference (in centimetres) of affected limb minus circumference of unaffected limb, further referred to as calf difference test. Subsequently, all patients were referred to undergo D-dimer testing. In line with available guidelines and previous studies, the D-dimer test result was considered abnormal if the test yielded a D-dimer level ≥ 500 ng/ml. [ 21 , 22 ] Finally, they all underwent the reference test, i.e. repeated compression ultrasonography (CUS) of the lower extremities. In patients with a normal first CUS measurement, the CUS was repeated after seven days. DVT was considered present if one CUS measurement was abnormal. The echographist was blinded to the results of patient history, physical examination, and the D-dimer assay.
Nested case-control samples
Nested case-control samples were drawn from the full study population (n = 1295). In all samples, we included always all 289 cases with DVT. Controls were randomly sampled from the 1006 subjects without DVT. We applied four different and frequently used case-control ratios, i.e. one control for each case (1:1), two controls for each case (1:2), three controls for each case (1:3) and four controls for each case (1:4). For example, a sample with case-control ratio of 1:1 contained 289 cases and 289 random subjects out of 1006 controls (sampling fraction 289/1006 = 0.287). In the 1:4 approach, we sampled with replacement. For each case-control ratio, 100 nested case-control samples were drawn.
Statistical analysis
We focussed on two important diagnostic tests for DVT, i.e. the dichotomous D-dimer test and the continuous calf difference test. The latter was specifically chosen as it allowed for the estimation and thus comparison of the area under the ROC curve (ROC area). Diagnostic accuracy measures of both tests were estimated for the four case-control ratios and compared with those obtained from the full study population. Measures of diagnostic accuracy included sensitivity and specificity, positive and negative predictive values and the odds ratio (OR) for the D-dimer test, and the OR and the ROC area for the calf difference test.
In the analysis of the nested case-control samples, we multiplied control samples by [1/sample fraction] corresponding to the case-control ratio (1:1 = 3.48; 1:2 = 1.74; 1:3 = 1.16; 1:4 = 0.87). For each case-control ratio, the point estimates and variability were determined. The median estimate of the 100 samples was considered as the point estimate. Analyses were performed using SPSS version 12.0 and S-plus version 6.0.
In the full study population, the prevalence of DVT was 22% (n = 289), the D-dimer test was abnormal in 69% of the patients (n = 892) and the mean difference in calf circumference was 2.3 cm (Table 1 ). The prevalence of DVT was 50%, 33%, 25% and 20% in the nested case-control samples as a result of the sampling ratios (1:1, 1:2, 1:3 and 1:4, respectively). The distributions of the test characteristics in the control samples were similar as for the patients from the full study population without DVT (Table 1 ).
In the full study population the sensitivity and negative predictive value were high for the D-dimer test, 0.94 and 0.96, respectively (Table 2 ), whereas the specificity and positive predictive value were relatively low. The OR for the calf difference test was 1.44 and the ROC area was 0.69.
The average estimates of diagnostic accuracy for each of the four case-control ratios were similar to the corresponding estimates of the full study population (Figure 2 ). For example, the negative predictive value of the D-dimer test was 0.955 in both the full study population and for the four case-control ratios. The OR of the calf difference test was 1.44 in the full study population and the OR derived from the nested case-control samples were on average also 1.44.

Estimates of diagnostic accuracy of the D-dimer test and calf difference test for the 100 nested case-control samples with case-control ratios ranging from 1:1 to 1:4 . The boxes indicate mean values and corresponding interquartile ranges (25 th and 75 th percentile). Whiskers indicate 2.5 th and 97.5 th percentiles. The dotted lines represent the values estimated in the full study population.
The use of (conventional) case-control studies in diagnostic research has often been associated with biased estimates of diagnostic accuracy, due to the incorrect sampling of subjects [ 3 – 6 , 18 ]. Moreover, this study design does not allow for the estimation of the desired absolute disease probabilities. We discussed and showed that a case-control study nested within a well defined cohort of subjects suspected of a particular target disease with known sample size can yield valid estimates of diagnostic accuracy of an index test, including the absolute probabilities of disease presence or absence. Diagnostic accuracy parameters derived from a full (cross-sectional) cohort of patients suspected of DVT were similar to the estimates derived from various nested case-control samples averaged over 100 simulations. Expectedly, the variability decreased with increasing number of controls, making the measures estimated in the larger case-control samples more precise.
As discussed, the number of subjects from which the index test results need to be retrieved can substantially be reduced with a nested case-control design. Hence, the nested case-control design is particularly advantageous when the prevalence of the target condition in the cohort of patients suspected of the target disease is rare, when the index test results are costly or difficult to collect and for re-analysing stored images or specimen. However, precision of the diagnostic accuracy measures will be hampered by increased variability when too little control patients are included.
Rutjes et al nicely discussed limitations of different study designs in diagnostic research [ 6 ]. They proposed the 'two-gate design with representative sampling' (which resembles the nested case-control design in this paper) as a valid design. We confirmed their proposition with a quantitative analysis of a diagnostic study. Rutjes et al suggested not to use the term 'nested case-control' to prevent confusion with etiologic studies where this design is commonly applied. Indeed, diagnostic and etiologic research differs fundamentally, first and foremost on the concept of time. Diagnostic accuracy studies are, in contrast to etiologic studies, typically cross-sectional in nature. Furthermore, diagnostic associations between index and reference tests are purely descriptive, whereas in etiologic studies causal associations and potential confounding are involved. Despite these major differences we believe there is no reason not to use the term nested case-control study in diagnostic research as well. The term inherently refers to the method of sampling of study subjects which can be the same in a diagnostic or etiologic setting, and has no direct bearing on the other issues typically related to etiologic case control studies.
Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies. We believe that the nested case-control approach should be applied more often in diagnostic research, and also be (re)appraised in current guidelines on diagnostic methodology.
Knottnerus JA, van Weel C, Muris JW: Evaluation of diagnostic procedures. BMJ. 2002, 324 (7335): 477-480. 10.1136/bmj.324.7335.477.
Article PubMed PubMed Central Google Scholar
Knottnerus JA, Muris JW: Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol. 2003, 56 (11): 1118-1128. 10.1016/S0895-4356(03)00206-3.
Article CAS PubMed Google Scholar
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Meulen van der JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.
Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ. 2006, 174 (4): 469-476.
Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004, 140 (3): 189-202.
Article PubMed Google Scholar
Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM: Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem. 2005, 51 (8): 1335-1341. 10.1373/clinchem.2005.048595.
Kraemer H: Evaluating Medical Tests. 1992, London, UK , Sage Publications
Google Scholar
Mantel N: Synthetic retrospective studies and related topics. Biometrics. 1973, 29 (3): 479-486. 10.2307/2529171.
Essebag V, Genest J, Suissa S, Pilote L: The nested case-control study in cardiology. Am Heart J. 2003, 146 (4): 581-590. 10.1016/S0002-8703(03)00512-X.
Ernster VL: Nested case-control studies. Prev Med. 1994, 23 (5): 587-590. 10.1006/pmed.1994.1093.
Langholz B: Case-Control Study, Nested. Encyclopedia of Biostatistics. Edited by: Armitage PCT. 2005, New York , John Wiley & Sons, 646-665. 2nd
Rothman KJ, Greenland S: Modern epidemiology. 1998, Philadelphia , Lincot-Raven Publishers, Second
Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22 (4): 719-748.
CAS PubMed Google Scholar
Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978, 299 (17): 926-930.
Begg CB, Greenes RA: Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983, 39: 297-215. 10.2307/2530820.
Article Google Scholar
Knottnerus JA, Leffers JP: The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol. 1992, 45: 1143-1154. 10.1016/0895-4356(92)90155-G.
van der Schouw YT, van Dijk R, Verbeek ALM: Problems in selecting the adequate patient population from existing data files for assessment studies of new diagnostic tests. J Clin Epidemiol. 1995, 48: 417-422. 10.1016/0895-4356(94)00144-F.
Oostenbrink R, Moons KG, Bleeker SE, Moll HA, Grobbee DE: Diagnostic research on routine care data: prospects and problems. J Clin Epidemiol. 2003, 56 (6): 501-506. 10.1016/S0895-4356(03)00080-5.
Oudega R, Hoes AW, Moons KG: The Wells rule does not adequately rule out deep venous thrombosis in primary care patients. Ann Intern Med. 2005, 143 (2): 100-107.
Oudega R, Moons KG, Hoes AW: Limited value of patient history and physical examination in diagnosing deep vein thrombosis in primary care. Fam Pract. 2005, 22 (1): 86-91. 10.1093/fampra/cmh718.
Perrier A, Desmarais S, Miron M, de Moerloose P, Lepage R, Slosman D, Didier D, Unger P, Patenaude J, Bounameaux H: Non-invasive diagnosis of venous thromboembolism in outpatients. Lancet. 1999, 353: 190-195. 10.1016/S0140-6736(98)05248-9.
Schutgens RE, Ackermark P, Haas FJ, Nieuwenhuis HK, Peltenburg HG, Pijlman AH, Pruijm M, Oltmans R, Kelder JC, Biesma DH: Combination of a normal D-dimer concentration and a non-high pretest clinical probability score is a safe strategy to exclude deep venous thrombosis. Circulation. 2003, 107 (4): 593-597. 10.1161/01.CIR.0000045670.12988.1E.
Pre-publication history
The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/8/48/prepub
Download references
Acknowledgements
For this research project we received financial support from the Netherlands Organization for Scientific Research, grant number: ZON-MW904-66-112. The funding source had no influence on the design, data analysis and report of this study.
Author information
Authors and affiliations.
Julius Center for Health Sciences and Primary Care, University Medical Center, Utrecht, The Netherlands
Cornelis J Biesheuvel, Yvonne Vergouwe, Ruud Oudega, Arno W Hoes, Diederick E Grobbee & Karel GM Moons
The Children's Hospital at Westmead, Sydney, Australia
Cornelis J Biesheuvel
You can also search for this author in PubMed Google Scholar
Corresponding author
Correspondence to Karel GM Moons .
Additional information
Competing interests.
The authors declare that they have no competing interests.
Authors' contributions
All authors commented on the draft and the interpretation of the findings, read and approved the final manuscript. CJB was responsible for the design, statistical analysis and wrote the original manuscript. YV was responsible for the design and statistical analysis. RO was responsible for the data collection. AWH was responsible for expertise in case-control design. DEG and KGMM were responsible for conception and design of the study and coordination.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1
Authors’ original file for figure 2, rights and permissions.
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reprints and Permissions
About this article
Cite this article.
Biesheuvel, C.J., Vergouwe, Y., Oudega, R. et al. Advantages of the nested case-control design in diagnostic research. BMC Med Res Methodol 8 , 48 (2008). https://doi.org/10.1186/1471-2288-8-48
Download citation
Received : 07 March 2008
Accepted : 21 July 2008
Published : 21 July 2008
DOI : https://doi.org/10.1186/1471-2288-8-48
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Diagnostic Accuracy
- Deep Vein Thrombosis
- Target Disease
- Diagnostic Accuracy Study
BMC Medical Research Methodology
ISSN: 1471-2288
- Submission enquiries: [email protected]
- General enquiries: [email protected]
Log in using your username and password
- Search More Search for this keyword Advanced search
- Latest content
- For authors
- Browse by collection
- BMJ Journals More You are viewing from: Google Indexer
You are here
- Volume 13, Issue 11
- Associations of anaemia with bleeding and thrombotic complications in patients with atrial fibrillation treated with warfarin: a registry-based nested case–control study
- Article Text
- Article info
- Citation Tools
- Rapid Responses
- Article metrics

- http://orcid.org/0000-0002-5273-8088 Tuukka Antero Helin 1 ,
- Pekka Raatikainen 2 ,
- http://orcid.org/0000-0002-8691-5142 Mika Lehto 2 , 3 ,
- http://orcid.org/0000-0003-1450-6208 Jari Haukka 4 ,
- Riitta Lassila 5 , 6
- 1 Clinical Chemistry, HUS Diagnostic Centre , Helsinki University Hospital and University of Helsinki , Helsinki , Finland
- 2 Cardiology, Heart and Lung Center , Helsinki University Hospital , Helsinki , Finland
- 3 Internal Medicine , Jorvi Hospital , Espoo , Finland
- 4 Department of Public Health , University of Helsinki , Helsinki , Finland
- 5 Coagulation Disorders Unit and Clinical Chemistry , HUS Helsinki University Hospital , Helsinki , Finland
- 6 Program in Systems Oncology, Faculty of Medicine , University of Helsinki , Helsinki , Finland
- Correspondence to Dr Tuukka Antero Helin; tuukka.helin{at}hus.fi
Objectives We studied association of laboratory testing beyond the international normalised ratio (INR) with bleeding and stroke/transient ischaemic attack (TIA) outcomes in patients with atrial fibrillation treated with warfarin.
Design This was a retrospective nested case–control study from the Finnish Warfarin in Atrial Fibrillation (FinWAF) registry (n=54 568), reporting the management and outcome in warfarin-anticoagulated patients. Associations of blood count test frequency and results were assessed together with risk of bleeding or stroke/TIA during 5-year follow-up.
Setting National FinWAF registry, with data from all six hospital districts. Follow-up period for complications was 1 January 2007–31 December 2011.
Participants A total of 54 568 warfarin-anticoagulated patients.
Results The number of patients with bleeding was 4681 (9%) and stroke/TIA episodes, 4692 (9%). In patients with bleeds, lower haemoglobin (within 3 months) preceded the event compared with the controls (median 126 vs 135 g/L; IQR 111–141 g/L vs 123–147 g/L, p<0.001), while patients with stroke/TIA had only modestly lower INR (median 2.2 vs 2.3; 1.8–2.6 vs 2.1–2.7, p<0.001). When the last measured haemoglobin was below the reference value (130 g/L for men, 120 g/L for women), the OR for a bleeding complication was 2.9 and stroke/TIA, 1.5. If the haemoglobin level was below 100 g/L, the complication risk increased further by 10-fold. If haemoglobin values were repeatedly (more than five times) low during the preceding 3 months, future OR was for bleeds 2.3 and for stroke/TIA 2.4.
Conclusions The deeper the anaemia, the higher the risk of bleeding and stroke/TIA. However, INR remained mainly at its target and only occasionally deviated, failing to detect the complication risk. Repeated low haemoglobin results, compatible with persistent anaemia, refer to suboptimal management and increased the complication risk in anticoagulated patients.
- Thromboembolism
- Bleeding disorders & coagulopathies
Data availability statement
Data are available upon reasonable request. Deidentified aggregated data are available upon reasonable request from the corresponding author (TAH, ORCID ID 0000-0002-5273-8088, [email protected]).
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ .
http://dx.doi.org/10.1136/bmjopen-2022-071342
Statistics from Altmetric.com
Request permissions.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
STRENGTHS AND LIMITATIONS OF THIS STUDY
The strength of the study was the inclusion of a large number of patients (n=54 568), enabling a nested case–control design with three age-matched and gender-matched controls.
The large number of patients enabled assessment of bleeding and stroke/transient ischaemic attack outcomes as separate cohorts.
The retrospective nature of the study limits the opportunity to assess the effects of interventions (eg, increasing blood count measurements).
Introduction
Since atrial fibrillation (AF) predisposes to stroke, oral anticoagulation (OAC) is recommended in most patients with AF. In clinical practice, the decision to start anticoagulation is based on the CHA 2 DS 2 -VASc risk score ( C ongestive heart failure; H ypertension, A ge ≥75 years (2 points); D iabetes mellitus; Prior S troke or transient ischaemic attack (TIA) (2 points); V ascular disease; A ge 65–74 years; Female S ex c ategory), the score >1 indicating anticoagulation in men and >2 in women. 1 According to the current guidelines, the use of direct oral anticoagulants (DOACs) rather than vitamin K antagonists is recommended to prevent thromboembolic complications in patients with non-valvular AF. 1 The benefits of the DOACs include, for example, less interactions with food and other drugs and no need for routine coagulation monitoring with international normalised ratio (INR). Nevertheless, warfarin is still the only oral anticoagulant for patients with mechanical heart valve or severe mitral stenosis and patients with antiphospholipid antibody syndrome. 2
It is well-known that anaemia, as well as other risk factors, such as increased blood pressure, decreased renal function, previous bleeds, smoking, sleep apnoea and concomitant use of antiplatelet agents, increase the risk of bleeding in patients using OAC. 3 Low haemoglobin (Hgb) count is associated with many illnesses and increased mortality. 4 Moreover, it has also been shown that periodical checks of Hgb and total blood count can be used to screen for occult malignancy. 5 Anaemia also predisposes anticoagulated patients with AF to thromboembolic complications. 3 Hence, Hgb, blood cell counts, including red blood cells and platelets, as well as liver and kidney function, need regular follow-up to safely manage anticoagulated patients. 2
Current AF management guidelines recommend assessment of the bleeding risk using the HAS-BLED score. 1 A potential problem with the HAS-BLED score is that unlike in some other risk scores, anaemia is not included in the risk stratification scheme. 6 The incidence of anaemia among elderly people is 4–11%, 7 and approximately 15% of patients using warfarin have low Hgb. 7 8 Despite the clear correlation with Hgb and mortality, Hgb levels are only occasionally and infrequently measured or followed up during OAC therapy. Moreover, often no attempt to explore the aetiology of anaemia is performed, if Hgb level is above 100 g/L. 7–9
We aimed to study the frequency of anaemia, kidney and liver function assessment, their results and impact on complication risks among unselected warfarin-anticoagulated patients. We assessed the frequency of Hgb and kidney and liver function measurements and their results preceding bleeding and thromboembolic complications (stroke or TIA) in patients with AF using warfarin for stroke prevention in a large nationwide cohort in Finland. The patients with and without (controls) these complications were compared. The impact of laboratory testing and abnormal values and the adverse outcome are reported.
Patients and methods
This study was based on the nationwide Finnish Warfarin in Atrial Fibrillation (FinWAF) registry, which incorporated data from the Finnish Care Registry, Finnish Institutional Care Registry, Finnish Cancer Registry, Finnish National Prescription Register, laboratory databases from six Finnish hospital districts, the Finnish National Cause of Death Registry and Finnish Population Registry. Diagnosis of bleeding episodes or stroke was based on imaging, laboratory and clinical assessments, i.e International Classification of Diseases (ICD) diagnoses . The FinWAF registry has been described in more detail in previous publications. 10–13 The total number of patients included in our study was 54 568.
The current study was a retrospective nested case–control study. Cases were defined according to the first event of stroke/TIA or bleeding event in the follow-up period (1 January 2007–31 December 2011). We selected the controls without the index event from the entire cohort, and three control patients for each affected case were matched according to sex and birth year. The ICD-10 codes for bleeding included D68.3, I60–I62, J942, K221, K223, K226, K250, K252, K254, K256, K260, K262, K264, K266, K270, K274, K276, K280, K282, K284, K286, K290, K631, K633, K920–K922, R04, R31, S064–S066 and S068. The ICD-10 codes for stroke/TIA included I63, I64 and G45. 10 We assessed the number of laboratory tests within the 3 months prior to the index event and provided the values of the last available test results for a given patient prior to the event, or in the controls, the end of follow-up. If a laboratory test was not obtained within 3 months prior to the event, the value was recorded as not available.
Patient and public involvement
Patients and the public were not involved in the design and conduct of this study.
The number of patients with bleeding outcomes was 4681 (9%), and the number of matched controls without bleeding events was 14 042. The bleeding events were more common among men (60%, p<0.001). The number of patients with stroke/TIA was 4692 (9%), and the number of matched controls without stoke/TIA was 14 063. In contrast to the bleeding events, the stroke/TIA events were more common among women (54%, p<0.001). A summary table with laboratory test results in the whole cohort, and bleeder and stroke/TIA cases is shown in table 1 .
- View inline
Summary data of the laboratory assessments of the entire cohort (n=54 568), for patients with bleeding complication (n=4681) and patients with stroke/TIA complication (n=4692)
The Hgb measurement within 3 months prior to the event was associated with later increased risk of bleeding and stroke/TIA, and the risk was higher if more than five measurements were performed in the preceding months ( figures 1 and 2 ). Also, when examining leucocytes, alanine aminotransferase and creatinine, more frequent blood sampling preceded the risk of later bleeding and stroke/TIA ( figures 1 and 2 ). The incidence of bleeding complication or stroke/TIA was higher if the Hgb had been measured during weekend days of Saturday or Sunday, that is, on-call hours, compared with an Hgb testing on weekdays.
- Download figure
- Open in new tab
- Download powerpoint
Laboratory measurements and the OR for bleeding events, with 95% CIs. Only statistically significant results are shown. The OR 1.0 of reference group is depicted with a dashed line. The number of cases and controls, and more detailed results in each group are provided in the online supplemental table 1 . ALT, alanine aminotransferase; Crea, creatinine; Hgb, haemoglobin; Leuk, leucocytes.
Supplemental material
Laboratory measurements and the OR for stroke/TIA events, with 95% CIs. Only statistically significant results are shown. The OR 1.0 of reference group is depicted with a dashed line. The number of cases and controls, and more detailed results in each group are provided in the online supplemental table 2 . ALT, alanine aminotransferase; Crea, creatinine; Hgb, haemoglobin; Leuk, leucocytes; TIA, transient ischaemic attack.
A comparison of laboratory test results between cases and controls for both bleeding and stroke/TIA is shown in figure 3 and online supplemental table 1 . Among cases with a bleeding complication, the preceding Hgb level was lower, and the creatinine level was higher than in the controls. These results did not differ when the stroke/TIA was analysed ( figure 3 and online supplemental tables 2 and 3 ). On the other hand, the patients with stroke/TIA had lower INR than controls ( figure 3 ), although the median INR remained in the therapeutic range (INR 2.0–3.0). The proportion of patients with any laboratory tests measured within 3 months prior to the outcome was between 48% and 61% in the bleeding cases, but 30–33% in the controls. In contrast, among patients with later stroke/TIA, only 55–56% had INR measured within the 3 months preceding the event ( online supplemental table 1 ).
Significantly different laboratory test results among the controls (n=14 042) and the patient cases (n=4681) having a bleeding event (white background plots). For the patients with stroke/TIA, the box plots of the controls (n=14 063) and the stroke/TIA cases (n=4692) are shown (grey background plot). The INR levels (p for difference=0.98 for bleeding cases and p<0.001 for stroke cases) and clinically significant changes in haemoglobin and creatinine (p<0.001) laboratory values are shown. For INR, the treatment range 2.0–3.0 is shown with the dashed line. For haemoglobin, the reference interval lower limit and for creatinine, reference interval upper limit are shown with continuous and dotted lines for women and men, respectively. Box plots depict ranges (whiskers), quartiles 1 and 3 (box limits) and medians (horizontal line). The n-values in the figure represent the number of patients who had a laboratory result. For the data with corresponding p values, please see online supplemental table 3 , and for the entire dataset, online supplemental tables 4 and 5 . INR, international normalised ratio; TIA, transient ischaemic attack.
The frequency of Hgb measurements exceeding five time points within the 3-month period prior to the event was associated with an adverse event in the near future ( figures 1 and 2 ). The OR of frequent measures (over 5) was 2.3 (95% CI 2.2 to 2.5) for bleeding events and 2.4 (95% CI 2.1 to 2.4) for stroke/TIA compared with maximum 5 (0–5) measurements.
When the last measured Hgb level was below the low reference value (130 g/L for men, 120 g/L for women) before the time of the adverse outcome, the OR for bleeding event was 2.9 (95% CI 2.8 to 3.0), while for stroke/TIA, the OR was 1.7 (95% CI 1.6 to 1.8). Strikingly, the Hgb level below 100 g/L further emphasised the complication risk, with the risk increase of 10-fold (95% CI 7.8 to 12.3) for both bleeding and 3-fold for stroke/TIA (95% CI 2.3 to 4.0). Similarly, the last measured leucocyte level of over 10.0×10 9 /L was associated with a fourfold increase (95% CI 3.0 to 4.3) in the risk of a bleed and threefold (95% CI 2.3 to 3.4) risk of stroke/TIA ( figures 1 and 2 ). Platelet count did not associate with the future complication risk.
During the previous year of the event, in patients with stroke outcome, the number of routine Hgb measurements did not differ, median 2.0 in both patients who had a stroke and controls, with corresponding IQRs of 0–6 and 0–5, respectively. During the year prior to the event, patients with bleeding outcome had a median of 3.0 Hgb measurements (IQR 0–7), while the median was 2.0 measurements (IQR 0–5) in the controls (p<0.001).
In our large retrospective case–control study of over 54 000 patients with AF, we found that Hgb levels during the 3-month period prior to bleeding outcome were significantly lower in bleeders than in controls. In addition, the lowering of the Hgb occurred during the period of repeated observations, compatible with persisting anaemia. Anaemia predisposes to bleeds due to impaired primary haemostasis and may diminish physiological red blood cell-mediated thrombin generation. 14–16 Most people with anaemia do not bleed overtly, even while on OAC, but continued occult bleeding consumes iron and haemoglobin. However, the reserve of the red cells is unavailable upon an emergent bleeding event, and anaemia in anticoagulated patients exposes them not only to bleeding events, but also to stroke and mortality. 17 In our study, preceding anaemia was also associated with future stroke/TIA. Therefore, active resolution of diagnosing iron deficiency or a possible other cause of anaemia is critical in anticoagulated patients, especially the vulnerable elderly population with AF, who also carry renal and cardiac impairment and multiple drug interactions.
Risk scores provide a practical tool for assessing bleeding risk in patients with AF. Lifetime risk of bleeding in anticoagulated patients with AF is 50% among those with HAS-BLED or HEMORR 2 HAGES score of 2 or more. 18 In a previous study from our dataset, male gender was associated with higher risk of bleeding. 12 It is clearly prudent to assess the bleeding risk with laboratory studies, including blood cell count and creatinine for all anticoagulated patients. Indeed, the increased risk for both low Hgb and high creatinine values for bleeding, as observed in our study, is a relevant addition to the established risk scores, paralleling the previous reports on DOACs, 3 19 and was associated with intracranial haemorrhage in our previous study. 13 Detection and management of the commonly occurring iron deficiency due to dietary and absorption defects, and ruling out continued bleeding and malignancy particularly in the gastrointestinal tract, are clinically important and preventive approaches among these vulnerable elderly patients. Higher creatinine levels and lower creatinine clearance (below 30 mL/min), concordant with lowered erythropoietin levels, have also been shown to independently associate with anaemia in the elderly (over the age of 65 years). 20 In our study, higher creatinine values and lower Hgb values were observed among the patients who developed a major bleed when compared with controls. Yet, in bleeders, Hgb, when measured frequently, often during an emergency visit at on-call hours or weekends, suggests that management of anaemia had not succeeded.
For stroke/TIA risk, CHA 2 DS 2 -VASc score is recommended as the screening tool on whether to start anticoagulation on a given patient (European Society of Cardiology 2021). The score also predicts thrombotic events in patients while anticoagulated with warfarin. 21 However, many of these same variables are also common alerting for a bleeding risk. In our study, the incidence of stroke/TIA was as high as the incidence of major bleeding, and both enhanced up to 3-fold to 10-fold if Hgb values maintained at a level of 100 g/L or less.
Both bleeding and stroke/TIA risks were associated with low time in therapeutic range in this cohort. 10 In the present study, notably, INR had not been monitored in up to 44% of both stroke/TIA cases during the 3-month period leading to the event. This is critical as INR taken on the day of the stroke/TIA was also included in these analyses. This management defect differs markedly from, for example, the carefully controlled dabigatran RE-LY trial, where INR was measured on average 1.5 times monthly in Northern European countries, with a maximum of 4 weeks allowed between INR measurements in the study protocol. 22 Adequate INR and Hgb controls are crucial, as low Hgb also increased risk of stroke/TIA. Our cohort is consistent with previous reports, where 15–29% of patients who had a stroke had anaemia. 23 This may be related to iron deficiency and its potentially thrombogenic effects, which have been well described even in children. 24 Iron deficiency may align with concomitant thrombocytosis, and anaemia-induced hypoxia causes endothelial dysfunction and von Willebrand factor release, contributing to organ dysfunction and thrombosis. 25 26 Concordantly, pre-existing anaemia also worsens the outcome after ischaemic stroke. 27
A bleeding event during anticoagulation therapy is a major risk factor for further bleeds and morbidity. In a study including 17 000 consecutive patients with venous thromboembolism, 2% experienced a major bleed. 28 Of these patients, 38% died within 30 days, 7% had major rebleeding and 19% fatal bleeding. Anaemia had a clear association with major bleeding, and 40–65% of the bleeders had preceding anaemia, compared with 33% of non-bleeders. 28 Bleeding risk-prone persons include patients with AF, and age above 80 years, of whom up to 40% have prevalent anaemia and iron deficiency. 29 Among warfarin-anticoagulated patients, 3% experienced a clinically relevant bleed, but only 0.3% had a major bleed during 47-month observation. 9 In our study, when Hgb was measured and resided below normal range, the risk of bleeding tripled. The bleeding risk further increased 10-fold if Hgb was below 100 g/L. It is evident that with these Hgb levels, corrective action is needed to manage this significant risk factor. The risk was highest when testing was done during the weekend, reflecting the emergency room visits of these patients, coinciding with a vulnerable clinical period in the following weeks.
Both bleeding and stroke/TIA had strong association with sample collection frequency, reflecting the quality of follow-up procedures ( figures 1 and 2 ). Proactive laboratory testing is targeted to timely manage interventions and diminish risks of anticoagulation-associated complications. Testing itself is not sufficient; in a recent study, patients who had a laboratory test done had decreased 3-year survival, reflecting lack of appropriate interventions. 30 Selection bias may also play a role, since sicker patients frequently have tests ordered at unconventional hours or days in acutely ill patients. In our study, the weekends seem to reflect this scenario.
The limitations of our study include the retrospective observational nature and lack of data on the possible interventions after abnormal laboratory results. Due to the study setting, the laboratory testing appeared to increase risk of complications, as the patients whose laboratory tests were ordered had a timely clinical suspicion of deterioration, thus reflecting the complication risks. We have no data on the possible pauses in warfarin treatment before the complications occurred, if the patient, for example, has had low Hgb. We did not either exclude patients with haematological disease or active cancer from the analyses, diseases which contribute to anaemia. However, since anaemia predisposing to bleeding and stroke/TIA is evident in these patients as well, we found it important to include them also. Moreover, due to the retrospective nature of the study, we only matched the patients based on age and sex. Even though lower glomerular filtration rate and liver dysfunction are known risk factors for bleeding, we did not match based on these, as we would have had significantly less cases—only 30–50% had, for example, creatinine levels measured. Instead, we explored the above-mentioned risks by looking at the laboratory test results in cases versus controls.
In conclusion, the severity-dependent anaemia and low Hgb levels emerged as strong risk factors for both bleeding and stroke/TIA among patients with AF using warfarin. Prior to the complication, it is noteworthy that patients with bleed had no clear change in INR levels, but rather Hgb was significantly lowered. The higher number of laboratory measurements was associated with increased risk of bleeding as well as stroke/TIA events, especially when measured during on-call hours or weekends, suggesting acute exacerbation of the patient’s condition. Our study highlights the need for earlier, routine testing for blood cell counts to ensure timely diagnosis and treatment of anaemia. Our observations are generalisable to the management of warfarin therapy in AF and beyond.
Ethics statements
Patient consent for publication.
Not required.
Ethics approval
The study was a registry-based study with no patient-identifiable information. This study was performed in accordance with the European Network of Centers for Pharmacoepidemiology and Pharmacovigilance Code of Conduct and was registered to the European Network of Centers for Pharmacoepidemiology and Pharmacovigilance e-register (ER12-9441). The study received approval by the Ethics Review Board of the Hospital District of Helsinki and Uusimaa, and data permits were obtained from each of the registry holders, based on the study protocol and ethics approval.
- Hindricks G ,
- Potpara T ,
- Dagres N , et al
- Eikelboom JW ,
- Connolly SJ ,
- Brueckmann M , et al
- Westenbrink BD ,
- Connolly SJ , et al
- Verma AK , et al
- Johannsdottir GA ,
- Onundarson PT ,
- Gudmundsdottir BR , et al
- Gonzalez-Higueras E , et al
- Broecker-Preuss M , et al
- Federici L , et al
- Delate T , et al
- Niiranen J ,
- Korhonen P , et al
- Raatikainen MJP ,
- Penttilä T ,
- Niiranen J , et al
- Putaala J ,
- Mehtälä J , et al
- Weisel JW ,
- Litvinov RI
- Lassila R ,
- Granger CB , et al
- Rabinstein AA ,
- Christianson TJH , et al
- Sherwood MW ,
- Nessel CC ,
- Hellkamp AS , et al
- Woodman RC , et al
- Halperin JL , et al
- Van Spall HGC ,
- Wallentin L ,
- Yusuf S , et al
- Savopoulos C ,
- Kanellos I , et al
- Maguire JL ,
- deVeber G ,
- Lorenzana Carrillo MA , et al
- Barlas RS ,
- Loke YK , et al
- De Tuesta AD ,
- Marchena PJ , et al
- Robalo Nunes A ,
- Fonseca C ,
- Marques F , et al
- Kohane IS ,
Supplementary materials
Supplementary data.
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
- Data supplement 1
Contributors TAH, PR, ML, JH and RL contributed to the planning and design of the study. TAH and JH did the statistical analyses. TAH, PR, ML, JH and RL contributed to the reporting of the results and preparation of the manuscript. TAH, PR, ML, JH and RL have all read and approved the final manuscript. RL was responsible for the overall content as the guarantor.
Funding This study was supported by Bristol-Myers Squibb, Finland; Pfizer, Finland; the Finnish Foundation for Cardiovascular Research; and Helsinki University Hospital District Research Fund (grant number Y2016SK007).
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.
Read the full text or download the PDF:

An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
- Advanced Search
- Journal List
- v.114(2); 2023
- PMC10133777

Prevalence of and Risk Factors for Hepatitis C Virus Infection in World Trade Center Responders
Stephanie h. factor.
1 Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Michael A. Crane
2 Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Douglas T. Dieterich
Paolo boffetta.
3 Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy
Associated Data

Prevalence of hepatitis C virus antibodies by year of birth in 3,871 members of the World Trade Center General Responder Cohort (recruited and tested from December 15, 2016 - July 12, 2018) and the US population based on National Health and Nutrition Examination Survey (NHANES) data from 2003 through 2012.
Background:
The risk of hepatitis C virus (HCV) infection among emergency responders exposed to human remains, blood/bodily fluids, and/or sewage is unknown.
A cross-sectional study of 3,871 World Trade Center General Responder Cohort (WTCGRC) members followed at the Icahn School of Medicine at Mount Sinai, born from 1945-1965, and recruited from 2016-2018 were tested for HCV infection, and prevalence was compared to National Health and Nutrition Examination Survey data from 2003 to 2012. A nested case-control study compared 61 HCV antibody positive cases to 2571 controls. Multivariable logistic regression models adjusting for time of birth, traditional HCV risk factors, and type of work at the World Trade Center (WTC) site, determined if contact with human remains, blood/bodily fluids, and/or sewage at the WTC site was associated with HCV infection.
The age-standardized point prevalence of HCV infection among WTCGRC members was 2.98% [95% CI (2.39, 3.56)] and in the US population was 3.33% [95% CI (2.54, 4.11)] [% difference=0.35%, 95% CI (- 0.31%, 1.01%), P=0.47]. In separate multivariable models, adjusting for possible confounders, contact with human remains was not associated with HCV infection [OR=1.10, 95% CI (0.63, 1.91), P=0.74)], contact with blood and/or bodily fluids was not associated with HCV infection [OR=1.45, 95% CI (0.82, 2.56), P=0.20], and contact with sewage was associated with HCV infection [OR=1.72, 95% CI (1.00, 2.98), P=0.05].
Conclusion:
Contact with sewage may increase the risk of HCV infection.
Supplementary Materials
1. introduction.
Chronic hepatitis C virus (HCV) infection increases the risk of liver cirrhosis and hepatocellular carcinoma [ 1 , 2 ]. While multiple risk factors for HCV infection have been recognized, at least 20% of persons with HCV infection do not have a known risk factor [ 3 , 4 ] which suggests there are unrecognized risk factors. The Occupational Safety and Health Administration of the US Department Labor standards for blood-borne pathogens (29 Code of Federal Regulations 1910.1030) [ 5 ] and personal protective equipment (29 Code of Federal Regulations 1910 Subpart 1) [ 6 ] require employers to protect workers from occupational exposure to infectious agents. Thus, persons employed in occupations with expected exposure to human remains, blood or bodily fluids, and stool with visible blood are educated about the risks of HCV from these agents, and provided with and expected to wear protective gear. During the course of an emergency response, workers and volunteers may be unexpectedly exposed to these agents. The prevalence of and risk factors for HCV infection has not been previously assessed in persons exposed to these agents during an emergency response.
The World Trade Center General Responders Cohort (WTCGRC) comprises workers and volunteers who participated in the emergency response activities at the World Trade Center (WTC) site and are followed at the Icahn School of Medicine at Mount Sinai and other medical institutions to monitor their health. At their enrollment interview, WTCGRC members were specifically asked about contact with “human remains”, “blood or bodily fluids”, and “sewage” during their WTC activities. This cohort offers an opportunity to study the risk of HCV infection in emergency response workers exposed to these agents.
A cross-sectional study determined the prevalence of HCV in WTC responders and compared it to the prevalence of HCV in the US population. A nested case-control study then determined if contact with human remains, blood or bodily fluids, and sewage (wastewater and excrement conveyed in sewers) during emergency response work is associated with an increased risk of HCV infection.
The WTCGRC has been described elsewhere [ 7 ]. The cohort consists of WTC responders who worked or volunteered in lower Manhattan, or the Staten Island landfill or barge-loading piers for 4 hours or more from September 11 to 14, 2001; 24 hours or more during September 2001; or 80 hours or more from September 2001 to December 2001. Recruitment began on July 16, 2002, and is ongoing.
Recruitment into WTCGRC was done by extensive outreach [ 7 ]. Participation is voluntary and includes a comprehensive baseline examination which includes collection of questionnaire data, a history and physical examination, and collection of bloodwork. Monitoring assessments are given every 12 to 18 months and similarly include collection of questionnaire data, a history and physical examination, and collection of bloodwork. All assessments occur at one of several World Trade Center Clinical Centers of Excellence [ 8 ]. As of March 31, 2014, 33,863 persons were eligible for inclusion in the cohort and had completed a Visit 1 [ 9 ]. WTCGRC participants may participate in health monitoring yet opt out of research. Thus, there are members of the WTCGRC who routinely attend their monitoring visits, but their data is unavailable to researchers.
According to CDC, persons born from 1945-1965 are at higher risk for HCV infection than other birth cohorts [ 10 ]. For this HCV Study, responders who presented to the Icahn School of Medicine at Mount Sinai site for a WTCGRC visit (Visit 1 or higher), and were born from 1945 through 1965, were given information about HCV infection and offered study participation that included free HCV testing.
Interested persons completed the informed consent process, signed the informed consent document, and completed the HCV Risk Factor Questionnaire (see below). Recruitment began on December 15, 2016, with materials in English. Spanish materials were available on November 3, 2017, and Polish materials were available on May 4, 2018. The last day of recruitment was July 12, 2018. Persons who declined to participate in the WTCGRC Research Study were included in this study at their discretion, and with the understanding that data beyond the HCV Risk Factor Questionnaire would not be accessed. Enrollment continued until the study sample size was reached. (Sample size is defined below.)
The HCV Risk Factor Questionnaire assessed demographic data (year of birth, country of origin, current type of medical insurance); previous testing, results, and treatment for HCV infection; and traditional risk factors for HCV infection (blood transfusion or organ transplant before July 1992, receipt of clotting factor concentrate produced before 1987, receipt of long-term hemodialysis, receipt of blood from an HCV-infected donor, birth to an HCV-infected mother, history of human immunodeficiency virus (HIV) or acquired immunodeficiency syndrome (AIDS), history of injecting drug use (IDU), and needle stick, sharps, or mucosal exposure to HCV-infected blood as a health care, emergency medical, or public safety worker) [ 10 ]. As a point of clarification, the HCV Risk Factor Questionnaire was administered on the date of HCV antibody testing and asked participants if they ever had any of the traditional risk factors for HCV infection (Supplementary Material, Table S1 ).
Table S1.
Data sources for nested case-control study.
After completing the HCV Risk Factor Questionnaire, study participants completed their usual visit. In addition to blood drawn for routine testing, two 2 mL tubes of blood were drawn. Blood from the first tube was tested for HCV antibody. For those with a positive HCV antibody test, blood from the second tube was tested for HCV RNA using polymerase chain reaction (PCR). Otherwise, blood from the second tube was discarded. Persons with a positive HCV antibody test were defined as having current or prior HCV infection. Current HCV infection was defined by detectable HCV ribonucleic acid (RNA) and prior HCV infection was defined by undetectable HCV RNA. All participants were notified of their results by letter, telephone call, or email based on their preference. Persons with current HCV infection received telephone calls with offers of referral to liver specialists at either the co-located Mount Sinai Liver Medicine Practice or a federally qualified health center in their preferred location. Persons who attended at least one outpatient visit with a liver specialist were defined as being linked to care.
2.1. Statistical Analysis
2.1.1. cross-sectional study.
For the cross-sectional study, the age-standardized prevalence of HCV infection in this study sample was determined using the US population in the 2010 census as the standard population, and was compared to the age-standardized prevalence of anti-HCV antibodies in the general US population based on National Health and Nutrition Examination Survey (NHANES) data from 2003 through 2012 using the appropriate weights [ 11 ]. The sample size was calculated to provide adequate statistical power for this analysis. With 3900 persons and an expected prevalence of HCV in the US population of 3.2%, this analysis would have 80% power to detect as statistically significant (at α=0.05) a 0.8% difference between the expected prevalence of HCV infection in the US population (3.2%) and the prevalence of HCV among members of the WTCGRC (either≥4.0% or≤2.4%).
2.1.2. Nested Case-control Study
2.1.2.1. study population.
WTCGRC members who consented to research were eligible for inclusion in the nested case-control study. A case was defined as a person with current or prior HCV infection and was measured by a positive HCV antibody test. Controls were persons without HCV infection. Persons with an indeterminate HCV antibody test were removed from the analysis. (All data used in the nested case-control study and the data source are provided in Supplementary Material, Table S1 .)
2.1.2.2. Three Main Exposures
Data for the three main exposures were obtained from the WTCGRC baseline questionnaire. Within the baseline questionnaire, study personnel determined if WTCGRC members had participated in WTC activities in three intervals: September through October 2001, November through December 2001, and January through June 2002. For those intervals in which the member participated, study personnel asked if they had “contact” with “human remains”, “blood or bodily fluids”, and/or “sewage”. Participants were defined as having exposure if they reported contact during any of these three periods.
2.1.2.3. Possible Confounders
Possible confounders for this study included traditional HCV risk factors, demographic characteristics, and activities at the WTC site. Traditional HCV risk factors were obtained from the HCV Risk Factor Questionnaire. Sex, race, and ethnicity were obtained from the WTCGRC baseline questionnaire. Year of birth, country of birth, and insurance status were obtained from the Hepatitis C Risk Factor Questionnaire. While household contacts with HCV infection [ 12 ], sexual risk behaviors [ 13 ], and level of education [ 14 ] have been identified as risk factors for HCV infection, these characteristics were not captured in our questionnaire and were therefore not included in this analysis.
Data on activities at the WTC site was obtained from the WTCGRC baseline questionnaire and included year of enrollment in the WTCGRC, type of work done as a volunteer or worker, use of gloves, use of personal protective equipment, and seeking care for injuries which pierced the skin. For type of work done as volunteer or worker, participants were handed a list of types of work organized by Department of Labor Codes and WTC Activity codes and asked “What activity code best describes what you were doing during this period?” for each of four periods (September 2001, October 2001, November to December 2001, and January to June 2002). Study personnel entered up to three activity codes for each period. A time-weighted measure of protective glove use and protective clothing use was based on participants’ reported use of protective gloves and/or protective clothing (rarely/never, sometimes, most of the time, all the time) during each of each three time periods: September-October, November-December, and January-June.
WTCGRC participants reported up to four injuries or illnesses acquired during the WTC activities for which they sought medical care while working at the WTC site. Members were identified as having skin/mucous membrane injury if they indicated seeking medical care for an abrasion, amputation, blister, burn, contusion (bruise), crush, cut/puncture, eye injury, foreign body, and skin irritation/rash.
2.1.2.4. Analytic Approach
It was decided a priori that the final analysis would adjust for traditional HCV risk factors and birth year. Thus, the nested case-control study analyses were limited to participants who had complete exposure data for at least one of the three main exposures, complete data on the traditional HCV risk factors, and reported a year of birth.
Demographic characteristics and activities at the WTC site were evaluated in univariable analysis. Characteristics with a p-value of <.20 were further evaluated in forward, backward, and stepwise manual and machine-assisted multivariable logistic regression analysis with HCV infection as the outcome to find those characteristics independently associated with HCV infection (P ≤.05). Characteristics independently associated with HCV infection were identified as possible confounders and were adjusted for in further analyses.
The final analysis included three multivariable logistic models, one for each main exposure. These models included HCV infection as the outcome and were adjusted for age, traditional HCV risk factors, and the identified possible confounders.
This study was approved by the Icahn School of Medicine Institutional Review Board (Study 16-01343) which conforms to the ethical guidelines of the 1975 Declaration of Helsinki. All participants gave informed consent before taking part. All statistical analyses were conducted in SAS (Version 9.4). A p-value ≤ 0.05 was considered statistically significant. All study data are protected with an Assurance of Confidentiality.
3.1. Cross-sectional Study
A total of 3,935 WTCGRC members enrolled in this current study. Of those, 64 persons were excluded (18 born outside the birth cohort, 21 missing laboratory data, 15 with “indeterminate” HCV antibody results with negative HCV RNA PCR testing, and 10 withdrawals). Of the remaining 3,871, 109 (2.8%) had HCV infection.
The prevalence of HCV infection in study participants was similar to the prevalence in the US population (Supplementary Material, Figure S1). The age-standardized point prevalence of HCV infection in the WTC Cohort was 2.98% [95% CI (2.39, 3.56)] and in the US population was 3.33% [95% CI (2.54, 4.11)] [% difference=0.35%, 95% CI (-0.31%, 1.01%), P=0.47].
Of the 109 persons with antibodies to HCV, 14 (13%) had current infection based on the presence of HCV RNA in their blood, 39 (36%) reported previous treatment for HCV infection, 39 (36%) reported no previous treatment for HCV infection, and 17 (16%) reported they did not know if they had been treated or did not answer the question. Because HCV treatment status is unknown in these last 17 persons, the range of spontaneous clearance of HCV infection in this group is between 36% (39/109) and 51% (56/109).
Of the 14 persons diagnosed with current HCV infection, two were in care for HCV infection at the time of screening, and 12 were referred for specialized care by study personnel. Of the 12, 11 (92%) requested a referral to the Mount Sinai Liver Medicine Practice, and one requested a referral to a federally qualified health center based on insurance status. Of the 12 referred for care, 10 (83%) were linked to care (attended an in-person appointment with a liver specialist). Of the 10 linked to care, nine persons were prescribed and received anti-HCV therapy and achieved sustained virologic response.
3.2. Nested Case-control Study
Of the 3,871 participants in the cross-sectional study, 507 persons did not have available exposure data (89 did not consent to participate in the WTCGRC Research Study, eight had data that were not yet integrated into the WTCGRC Research Study database, and 410 did not have complete data on at least one of the three main exposures) leaving 3,364 persons. Of the 3,364 persons, 732 did not provide complete information on traditional HCV risk factors leaving 2,632 persons for the nested case-control study ( Table 1 ).
Table 1.
Participants with and without hepatitis C virus (HCV) antibodies in the nested case-controls study a .
HCV: hepatitis C virus, WTC: World Trade Center, HIV: human immunodeficiency virus, AIDS: acquired immunodeficiency syndrome
a Unless otherwise noted, data is presented for all 61 cases (persons who had HCV antibodies) and 2571 controls (persons without HCV antibodies) .
b Analyzed using Chi-square analysis.
c Analyzed using Fisher’s Exact analysis.
d Data on all categories of work at the WTC site are included in Supplementary Material, Table S2 .
e Analyzed using Chi-square test for linear trend.
Of the 2,632 persons included in the nested case-control study, 61 (2.3%) had HCV antibodies indicating prior or current HCV infection and were identified as cases.
Participants in the nested case-control study were 84% men, 56% white, 18% black, 16% multi-racial, 9% not-reported, 1% Asian, 0.3% American Indian and Alaskan Native, 0.04% Pacific Islander, and 23% Latino. There was a higher prevalence of black WTCGRC members in this HCV Study compared to the full WTCGRC Research Study [ 7 ]. The median age on enrollment was 58 years [interquartile range (IQR) 54, 62].
Cases and controls were of similar sex, race, ethnicity, insurance status, and USA birth but differed in time period of birth (P=.001) ( Table 1 ). Cases and controls did not differ in their contact with human remains (P=.95) and blood and/or bodily fluid (P=.25), but did differ in their contact with sewage (P=.02).
After evaluating demographic characteristics and activities at the WTC site in univariable and multivariable analysis adjusting for time period of birth and traditional HCV risk factors, working or volunteering for perimeter security at the WTC site, and working or volunteering as a truck driver at the WTC site were both associated with HCV infection. (Characteristics of the persons working or volunteering for perimeter security and characteristics of persons working or volunteering as a truck driver at the WTC are presented in Supplementary Materials, Table S3 , and Table S4 , respectively.) Thus, the final models adjusted for time period of birth, traditional HCV risk factors, working or volunteering for perimeter security at the WTC site, and working or volunteering as a truck driver at the WTC site.
Table S3.
Characteristics of World Trade Center (WTC) perimeter security workers from a subset of the WTC General Responder Cohort, recruited and tested for hepatitis C virus antibodies from December 15, 2016 – July 12, 2018 (N=520).
ND – Not defined.
*Analyzed using Fisher’s Exact test .
Table S4.
Characteristics of World Trade Center (WTC) truck drivers from a subset of the WTC General Responder Cohort, recruited and tested for HCV from December 15, 2016 – July 12, 2018 (N=24).
In multivariable logistic regression models, adjusting for possible confounders (time period of birth, traditional HCV risk factors, perimeter security work at the WTC, driving a truck during WTC), contact with human remains was not associated with HCV infection [ Table 2 , Model 1: OR=1.10, 95% CI (0.63, 1.91), P=0.74) and contact with blood and/or bodily fluids was not associated with HCV infection [ Table 2 , Model 2: OR=1.45, 95% CI (0.82, 2.56), P=0.20]. Contact with sewage was associated with HCV infection [ Table 2 , Model 3: OR=1.72, 95% CI (1.00, 2.98), P=0.05].
Table 2.
Multivariable logistic regression models for risk of HCV infection (N=2632) a .
HCV: hepatitis C virus, WTC: World Trade Center
a All models adjusted for time period of birth (1945- 1949, 1950-1954, 1955-1959, 1960-1965) and traditional HCV risk factors (blood transfusion or organ transplant before July 1992, receipt of clotting factor concentrate produced before 1987, receipt of long-term hemodialysis, receipt of blood from HCV-infected donor, birth to HCV-infected mother, history of human immunodeficiency virus infection or acquired immunodeficiency syndrome, history of injecting drug use, and needle stick, sharps, or mucosal exposure to HCV-infected blood as a health care, emergency medical, or public safety worker).
4. Discussion
This study suggests the prevalence of HCV infection in WTCGRC members born from 1945 through 1965 is comparable to that of the US population, the spontaneous HCV infection clearance rate in this population is similar to the general population, and an ongoing cohort study is an effective site for screening for HCV and linkage to care. Contact with sewage at the WTC site may be associated with an increased risk of HCV infection.
The similar prevalence of HCV infection in WTCGRC members born from 1945 through 1965 and in the general US population initially appears reassuring. However, the WTCGRC members, by definition, are persons who worked at the WTC site. Comparisons of groups of workers to the general population may suffer from the “healthy worker effect” bias. In their often-cited article, Li and Sung write, “the ‘healthy worker effect’ reflects that an individual must be relatively healthy to be employable in a workforce, and both morbidity and mortality rates within the workforce are usually lower than in the general population” [ 15 ]. Thus, the similar prevalence of HCV infection in WTCGRC members and the general US population may be reassuring or may reflect an increased risk of HCV infection in WTCGRC workers, which is obfuscated by the healthy worker effect. While Li and Sung would recommend comparing HCV infections among members of the WTCGRC cohort to an external work comparison group [ 15 ], such a comparison is beyond the scope of this current study.
The rate of spontaneous clearance of HCV infection in this study was between 36% and 51%, consistent with the current literature. While the oft-cited systematic review of spontaneous clearance by Micallef et al. found spontaneous clearance of HCV infection to be 26% (95% CI: 22%, 29%) [ 16 ], a more recent systemic review and meta-analysis of studies with longer follow-up done by Aisyah et al. found spontaneous clearance at 24 months to be 37.1% (95% CI: 23.7%, 52.8%) [ 17 ].
This study was effective at using an existing cohort study to screen persons for HCV infection and to link infected persons with care. Previous research has suggested that sites which both screen for HCV and provide treatment for HCV are more successful at linking HCV-infected persons to care. Galbraith et al reported 21% (21/100) of patients with HCV infection were linked to care after screening done in a hospital emergency department, a site in which screening and treatment were not co-located [ 18 ]. In contrast, Jonas et al reported 81% (214/277) linkage to care in a study of 11,200 persons born from 1945 through 1965 who were screened during routine outpatient visits at Kaiser Permanente [ 19 ], a site in which screening and treatment were co-located. In their study in which 4,514 persons were screened for HCV infection at 5 different federally qualified health centers, Coyle et al reported that the most successful linkage-to-care rate was seen at the one health center in which HCV testing, care, and treatment were provided in the same setting [67.6% (167 out of 247)] [ 20 ]. The high percentage of persons linked to care in this study is likely due to both the continuing relationship of the participants with the WTCGRC Research Study and the co-location of the Mount Sinai Liver Medicine Practice.
This is the first well-designed epidemiologic study to suggest an association between contact with sewage and HCV infection. A study in 1999 described two sewer workers with no recognized HCV risk factors and routine skin contact with sewer water, who were diagnosed with HCV infection [ 21 ]. A cross-sectional study of 19,503 persons in Brazil, ages 10 to 69, from all 26 State capitals and the Federal District, found that families without public sewage disposal had an increased risk of HCV infection [OR=2.53, 95% CI (1.38, 4.65)] after adjusting for age, IDU, use of sniffed drugs, injection with a glass syringe, and hospitalization [ 22 ]. The authors interpreted this finding as an association between HCV infection and low socioeconomic status, and did not discuss the possibility of contact with sewage as a risk factor for HCV infection.
There are few HCV seroprevalence studies among persons with occupational contact with sewage. A study of 107 sanitary workers in Pakistan, ages 20 to 48, with at least five years of exposure to opening drains, large sewage ducts, and sewage piping, found 39 (36%) had HCV antibodies [ 23 ]. This was higher than the prevalence of HCV infection in the general population of Pakistan (2.6%-5.3%). A study of 410 sanitary workers in Alexandria, Egypt, found 9.8% had HCV antibodies [ 24 ]. This was similar to the estimated prevalence of HCV infection in the region’s general population. However, generalizing findings from this Egyptian study is difficult. Egypt has a history of mass spread of HCV infection during a campaign to fight schistosomiasis, which was followed by mass treatment for infected persons and mass education campaigns to prevent further spread [ 25 ].
The association between contact with sewage and HCV infection may not be real. There are well-established risk factors for which this study was unable to adjust, including household contacts with HCV infection [ 12 ], sexual risk behaviors [ 13 ], and level of education [ 14 ], because our questionnaire did not capture this information. However, it is not clear how exposure to sewage at the WTC site would correlate with these unmeasured risk factors. Additionally, some may question the validity of an association with a P-value of 0.05 found in this analysis. However, a P-value ≤ 0.05 is the generally accepted level by which statistical significance is measured, the one used for this study design, and, therefore, the one we must follow in interpreting our results.
In this study, work as a truck driver was associated with an increased risk of HCV infection and work at perimeter security was associated with a decreased risk of HCV infection. The association between work as a truck driver and increased risk of HCV infection is consistent with previous research [ 26 ]; this consistency provides validation for this study. The decreased risk of HCV infection among persons working at perimeter security may suggest distance from the site was protective against HCV infection. This may lend further support to an increased risk of HCV infection when exposed to sewage at the site.
While the association between HCV infection and contact with sewage has not been widely reported, this association is biologically plausible. A cross-sectional study of 98 stool samples without occult blood from 98 persons with chronic HCV infection, found 68 (69%) had detectable HCV RNA [ 27 ]. Similarly, a cross-sectional study of 12 stool samples from 12 persons with chronic HCV infection, found 10 (83%) had detectable HCV RNA [ 28 ]. Ciesek et al found HCV RNA infectivity in a liquid environment lasted up to 5 months at lower temperatures (4 °C) and up to 21 days at room temperature (21 °C) [ 29 ]. Paintsil et al found that an HCV clone dried on fomite surfaces maintained its infectivity for up to 6 weeks at 4 °C and 22 °C [ 30 ]. Thus, WTCGRC members who reported sewage contact may have come into contact with infective HCV RNA and acquired HCV infection through breaks in their skin or mucous membranes.
This study did not find associations between contact with human remains and/or blood and bodily fluid, and HCV infection. The OSHA requirements for the use of protective gear when contacting these substances [ 5 , 6 ] has likely increased worker awareness of their danger and may have led to self-protective behaviors, like avoidance of these substances when possible and washing hands immediately after contact.
There are several limitations to this study. Most importantly, the WTCGRC baseline questionnaire did not clearly define what constitutes “contact with sewage” allowing responders individual interpretations. However, given the lack of knowledge of most responders about their HCV status at the time they completed the baseline questionnaire, such misclassification would have been non-differential, leading to an underestimate of the association between contact with sewage and HCV infection. Because some participants enrolled in the WTCGRC Research Study many years after the WTC events, there may be misclassification of exposure due to difficulties with recall. Here to, it is likely that the misclassification would have been non-differential, leading to an underestimate of the association between contact with sewage and HCV infection. This study is unable to determine when HCV infection occurred. IDU is closely associated with HCV infection and participants may not report past IDU. However, it is unlikely that IDU use is correlated with contact with sewage at the WTC site, and, therefore, missing data on IDU is unlikely to confound the relationship between sewage and HCV infection. There are some unmeasured confounders in this study including occupation before and after WTC.
5. Conclusions
This study suggests that existing cohort studies may serve as venues for HCV screening and linkage to care. More research is needed on the relationship between contact with sewage and HCV infection. The combination of the current nested case-control study and the cross-sectional study of HCV in Brazil [ 22 ] suggests that contact with sewage may be a risk factor for HCV in persons with occupational contact with sewage and persons with non-occupational contact with sewage. The CDC now recommends that all adults in the US, ages 18 and over, get tested for HCV at least once in their lifetime [ 31 ]. Additional research is needed to determine if the association we detected is real and whether additional HCV screening is appropriate for persons with ongoing contact with sewage.
Acknowledgments:
We thank the staff of the World Trade Center Clinical Centers of Excellence; the labor, community, and volunteer organization stakeholders; and the World Trade Center rescue and recovery workers, who responded generously with their service to the World Trade Center attacks and to whom the World Trade Center programs are dedicated.
Supplementary Materials:
The following are available in the online version: Table S1 : Data sources for nested case-control study, Figure S1: Prevalence of hepatitis C virus antibodies by year of birth in members of the World Trade Center General Responder Cohort and the US population, Table S2 : Association between participant activity at the World Trade Center site and hepatitis C virus antibody status, Table S3 . Characteristics of World Trade Center (WTC) perimeter security workers from a subset of the WTC General Responder Cohort, recruited and tested for hepatitis C virus antibodies from December 15, 2016 – July 12, 2018 (N=520), Table S4 . Characteristics of World Trade Center (WTC) truck drivers from a subset of the WTC General Responder Cohort, recruited and tested for HCV from December 15, 2016 – July 12, 2018 (N=24).
Table S2.
Association between participant activity at the World Trade Center site and hepatitis C virus (HCV) antibody status among of a subset of the World Trade Center General Responder Cohort, recruited and tested for HCV from December 15, 2016 - July 12, 2018 (N=2,632).
HCV: hepatitis C virus
† Of the 2571 persons without HCV antibodies (controls), 21 (1%) did not provide any data about the type of activities they performed at the WTC site. The percentages presented here are percentages among the controls for whom there is data on activity at the WTC site.
‡Analyzed using Fishers Exact test determined the association between participation in the listed activity vs. no participation in the listed activity and HCV infection.
§Analyzed using Chi-Square Test determined the association between participation in the listed activity vs. no participation in the listed activity and HCV infection.
This work was supported by the US Centers for Disease Control and Prevention (National Institute for Occupational Safety and Health Grant U01OH011307). The funding body had no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Institutional Review Board Statement:
This study was approved by the Icahn School of Medicine Institutional Review Board (Study 16-01343) that conforms to the ethical guidelines of the 1975 Declaration of Helsinki.
Informed Consent Statement:
Written informed consent was obtained from all individual participants included in the study.
Declaration of Interest:
Douglas T. Dieterich is a consultant and speaker for Gilead and AbbVie. The other authors declare no conflicts of interest.
- Ghany MG, Strader DB, Thomas DL, Seeff LB. American Association for the Study of Liver D. Diagnosis, management, and treatment of hepatitis C: an update. Hepatology. Apr 2009; 49 (4):1335–74. Doi: 10.1002/hep.22759. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Peveling-Oberhag J, Arcaini L, Hansmann ML, Zeuzem S. Hepatitis C-associated B-cell non-Hodgkin lymphomas. Epidemiology, molecular signature and clinical management. J Hepatol. Jul 2013; 59 (1):169–77. Doi: 10.1016/j.jhep.2013.03.018. [ PubMed ] [ Google Scholar ]
- Smith BD, Beckett GA, Yartel A, Holtzman D, Patel N, Ward JW. Previous exposure to HCV among persons born during 1945-1965: prevalence and predictors, United States, 1999-2008. Am J Public Health. Mar 2014; 104 (3):474–81. Doi: 10.2105/AJPH.2013.301549. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Boscarino JA, Sitarik A, Gordon SC, et al. Risk factors for hepatitis C infection among Vietnam era veterans versus nonveterans: results from the Chronic Hepatitis Cohort Study (CHeCS) J Community Health. Oct 2014; 39 (5):914–21. Doi: 10.1007/s10900-014-9863-5. [ PubMed ] [ Google Scholar ]
- Occupational Safety & Health Administration. Bloodborne pathogens (29 CFR 1910.1030) 2012 Accessed November 26, 2019. https://www.osha.gov/laws-regs/regulations/standardnumber/1910/1910.1030 . [ Google Scholar ]
- Occupational Safety & Health Administration. Personal Protective Equipment (29 CFR 1910 Subpart 1) 2019 Accessed November 26, 2019. https://www.osha.gov/laws-regs/regulations/standardnumber/1910/1910SubpartI . [ Google Scholar ]
- Wisnivesky JP, Teitelbaum SL, Todd AC, et al. Persistence of multiple illnesses in World Trade Center rescue and recovery workers: a cohort study. Lancet. Sep 3 2011; 378 (9794):888–97. Doi: 10.1016/S0140-6736(11)61180-X. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Centers for Disease Control and Prevention. National Institute for Occupational Safety and Health. World Trade Center Health Program. Centers for Disease Control and Prevention. 2019 Accessed December 16, 2019. https://www.cdc.gov/wtc/clinics.html . [ Google Scholar ]
- Dasaro CR, Holden WL, Berman KD, et al. Cohort Profile: World Trade Center Health Program General Responder Cohort. Int J Epidemiol. Apr 1 2017; 46 (2):e9. Doi: 10.1093/ije/dyv099. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Smith BD, Morgan RL, Beckett GA, et al. Recommendations for the identification of chronic hepatitis C virus infection among persons born during 1945-1965. MMWR Recommendations and reports: Morbidity and mortality weekly report Recommendations and reports / Centers for Disease Control. Aug 17 2012; 61 (RR-4):1–32. [ PubMed ] [ Google Scholar ]
- Centers for Disease Control and Prevention (CDC) National Center for Health Statitistics (NCHS). National Health and Nutrition Examination Survey Data. >Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention; https://wwwn.cdc.gov/nchs/nhanes/Default.aspx . [ Google Scholar ]
- Stroffolini T, Lorenzoni U, Menniti-Ippolito F, Infantolino D, Chiaramonte M. Hepatitis C virus infection in spouses: sexual transmission or common exposure to the same risk factors? Am J Gastroenterol. Nov 2001; 96 (11):3138–41. Doi: 10.1111/j.1572-0241.2001.05267.x. [ PubMed ] [ Google Scholar ]
- Terrault NA. Sexual activity as a risk factor for hepatitis C. Hepatology. Nov 2002; 36 (5 Suppl 1):S99–105. Doi: 10.1053/jhep.2002.36797. [ PubMed ] [ Google Scholar ]
- Denniston MM, Jiles RB, Drobeniuc J, et al. Chronic hepatitis C virus infection in the United States, National Health and Nutrition Examination Survey 2003 to 2010. Ann Intern Med. Mar 4 2014; 160 (5):293–300. Doi: 10.7326/M13-1133. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Li CY, Sung FC. A review of the healthy worker effect in occupational epidemiology. Occup Med (Lond) May 1999; 49 (4):225–9. Doi: 10.1093/occmed/49.4.225. [ PubMed ] [ Google Scholar ]
- Micallef JM, Kaldor JM, Dore GJ. Spontaneous viral clearance following acute hepatitis C infection: a systematic review of longitudinal studies. J Viral Hepat. Jan 2006; 13 (1):34–41. Doi: 10.1111/j.1365-2893.2005.00651.x. [ PubMed ] [ Google Scholar ]
- Aisyah DN, Shallcross L, Hully AJ, O'Brien A, Hayward A. Assessing hepatitis C spontaneous clearance and understanding associated factors-A systematic review and meta-analysis. J Viral Hepat. Jun 2018; 25 (6):680–698. Doi: 10.1111/jvh.12866. [ PubMed ] [ Google Scholar ]
- Galbraith JW, Franco RA, Donnelly JP, et al. Unrecognized chronic hepatitis C virus infection among baby boomers in the emergency department. Hepatology. Mar 2015; 61 (3):776–82. Doi: 10.1002/hep.27410. [ PubMed ] [ Google Scholar ]
- Jonas MC, Rodriguez CV, Redd J, Sloane DA, Winston BJ, Loftus BC. Streamlining Screening to Treatment: The Hepatitis C Cascade of Care at Kaiser Permanente Mid-Atlantic States. Clin Infect Dis. May 15 2016; 62 (10):1290–1296. Doi: 10.1093/cid/ciw086. [ PubMed ] [ Google Scholar ]
- Coyle C, Viner K, Hughes E, et al. Identification and Linkage to Care of HCV-Infected Persons in Five Health Centers - Philadelphia, Pennsylvania, 2012-2014. MMWR Morb Mortal Wkly Rep. May 8 2015; 64 (17):459–63. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Brautbar N, Navizadeh N. Sewer workers: occupational risk for hepatitis C–report of two cases and review of literature. Arch Environ Health. Sep-Oct 1999; 54 (5):328–30. Doi: 10.1080/00039899909602495. [ PubMed ] [ Google Scholar ]
- Pereira LM, Martelli CM, Moreira RC, et al. Prevalence and risk factors of Hepatitis C virus infection in Brazil, 2005 through 2009: a cross-sectional study. BMC Infect Dis. Feb 1 2013; 13 :60. Doi: 10.1186/1471-2334-13-60. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Sheikh SHA, William GP, Ansari F, Waqqar S, Bano S. Hepatits Prevalence of Epidemic Magnitude among Sanitary Workers in Pakistan. IOSR J Dent Med Sci. June 2014 2014; 13 (6):58–61. [ Google Scholar ]
- Hassanein FI, Masoud IM, Shehata AI. Infection hazard of exposure to intestinal parasites, H. pylori and hepatitis viruses among municipal sewage workers: a neglected high risk population. Parasitol United J. Aug 2019; 12 (2):130–138. Doi: 10.21608/puj.2019.13679.1047. [ Google Scholar ]
- Kandeel A, Genedy M, El-Refai S, Funk AL, Fontanet A, Talaat M. The prevalence of hepatitis C virus infection in Egypt 2015: implications for future policy on prevention and treatment. Liver Int. Jan 2017; 37 (1):45–53. Doi: 10.1111/liv.13186. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Valway S, Jenison S, Keller N, Vega-Hernandez J, Hubbard McCree D. Risk assessment and screening for sexually transmitted infections, HIV, and hepatitis virus among long-distance truck drivers in New Mexico, 2004-2006. Am j Public Health. Nov 2009; 99 (11):2063–8. Doi: 10.2105/AJPH.2008.145383. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Heidrich B, Steinmann E, Plumeier I, et al. Frequent detection of HCV RNA and HCVcoreAg in stool of patients with chronic hepatitis C. J Clin Virol. Jul 2016; 80 :1–7. Doi: 10.1016/j.jcv.2016.04.006. [ PubMed ] [ Google Scholar ]
- Monrroy H, Angulo J, Pino K, et al. Detection of high biliary and fecal viral loads in patients with chronic hepatitis C virus infection. Gastroenterol Hepatol. May 2017; 40 (5):339–347. Doi: 10.1016/j.gastrohep.2017.01.004. [ PubMed ] [ Google Scholar ]
- Ciesek S, Friesland M, Steinmann J, et al. How stable is the hepatitis C virus (HCV)? Environmental stability of HCV and its susceptibility to chemical biocides. J Infect Dis. Jun 15 2010; 201 (12):1859–66. Doi:10.1086/652803. [ PubMed ] [ Google Scholar ]
- Paintsil E, Binka M, Patel A, Lindenbach BD, Heimer R. Hepatitis C virus maintains infectivity for weeks after drying on inanimate surfaces at room temperature: implications for risks of transmission. J Infect Dis. Apr 15 2014; 209 (8):1205–11. Doi: 10.1093/infdis/jit648. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Schillie S, Wester C, Osborne M, Wesolowski L, Ryerson AB. CDC Recommendations for Hepatitis C Screening Among Adults - United States, 2020. MMWR Recommendations and reports: Morbidity and mortality weekly report Recommendations and reports / Centers for Disease Control. Apr 10 2020; 69 (2):1–17. Doi: 10.15585/mmwr.rr6902a1. [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Subscriber Services
- For Authors
- Publications
- Archaeology
- Art & Architecture
- Bilingual dictionaries
- Classical studies
- Encyclopedias
- English Dictionaries and Thesauri
- Language reference
- Linguistics
- Media studies
- Medicine and health
- Names studies
- Performing arts
- Science and technology
- Social sciences
- Society and culture
- Overview Pages
- Subject Reference
- English Dictionaries
- Bilingual Dictionaries
Recently viewed (0)
- Save Search
Related Content
Related overviews.
case control study
cohort study
More Like This
Show all results sharing these subjects:
- Public Health and Epidemiology
nested case control study
Quick reference.
A case control study that utilizes cases and control subjects already being studied for another purpose; often part of the larger population of a cohort study. The cases are those that arise in the larger population; the controls are other members of the same study population age- and sex-matched, but without the condition of interest. This is a nested portion of a larger group for all of whom some relevant information already exists.
From: nested case control study in A Dictionary of Public Health »
Subjects: Medicine and health — Public Health and Epidemiology
Related content in Oxford Reference
Reference entries.
View all related items in Oxford Reference »
Search for: 'nested case control study' in Oxford Reference »
- Oxford University Press
PRINTED FROM OXFORD REFERENCE (www.oxfordreference.com). (c) Copyright Oxford University Press, 2023. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single entry from a reference work in OR for personal use (for details see Privacy Policy and Legal Notice ).
date: 07 November 2023
- Cookie Policy
- Privacy Policy
- Legal Notice
- Accessibility
- [66.249.64.20|185.148.24.167]
- 185.148.24.167
Character limit 500 /500
Europe PMC requires Javascript to function effectively.
Either your web browser doesn't support Javascript or it is currently turned off. In the latter case, please turn on Javascript support in your web browser and reload this page.
Search life-sciences literature ( 43,127,767 articles, preprints and more)
- Available from publisher site using DOI. A subscription may be required. Full text
- Citations & impact
- Similar Articles
Nested case-control studies.
Author information, affiliations.
- Ernster VL 1
Preventive Medicine , 01 Sep 1994 , 23(5): 587-590 https://doi.org/10.1006/pmed.1994.1093 PMID: 7845919
Abstract
Full text links .
Read article at publisher's site: https://doi.org/10.1006/pmed.1994.1093
Citations & impact
Impact metrics, citations of article over time, alternative metrics.

Smart citations by scite.ai Smart citations by scite.ai include citation statements extracted from the full text of the citing article. The number of the statements may be higher than the number of citations provided by EuropePMC if one paper cites another multiple times or lower if scite has not yet processed some of the citing articles. Explore citation contexts and check if this article has been supported or disputed. https://scite.ai/reports/10.1006/pmed.1994.1093
Article citations, drug-drug interactions of docetaxel in patients with breast cancer based on insurance claims data..
Shin KH , Ah YM , Cha SH , Choi HD
PLoS One , 18(6):e0287382, 16 Jun 2023
Cited by: 0 articles | PMID: 37327237 | PMCID: PMC10275435
Free to read & use
Risks of leukemia, intracranial tumours and lymphomas in childhood and early adulthood after pediatric radiation exposure from computed tomography.
Wang WH , Sung CY , Wang SC , Shao YJ
CMAJ , 195(16):E575-E583, 01 Apr 2023
Cited by: 0 articles | PMID: 37094867 | PMCID: PMC10125186
Antibiotic Exposure is Associated With a Risk of Esophageal Adenocarcinoma.
Thanawala SU , Kaplan DE , Falk GW , Beveridge CA , Schaubel D , Serper M , Yang YX
Clin Gastroenterol Hepatol , 21(11):2817-2824.e4, 24 Mar 2023
Cited by: 0 articles | PMID: 36967101
Association between WHO First-Step Analgesic Use and Risk of Breast Cancer in Women of Working Age.
Oh HS , Seo HJ
Pharmaceuticals (Basel) , 16(2):323, 20 Feb 2023
Cited by: 0 articles | PMID: 37259467 | PMCID: PMC9961524
Data analysis guidelines for single-cell RNA-seq in biomedical studies and clinical applications.
Su M , Pan T , Chen QZ , Zhou WW , Gong Y , Xu G , Yan HY , Li S , Shi QZ , Zhang Y , He X , Jiang CJ , Fan SC , Li X , Cairns MJ , Wang X , Li YS
Mil Med Res , 9(1):68, 02 Dec 2022
Cited by: 4 articles | PMID: 36461064 | PMCID: PMC9716519
Review Free to read & use
Similar Articles
To arrive at the top five similar articles we use a word-weighted algorithm to compare words from the Title and Abstract of each citation.
[Application of nested case-control study on safe evaluation of post-marketing traditional Chinese medicine injection].
Xiao Y , Zhao Y , Xie Y
Zhongguo Zhong Yao Za Zhi , 36(20):2796-2798, 01 Oct 2011
Cited by: 0 articles | PMID: 22292368
Design options for molecular epidemiology research within cohort studies.
Rundle AG , Vineis P , Ahsan H
Cancer Epidemiol Biomarkers Prev , 14(8):1899-1907, 01 Aug 2005
Cited by: 61 articles | PMID: 16103435
Combining data from 2 nested case-control studies of overlapping cohorts to improve efficiency.
Salim A , Hultman C , Sparén P , Reilly M
Biostatistics , 10(1):70-79, 10 Jun 2008
Cited by: 19 articles | PMID: 18550564
Efficiency of cohort sampling designs: some surprising results.
Langholz B , Thomas DC
Biometrics , 47(4):1563-1571, 01 Dec 1991
Cited by: 15 articles | PMID: 1786329
The nested case-control study in cardiology.
Essebag V , Genest J , Suissa S , Pilote L
Am Heart J , 146(4):581-590, 01 Oct 2003
Cited by: 99 articles | PMID: 14564310
Europe PMC is part of the ELIXIR infrastructure
Contribute to the future of Europe PMC. Take part in our user survey!
Nested case-control study of selected systemic autoimmune diseases in World Trade Center rescue/recovery workers
Affiliation.
- 1 Montefiore Medical Center and Albert Einstein College of Medicine, Bronx, New York, and Fire Department of the City of New York, Bureau of Health Services, Brooklyn, New York.
- PMID: 25779102
- PMCID: PMC5562156
- DOI: 10.1002/art.39059
Objective: To test the a priori hypothesis that acute and chronic work exposures to the World Trade Center (WTC) site on or after September 11, 2001 were associated with risk of new-onset systemic autoimmune diseases.
Methods: A nested case-control study was performed in WTC rescue/recovery workers who had received a rheumatologist-confirmed systemic autoimmune disease diagnosis between September 12, 2001 and September 11, 2013 (n = 59), each of whom was individually matched to 4 randomly selected controls (n = 236) on the basis of year of hire (±1 year), sex, race, and work assignment (firefighter or emergency medical service). Acute exposure was defined according to the earliest time of arrival (morning of 9/11 versus later) at the WTC site, and chronic exposure was defined as duration (number of months) of WTC site-related work. Rheumatologists were blinded with regard to each subject's exposure status. The conditional odds ratios (CORs) with 95% confidence intervals (95% CIs) for incident autoimmune disease were derived from exact conditional logistic regression models.
Results: Rheumatoid arthritis was the most common autoimmune diagnosis (37% of subjects), followed by spondyloarthritis (22%), inflammatory myositis (14%), systemic lupus erythematosus (12%), systemic sclerosis (5%), Sjögren's syndrome (5%), antiphospholipid syndrome (3%), and granulomatosis with polyangiitis (Wegener's) (2%). The COR for incident autoimmune disease increased by 13% (COR 1.13, 95% CI 1.02-1.26) for each additional month worked at the WTC site. These odds were independent of the association between high acute exposure (working during the morning of 9/11) and disease outcome, which conveyed an elevated, but not statistically significant, risk (COR 1.85, 95% CI 0.86-3.89).
Conclusion: Prolonged work at the WTC site, independent of acute exposure, was an important predictor of post-9/11 systemic autoimmune diseases. The WTC Health Program should expand surveillance efforts for those with extended exposures, as early detection can facilitate early treatment, which has been shown to minimize organ damage and improve quality of life.
© 2015, American College of Rheumatology.
Publication types
- Research Support, U.S. Gov't, P.H.S.
- Antiphospholipid Syndrome / epidemiology
- Arthritis, Rheumatoid / epidemiology
- Autoimmune Diseases / epidemiology*
- Case-Control Studies
- Emergency Medical Technicians / statistics & numerical data
- Emergency Responders / statistics & numerical data*
- Environmental Exposure / statistics & numerical data*
- Firefighters / statistics & numerical data
- Granulomatosis with Polyangiitis / epidemiology
- Lupus Erythematosus, Systemic / epidemiology
- Middle Aged
- Myositis / epidemiology
- Rescue Work*
- Scleroderma, Systemic / epidemiology
- September 11 Terrorist Attacks*
- Sjogren's Syndrome / epidemiology
- Spondylarthropathies / epidemiology
- Young Adult
Grants and funding
- U01 OH010513/OH/NIOSH CDC HHS/United States
- 1 U01-OH-010513/OH/NIOSH CDC HHS/United States
Advertisement

- Previous Article
- Next Article
Prediagnostic Hormone Levels and Risk of Testicular Germ Cell Tumors: A Nested Case–Control Study in the Janus Serum Bank

Cancer Epidemiol Biomarkers Prev 2023;32:1564–71
- Funder(s): National Institutes of Health Intramural Research Program
- Article contents
- Figures & tables
- Supplementary Data
- Peer Review
- Get Permissions
- Cite Icon Cite
- Search Site
- Version of Record November 1 2023
- Proof September 6 2023
- Accepted Manuscript August 24 2023
Zeni Wu , Britton Trabert , Chantal Guillemette , Patrick Caron , Gary Bradwin , Barry I. Graubard , Elisabete Weiderpass , Giske Ursin , Hilde Langseth , Katherine A. McGlynn; Prediagnostic Hormone Levels and Risk of Testicular Germ Cell Tumors: A Nested Case–Control Study in the Janus Serum Bank. Cancer Epidemiol Biomarkers Prev 1 November 2023; 32 (11): 1564–1571. https://doi.org/10.1158/1055-9965.EPI-23-0772
Download citation file:
- Ris (Zotero)
- Reference Manager
It has been hypothesized that poorly functioning Leydig and/or Sertoli cells of the testes, indicated by higher levels of serum gonadotropins and lower levels of androgens, are related to the development of testicular germ cell tumors (TGCT). To investigate this hypothesis, we conducted a nested case–control study within the Janus Serum Bank cohort.
Men who developed TGCT ( n = 182) were matched to men who did not ( n = 364). Sex steroid hormones were measured using LC/MS. Sex hormone binding globulin, follicle-stimulating hormone (FSH), and luteinizing hormone (LH) were quantified by direct immunoassay. Multivariable logistic regression was used to calculate ORs and 95% confidence intervals (CI) for associations between hormone levels and TGCT risk.
Higher FSH levels [tertile (T) 3 vs. T2: OR = 2.89, 95% CI = 1.83–4.57] were associated with TGCT risk, but higher LH levels were not (OR = 1.26, 95% CI = 0.81–1.96). The only sex steroid hormone associated with risk was androstane-3α, 17β-diol-3G (3α-diol-3G; OR = 2.37, 95% CI = 1.46–3.83). Analysis by histology found that increased FSH levels were related to seminoma (OR = 3.55, 95% CI = 2.12–5.95) but not nonseminoma (OR = 1.19, 95% CI = 0.38–3.13). Increased levels of 3α-diol-3G were related to seminoma (OR = 2.29, 95% CI = 1.35–3.89) and nonsignificantly related to nonseminoma (OR = 2.71, 95% CI = 0.82–8.92).
Higher FSH levels are consistent with the hypothesis that poorly functioning Sertoli cells are related to the development of TGCT. In contrast, higher levels of 3α-diol-3G do not support the hypothesis that insufficient androgenicity is related to risk of TGCT.
Clarifying the role of sex hormones in the development of TGCT may stimulate new research hypotheses.
Client Account
Citing articles via, email alerts.
- Online First
- Online ISSN 1538-7755
- Print ISSN 1055-9965
AACR Journals
- Blood Cancer Discovery
- Cancer Discovery
- Cancer Epidemiology, Biomarkers & Prevention
- Cancer Immunology Research
- Cancer Prevention Research
- Cancer Research
- Cancer Research Communications
- Clinical Cancer Research
- Molecular Cancer Research
- Molecular Cancer Therapeutics
- Info for Advertisers
- Information for Institutions/Librarians
- Privacy Policy
- Copyright © 2023 by the American Association for Cancer Research.
This Feature Is Available To Subscribers Only
Sign In or Create an Account

IMAGES
VIDEO
COMMENTS
A nested case-control (NCC) study is a variation of a case-control study in which cases and controls are drawn from the population in a fully enumerated cohort. [1] Usually, the exposure of interest is only measured among the cases and the selected controls. Thus the nested case-control study is more efficient than the full cohort design.
A nested case-control study is an efficient design that can be embedded within an existing cohort study or randomised trial. It has a number of advantages compared to the conventional case-control design, and has the potential to answer important research questions using untapped prospectively collected data. Methods
The main advantages of a nested case-control study are as follows: (1) cost reduction and effort minimization, as only a fraction of the parent cohort requires the necessary outcome assessment; (2) reduced selection bias, as both case and control subjects are sampled from the same population; and (3) flexibility in analysis by allowing testing ...
The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study.
A major flaw inherent to case-control studies, described as early as 1959 , is the difficulty to ensure that cases and controls are a representative sample of the same source population. In a nested case-control study the cases emerge from a well-defined source population and the controls are sampled from that same population.
Of the four main types of case-control studies, we will focus on the basic case-control study and the nested case-control study. Other types include the case-cohort study and the case-crossover study which are discussed elsewhere. 9 In a nested case-control study, the case-control study is embedded within a cohort of patients, and cases and controls are both selected from the same cohort.
The analysis of nested case-control studies uses a proportional hazards model and a modification to the partial likelihood used in full-cohort studies, giving estimates of hazard ratios. Extensions to other survival models are possible. In the standard design, controls are selected randomly from the risk set for each case; however, more ...
This is a retrospective nested case-control study. The initial sample size was 8462 admitted to a single cerebrovascular specialty hospital with acute stroke. A total of 156 fall events occurred, and each fall case was randomly matched with six control cases. ... or better than, those produced by traditional tools if sufficient data and ...
Methods: Based on the Jinchang cohort, a nested case-control study was conducted in 1 025 new cases of diabetes after excluding patients with malignant tumor and related endocrine, circulatory system disease, then an age (±2 years), gender matched 1∶1 control group of 1 025 cases was set to analyze the relationship between the incidence of diabe...
In this retrospective nested case-control study, we found several risk factors for neurological disorders in HIV-infected people and then developed a simple risk scoring system to identify those at risk. To the best of our knowledge, this scoring system is the first to be specifically designed for identifying neurological disorders in people ...
The nested case-control design, like the case-cohort design, is a schema in which a representative sample of a full cohort is used. It includes all cases and a pre-specified number of controls randomly chosen from the risk set of each failure time ( Thomas, 1977 ). The design is also referred as incidence density sampling or risk set sampling.
A Nested Case-Control Study Case-Control Studies A Nested Case-Control Study Suppose a prospective cohort study were conducted among almost 90,000 women for the purpose of studying the determinants of cancer and cardiovascular disease.
In a nested case-control study, controls are selected for each case from the individuals who are at risk at the time at which the case occurs. We say that the controls are matched on study time. To adjust for possible confounding, it is common to match on other variables as well. The standard analysis of nested case-control data is based on a partial likelihood which compares the ...
In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of matched controls is selected from among those in the cohort who have not developed the disease by the time of disease occurrence in the case.
When certain covariates are difficult to obtain, however, researchers may only have the resources to sub-sample patients on whom to collect complete data: one way is using the nested case-control (NCC) design, in which risk set sampling is performed based on a single outcome.
a) The nested case-control study is a retrospective design b) The study design minimised selection bias compared with a case-control study c) Recall bias was minimised compared with a case-control study d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism Answers
The present nested case-control study measured the relative risk of self-reported breast cancer associated with dietary phosphate intake over 10 annual visits in a cohort of middle-aged U.S ...
There are two main types of cohort sampling designs: nested case-control studies and case-cohort studies; see e.g. Keogh and Cox (2014, Chaps. 7, 8) for a review. The two types of cohort sampling designs differ in the way controls are selected. In a nested case-control study, controls are selected for each case from the individuals
Background Despite its benefits, it is uncommon to apply the nested case-control design in diagnostic research. We aim to show advantages of this design for diagnostic accuracy studies. Methods We used data from a full cross-sectional diagnostic study comprising a cohort of 1295 consecutive patients who were selected on their suspicion of having deep vein thrombosis (DVT). We draw nested case ...
Methods. A nested case-control study was performed in WTC rescue/recovery workers who had received a rheumatologist-confirmed systemic auto-immune disease diagnosis between September 12, 2001 and September 11, 2013 (n = 59), each of whom was individually matched to 4 randomly selected controls (n = 236) on the basis of year of hire (±1 year), sex, race, and work assignment (firefighter or ...
Objectives We studied association of laboratory testing beyond the international normalised ratio (INR) with bleeding and stroke/transient ischaemic attack (TIA) outcomes in patients with atrial fibrillation treated with warfarin. Design This was a retrospective nested case-control study from the Finnish Warfarin in Atrial Fibrillation (FinWAF) registry (n=54 568), reporting the management ...
A nested case-control study compared 61 HCV antibody positive cases to 2571 controls. Multivariable logistic regression models adjusting for time of birth, traditional HCV risk factors, and type of work at the World Trade Center (WTC) site, determined if contact with human remains, blood/bodily fluids, and/or sewage at the WTC site was ...
Search for: 'nested case control study' in Oxford Reference ». A case control study that utilizes cases and control subjects already being studied for another purpose; often part of the larger population of a cohort study. The cases are those that arise in the larger population; the controls are other members of the same study population age ...
The nested case-control study design (or the case-control in a cohort study) is described here and compared with other designs, including the classic case-control and cohort studies and the case-cohort study. In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of ...
R Zeig-Owens PMID: 25779102 PMCID: PMC5562156 DOI: 10.1002/art.39059 To test the a priori hypothesis that acute and chronic work exposures to the World Trade Center (WTC) site on or after September 11, 2001 were associated with risk of new-onset systemic autoimmune diseases.
We performed a nested case-control study by individually matching each rheumatologist-confirmed case diagnosed between 9/12/2001 and 9/11/2013 (n=59) to 4 randomly selected controls...
AbstractBackground:. It has been hypothesized that poorly functioning Leydig and/or Sertoli cells of the testes, indicated by higher levels of serum gonadotropins and lower levels of androgens, are related to the development of testicular germ cell tumors (TGCT). To investigate this hypothesis, we conducted a nested case-control study within the Janus Serum Bank cohort.Methods:. Men who ...