Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • What Is a Case-Control Study? | Definition & Examples

What Is a Case-Control Study? | Definition & Examples

Published on February 4, 2023 by Tegan George . Revised on June 22, 2023.

A case-control study is an experimental design that compares a group of participants possessing a condition of interest to a very similar group lacking that condition. Here, the participants possessing the attribute of study, such as a disease, are called the “case,” and those without it are the “control.”

It’s important to remember that the case group is chosen because they already possess the attribute of interest. The point of the control group is to facilitate investigation, e.g., studying whether the case group systematically exhibits that attribute more than the control group does.

Table of contents

When to use a case-control study, examples of case-control studies, advantages and disadvantages of case-control studies, other interesting articles, frequently asked questions.

Case-control studies are a type of observational study often used in fields like medical research, environmental health, or epidemiology. While most observational studies are qualitative in nature, case-control studies can also be quantitative , and they often are in healthcare settings. Case-control studies can be used for both exploratory and explanatory research , and they are a good choice for studying research topics like disease exposure and health outcomes.

A case-control study may be a good fit for your research if it meets the following criteria.

  • Data on exposure (e.g., to a chemical or a pesticide) are difficult to obtain or expensive.
  • The disease associated with the exposure you’re studying has a long incubation period or is rare or under-studied (e.g., AIDS in the early 1980s).
  • The population you are studying is difficult to contact for follow-up questions (e.g., asylum seekers).

Retrospective cohort studies use existing secondary research data, such as medical records or databases, to identify a group of people with a common exposure or risk factor and to observe their outcomes over time. Case-control studies conduct primary research , comparing a group of participants possessing a condition of interest to a very similar group lacking that condition in real time.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

case control study epidemiology example

Case-control studies are common in fields like epidemiology, healthcare, and psychology.

You would then collect data on your participants’ exposure to contaminated drinking water, focusing on variables such as the source of said water and the duration of exposure, for both groups. You could then compare the two to determine if there is a relationship between drinking water contamination and the risk of developing a gastrointestinal illness. Example: Healthcare case-control study You are interested in the relationship between the dietary intake of a particular vitamin (e.g., vitamin D) and the risk of developing osteoporosis later in life. Here, the case group would be individuals who have been diagnosed with osteoporosis, while the control group would be individuals without osteoporosis.

You would then collect information on dietary intake of vitamin D for both the cases and controls and compare the two groups to determine if there is a relationship between vitamin D intake and the risk of developing osteoporosis. Example: Psychology case-control study You are studying the relationship between early-childhood stress and the likelihood of later developing post-traumatic stress disorder (PTSD). Here, the case group would be individuals who have been diagnosed with PTSD, while the control group would be individuals without PTSD.

Case-control studies are a solid research method choice, but they come with distinct advantages and disadvantages.

Advantages of case-control studies

  • Case-control studies are a great choice if you have any ethical considerations about your participants that could preclude you from using a traditional experimental design .
  • Case-control studies are time efficient and fairly inexpensive to conduct because they require fewer subjects than other research methods .
  • If there were multiple exposures leading to a single outcome, case-control studies can incorporate that. As such, they truly shine when used to study rare outcomes or outbreaks of a particular disease .

Disadvantages of case-control studies

  • Case-control studies, similarly to observational studies, run a high risk of research biases . They are particularly susceptible to observer bias , recall bias , and interviewer bias.
  • In the case of very rare exposures of the outcome studied, attempting to conduct a case-control study can be very time consuming and inefficient .
  • Case-control studies in general have low internal validity  and are not always credible.

Case-control studies by design focus on one singular outcome. This makes them very rigid and not generalizable , as no extrapolation can be made about other outcomes like risk recurrence or future exposure threat. This leads to less satisfying results than other methodological choices.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Quartiles & Quantiles
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Prospective cohort study

Research bias

  • Implicit bias
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic
  • Social desirability bias

Prevent plagiarism. Run a free check.

A case-control study differs from a cohort study because cohort studies are more longitudinal in nature and do not necessarily require a control group .

While one may be added if the investigator so chooses, members of the cohort are primarily selected because of a shared characteristic among them. In particular, retrospective cohort studies are designed to follow a group of people with a common exposure or risk factor over time and observe their outcomes.

Case-control studies, in contrast, require both a case group and a control group, as suggested by their name, and usually are used to identify risk factors for a disease by comparing cases and controls.

A case-control study differs from a cross-sectional study because case-control studies are naturally retrospective in nature, looking backward in time to identify exposures that may have occurred before the development of the disease.

On the other hand, cross-sectional studies collect data on a population at a single point in time. The goal here is to describe the characteristics of the population, such as their age, gender identity, or health status, and understand the distribution and relationships of these characteristics.

Cases and controls are selected for a case-control study based on their inherent characteristics. Participants already possessing the condition of interest form the “case,” while those without form the “control.”

Keep in mind that by definition the case group is chosen because they already possess the attribute of interest. The point of the control group is to facilitate investigation, e.g., studying whether the case group systematically exhibits that attribute more than the control group does.

The strength of the association between an exposure and a disease in a case-control study can be measured using a few different statistical measures , such as odds ratios (ORs) and relative risk (RR).

No, case-control studies cannot establish causality as a standalone measure.

As observational studies , they can suggest associations between an exposure and a disease, but they cannot prove without a doubt that the exposure causes the disease. In particular, issues arising from timing, research biases like recall bias , and the selection of variables lead to low internal validity and the inability to determine causality.

Sources in this article

We strongly encourage students to use sources in their work. You can cite our article (APA Style) or take a deep dive into the articles below.

George, T. (2023, June 22). What Is a Case-Control Study? | Definition & Examples. Scribbr. Retrieved June 24, 2024, from https://www.scribbr.com/methodology/case-control-study/
Schlesselman, J. J. (1982). Case-Control Studies: Design, Conduct, Analysis (Monographs in Epidemiology and Biostatistics, 2) (Illustrated). Oxford University Press.

Is this article helpful?

Tegan George

Tegan George

Other students also liked, what is an observational study | guide & examples, control groups and treatment groups | uses & examples, cross-sectional study | definition, uses & examples, what is your plagiarism score.

  • Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Case Control Study: Definition, Benefits & Examples

By Jim Frost 2 Comments

What is a Case Control Study?

A case control study is a retrospective, observational study that compares two existing groups. Researchers form these groups based on the existence of a condition in the case group and the lack of that condition in the control group. They evaluate the differences in the histories between these two groups looking for factors that might cause a disease.

Photograph of medical scientist at work.

By evaluating differences in exposure to risk factors between the case and control groups, researchers can learn which factors are associated with the medical condition.

For example, medical researchers study disease X and use a case-control study design to identify risk factors. They create two groups using available medical records from hospitals. Individuals with disease X are in the case group, while those without it are in the control group. If the case group has more exposure to a risk factor than the control group, that exposure is a potential cause for disease X. However, case-control studies establish only correlation and not causation. Be aware of spurious correlations!

Case-control studies are observational studies because researchers do not control the risk factors—they only observe them. They are retrospective studies because the scientists create the case and control groups after the outcomes for the subjects (e.g., disease vs. no disease) are known.

This post explains the benefits and limitations of case-control studies, controlling confounders, and analyzing and interpreting the results. I close with an example case control study showing how to calculate and interpret the results.

Learn more about Experimental Design: Definition, Types, and Examples .

Related posts : Observational Studies Explained and Control Groups in Experiments

Benefits of a Case Control Study

A case control study is a relatively quick and simple design. They frequently use existing patient data, and the experimenters form the groups after the outcomes are known. Researchers do not conduct an experiment. Instead, they look for differences between the case and control groups that are potential risk factors for the condition. Small groups and individual facilities can conduct case-control studies, unlike other more intensive types of experiments.

Case-control studies are perfect for evaluating outbreaks and rare conditions. Researchers simply need to let a sufficient number of known cases accumulate in an established database. The alternative would be to select a large random sample and hope that the condition afflicts it eventually.

A case control study can provide rapid results during outbreaks where the researchers need quick answers. They are ideal for the preliminary investigation phase, where scientists screen potential risk factors. As such, they can point the way for more thorough, time-consuming, and expensive studies. They are especially beneficial when the current state of science knows little about the connection between risk factors and the medical condition. And when you need to identify potential risk factors quickly!

Cohort studies are another type of observational study that are similar to case-control studies, but there are some important differences. To learn more, read my post about Cohort Studies .

Limitations of a Case Control Study

Because case-control studies are observational, they cannot establish causality and provide lower quality evidence than other experimental designs, such as randomized controlled trials . Additionally, as you’ll see in the next section, this type of study is susceptible to confounding variables unless experimenters correctly match traits between the two groups.

A case-control study typically depends on health records. If the necessary data exist in sources available to the researchers, all is good. However, the investigation becomes more complicated if the data are not readily available.

Case-control studies can incorporate biases from the underlying data sources. For example, researchers frequently obtain patient data from hospital records. The population of hospital patients is likely to differ from the general population. Even the control patients are in the hospital for some reason—they likely have serious health problems. Consequently, the subjects in case-control studies are likely to differ from the general population, which reduces the generalizability of the results.

A case-control study cannot estimate incidence or prevalence rates for the disease. The data from these studies do not allow you to calculate the probability of a new person contracting the condition in a given period nor how common it is in the population. This limitation occurs because case-control studies do not use a representative sample.

Case-control studies cannot determine the time between exposure and onset of the medical condition. In fact, case-control studies cannot reliably assess each subject’s exposure to risk factors over time. Longitudinal studies, such as prospective cohort studies, can better make those types of assessment.

Related post : Causation versus Correlation in Statistics

Use Matching to Control Confounders

Because case-control studies are observational studies, they are particularly vulnerable to confounding variables and spurious correlations . A confounder correlates with both the risk factor and the outcome variable. Because observational studies don’t use random assignment to equalize confounders between the case and control groups, they can become unbalanced and affect the results.

Unfortunately, confounders can be the actual cause of the medical condition rather than the risk factor that the researchers identify. If a case-control study does not account for confounding variables, it can bias the results and make them untrustworthy.

Case-control studies typically use trait matching to control confounders. This technique involves selecting study participants for the case and control groups with similar characteristics, which helps equalize the groups for potential confounders. Equalizing confounders limits their impact on the results.

Ultimately, the goal is to create case and control groups that have equal risks for developing the condition/disease outside the risk factors the researchers are explicitly assessing. Matching facilitates valid comparisons between the two groups because the controls are similar to cases. The researchers use subject-area knowledge to identify characteristics that are critical to match.

Note that you cannot assess matching variables as potential risk factors. You’ve intentionally equalized them across the case and control groups and, consequently, they do not correlate with the condition. Hence, do not use the risk factors you want to evaluate as trait matching variables.

Learn more about confounding variables .

Statistical Analysis of a Case Control Study

Researchers frequently include two controls for each case to increase statistical power for a case-control study. Adding even more controls per case provides few statistical benefits, so studies usually do not use more than a 2:1 control to case ratio.

For statistical results, case-control studies typically produce an odds ratio for each potential risk factor. The equation below shows how to calculate an odds ratio for a case-control study.

Equation for an odds ratio in a case-control study.

Notice how this ratio takes the exposure odds in the case group and divides it by the exposure odds in the control group. Consequently, it quantifies how much higher the odds of exposure are among cases than the controls.

In general, odds ratios greater than one flag potential risk factors because they indicate that exposure was higher in the case group than in the control group. Furthermore, higher ratios signify stronger associations between exposure and the medical condition.

An odds ratio of one indicates that exposure was the same in the case and control groups. Nothing to see here!

Ratios less than one might identify protective factors.

Learn more about Understanding Ratios .

Now, let’s bring this to life with an example!

Example Odds Ratio in a Case-Control Study

The Kent County Health Department in Michigan conducted a case-control study in 2005 for a company lunch that produced an outbreak of vomiting and diarrhea. Out of multiple lunch ingredients, researchers found the following exposure rates for lettuce consumption.

53 33
1 7

By plugging these numbers into the equation, we can calculate the odds ratio for lettuce in this case-control study.

Example odds ratio calculations for a case-control study.

The study determined that the odds ratio for lettuce is 11.2.

This ratio indicates that those with symptoms were 11.2 times more likely to have eaten lettuce than those without symptoms. These results raise a big red flag for contaminated lettuce being the culprit!

Learn more about Odds Ratios.

Epidemiology in Practice: Case-Control Studies (NIH)

Interpreting Results of Case-Control Studies (CDC)

Share this:

case control study epidemiology example

Reader Interactions

' src=

January 18, 2022 at 7:56 am

Great post, thanks for writing it!

Is it possible to test an odds ration for statistical significance?

' src=

January 18, 2022 at 7:41 pm

Hi Michael,

Thanks! And yes, you can test for significance. To learn more about that, read my post about odds ratios , where I discuss p-values and confidence intervals.

Comments and Questions Cancel reply

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed, case control studies, affiliations.

  • 1 University of Nebraska Medical Center
  • 2 Spectrum Health/Michigan State University College of Human Medicine
  • PMID: 28846237
  • Bookshelf ID: NBK448143

A case-control study is a type of observational study commonly used to look at factors associated with diseases or outcomes. The case-control study starts with a group of cases, which are the individuals who have the outcome of interest. The researcher then tries to construct a second group of individuals called the controls, who are similar to the case individuals but do not have the outcome of interest. The researcher then looks at historical factors to identify if some exposure(s) is/are found more commonly in the cases than the controls. If the exposure is found more commonly in the cases than in the controls, the researcher can hypothesize that the exposure may be linked to the outcome of interest.

For example, a researcher may want to look at the rare cancer Kaposi's sarcoma. The researcher would find a group of individuals with Kaposi's sarcoma (the cases) and compare them to a group of patients who are similar to the cases in most ways but do not have Kaposi's sarcoma (controls). The researcher could then ask about various exposures to see if any exposure is more common in those with Kaposi's sarcoma (the cases) than those without Kaposi's sarcoma (the controls). The researcher might find that those with Kaposi's sarcoma are more likely to have HIV, and thus conclude that HIV may be a risk factor for the development of Kaposi's sarcoma.

There are many advantages to case-control studies. First, the case-control approach allows for the study of rare diseases. If a disease occurs very infrequently, one would have to follow a large group of people for a long period of time to accrue enough incident cases to study. Such use of resources may be impractical, so a case-control study can be useful for identifying current cases and evaluating historical associated factors. For example, if a disease developed in 1 in 1000 people per year (0.001/year) then in ten years one would expect about 10 cases of a disease to exist in a group of 1000 people. If the disease is much rarer, say 1 in 1,000,0000 per year (0.0000001/year) this would require either having to follow 1,000,0000 people for ten years or 1000 people for 1000 years to accrue ten total cases. As it may be impractical to follow 1,000,000 for ten years or to wait 1000 years for recruitment, a case-control study allows for a more feasible approach.

Second, the case-control study design makes it possible to look at multiple risk factors at once. In the example above about Kaposi's sarcoma, the researcher could ask both the cases and controls about exposures to HIV, asbestos, smoking, lead, sunburns, aniline dye, alcohol, herpes, human papillomavirus, or any number of possible exposures to identify those most likely associated with Kaposi's sarcoma.

Case-control studies can also be very helpful when disease outbreaks occur, and potential links and exposures need to be identified. This study mechanism can be commonly seen in food-related disease outbreaks associated with contaminated products, or when rare diseases start to increase in frequency, as has been seen with measles in recent years.

Because of these advantages, case-control studies are commonly used as one of the first studies to build evidence of an association between exposure and an event or disease.

In a case-control study, the investigator can include unequal numbers of cases with controls such as 2:1 or 4:1 to increase the power of the study.

Disadvantages and Limitations

The most commonly cited disadvantage in case-control studies is the potential for recall bias. Recall bias in a case-control study is the increased likelihood that those with the outcome will recall and report exposures compared to those without the outcome. In other words, even if both groups had exactly the same exposures, the participants in the cases group may report the exposure more often than the controls do. Recall bias may lead to concluding that there are associations between exposure and disease that do not, in fact, exist. It is due to subjects' imperfect memories of past exposures. If people with Kaposi's sarcoma are asked about exposure and history (e.g., HIV, asbestos, smoking, lead, sunburn, aniline dye, alcohol, herpes, human papillomavirus), the individuals with the disease are more likely to think harder about these exposures and recall having some of the exposures that the healthy controls.

Case-control studies, due to their typically retrospective nature, can be used to establish a correlation between exposures and outcomes, but cannot establish causation . These studies simply attempt to find correlations between past events and the current state.

When designing a case-control study, the researcher must find an appropriate control group. Ideally, the case group (those with the outcome) and the control group (those without the outcome) will have almost the same characteristics, such as age, gender, overall health status, and other factors. The two groups should have similar histories and live in similar environments. If, for example, our cases of Kaposi's sarcoma came from across the country but our controls were only chosen from a small community in northern latitudes where people rarely go outside or get sunburns, asking about sunburn may not be a valid exposure to investigate. Similarly, if all of the cases of Kaposi's sarcoma were found to come from a small community outside a battery factory with high levels of lead in the environment, then controls from across the country with minimal lead exposure would not provide an appropriate control group. The investigator must put a great deal of effort into creating a proper control group to bolster the strength of the case-control study as well as enhance their ability to find true and valid potential correlations between exposures and disease states.

Similarly, the researcher must recognize the potential for failing to identify confounding variables or exposures, introducing the possibility of confounding bias, which occurs when a variable that is not being accounted for that has a relationship with both the exposure and outcome. This can cause us to accidentally be studying something we are not accounting for but that may be systematically different between the groups.

Copyright © 2024, StatPearls Publishing LLC.

PubMed Disclaimer

Conflict of interest statement

Disclosure: Steven Tenny declares no relevant financial relationships with ineligible companies.

Disclosure: Connor Kerndt declares no relevant financial relationships with ineligible companies.

Disclosure: Mary Hoffman declares no relevant financial relationships with ineligible companies.

  • Introduction
  • Issues of Concern
  • Clinical Significance
  • Enhancing Healthcare Team Outcomes
  • Review Questions

Similar articles

  • Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas. Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
  • Epidemiology Of Study Design. Munnangi S, Boktor SW. Munnangi S, et al. 2023 Apr 24. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan–. 2023 Apr 24. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan–. PMID: 29262004 Free Books & Documents.
  • Risk factors for Kaposi's sarcoma in HIV-positive subjects in Uganda. Ziegler JL, Newton R, Katongole-Mbidde E, Mbulataiye S, De Cock K, Wabinga H, Mugerwa J, Katabira E, Jaffe H, Parkin DM, Reeves G, Weiss R, Beral V. Ziegler JL, et al. AIDS. 1997 Nov;11(13):1619-26. doi: 10.1097/00002030-199713000-00011. AIDS. 1997. PMID: 9365767
  • Epidemiology of Kaposi's sarcoma. Beral V. Beral V. Cancer Surv. 1991;10:5-22. Cancer Surv. 1991. PMID: 1821323 Review.
  • The epidemiology of classic, African, and immunosuppressed Kaposi's sarcoma. Wahman A, Melnick SL, Rhame FS, Potter JD. Wahman A, et al. Epidemiol Rev. 1991;13:178-99. doi: 10.1093/oxfordjournals.epirev.a036068. Epidemiol Rev. 1991. PMID: 1765111 Review.
  • Setia MS. Methodology Series Module 2: Case-control Studies. Indian J Dermatol. 2016 Mar-Apr;61(2):146-51. - PMC - PubMed
  • Sedgwick P. Bias in observational study designs: case-control studies. BMJ. 2015 Jan 30;350:h560. - PubMed
  • Groenwold RHH, van Smeden M. Efficient Sampling in Unmatched Case-Control Studies When the Total Number of Cases and Controls Is Fixed. Epidemiology. 2017 Nov;28(6):834-837. - PubMed

Publication types

  • Search in PubMed
  • Search in MeSH
  • Add to Search

Related information

Linkout - more resources, full text sources.

  • NCBI Bookshelf

Research Materials

  • NCI CPTC Antibody Characterization Program

Miscellaneous

  • NCI CPTAC Assay Portal

book cover photo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

case control study epidemiology example

Outbreak Investigations

  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  
  • |   10  

Learn More sidebar

All Modules

Example of a Case-Control Study

 

Excerpts from introduction of the report by the Massachusetts Department of Health

 

Within a short period of time 20 cases of hepatitis A were identified in the Marshfield area. The epidemic curve suggested a point source epidemic, and the spot map showed the cases to be spread across the entire South Shore of Massachusetts, although the pattern suggested a focus near Marshfield. Hypothesis-generating interviews resulted in five food establishments that were candidate sources. Moreover, the disease was rare, so that even if they interviewed a sample of patrons at each of the restaurants, it is most likely that few, if any would have had recent hepatitis, even from the responsible restaurant.

In a situation like this a case-control design is a much more efficient option. The investigators identified as many cases as possible (19 agreed to answer the questionnaire), and they selected a sample of 38 non-diseased people as a comparison group (the controls). In this case, the "controls" were non-diseased people who were matched to the cases with respect to age, gender, and neighborhood of residence. Investigators then ascertained the prior exposures of subjects in each group, focusing on food establishments and other possibly relevant exposures they had had during the past two months.

Cases Controls
Ate at Papa Gino's 10 19
Did not eat at Papa Gino's 9 19
19 38

Given these hypothetical results, the odds that someone who ate a Papa Gino's was a case were 10/19, while the odds that someone not exposed to Papa Gino's became a case were 9/19. These odds are quite similar, and the odds ratio is close to 1.0. The odds ratio can be interpreted the same way as a risk ratio.

Odds Ratio = (10/19) / (9/19) = 1.1

This certainly provides no compelling evidence to suggest an association with Papa Gino's, but, as we did with the risk ratio, we could compute a 95% confidence interval for the odds ratio, and we could also compute a p value. In this case the 95% confidence interval is 0.37 to 3.35, and p= 0.85.

In contrast, consider the findings for Ron's Grill:

 

Cases

Controls

Ate at Ron's Grill

18

7

Did not eat at Ron's

1

29

 

19

38

For Ron's Grill the odds ratio would be computed as follows:

Odds Ratio = (18/7) / (1/29) = 75

This suggests that patrons of Ron's Grill had 75 times the risk of being a case compared to those who did not eat at Ron's. The other three restaurants that had been suspects had odds ratios that were close to 1.0. This certainly provides strong evidence that a Ron's Grill was the source of the outbreak, and further investigation confirmed that one of the food handlers at Ron's had recently had a subclinical case of hepatitis A.

In case-control studies, one of the most difficult decisions is how to select the the controls. Ideally they should be non-diseased people who come from the same source population as the cases, and, aside from their outcome status, they should be comparable to the cases in order to avoid selection bias. Note that in the Marshfield case-control study the controls were selected in a way to ensure that they were comparable with respect to age and gender and lived in similar neighborhoods.

For more information about the conduct and analysis of case-control studies, please see the online modules on:

  • Link to module giving an overview of Analytical Studies
  • Link to module on Case-Control Studies

For more information on developing questionnaires for outbreak studies, see:

  • Link to information on developing a questionnaire

return to top | previous page | next page

Content ©2016. All Rights Reserved. Date last modified: May 3, 2016. Wayne W. LaMorte, MD, PhD, MPH

Study Design 101

  • Helpful formulas
  • Finding specific study types
  • Case Control Study
  • Meta- Analysis
  • Systematic Review
  • Practice Guideline
  • Randomized Controlled Trial
  • Cohort Study
  • Case Reports

A study that compares patients who have a disease or outcome of interest (cases) with patients who do not have the disease or outcome (controls), and looks back retrospectively to compare how frequently the exposure to a risk factor is present in each group to determine the relationship between the risk factor and the disease.

Case control studies are observational because no intervention is attempted and no attempt is made to alter the course of the disease. The goal is to retrospectively determine the exposure to the risk factor of interest from each of the two groups of individuals: cases and controls. These studies are designed to estimate odds.

Case control studies are also known as "retrospective studies" and "case-referent studies."

  • Good for studying rare conditions or diseases
  • Less time needed to conduct the study because the condition or disease has already occurred
  • Lets you simultaneously look at multiple risk factors
  • Useful as initial studies to establish an association
  • Can answer questions that could not be answered through other study designs

Disadvantages

  • Retrospective studies have more problems with data quality because they rely on memory and people with a condition will be more motivated to recall risk factors (also called recall bias).
  • Not good for evaluating diagnostic tests because it’s already clear that the cases have the condition and the controls do not
  • It can be difficult to find a suitable control group

Design pitfalls to look out for

Care should be taken to avoid confounding, which arises when an exposure and an outcome are both strongly associated with a third variable. Controls should be subjects who might have been cases in the study but are selected independent of the exposure. Cases and controls should also not be "over-matched."

Is the control group appropriate for the population? Does the study use matching or pairing appropriately to avoid the effects of a confounding variable? Does it use appropriate inclusion and exclusion criteria?

Fictitious Example

There is a suspicion that zinc oxide, the white non-absorbent sunscreen traditionally worn by lifeguards is more effective at preventing sunburns that lead to skin cancer than absorbent sunscreen lotions. A case-control study was conducted to investigate if exposure to zinc oxide is a more effective skin cancer prevention measure. The study involved comparing a group of former lifeguards that had developed cancer on their cheeks and noses (cases) to a group of lifeguards without this type of cancer (controls) and assess their prior exposure to zinc oxide or absorbent sunscreen lotions.

This study would be retrospective in that the former lifeguards would be asked to recall which type of sunscreen they used on their face and approximately how often. This could be either a matched or unmatched study, but efforts would need to be made to ensure that the former lifeguards are of the same average age, and lifeguarded for a similar number of seasons and amount of time per season.

Real-life Examples

Boubekri, M., Cheung, I., Reid, K., Wang, C., & Zee, P. (2014). Impact of windows and daylight exposure on overall health and sleep quality of office workers: a case-control pilot study . Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine, 10 (6), 603-611. https://doi.org/10.5664/jcsm.3780

This pilot study explored the impact of exposure to daylight on the health of office workers (measuring well-being and sleep quality subjectively, and light exposure, activity level and sleep-wake patterns via actigraphy). Individuals with windows in their workplaces had more light exposure, longer sleep duration, and more physical activity. They also reported a better scores in the areas of vitality and role limitations due to physical problems, better sleep quality and less sleep disturbances.

Togha, M., Razeghi Jahromi, S., Ghorbani, Z., Martami, F., & Seifishahpar, M. (2018). Serum Vitamin D Status in a Group of Migraine Patients Compared With Healthy Controls: A Case-Control Study . Headache, 58 (10), 1530-1540. https://doi.org/10.1111/head.13423

This case-control study compared serum vitamin D levels in individuals who experience migraine headaches with their matched controls. Studied over a period of thirty days, individuals with higher levels of serum Vitamin D was associated with lower odds of migraine headache.

Related Formulas

  • Odds ratio in an unmatched study
  • Odds ratio in a matched study

Related Terms

A patient with the disease or outcome of interest.

Confounding

When an exposure and an outcome are both strongly associated with a third variable.

A patient who does not have the disease or outcome.

Matched Design

Each case is matched individually with a control according to certain characteristics such as age and gender. It is important to remember that the concordant pairs (pairs in which the case and control are either both exposed or both not exposed) tell us nothing about the risk of exposure separately for cases or controls.

Observed Assignment

The method of assignment of individuals to study and control groups in observational studies when the investigator does not intervene to perform the assignment.

Unmatched Design

The controls are a sample from a suitable non-affected population.

Now test yourself!

1. Case Control Studies are prospective in that they follow the cases and controls over time and observe what occurs.

a) True b) False

2. Which of the following is an advantage of Case Control Studies?

a) They can simultaneously look at multiple risk factors. b) They are useful to initially establish an association between a risk factor and a disease or outcome. c) They take less time to complete because the condition or disease has already occurred. d) b and c only e) a, b, and c

← Previous Next →

© 2011-2019, The Himmelfarb Health Sciences Library Questions? Ask us .

Creative Commons License

  • Himmelfarb Intranet
  • Privacy Notice
  • Terms of Use
  • GW is committed to digital accessibility. If you experience a barrier that affects your ability to access content on this page, let us know via the Accessibility Feedback Form .

What Is A Case Control Study?

Julia Simkus

Editor at Simply Psychology

BA (Hons) Psychology, Princeton University

Julia Simkus is a graduate of Princeton University with a Bachelor of Arts in Psychology. She is currently studying for a Master's Degree in Counseling for Mental Health and Wellness in September 2023. Julia's research has been published in peer reviewed journals.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

A case-control study is a research method where two groups of people are compared – those with the condition (cases) and those without (controls). By looking at their past, researchers try to identify what factors might have contributed to the condition in the ‘case’ group.

Explanation

A case-control study looks at people who already have a certain condition (cases) and people who don’t (controls). By comparing these two groups, researchers try to figure out what might have caused the condition. They look into the past to find clues, like habits or experiences, that are different between the two groups.

The “cases” are the individuals with the disease or condition under study, and the “controls” are similar individuals without the disease or condition of interest.

The controls should have similar characteristics (i.e., age, sex, demographic, health status) to the cases to mitigate the effects of confounding variables .

Case-control studies identify any associations between an exposure and an outcome and help researchers form hypotheses about a particular population.

Researchers will first identify the two groups, and then look back in time to investigate which subjects in each group were exposed to the condition.

If the exposure is found more commonly in the cases than the controls, the researcher can hypothesize that the exposure may be linked to the outcome of interest.

Case Control Study

Figure: Schematic diagram of case-control study design. Kenneth F. Schulz and David A. Grimes (2002) Case-control studies: research in reverse . The Lancet Volume 359, Issue 9304, 431 – 434

Quick, inexpensive, and simple

Because these studies use already existing data and do not require any follow-up with subjects, they tend to be quicker and cheaper than other types of research. Case-control studies also do not require large sample sizes.

Beneficial for studying rare diseases

Researchers in case-control studies start with a population of people known to have the target disease instead of following a population and waiting to see who develops it. This enables researchers to identify current cases and enroll a sufficient number of patients with a particular rare disease.

Useful for preliminary research

Case-control studies are beneficial for an initial investigation of a suspected risk factor for a condition. The information obtained from cross-sectional studies then enables researchers to conduct further data analyses to explore any relationships in more depth.

Limitations

Subject to recall bias.

Participants might be unable to remember when they were exposed or omit other details that are important for the study. In addition, those with the outcome are more likely to recall and report exposures more clearly than those without the outcome.

Difficulty finding a suitable control group

It is important that the case group and the control group have almost the same characteristics, such as age, gender, demographics, and health status.

Forming an accurate control group can be challenging, so sometimes researchers enroll multiple control groups to bolster the strength of the case-control study.

Do not demonstrate causation

Case-control studies may prove an association between exposures and outcomes, but they can not demonstrate causation.

A case-control study is an observational study where researchers analyzed two groups of people (cases and controls) to look at factors associated with particular diseases or outcomes.

Below are some examples of case-control studies:
  • Investigating the impact of exposure to daylight on the health of office workers (Boubekri et al., 2014).
  • Comparing serum vitamin D levels in individuals who experience migraine headaches with their matched controls (Togha et al., 2018).
  • Analyzing correlations between parental smoking and childhood asthma (Strachan and Cook, 1998).
  • Studying the relationship between elevated concentrations of homocysteine and an increased risk of vascular diseases (Ford et al., 2002).
  • Assessing the magnitude of the association between Helicobacter pylori and the incidence of gastric cancer (Helicobacter and Cancer Collaborative Group, 2001).
  • Evaluating the association between breast cancer risk and saturated fat intake in postmenopausal women (Howe et al., 1990).

Frequently asked questions

1. what’s the difference between a case-control study and a cross-sectional study.

Case-control studies are different from cross-sectional studies in that case-control studies compare groups retrospectively while cross-sectional studies analyze information about a population at a specific point in time.

In  cross-sectional studies , researchers are simply examining a group of participants and depicting what already exists in the population.

2. What’s the difference between a case-control study and a longitudinal study?

Case-control studies compare groups retrospectively, while longitudinal studies can compare groups either retrospectively or prospectively.

In a  longitudinal study , researchers monitor a population over an extended period of time, and they can be used to study developmental shifts and understand how certain things change as we age.

In addition, case-control studies look at a single subject or a single case, whereas longitudinal studies can be conducted on a large group of subjects.

3. What’s the difference between a case-control study and a retrospective cohort study?

Case-control studies are retrospective as researchers begin with an outcome and trace backward to investigate exposure; however, they differ from retrospective cohort studies.

In a  retrospective cohort study , researchers examine a group before any of the subjects have developed the disease, then examine any factors that differed between the individuals who developed the condition and those who did not.

Thus, the outcome is measured after exposure in retrospective cohort studies, whereas the outcome is measured before the exposure in case-control studies.

Boubekri, M., Cheung, I., Reid, K., Wang, C., & Zee, P. (2014). Impact of windows and daylight exposure on overall health and sleep quality of office workers: a case-control pilot study. Journal of Clinical Sleep Medicine: JCSM: Official Publication of the American Academy of Sleep Medicine, 10 (6), 603-611.

Ford, E. S., Smith, S. J., Stroup, D. F., Steinberg, K. K., Mueller, P. W., & Thacker, S. B. (2002). Homocyst (e) ine and cardiovascular disease: a systematic review of the evidence with special emphasis on case-control studies and nested case-control studies. International journal of epidemiology, 31 (1), 59-70.

Helicobacter and Cancer Collaborative Group. (2001). Gastric cancer and Helicobacter pylori: a combined analysis of 12 case control studies nested within prospective cohorts. Gut, 49 (3), 347-353.

Howe, G. R., Hirohata, T., Hislop, T. G., Iscovich, J. M., Yuan, J. M., Katsouyanni, K., … & Shunzhang, Y. (1990). Dietary factors and risk of breast cancer: combined analysis of 12 case—control studies. JNCI: Journal of the National Cancer Institute, 82 (7), 561-569.

Lewallen, S., & Courtright, P. (1998). Epidemiology in practice: case-control studies. Community eye health, 11 (28), 57–58.

Strachan, D. P., & Cook, D. G. (1998). Parental smoking and childhood asthma: longitudinal and case-control studies. Thorax, 53 (3), 204-212.

Tenny, S., Kerndt, C. C., & Hoffman, M. R. (2021). Case Control Studies. In StatPearls . StatPearls Publishing.

Togha, M., Razeghi Jahromi, S., Ghorbani, Z., Martami, F., & Seifishahpar, M. (2018). Serum Vitamin D Status in a Group of Migraine Patients Compared With Healthy Controls: A Case-Control Study. Headache, 58 (10), 1530-1540.

Further Information

  • Schulz, K. F., & Grimes, D. A. (2002). Case-control studies: research in reverse. The Lancet, 359(9304), 431-434.
  • What is a case-control study?

Print Friendly, PDF & Email

  • En español – ExME
  • Em português – EME

Case-control and Cohort studies: A brief overview

Posted on 6th December 2017 by Saul Crandon

Man in suit with binoculars

Introduction

Case-control and cohort studies are observational studies that lie near the middle of the hierarchy of evidence . These types of studies, along with randomised controlled trials, constitute analytical studies, whereas case reports and case series define descriptive studies (1). Although these studies are not ranked as highly as randomised controlled trials, they can provide strong evidence if designed appropriately.

Case-control studies

Case-control studies are retrospective. They clearly define two groups at the start: one with the outcome/disease and one without the outcome/disease. They look back to assess whether there is a statistically significant difference in the rates of exposure to a defined risk factor between the groups. See Figure 1 for a pictorial representation of a case-control study design. This can suggest associations between the risk factor and development of the disease in question, although no definitive causality can be drawn. The main outcome measure in case-control studies is odds ratio (OR) .

case control study epidemiology example

Figure 1. Case-control study design.

Cases should be selected based on objective inclusion and exclusion criteria from a reliable source such as a disease registry. An inherent issue with selecting cases is that a certain proportion of those with the disease would not have a formal diagnosis, may not present for medical care, may be misdiagnosed or may have died before getting a diagnosis. Regardless of how the cases are selected, they should be representative of the broader disease population that you are investigating to ensure generalisability.

Case-control studies should include two groups that are identical EXCEPT for their outcome / disease status.

As such, controls should also be selected carefully. It is possible to match controls to the cases selected on the basis of various factors (e.g. age, sex) to ensure these do not confound the study results. It may even increase statistical power and study precision by choosing up to three or four controls per case (2).

Case-controls can provide fast results and they are cheaper to perform than most other studies. The fact that the analysis is retrospective, allows rare diseases or diseases with long latency periods to be investigated. Furthermore, you can assess multiple exposures to get a better understanding of possible risk factors for the defined outcome / disease.

Nevertheless, as case-controls are retrospective, they are more prone to bias. One of the main examples is recall bias. Often case-control studies require the participants to self-report their exposure to a certain factor. Recall bias is the systematic difference in how the two groups may recall past events e.g. in a study investigating stillbirth, a mother who experienced this may recall the possible contributing factors a lot more vividly than a mother who had a healthy birth.

A summary of the pros and cons of case-control studies are provided in Table 1.

case control study epidemiology example

Table 1. Advantages and disadvantages of case-control studies.

Cohort studies

Cohort studies can be retrospective or prospective. Retrospective cohort studies are NOT the same as case-control studies.

In retrospective cohort studies, the exposure and outcomes have already happened. They are usually conducted on data that already exists (from prospective studies) and the exposures are defined before looking at the existing outcome data to see whether exposure to a risk factor is associated with a statistically significant difference in the outcome development rate.

Prospective cohort studies are more common. People are recruited into cohort studies regardless of their exposure or outcome status. This is one of their important strengths. People are often recruited because of their geographical area or occupation, for example, and researchers can then measure and analyse a range of exposures and outcomes.

The study then follows these participants for a defined period to assess the proportion that develop the outcome/disease of interest. See Figure 2 for a pictorial representation of a cohort study design. Therefore, cohort studies are good for assessing prognosis, risk factors and harm. The outcome measure in cohort studies is usually a risk ratio / relative risk (RR).

case control study epidemiology example

Figure 2. Cohort study design.

Cohort studies should include two groups that are identical EXCEPT for their exposure status.

As a result, both exposed and unexposed groups should be recruited from the same source population. Another important consideration is attrition. If a significant number of participants are not followed up (lost, death, dropped out) then this may impact the validity of the study. Not only does it decrease the study’s power, but there may be attrition bias – a significant difference between the groups of those that did not complete the study.

Cohort studies can assess a range of outcomes allowing an exposure to be rigorously assessed for its impact in developing disease. Additionally, they are good for rare exposures, e.g. contact with a chemical radiation blast.

Whilst cohort studies are useful, they can be expensive and time-consuming, especially if a long follow-up period is chosen or the disease itself is rare or has a long latency.

A summary of the pros and cons of cohort studies are provided in Table 2.

case control study epidemiology example

The Strengthening of Reporting of Observational Studies in Epidemiology Statement (STROBE)

STROBE provides a checklist of important steps for conducting these types of studies, as well as acting as best-practice reporting guidelines (3). Both case-control and cohort studies are observational, with varying advantages and disadvantages. However, the most important factor to the quality of evidence these studies provide, is their methodological quality.

  • Song, J. and Chung, K. Observational Studies: Cohort and Case-Control Studies .  Plastic and Reconstructive Surgery.  2010 Dec;126(6):2234-2242.
  • Ury HK. Efficiency of case-control studies with multiple controls per case: Continuous or dichotomous data .  Biometrics . 1975 Sep;31(3):643–649.
  • von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative.  The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.   Lancet 2007 Oct;370(9596):1453-14577. PMID: 18064739.

' src=

Saul Crandon

Leave a reply cancel reply.

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

No Comments on Case-control and Cohort studies: A brief overview

' src=

Very well presented, excellent clarifications. Has put me right back into class, literally!

' src=

Very clear and informative! Thank you.

' src=

very informative article.

' src=

Thank you for the easy to understand blog in cohort studies. I want to follow a group of people with and without a disease to see what health outcomes occurs to them in future such as hospitalisations, diagnoses, procedures etc, as I have many health outcomes to consider, my questions is how to make sure these outcomes has not occurred before the “exposure disease”. As, in cohort studies we are looking at incidence (new) cases, so if an outcome have occurred before the exposure, I can leave them out of the analysis. But because I am not looking at a single outcome which can be checked easily and if happened before exposure can be left out. I have EHR data, so all the exposure and outcome have occurred. my aim is to check the rates of different health outcomes between the exposed)dementia) and unexposed(non-dementia) individuals.

' src=

Very helpful information

' src=

Thanks for making this subject student friendly and easier to understand. A great help.

' src=

Thanks a lot. It really helped me to understand the topic. I am taking epidemiology class this winter, and your paper really saved me.

Happy new year.

' src=

Wow its amazing n simple way of briefing ,which i was enjoyed to learn this.its very easy n quick to pick ideas .. Thanks n stay connected

' src=

Saul you absolute melt! Really good work man

' src=

am a student of public health. This information is simple and well presented to the point. Thank you so much.

' src=

very helpful information provided here

' src=

really thanks for wonderful information because i doing my bachelor degree research by survival model

' src=

Quite informative thank you so much for the info please continue posting. An mph student with Africa university Zimbabwe.

' src=

Thank you this was so helpful amazing

' src=

Apreciated the information provided above.

' src=

So clear and perfect. The language is simple and superb.I am recommending this to all budding epidemiology students. Thanks a lot.

' src=

Great to hear, thank you AJ!

' src=

I have recently completed an investigational study where evidence of phlebitis was determined in a control cohort by data mining from electronic medical records. We then introduced an intervention in an attempt to reduce incidence of phlebitis in a second cohort. Again, results were determined by data mining. This was an expedited study, so there subjects were enrolled in a specific cohort based on date(s) of the drug infused. How do I define this study? Thanks so much.

' src=

thanks for the information and knowledge about observational studies. am a masters student in public health/epidemilogy of the faculty of medicines and pharmaceutical sciences , University of Dschang. this information is very explicit and straight to the point

' src=

Very much helpful

Subscribe to our newsletter

You will receive our monthly newsletter and free access to Trip Premium.

Related Articles

""

Cluster Randomized Trials: Concepts

This blog summarizes the concepts of cluster randomization, and the logistical and statistical considerations while designing a cluster randomized controlled trial.

""

Expertise-based Randomized Controlled Trials

This blog summarizes the concepts of Expertise-based randomized controlled trials with a focus on the advantages and challenges associated with this type of study.

""

An introduction to different types of study design

Conducting successful research requires choosing the appropriate study design. This article describes the most common types of designs conducted by researchers.

News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Oomph library resources: phw 250/250b epidemiologic methods: epidemiologic case study resources.

  • Online Books on Epidemiology and Biostatistics
  • R for Public Health
  • Epidemiologic Case Study Resources
  • Rural Health Resources
  • Stata Resources and Tips
  • Help/Off-Campus Access

Epidemiologic Case Studies

  • Epidemiologic Case Studies (US CDC) These case studies are interactive exercises developed to teach epidemiologic principles and practices. They are based on real-life outbreaks and public health problems and were developed in collaboration with the original investigators and experts from the Centers for Disease Control and Prevention (CDC). The case studies require students to apply their epidemiologic knowledge and skills to problems confronted by public health practitioners at the local, state, and national level every day.
  • Case Studies (WHO) From "Strengthening health security by implementing the International Health Regulations," each case has learning objectives and documentation.
  • Case Studies in Social Medicine A series of Perspective articles from the New England Journal of Medicine that highlight the importance of social concepts and social context in clinical medicine. The series uses discussions of real clinical cases to translate theories and methods for understanding social processes into terms that can readily be used in medical education, clinical practice, and health system planning.
  • African Case Studies in Public Heath Case study exercises based on real events in African contexts and written by experienced Africa-based public health trainers and practitioners. These case studies represent the most up-to-date and context-appropriate case study exercises for African public health training programs. These exercises are designed to reinforce and instill competencies for addressing health threats in the future leaders of public health in Africa.
  • Case Consortium @ Columbia University: Public Health Cases The case collection includes "teaching" cases. Nearly all the cases are multimedia and based on original research; a few are written from secondary sources. All cases are offered free of charge.
  • Epi Teams Training: Case Studies From the North Carolina Institute for Public Health, this curriculum includes several interactive case studies designed be used by the Epi Team as a group. These case studies are based on actual outbreaks that have occurred in North Carolina and elsewhere.
  • National Center for Case Study Teaching in Science The mission of the NCCSTS at the University at Buffalo is to promote the development and dissemination of materials and practices for case teaching in the sciences. Our website provides access to an award-winning collection of peer-reviewed case studies. We offer a five-day summer workshop and a two-day fall conference to train faculty in the case method of teaching science. In addition, we are actively engaged in educational research to assess the impact of the case method on student learning. "Case Collection" includes over 100 public health cases.

Books of Case Studies

case control study epidemiology example

  • << Previous: R for Public Health
  • Next: Rural Health Resources >>
  • Last Updated: Jun 18, 2024 3:39 PM
  • URL: https://guides.lib.berkeley.edu/publichealth/PHW250
  • Search Menu

Sign in through your institution

  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Papyrology
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Social History
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Evolution
  • Language Reference
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Media
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business Ethics
  • Business History
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic History
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • Ethnic Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Theory
  • Politics and Law
  • Politics of Development
  • Public Administration
  • Public Policy
  • Qualitative Political Methodology
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

Epidemiology with R

  • < Previous chapter
  • Next chapter >

7 Case-control and case-cohort studies

  • Published: December 2020
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter addresses Case-control and case-cohort studies. In a Case-control study, one samples persons based on their disease outcome, so the fraction of diseased persons in a Case-control study is usually known (at least approximately) before data collection. In a cohort (follow-up) study, the relationship between some exposure and disease incidence is investigated by following the entire cohort and measuring the rate of occurrence of new cases in the different exposure groups. The follow-up records all persons who develop the disease during the study period. Implicit in this is that the relevant exposure information is available at all times for all persons under follow-up. The chapter then looks at the statistical model for the odds ratio, before differentiating between odds ratio and rate ratio. It also considers confounding and stratified sampling; individually matched studies; and nested Case-control studies.

Personal account

  • Sign in with email/username & password
  • Get email alerts
  • Save searches
  • Purchase content
  • Activate your purchase/trial code
  • Add your ORCID iD

Institutional access

Sign in with a library card.

  • Sign in with username/password
  • Recommend to your librarian
  • Institutional account management
  • Get help with access

Access to content on Oxford Academic is often provided through institutional subscriptions and purchases. If you are a member of an institution with an active account, you may be able to access content in one of the following ways:

IP based access

Typically, access is provided across an institutional network to a range of IP addresses. This authentication occurs automatically, and it is not possible to sign out of an IP authenticated account.

Choose this option to get remote access when outside your institution. Shibboleth/Open Athens technology is used to provide single sign-on between your institution’s website and Oxford Academic.

  • Click Sign in through your institution.
  • Select your institution from the list provided, which will take you to your institution's website to sign in.
  • When on the institution site, please use the credentials provided by your institution. Do not use an Oxford Academic personal account.
  • Following successful sign in, you will be returned to Oxford Academic.

If your institution is not listed or you cannot sign in to your institution’s website, please contact your librarian or administrator.

Enter your library card number to sign in. If you cannot sign in, please contact your librarian.

Society Members

Society member access to a journal is achieved in one of the following ways:

Sign in through society site

Many societies offer single sign-on between the society website and Oxford Academic. If you see ‘Sign in through society site’ in the sign in pane within a journal:

  • Click Sign in through society site.
  • When on the society site, please use the credentials provided by that society. Do not use an Oxford Academic personal account.

If you do not have a society account or have forgotten your username or password, please contact your society.

Sign in using a personal account

Some societies use Oxford Academic personal accounts to provide access to their members. See below.

A personal account can be used to get email alerts, save searches, purchase content, and activate subscriptions.

Some societies use Oxford Academic personal accounts to provide access to their members.

Viewing your signed in accounts

Click the account icon in the top right to:

  • View your signed in personal account and access account management features.
  • View the institutional accounts that are providing access.

Signed in but can't access content

Oxford Academic is home to a wide variety of products. The institutional subscription may not cover the content that you are trying to access. If you believe you should have access to that content, please contact your librarian.

For librarians and administrators, your personal account also provides access to institutional account management. Here you will find options to view and activate subscriptions, manage institutional settings and access options, access usage statistics, and more.

Our books are available by subscription or purchase to libraries and institutions.

Month: Total Views:
October 2022 15
November 2022 22
December 2022 16
January 2023 17
February 2023 11
March 2023 13
April 2023 10
May 2023 13
June 2023 14
July 2023 7
August 2023 18
September 2023 8
October 2023 10
November 2023 25
December 2023 14
January 2024 19
February 2024 20
March 2024 19
April 2024 16
May 2024 6
June 2024 6
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Rights and permissions
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Encyclopedia Britannica

  • Games & Quizzes
  • History & Society
  • Science & Tech
  • Biographies
  • Animals & Nature
  • Geography & Travel
  • Arts & Culture
  • On This Day
  • One Good Fact
  • New Articles
  • Lifestyles & Social Issues
  • Philosophy & Religion
  • Politics, Law & Government
  • World History
  • Health & Medicine
  • Browse Biographies
  • Birds, Reptiles & Other Vertebrates
  • Bugs, Mollusks & Other Invertebrates
  • Environment
  • Fossils & Geologic Time
  • Entertainment & Pop Culture
  • Sports & Recreation
  • Visual Arts
  • Demystified
  • Image Galleries
  • Infographics
  • Top Questions
  • Britannica Kids
  • Saving Earth
  • Space Next 50
  • Student Center
  • Why is biology important?
  • When did science begin?
  • Where was science invented?

Aspirin pills.

case-control study

Our editors will review what you’ve submitted and determine whether to revise the article.

  • National Center for Biotechnology Information - Case Control Studies

case-control study , in epidemiology , observational (nonexperimental) study design used to ascertain information on differences in suspected exposures and outcomes between individuals with a disease of interest (cases) and comparable individuals who do not have the disease (controls). Analysis yields an odds ratio (OR) that reflects the relative probabilities of exposure in the two populations. Case-control studies can be classified as retrospective (dealing with a past exposure) or prospective (dealing with an anticipated exposure), depending on when cases are identified in relation to the measurement of exposures. The case-control study was first used in its modern form in 1926. It grew in popularity in the 1950s following the publication of several seminal case-control studies that established a link between smoking and lung cancer .

Case-control studies are advantageous because they require smaller sample sizes and thus fewer resources and less time than other observational studies. The case-control design also is the most practical option for studying exposure related to rare diseases. That is in part because known cases can be compared with selected controls (as opposed to waiting for cases to emerge, which is required by other observational study designs) and in part because of the rare disease assumption, in which OR mathematically becomes an increasingly better approximation of relative risk as disease incidence declines. Case-control studies also are used for diseases that have long latent periods (long durations between exposure and disease manifestation) and are ideal when multiple potential risk factors are at play.

The primary challenge in designing a case-control study is the appropriate selection of cases and controls. Poor selection can result in confounding, in which correlations that are unrelated to the exposure exist between case and control subjects. Confounding in turn affects estimates of the association between disease and exposure, causing selection bias, which distorts OR figures. To overcome selection bias, controls typically are selected from the same source population as that used for the selection of cases. In addition, cases and controls may be matched by relevant characteristics. During the analysis of study data, multivariate analysis (usually logistic regression) can be used to adjust for the effect of measured confounders.

Bias in a case-control study might also result if exposures cannot be measured or recalled equally in both cases and controls. Healthy controls, for example, may not have been seen by a physician for a particular illness or may not remember the details of their illness. Choosing from a population with a disease different from the one of interest but of similar impact or incidence may minimize recall and measurement bias, since affected individuals may be more likely to recall exposures or to have had their information recorded to a level comparable to cases.

A step-by-step guide to causal study design using real-world data

  • Open access
  • Published: 19 June 2024

Cite this article

You have full access to this open access article

case control study epidemiology example

  • Sarah Ruth Hoffman 1 ,
  • Nilesh Gangan 1 ,
  • Xiaoxue Chen 2 ,
  • Joseph L. Smith 1 ,
  • Arlene Tave 1 ,
  • Yiling Yang 1 ,
  • Christopher L. Crowe 1 ,
  • Susan dosReis 3 &
  • Michael Grabner 1  

396 Accesses

Explore all metrics

Due to the need for generalizable and rapidly delivered evidence to inform healthcare decision-making, real-world data have grown increasingly important to answer causal questions. However, causal inference using observational data poses numerous challenges, and relevant methodological literature is vast. We endeavored to identify underlying unifying themes of causal inference using real-world healthcare data and connect them into a single schema to aid in observational study design, and to demonstrate this schema using a previously published research example. A multidisciplinary team (epidemiology, biostatistics, health economics) reviewed the literature related to causal inference and observational data to identify key concepts. A visual guide to causal study design was developed to concisely and clearly illustrate how the concepts are conceptually related to one another. A case study was selected to demonstrate an application of the guide. An eight-step guide to causal study design was created, integrating essential concepts from the literature, anchored into conceptual groupings according to natural steps in the study design process. The steps include defining the causal research question and the estimand; creating a directed acyclic graph; identifying biases and design and analytic techniques to mitigate their effect, and techniques to examine the robustness of findings. The cardiovascular case study demonstrates the applicability of the steps to developing a research plan. This paper used an existing study to demonstrate the relevance of the guide. We encourage researchers to incorporate this guide at the study design stage in order to elevate the quality of future real-world evidence.

Similar content being viewed by others

case control study epidemiology example

Examples of Applying Causal-Inference Roadmap to Real-World Studies

case control study epidemiology example

Selection Mechanisms and Their Consequences: Understanding and Addressing Selection Bias

case control study epidemiology example

Assessing causality in epidemiology: revisiting Bradford Hill to incorporate developments in causal thinking

Avoid common mistakes on your manuscript.

1 Introduction

Approximately 50 new drugs are approved each year in the United States (Mullard 2022 ). For all new drugs, randomized controlled trials (RCTs) are the gold-standard by which potential effectiveness (“efficacy”) and safety are established. However, RCTs cannot guarantee how a drug will perform in a less controlled context. For this reason, regulators frequently require observational, post-approval studies using “real-world” data, sometimes even as a condition of drug approval. The “real-world” data requested by regulators is often derived from insurance claims databases and/or healthcare records. Importantly, these data are recorded during routine clinical care without concern for potential use in research. Yet, in recent years, there has been increasing use of such data for causal inference and regulatory decision making, presenting a variety of methodologic challenges for researchers and stakeholders to consider (Arlett et al. 2022 ; Berger et al. 2017 ; Concato and ElZarrad 2022 ; Cox et al. 2009 ; European Medicines Agency 2023 ; Franklin and Schneeweiss 2017 ; Girman et al. 2014 ; Hernán and Robins 2016 ; International Society for Pharmacoeconomics and Outcomes Research (ISPOR) 2022 ; International Society for Pharmacoepidemiology (ISPE) 2020 ; Stuart et al. 2013 ; U.S. Food and Drug Administration 2018 ; Velentgas et al. 2013 ).

Current guidance for causal inference using observational healthcare data articulates the need for careful study design (Berger et al. 2017 ; Cox et al. 2009 ; European Medicines Agency 2023 ; Girman et al. 2014 ; Hernán and Robins 2016 ; Stuart et al. 2013 ; Velentgas et al. 2013 ). In 2009, Cox et al. described common sources of bias in observational data and recommended specific strategies to mitigate these biases (Cox et al. 2009 ). In 2013, Stuart et al. emphasized counterfactual theory and trial emulation, offered several approaches to address unmeasured confounding, and provided guidance on the use of propensity scores to balance confounding covariates (Stuart et al. 2013 ). In 2013, the Agency for Healthcare Research and Quality (AHRQ) released an extensive, 200-page guide to developing a protocol for comparative effectiveness research using observational data (Velentgas et al. 2013 ). The guide emphasized development of the research question, with additional chapters on study design, comparator selection, sensitivity analyses, and directed acyclic graphs (Velentgas et al. 2013 ). In 2014, Girman et al. provided a clear set of steps for assessing study feasibility including examination of the appropriateness of the data for the research question (i.e., ‘fit-for-purpose’), empirical equipoise, and interpretability, stating that comparative effectiveness research using observational data “should be designed with the goal of drawing a causal inference” (Girman et al. 2014 ). In 2017 , Berger et al. described aspects of “study hygiene,” focusing on procedural practices to enhance confidence in, and credibility of, real-world data studies (Berger et al. 2017 ). Currently, the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCePP) maintains a guide on methodological standards in pharmacoepidemiology which discusses causal inference using observational data and includes an overview of study designs, a chapter on methods to address bias and confounding, and guidance on writing statistical analysis plans (European Medicines Agency 2023 ). In addition to these resources, the “target trial framework” provides a structured approach to planning studies for causal inferences from observational databases (Hernán and Robins 2016 ; Wang et al. 2023b ). This framework, published in 2016, encourages researchers to first imagine a clinical trial for the study question of interest and then to subsequently design the observational study to reflect the hypothetical trial (Hernán and Robins 2016 ).

While the literature addresses critical issues collectively, there remains a need for a framework that puts key components, including the target trial approach, into a simple, overarching schema (Loveless 2022 ) so they can be more easily remembered, and communicated to all stakeholders including (new) researchers, peer-reviewers, and other users of the research findings (e.g., practicing providers, professional clinical societies, regulators). For this reason, we created a step-by-step guide for causal inference using administrative health data, which aims to integrate these various best practices at a high level and complements existing, more specific guidance, including those from the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) and the International Society for Pharmacoepidemiology (ISPE) (Berger et al. 2017 ; Cox et al. 2009 ; Girman et al. 2014 ). We demonstrate the application of this schema using a previously published paper in cardiovascular research.

This work involved a formative phase and an implementation phase to evaluate the utility of the causal guide. In the formative phase, a multidisciplinary team with research expertise in epidemiology, biostatistics, and health economics reviewed selected literature (peer-reviewed publications, including those mentioned in the introduction, as well as graduate-level textbooks) related to causal inference and observational healthcare data from the pharmacoepidemiologic and pharmacoeconomic perspectives. The potential outcomes framework served as the foundation for our conception of causal inference (Rubin 2005 ). Information was grouped into the following four concepts: (1) Defining the Research Question; (2) Defining the Estimand; (3) Identifying and Mitigating Biases; (4) Sensitivity Analysis. A step-by-step guide to causal study design was developed to distill the essential elements of each concept, organizing them into a single schema so that the concepts are clearly related to one another. References for each step of the schema are included in the Supplemental Table.

In the implementation phase we tested the application of the causal guide to previously published work (Dondo et al. 2017 ). The previously published work utilized data from the Myocardial Ischaemia National Audit Project (MINAP), the United Kingdom’s national heart attack register. The goal of the study was to assess the effect of β-blockers on all-cause mortality among patients hospitalized for acute myocardial infarction without heart failure or left ventricular systolic dysfunction. We selected this paper for the case study because of its clear descriptions of the research goal and methods, and the explicit and methodical consideration of potential biases and use of sensitivity analyses to examine the robustness of the main findings.

3.1 Overview of the eight steps

The step-by-step guide to causal inference comprises eight distinct steps (Fig.  1 ) across the four concepts. As scientific inquiry and study design are iterative processes, the various steps may be completed in a different order than shown, and steps may be revisited.

figure 1

A step-by-step guide for causal study design

Abbreviations: GEE: generalized estimating equations; IPC/TW: inverse probability of censoring/treatment weighting; ITR: individual treatment response; MSM: marginal structural model; TE: treatment effect

Please refer to the Supplemental Table for references providing more in-depth information.

1 Ensure that the exposure and outcome are well-defined based on literature and expert opinion.

2 More specifically, measures of association are not affected by issues such as confounding and selection bias because they do not intend to isolate and quantify a single causal pathway. However, information bias (e.g., variable misclassification) can negatively affect association estimates, and association estimates remain subject to random variability (and are hence reported with confidence intervals).

3 This list is not exhaustive; it focuses on frequently encountered biases.

4 To assess bias in a nonrandomized study following the target trial framework, use of the ROBINS-I tool is recommended ( https://www.bmj.com/content/355/bmj.i4919 ).

5 Only a selection of the most popular approaches is presented here. Other methods exist; e.g., g-computation and g-estimation for both time-invariant and time-varying analysis; instrumental variables; and doubly-robust estimation methods. There are also program evaluation methods (e.g., difference-in-differences, regression discontinuities) that can be applied to pharmacoepidemiologic questions. Conventional outcome regression analysis is not recommended for causal estimation due to issues determining covariate balance, correct model specification, and interpretability of effect estimates.

6 Online tools include, among others, an E-value calculator for unmeasured confounding ( https://www.evalue-calculator.com /) and the P95 outcome misclassification estimator ( http://apps.p-95.com/ISPE /).

3.2 Defining the Research question (step 1)

The process of designing a study begins with defining the research question. Research questions typically center on whether a causal relationship exists between an exposure and an outcome. This contrasts with associative questions, which, by their nature, do not require causal study design elements because they do not attempt to isolate a causal pathway from a single exposure to an outcome under study. It is important to note that the phrasing of the question itself should clarify whether an association or a causal relationship is of interest. The study question “Does statin use reduce the risk of future cardiovascular events?” is explicitly causal and requires that the study design addresses biases such as confounding. In contrast, the study question “Is statin use associated with a reduced risk of future cardiovascular events?” can be answered without control of confounding since the word “association” implies correlation. Too often, however, researchers use the word “association” to describe their findings when their methods were created to address explicitly causal questions (Hernán 2018 ). For example, a study that uses propensity score-based methods to balance risk factors between treatment groups is explicitly attempting to isolate a causal pathway by removing confounding factors. This is different from a study that intends only to measure an association. In fact, some journals may require that the word “association” be used when causal language would be more appropriate; however, this is beginning to change (Flanagin et al. 2024 ).

3.3 Defining the estimand (steps 2, 3, 4)

The estimand is the causal effect of research interest and is described in terms of required design elements: the target population for the counterfactual contrast, the kind of effect, and the effect/outcome measure.

In Step 2, the study team determines the target population of interest, which depends on the research question of interest. For example, we may want to estimate the effect of the treatment in the entire study population, i.e., the hypothetical contrast between all study patients taking the drug of interest versus all study patients taking the comparator (the average treatment effect; ATE). Other effects can be examined, including the average treatment effect in the treated or untreated (ATT or ATU).When covariate distributions are the same across the treated and untreated populations and there is no effect modification by covariates, these effects are generally the same (Wang et al. 2017 ). In RCTs, this occurs naturally due to randomization, but in non-randomized data, careful study design and statistical methods must be used to mitigate confounding bias.

In Step 3, the study team decides whether to measure the intention-to-treat (ITT), per-protocol, or as-treated effect. The ITT approach is also known as “first-treatment-carried-forward” in the observational literature (Lund et al. 2015 ). In trials, the ITT measures the effect of treatment assignment rather than the treatment itself, and in observational data the ITT can be conceptualized as measuring the effect of treatment as started . To compute the ITT effect from observational data, patients are placed into the exposure group corresponding to the treatment that they initiate, and treatment switching or discontinuation are purposely ignored in the analysis. Alternatively, a per-protocol effect can be measured from observational data by classifying patients according to the treatment that they initiated but censoring them when they stop, switch, or otherwise change treatment (Danaei et al. 2013 ; Yang et al. 2014 ). Finally, “as-treated” effects are estimated from observational data by classifying patients according to their actual treatment exposure during follow-up, for example by using multiple time windows to measure exposure changes (Danaei et al. 2013 ; Yang et al. 2014 ).

Step 4 is the final step in specifying the estimand in which the research team determines the effect measure of interest. Answering this question has two parts. First, the team must consider how the outcome of interest will be measured. Risks, rates, hazards, odds, and costs are common ways of measuring outcomes, but each measure may be best suited to a particular scenario. For example, risks assume patients across comparison groups have equal follow-up time, while rates allow for variable follow-up time (Rothman et al. 2008 ). Costs may be of interest in studies focused on economic outcomes, including as inputs to cost-effectiveness analyses. After deciding how the outcome will be measured, it is necessary to consider whether the resulting quantity will be compared across groups using a ratio or a difference. Ratios convey the effect of exposure in a way that is easy to understand, but they do not provide an estimate of how many patients will be affected. On the other hand, differences provide a clearer estimate of the potential public health impact of exposure; for example, by allowing the calculation of the number of patients that must be treated to cause or prevent one instance of the outcome of interest (Tripepi et al. 2007 ).

3.4 Identifying and mitigating biases (steps 5, 6, 7)

Observational, real-world studies can be subject to multiple potential sources of bias, which can be grouped into confounding, selection, measurement, and time-related biases (Prada-Ramallal et al. 2019 ).

In Step 5, as a practical first approach in developing strategies to address threats to causal inference, researchers should create a visual mapping of factors that may be related to the exposure, outcome, or both (also called a directed acyclic graph or DAG) (Pearl 1995 ). While creating a high-quality DAG can be challenging, guidance is increasingly available to facilitate the process (Ferguson et al. 2020 ; Gatto et al. 2022 ; Hernán and Robins 2020 ; Rodrigues et al. 2022 ; Sauer 2013 ). The types of inter-variable relationships depicted by DAGs include confounders, colliders, and mediators. Confounders are variables that affect both exposure and outcome, and it is necessary to control for them in order to isolate the causal pathway of interest. Colliders represent variables affected by two other variables, such as exposure and outcome (Griffith et al. 2020 ). Colliders should not be conditioned on since by doing so, the association between exposure and outcome will become distorted. Mediators are variables that are affected by the exposure and go on to affect the outcome. As such, mediators are on the causal pathway between exposure and outcome and should also not be conditioned on, otherwise a path between exposure and outcome will be closed and the total effect of the exposure on the outcome cannot be estimated. Mediation analysis is a separate type of analysis aiming to distinguish between direct and indirect (mediated) effects between exposure and outcome and may be applied in certain cases (Richiardi et al. 2013 ). Overall, the process of creating a DAG can create valuable insights about the nature of the hypothesized underlying data generating process and the biases that are likely to be encountered (Digitale et al. 2022 ). Finally, an extension to DAGs which incorporates counterfactual theory is available in the form of Single World Intervention Graphs (SWIGs) as described in a 2013 primer (Richardson and Robins 2013 ).

In Step 6, researchers comprehensively assess the possibility of different types of bias in their study, above and beyond what the creation of the DAG reveals. Many potential biases have been identified and summarized in the literature (Berger et al. 2017 ; Cox et al. 2009 ; European Medicines Agency 2023 ; Girman et al. 2014 ; Stuart et al. 2013 ; Velentgas et al. 2013 ). Every study can be subject to one or more biases, each of which can be addressed using one or more methods. The study team should thoroughly and explicitly identify all possible biases with consideration for the specifics of the available data and the nuances of the population and health care system(s) from which the data arise. Once the potential biases are identified and listed, the team can consider potential solutions using a variety of study design and analytic techniques.

In Step 7, the study team considers solutions to the biases identified in Step 6. “Target trial” thinking serves as the basis for many of these solutions by requiring researchers to consider how observational studies can be designed to ensure comparison groups are similar and produce valid inferences by emulating RCTs (Labrecque and Swanson 2017 ; Wang et al. 2023b ). Designing studies to include only new users of a drug and an active comparator group is one way of increasing the similarity of patients across both groups, particularly in terms of treatment history. Careful consideration must be paid to the specification of the time periods and their relationship to inclusion/exclusion criteria (Suissa and Dell’Aniello 2020 ). For instance, if a drug is used intermittently, a longer wash-out period is needed to ensure adequate capture of prior use in order to avoid bias (Riis et al. 2015 ). The study team should consider how to approach confounding adjustment, and whether both time-invariant and time-varying confounding may be present. Many potential biases exist, and many methods have been developed to address them in order to improve causal estimation from observational data. Many of these methods, such as propensity score estimation, can be enhanced by machine learning (Athey and Imbens 2019 ; Belthangady et al. 2021 ; Mai et al. 2022 ; Onasanya et al. 2024 ; Schuler and Rose 2017 ; Westreich et al. 2010 ). Machine learning has many potential applications in the causal inference discipline, and like other tools, must be used with careful planning and intentionality. To aid in the assessment of potential biases, especially time-related ones, and the development of a plan to address them, the study design should be visualized (Gatto et al. 2022 ; Schneeweiss et al. 2019 ). Additionally, we note the opportunity for collaboration across research disciplines (e.g., the application of difference-in-difference methods (Zhou et al. 2016 ) to the estimation of comparative drug effectiveness and safety).

3.5 Quality Control & sensitivity analyses (step 8)

Causal study design concludes with Step 8, which includes planning quality control and sensitivity analyses to improve the internal validity of the study. Quality control begins with reviewing study output for prima facie validity. Patient characteristics (e.g., distributions of age, sex, region) should align with expected values from the researchers’ intuition and the literature, and researchers should assess reasons for any discrepancies. Sensitivity analyses should be conducted to determine the robustness of study findings. Researchers can test the stability of study estimates using a different estimand or type of model than was used in the primary analysis. Sensitivity analysis estimates that are similar to those of the primary analysis might confirm that the primary analysis estimates are appropriate. The research team may be interested in how changes to study inclusion/exclusion criteria may affect study findings or wish to address uncertainties related to measuring the exposure or outcome in the administrative data by modifying the algorithms used to identify exposure or outcome (e.g., requiring hospitalization with a diagnosis code in a principal position rather than counting any claim with the diagnosis code in any position). As feasible, existing validation studies for the exposure and outcome should be referenced, or new validation efforts undertaken. The results of such validation studies can inform study estimates via quantitative bias analyses (Lanes and Beachler 2023 ). The study team may also consider biases arising from unmeasured confounding and plan quantitative bias analyses to explore how unmeasured confounding may impact estimates. Quantitative bias analysis can assess the directionality, magnitude, and uncertainty of errors arising from a variety of limitations (Brenner and Gefeller 1993 ; Lash et al. 2009 , 2014 ; Leahy et al. 2022 ).

3.6 Illustration using a previously published research study

In order to demonstrate how the guide can be used to plan a research study utilizing causal methods, we turn to a previously published study (Dondo et al. 2017 ) that assessed the causal relationship between the use of 𝛽-blockers and mortality after acute myocardial infarction in patients without heart failure or left ventricular systolic dysfunction. The investigators sought to answer a causal research question (Step 1), and so we proceed to Step 2. Use (or no use) of 𝛽-blockers was determined after discharge without taking into consideration discontinuation or future treatment changes (i.e., intention-to-treat). Considering treatment for whom (Step 3), both ATE and ATT were evaluated. Since survival was the primary outcome, an absolute difference in survival time was chosen as the effect measure (Step 4). While there was no explicit directed acyclic graph provided, the investigators specified a list of confounders.

Robust methodologies were established by consideration of possible sources of biases and addressing them using viable solutions (Steps 6 and 7). Table  1 offers a list of the identified potential biases and their corresponding solutions as implemented. For example, to minimize potential biases including prevalent-user bias and selection bias, the sample was restricted to patients with no previous use of 𝛽-blockers, no contraindication for 𝛽-blockers, and no prescription of loop diuretics. To improve balance across the comparator groups in terms of baseline confounders, i.e., those that could influence both exposure (𝛽-blocker use) and outcome (mortality), propensity score-based inverse probability of treatment weighting (IPTW) was employed. However, we noted that the baseline look-back period to assess measured covariates was not explicitly listed in the paper.

Quality control and sensitivity analysis (Step 8) is described extensively. The overlap of propensity score distributions between comparator groups was tested and confounder balance was assessed. Since observations in the tail-end of the propensity score distribution may violate the positivity assumption (Crump et al. 2009 ), a sensitivity analysis was conducted including only cases within 0.1 to 0.9 of the propensity score distribution. While not mentioned by the authors, the PS tails can be influenced by unmeasured confounders (Sturmer et al. 2021 ), and the findings were robust with and without trimming. An assessment of extreme IPTW weights, while not included, would further help increase confidence in the robustness of the analysis. An instrumental variable approach was employed to assess potential selection bias due to unmeasured confounding, using hospital rates of guideline-indicated prescribing as the instrument. Additionally, potential bias caused by missing data was attenuated through the use of multiple imputation, and separate models were built for complete cases only and imputed/complete cases.

4 Discussion

We have described a conceptual schema for designing observational real-world studies to estimate causal effects. The application of this schema to a previously published study illuminates the methodologic structure of the study, revealing how each structural element is related to a potential bias which it is meant to address. Real-world evidence is increasingly accepted by healthcare stakeholders, including the FDA (Concato and Corrigan-Curay 2022 ; Concato and ElZarrad 2022 ), and its use for comparative effectiveness and safety assessments requires appropriate causal study design; our guide is meant to facilitate this design process and complement existing, more specific, guidance.

Existing guidance for causal inference using observational data includes components that can be clearly mapped onto the schema that we have developed. For example, in 2009 Cox et al. described common sources of bias in observational data and recommended specific strategies to mitigate these biases, corresponding to steps 6–8 of our step-by-step guide (Cox et al. 2009 ). In 2013, the AHRQ emphasized development of the research question, corresponding to steps 1–4 of our guide, with additional chapters on study design, comparator selection, sensitivity analyses, and directed acyclic graphs which correspond to steps 7 and 5, respectively (Velentgas et al. 2013 ). Much of Girman et al.’s manuscript (Girman et al. 2014 ) corresponds with steps 1–4 of our guide, and the matter of equipoise and interpretability specifically correspond to steps 3 and 7–8. The current ENCePP guide on methodological standards in pharmacoepidemiology contains a section on formulating a meaningful research question, corresponding to step 1, and describes strategies to mitigate specific sources of bias, corresponding to steps 6–8 (European Medicines Agency 2023 ). Recent works by the FDA Sentinel Innovation Center (Desai et al. 2024 ) and the Joint Initiative for Causal Inference (Dang et al. 2023 ) provide more advanced exposition of many of the steps in our guide. The target trial framework contains guidance on developing seven components of the study protocol, including eligibility criteria, treatment strategies, assignment procedures, follow-up period, outcome, causal contrast of interest, and analysis plan (Hernán and Robins 2016 ). Our work places the target trial framework into a larger context illustrating its relationship with other important study planning considerations, including the creation of a directed acyclic graph and incorporation of prespecified sensitivity and quantitative bias analyses.

Ultimately, the feasibility of estimating causal effects relies on the capabilities of the available data. Real-world data sources are complex, and the investigator must carefully consider whether the data on hand are sufficient to answer the research question. For example, a study that relies solely on claims data for outcome ascertainment may suffer from outcome misclassification bias (Lanes and Beachler 2023 ). This bias can be addressed through medical record validation for a random subset of patients, followed by quantitative bias analysis (Lanes and Beachler 2023 ). If instead, the investigator wishes to apply a previously published, claims-based algorithm validated in a different database, they must carefully consider the transportability of that algorithm to their own study population. In this way, causal inference from real-world data requires the ability to think creatively and resourcefully about how various data sources and elements can be leveraged, with consideration for the strengths and limitations of each source. The heart of causal inference is in the pairing of humility and creativity: the humility to acknowledge what the data cannot do, and the creativity to address those limitations as best as one can at the time.

4.1 Limitations

As with any attempt to synthesize a broad array of information into a single, simplified schema, there are several limitations to our work. Space and useability constraints necessitated simplification of the complex source material and selections among many available methodologies, and information about the relative importance of each step is not currently included. Additionally, it is important to consider the context of our work. This step-by-step guide emphasizes analytic techniques (e.g., propensity scores) that are used most frequently within our own research environment and may not include less familiar study designs and analytic techniques. However, one strength of the guide is that additional designs and techniques or concepts can easily be incorporated into the existing schema. The benefit of a schema is that new information can be added and is more readily accessed due to its association with previously sorted information (Loveless 2022 ). It is also important to note that causal inference was approached as a broad overarching concept defined by the totality of the research, from start to finish, rather than focusing on a particular analytic technique, however we view this as a strength rather than a limitation.

Finally, the focus of this guide was on the methodologic aspects of study planning. As a result, we did not include steps for drafting or registering the study protocol in a public database or for communicating results. We strongly encourage researchers to register their study protocols and communicate their findings with transparency. A protocol template endorsed by ISPOR and ISPE for studies using real-world data to evaluate treatment effects is available (Wang et al. 2023a ). Additionally, the steps described above are intended to illustrate an order of thinking in the study planning process, and these steps are often iterative. The guide is not intended to reflect the order of study execution; specifically, quality control procedures and sensitivity analyses should also be formulated up-front at the protocol stage.

5 Conclusion

We outlined steps and described key conceptual issues of importance in designing real-world studies to answer causal questions, and created a visually appealing, user-friendly resource to help researchers clearly define and navigate these issues. We hope this guide serves to enhance the quality, and thus the impact, of real-world evidence.

Data availability

No datasets were generated or analysed during the current study.

Arlett, P., Kjaer, J., Broich, K., Cooke, E.: Real-world evidence in EU Medicines Regulation: Enabling Use and establishing value. Clin. Pharmacol. Ther. 111 (1), 21–23 (2022)

Article   PubMed   Google Scholar  

Athey, S., Imbens, G.W.: Machine Learning Methods That Economists Should Know About. Annual Review of Economics 11(Volume 11, 2019): 685–725. (2019)

Belthangady, C., Stedden, W., Norgeot, B.: Minimizing bias in massive multi-arm observational studies with BCAUS: Balancing covariates automatically using supervision. BMC Med. Res. Methodol. 21 (1), 190 (2021)

Article   PubMed   PubMed Central   Google Scholar  

Berger, M.L., Sox, H., Willke, R.J., Brixner, D.L., Eichler, H.G., Goettsch, W., Madigan, D., Makady, A., Schneeweiss, S., Tarricone, R., Wang, S.V., Watkins, J.: and C. Daniel Mullins. 2017. Good practices for real-world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR-ISPE Special Task Force on real-world evidence in health care decision making. Pharmacoepidemiol Drug Saf. 26 (9): 1033–1039

Brenner, H., Gefeller, O.: Use of the positive predictive value to correct for disease misclassification in epidemiologic studies. Am. J. Epidemiol. 138 (11), 1007–1015 (1993)

Article   CAS   PubMed   Google Scholar  

Concato, J., Corrigan-Curay, J.: Real-world evidence - where are we now? N Engl. J. Med. 386 (18), 1680–1682 (2022)

Concato, J., ElZarrad, M.: FDA Issues Draft Guidances on Real-World Evidence, Prepares to Publish More in Future [accessed on 2022]. (2022). https://www.fda.gov/drugs/news-events-human-drugs/fda-issues-draft-guidances-real-world-evidence-prepares-publish-more-future

Cox, E., Martin, B.C., Van Staa, T., Garbe, E., Siebert, U., Johnson, M.L.: Good research practices for comparative effectiveness research: Approaches to mitigate bias and confounding in the design of nonrandomized studies of treatment effects using secondary data sources: The International Society for Pharmacoeconomics and Outcomes Research Good Research Practices for Retrospective Database Analysis Task Force Report–Part II. Value Health. 12 (8), 1053–1061 (2009)

Crump, R.K., Hotz, V.J., Imbens, G.W., Mitnik, O.A.: Dealing with limited overlap in estimation of average treatment effects. Biometrika. 96 (1), 187–199 (2009)

Article   Google Scholar  

Danaei, G., Rodriguez, L.A., Cantero, O.F., Logan, R., Hernan, M.A.: Observational data for comparative effectiveness research: An emulation of randomised trials of statins and primary prevention of coronary heart disease. Stat. Methods Med. Res. 22 (1), 70–96 (2013)

Dang, L.E., Gruber, S., Lee, H., Dahabreh, I.J., Stuart, E.A., Williamson, B.D., Wyss, R., Diaz, I., Ghosh, D., Kiciman, E., Alemayehu, D., Hoffman, K.L., Vossen, C.Y., Huml, R.A., Ravn, H., Kvist, K., Pratley, R., Shih, M.C., Pennello, G., Martin, D., Waddy, S.P., Barr, C.E., Akacha, M., Buse, J.B., van der Laan, M., Petersen, M.: A causal roadmap for generating high-quality real-world evidence. J. Clin. Transl Sci. 7 (1), e212 (2023)

Desai, R.J., Wang, S.V., Sreedhara, S.K., Zabotka, L., Khosrow-Khavar, F., Nelson, J.C., Shi, X., Toh, S., Wyss, R., Patorno, E., Dutcher, S., Li, J., Lee, H., Ball, R., Dal Pan, G., Segal, J.B., Suissa, S., Rothman, K.J., Greenland, S., Hernan, M.A., Heagerty, P.J., Schneeweiss, S.: Process guide for inferential studies using healthcare data from routine clinical practice to evaluate causal effects of drugs (PRINCIPLED): Considerations from the FDA Sentinel Innovation Center. BMJ. 384 , e076460 (2024)

Digitale, J.C., Martin, J.N., Glymour, M.M.: Tutorial on directed acyclic graphs. J. Clin. Epidemiol. 142 , 264–267 (2022)

Dondo, T.B., Hall, M., West, R.M., Jernberg, T., Lindahl, B., Bueno, H., Danchin, N., Deanfield, J.E., Hemingway, H., Fox, K.A.A., Timmis, A.D., Gale, C.P.: beta-blockers and Mortality after Acute myocardial infarction in patients without heart failure or ventricular dysfunction. J. Am. Coll. Cardiol. 69 (22), 2710–2720 (2017)

Article   CAS   PubMed   PubMed Central   Google Scholar  

European Medicines Agency: ENCePP Guide on Methodological Standards in Pharmacoepidemiology [accessed on 2023]. (2023). https://www.encepp.eu/standards_and_guidances/methodologicalGuide.shtml

Ferguson, K.D., McCann, M., Katikireddi, S.V., Thomson, H., Green, M.J., Smith, D.J., Lewsey, J.D.: Evidence synthesis for constructing directed acyclic graphs (ESC-DAGs): A novel and systematic method for building directed acyclic graphs. Int. J. Epidemiol. 49 (1), 322–329 (2020)

Flanagin, A., Lewis, R.J., Muth, C.C., Curfman, G.: What does the proposed causal inference Framework for Observational studies Mean for JAMA and the JAMA Network Journals? JAMA (2024)

U.S. Food and Drug Administration: Framework for FDA’s Real-World Evidence Program [accessed on 2018]. (2018). https://www.fda.gov/media/120060/download

Franklin, J.M., Schneeweiss, S.: When and how can Real World Data analyses substitute for randomized controlled trials? Clin. Pharmacol. Ther. 102 (6), 924–933 (2017)

Gatto, N.M., Wang, S.V., Murk, W., Mattox, P., Brookhart, M.A., Bate, A., Schneeweiss, S., Rassen, J.A.: Visualizations throughout pharmacoepidemiology study planning, implementation, and reporting. Pharmacoepidemiol Drug Saf. 31 (11), 1140–1152 (2022)

Girman, C.J., Faries, D., Ryan, P., Rotelli, M., Belger, M., Binkowitz, B., O’Neill, R.: and C. E. R. S. W. G. Drug Information Association. 2014. Pre-study feasibility and identifying sensitivity analyses for protocol pre-specification in comparative effectiveness research. J. Comp. Eff. Res. 3 (3): 259–270

Griffith, G.J., Morris, T.T., Tudball, M.J., Herbert, A., Mancano, G., Pike, L., Sharp, G.C., Sterne, J., Palmer, T.M., Davey Smith, G., Tilling, K., Zuccolo, L., Davies, N.M., Hemani, G.: Collider bias undermines our understanding of COVID-19 disease risk and severity. Nat. Commun. 11 (1), 5749 (2020)

Hernán, M.A.: The C-Word: Scientific euphemisms do not improve causal inference from Observational Data. Am. J. Public Health. 108 (5), 616–619 (2018)

Hernán, M.A., Robins, J.M.: Using Big Data to emulate a target Trial when a Randomized Trial is not available. Am. J. Epidemiol. 183 (8), 758–764 (2016)

Hernán, M., Robins, J.: Causal Inference: What if. Chapman & Hall/CRC, Boca Raton (2020)

Google Scholar  

International Society for Pharmacoeconomics and Outcomes Research (ISPOR): Strategic Initiatives: Real-World Evidence [accessed on 2022]. (2022). https://www.ispor.org/strategic-initiatives/real-world-evidence

International Society for Pharmacoepidemiology (ISPE): Position on Real-World Evidence [accessed on 2020]. (2020). https://pharmacoepi.org/pub/?id=136DECF1-C559-BA4F-92C4-CF6E3ED16BB6

Labrecque, J.A., Swanson, S.A.: Target trial emulation: Teaching epidemiology and beyond. Eur. J. Epidemiol. 32 (6), 473–475 (2017)

Lanes, S., Beachler, D.C.: Validation to correct for outcome misclassification bias. Pharmacoepidemiol Drug Saf. (2023)

Lash, T.L., Fox, M.P., Fink, A.K.: Applying Quantitative bias Analysis to Epidemiologic data. Springer (2009)

Lash, T.L., Fox, M.P., MacLehose, R.F., Maldonado, G., McCandless, L.C., Greenland, S.: Good practices for quantitative bias analysis. Int. J. Epidemiol. 43 (6), 1969–1985 (2014)

Leahy, T.P., Kent, S., Sammon, C., Groenwold, R.H., Grieve, R., Ramagopalan, S., Gomes, M.: Unmeasured confounding in nonrandomized studies: Quantitative bias analysis in health technology assessment. J. Comp. Eff. Res. 11 (12), 851–859 (2022)

Loveless, B.: A Complete Guide to Schema Theory and its Role in Education [accessed on 2022]. (2022). https://www.educationcorner.com/schema-theory/

Lund, J.L., Richardson, D.B., Sturmer, T.: The active comparator, new user study design in pharmacoepidemiology: Historical foundations and contemporary application. Curr. Epidemiol. Rep. 2 (4), 221–228 (2015)

Mai, X., Teng, C., Gao, Y., Governor, S., He, X., Kalloo, G., Hoffman, S., Mbiydzenyuy, D., Beachler, D.: A pragmatic comparison of logistic regression versus machine learning methods for propensity score estimation. Supplement: Abstracts of the 38th International Conference on Pharmacoepidemiology: Advancing Pharmacoepidemiology and Real-World Evidence for the Global Community, August 26–28, 2022, Copenhagen, Denmark. Pharmacoepidemiology and Drug Safety 31(S2). (2022)

Mullard, A.: 2021 FDA approvals. Nat. Rev. Drug Discov. 21 (2), 83–88 (2022)

Onasanya, O., Hoffman, S., Harris, K., Dixon, R., Grabner, M.: Current applications of machine learning for causal inference in healthcare research using observational data. International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Atlanta, GA. (2024)

Pearl, J.: Causal diagrams for empirical research. Biometrika. 82 (4), 669–688 (1995)

Prada-Ramallal, G., Takkouche, B., Figueiras, A.: Bias in pharmacoepidemiologic studies using secondary health care databases: A scoping review. BMC Med. Res. Methodol. 19 (1), 53 (2019)

Richardson, T.S., Robins, J.M.: Single World Intervention Graphs: A Primer [accessed on 2013]. (2013). https://www.stats.ox.ac.uk/~evans/uai13/Richardson.pdf

Richiardi, L., Bellocco, R., Zugna, D.: Mediation analysis in epidemiology: Methods, interpretation and bias. Int. J. Epidemiol. 42 (5), 1511–1519 (2013)

Riis, A.H., Johansen, M.B., Jacobsen, J.B., Brookhart, M.A., Sturmer, T., Stovring, H.: Short look-back periods in pharmacoepidemiologic studies of new users of antibiotics and asthma medications introduce severe misclassification. Pharmacoepidemiol Drug Saf. 24 (5), 478–485 (2015)

Rodrigues, D., Kreif, N., Lawrence-Jones, A., Barahona, M., Mayer, E.: Reflection on modern methods: Constructing directed acyclic graphs (DAGs) with domain experts for health services research. Int. J. Epidemiol. 51 (4), 1339–1348 (2022)

Rothman, K.J., Greenland, S., Lash, T.L.: Modern Epidemiology. Wolters Kluwer Health/Lippincott Williams & Wilkins, Philadelphia (2008)

Rubin, D.B.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100 (469), 322–331 (2005)

Article   CAS   Google Scholar  

Sauer, B.V.: TJ. Use of Directed Acyclic Graphs. In Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide , edited by P. Velentgas, N. Dreyer, and P. Nourjah: Agency for Healthcare Research and Quality (US) (2013)

Schneeweiss, S., Rassen, J.A., Brown, J.S., Rothman, K.J., Happe, L., Arlett, P., Dal Pan, G., Goettsch, W., Murk, W., Wang, S.V.: Graphical depiction of longitudinal study designs in Health Care databases. Ann. Intern. Med. 170 (6), 398–406 (2019)

Schuler, M.S., Rose, S.: Targeted maximum likelihood estimation for causal inference in Observational studies. Am. J. Epidemiol. 185 (1), 65–73 (2017)

Stuart, E.A., DuGoff, E., Abrams, M., Salkever, D., Steinwachs, D.: Estimating causal effects in observational studies using Electronic Health data: Challenges and (some) solutions. EGEMS (Wash DC) 1 (3). (2013)

Sturmer, T., Webster-Clark, M., Lund, J.L., Wyss, R., Ellis, A.R., Lunt, M., Rothman, K.J., Glynn, R.J.: Propensity score weighting and trimming strategies for reducing Variance and Bias of Treatment Effect estimates: A Simulation Study. Am. J. Epidemiol. 190 (8), 1659–1670 (2021)

Suissa, S., Dell’Aniello, S.: Time-related biases in pharmacoepidemiology. Pharmacoepidemiol Drug Saf. 29 (9), 1101–1110 (2020)

Tripepi, G., Jager, K.J., Dekker, F.W., Wanner, C., Zoccali, C.: Measures of effect: Relative risks, odds ratios, risk difference, and ‘number needed to treat’. Kidney Int. 72 (7), 789–791 (2007)

Velentgas, P., Dreyer, N., Nourjah, P., Smith, S., Torchia, M.: Developing a Protocol for Observational Comparative Effectiveness Research: A User’s Guide. Agency for Healthcare Research and Quality (AHRQ) Publication 12(13). (2013)

Wang, A., Nianogo, R.A., Arah, O.A.: G-computation of average treatment effects on the treated and the untreated. BMC Med. Res. Methodol. 17 (1), 3 (2017)

Wang, S.V., Pottegard, A., Crown, W., Arlett, P., Ashcroft, D.M., Benchimol, E.I., Berger, M.L., Crane, G., Goettsch, W., Hua, W., Kabadi, S., Kern, D.M., Kurz, X., Langan, S., Nonaka, T., Orsini, L., Perez-Gutthann, S., Pinheiro, S., Pratt, N., Schneeweiss, S., Toussi, M., Williams, R.J.: HARmonized Protocol Template to enhance reproducibility of hypothesis evaluating real-world evidence studies on treatment effects: A good practices report of a joint ISPE/ISPOR task force. Pharmacoepidemiol Drug Saf. 32 (1), 44–55 (2023a)

Wang, S.V., Schneeweiss, S., Initiative, R.-D., Franklin, J.M., Desai, R.J., Feldman, W., Garry, E.M., Glynn, R.J., Lin, K.J., Paik, J., Patorno, E., Suissa, S., D’Andrea, E., Jawaid, D., Lee, H., Pawar, A., Sreedhara, S.K., Tesfaye, H., Bessette, L.G., Zabotka, L., Lee, S.B., Gautam, N., York, C., Zakoul, H., Concato, J., Martin, D., Paraoan, D.: and K. Quinto. Emulation of Randomized Clinical Trials With Nonrandomized Database Analyses: Results of 32 Clinical Trials. JAMA 329(16): 1376-85. (2023b)

Westreich, D., Lessler, J., Funk, M.J.: Propensity score estimation: Neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J. Clin. Epidemiol. 63 (8), 826–833 (2010)

Yang, S., Eaton, C.B., Lu, J., Lapane, K.L.: Application of marginal structural models in pharmacoepidemiologic studies: A systematic review. Pharmacoepidemiol Drug Saf. 23 (6), 560–571 (2014)

Zhou, H., Taber, C., Arcona, S., Li, Y.: Difference-in-differences method in comparative Effectiveness Research: Utility with unbalanced groups. Appl. Health Econ. Health Policy. 14 (4), 419–429 (2016)

Download references

The authors received no financial support for this research.

Author information

Authors and affiliations.

Carelon Research, Wilmington, DE, USA

Sarah Ruth Hoffman, Nilesh Gangan, Joseph L. Smith, Arlene Tave, Yiling Yang, Christopher L. Crowe & Michael Grabner

Elevance Health, Indianapolis, IN, USA

Xiaoxue Chen

University of Maryland School of Pharmacy, Baltimore, MD, USA

Susan dosReis

You can also search for this author in PubMed   Google Scholar

Contributions

SH, NG, JS, AT, CC, MG are employees of Carelon Research, a wholly owned subsidiary of Elevance Health, which conducts health outcomes research with both internal and external funding, including a variety of private and public entities. XC was an employee of Elevance Health at the time of study conduct. YY was an employee of Carelon Research at the time of study conduct. SH, MG, and JLS are shareholders of Elevance Health. SdR receives funding from GlaxoSmithKline for a project unrelated to the content of this manuscript and conducts research that is funded by state and federal agencies.

Corresponding author

Correspondence to Sarah Ruth Hoffman .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary material 2, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Hoffman, S.R., Gangan, N., Chen, X. et al. A step-by-step guide to causal study design using real-world data. Health Serv Outcomes Res Method (2024). https://doi.org/10.1007/s10742-024-00333-6

Download citation

Received : 07 December 2023

Revised : 31 May 2024

Accepted : 10 June 2024

Published : 19 June 2024

DOI : https://doi.org/10.1007/s10742-024-00333-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Causal inference
  • Real-world data
  • Confounding
  • Non-randomized data
  • Bias in pharmacoepidemiology
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 14 June 2024

Associations between deep venous thrombosis and thyroid diseases: a two-sample bidirectional Mendelian randomization study

  • Lifeng Zhang 1   na1 ,
  • Kaibei Li 2   na1 ,
  • Qifan Yang 1 ,
  • Yao Lin 1 ,
  • Caijuan Geng 1 ,
  • Wei Huang 1 &
  • Wei Zeng 1  

European Journal of Medical Research volume  29 , Article number:  327 ( 2024 ) Cite this article

316 Accesses

Metrics details

Some previous observational studies have linked deep venous thrombosis (DVT) to thyroid diseases; however, the findings were contradictory. This study aimed to investigate whether some common thyroid diseases can cause DVT using a two-sample Mendelian randomization (MR) approach.

This two-sample MR study used single nucleotide polymorphisms (SNPs) identified by the FinnGen genome-wide association studies (GWAS) to be highly associated with some common thyroid diseases, including autoimmune hyperthyroidism (962 cases and 172,976 controls), subacute thyroiditis (418 cases and 187,684 controls), hypothyroidism (26,342 cases and 59,827 controls), and malignant neoplasm of the thyroid gland (989 cases and 217,803 controls. These SNPs were used as instruments. Outcome datasets for the GWAS on DVT (6,767 cases and 330,392 controls) were selected from the UK Biobank data, which was obtained from the Integrative Epidemiology Unit (IEU) open GWAS project. The inverse variance weighted (IVW), MR-Egger and weighted median methods were used to estimate the causal association between DVT and thyroid diseases. The Cochran’s Q test was used to quantify the heterogeneity of the instrumental variables (IVs). MR Pleiotropy RESidual Sum and Outlier test (MR-PRESSO) was used to detect horizontal pleiotropy. When the causal relationship was significant, bidirectional MR analysis was performed to determine any reverse causal relationships between exposures and outcomes.

This MR study illustrated that autoimmune hyperthyroidism slightly increased the risk of DVT according to the IVW [odds ratio (OR) = 1.0009; p  = 0.024] and weighted median methods [OR = 1.001; p  = 0.028]. According to Cochran’s Q test, there was no evidence of heterogeneity in IVs. Additionally, MR-PRESSO did not detect horizontal pleiotropy ( p  = 0.972). However, no association was observed between other thyroid diseases and DVT using the IVW, weighted median, and MR-Egger regression methods.

Conclusions

This study revealed that autoimmune hyperthyroidism may cause DVT; however, more evidence and larger sample sizes are required to draw more precise conclusions.

Introduction

Deep venous thrombosis (DVT) is a common type of disease that occurs in 1–2 individuals per 1000 each year [ 1 ]. In the post-COVID-19 era, DVT showed a higher incidence rate [ 2 ]. Among hospitalized patients, the incidence rate of this disease was as high as 2.7% [ 3 ], increasing the risk of adverse events during hospitalization. According to the Registro Informatizado Enfermedad Tromboembolica (RIETE) registry, which included data from ~ 100,000 patients from 26 countries, the 30-day mortality rate was 2.6% for distal DVT and 3.3% for proximal DVT [ 4 ]. Other studies have shown that the one-year mortality rate of DVT is 19.6% [ 5 ]. DVT and pulmonary embolism (PE), collectively referred to as venous thromboembolism (VTE), constitute a major global burden of disease [ 6 ].

Thyroid diseases are common in the real world. Previous studies have focused on the relationship between DVT and thyroid diseases, including thyroid dysfunction and thyroid cancer. Some case reports [ 7 , 8 , 9 ] have demonstrated that hyperthyroidism is often associated with DVT and indicates a worse prognosis [ 10 ]. The relationship between thyroid tumors and venous thrombosis has troubled researchers for many years. In 1989, the first case of papillary thyroid carcinoma presenting with axillary vein thrombosis as the initial symptom was reported [ 11 ]. In 1995, researchers began to notice the relationship between thyroid tumors and hypercoagulability [ 12 ], laying the foundation for subsequent extensive research. However, the aforementioned observational studies had limitations, such as small sample sizes, selection bias, reverse causality, and confounding factors, which may have led to unreliable conclusions [ 13 ].

Previous studies have explored the relationship of thyroid disease and DVT and revealed that high levels of thyroid hormones may increase the risk of DVT. Hyperthyroidism promotes a procoagulant and hypofibrinolytic state by affecting the von Willebrand factor, factors VIII, IV, and X, fibrinogen, and plasminogen activator inhibitor-1 [ 14 , 15 ]. At the molecular level, researchers believe that thyroid hormones affect coagulation levels through an important nuclear thyroid hormone receptor (TR), TRβ [ 16 ], and participate in pathological coagulation through endothelial dysfunction. Thyroid hormones may have non-genetic effects on the behavior of endothelial cells [ 17 , 18 ]. In a study regarding tumor thrombosis, Lou [ 19 ] found that 303 circular RNAs were differentially expressed in DVT using microarray. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed that the most significantly enriched pathways included thyroid hormone-signaling pathway and endocytosis, and also increased level of proteoglycans in cancer. This indicated that tumor cells and thyroid hormones might interact to promote thrombosis. Based on these studies, we speculated that thyroid diseases, including thyroid dysfunction and thyroid tumors, may cause DVT.

Mendelian randomization (MR) research is a causal inference technique that can be used to assess the causal relationship and reverse causation between specific exposure and outcome factors. If certain assumptions [ 20 ] are fulfilled, genetic variants can be employed as instrumental variables (IVs) to establish causal relationships. Bidirectional MR analysis can clarify the presence of reverse causal relationships [ 21 ], making the conclusions more comprehensive. Accordingly, we aimed to apply a two-sample MR strategy to investigate whether DVT is related to four thyroid diseases, including autoimmune hyperthyroidism, subacute thyroiditis, hypothyroidism, and thyroid cancer.

Study design

MR relies on single nucleotide polymorphisms (SNPs) as IVs. The IVs should fulfill the following three criteria [ 22 ]: (1) IVs should be strongly associated with exposure. (2) Genetic variants must be independent of unmeasured confounding factors that may affect the exposure–outcome association. (3) IVs are presumed to affect the outcome only through their associations with exposure (Fig.  1 ). IVs that met the above requirements were used to estimate the relationship between exposure and outcome. Our study protocol conformed to the STROBE-MR Statement [ 23 ], and all methods were performed in accordance with the relevant guidelines and regulations.

figure 1

The relationship between instrumental variables, exposure, outcome, and confounding factors

Data sources and instruments

Datasets (Table  1 ) in this study were obtained from a publicly available database (the IEU open genome-wide association studies (GWAS) project [ 24 ] ( https://gwas.mrcieu.ac.uk )). There was no overlap in samples between the data sources of outcome and exposures. Using de-identified summary-level data, privacy information such as overall age and gender were hidden. Ethical approval was obtained for all original work. This study complied with the terms of use of the database.

MR analysis was performed using the R package “TwoSampleMR”. SNPs associated with each thyroid disease at the genome-wide significance threshold of p  < 5.0 × 10 –8 were selected as potential IVs. To ensure independence between the genetic variants used as IVs, the linkage disequilibrium (LD) threshold for grouping was set to r 2  < 0.001 with a window size of 10,000 kb. The SNP with the lowest p -value at each locus was retained for analyses.

Statistical analysis

Multiple MR methods were used to infer causal relationships between thyroid diseases and DVT, including the inverse variance weighted (IVW), weighted median, and MR-Egger tests, after harmonizing the SNPs across the GWASs of exposures and outcomes. The main analysis was conducted using the IVW method. Heterogeneity and pleiotropy were also performed in each MR analysis. Meanwhile, the MR-PRESSO Global test [ 25 ] was utilized to detect horizontal pleiotropy. The effect trend of SNP was observed through a scatter plot, and the forest plot was used to observe the overall effects. When a significant causal relationship was confirmed by two-sample MR analysis, bidirectional MR analysis was performed to assess reverse causal relationships by swapping exposure and outcome factors. Parameters were set the same as before. All abovementioned statistical analyses were performed using the package TwoSampleMR (version 0.5.7) in the R program (version 4.2.1).

After harmonizing the SNPs across the GWASs for exposures and outcomes, the IVW (OR = 1.0009, p  = 0.024, Table  2 ) and weighted median analyses (OR = 1.001, p  = 0.028) revealed significant causal effects between autoimmune hyperthyroidism and DVT risk. Similar results were observed using the weighted median approach Cochran’s Q test, MR-Egger intercept, and MR-PRESSO tests suggested that the results were not influenced by pleiotropy and heterogeneity (Table  2 ). However, the leave-one-out analysis revealed a significant difference after removing some SNPs (rs179247, rs6679677, rs72891915, and rs942495, p  < 0.05, Figure S2a), indicating that MR results were dependent on these SNPs (Figure S2, Table S1). No significant effects were observed in other thyroid diseases (Table  2 ). The estimated scatter plot of the association between thyroid diseases and DVT is presented in Fig.  2 , indicating a positive causal relationship between autoimmune hyperthyroidism and DVT (Fig.  2 a). The forest plots of single SNPs affecting the risk of DVT are displayed in Figure S1.

figure 2

The estimated scatter plot of the association between thyroid diseases and DVT. MR-analyses are derived using IVW, MR-Egger, weighted median and mode. By fitting different models, the scatter plot showed the relationship between SNP and exposure factors, predicting the association between SNP and outcomes

Bidirectional MR analysis was performed to further determine the relationship between autoimmune hyperthyroidism and DVT. The reverse causal relationship was not observed (Table S2), which indicated that autoimmune hyperthyroidism can cause DVT from a mechanism perspective.

This study used MR to assess whether thyroid diseases affect the incidence of DVT. The results showed that autoimmune hyperthyroidism can increase the risk of DVT occurrence, but a reverse causal relationship was not observed between them using bidirectional MR analysis. However, other thyroid diseases, such as subacute thyroiditis, hypothyroidism, and thyroid cancer, did not show a similar effect.

Recently, several studies have suggested that thyroid-related diseases may be associated with the occurrence of DVT in the lower extremities, which provided etiological clues leading to the occurrence of DVT in our subsequent research. In 2006, a review mentioned the association between thyroid dysfunction and coagulation disorders [ 26 ], indicating a hypercoagulable state in patients with hyperthyroidism. In 2011, a review further suggested a clear association between hypothyroidism and bleeding tendency, while hyperthyroidism appeared to increase the risk of thrombotic events, particularly cerebral venous thrombosis [ 27 ]. A retrospective cohort study [ 28 ] supported this conclusion, but this study only observed a higher proportion of concurrent thyroid dysfunction in patients with cerebral venous thrombosis. The relationship between thyroid function and venous thromboembolism remains controversial. Krieg VJ et al. [ 29 ] found that hypothyroidism has a higher incidence rate in patients with chronic thromboembolic pulmonary hypertension and may be associated with more severe disease, which seemed to be different from previous views that hyperthyroidism may be associated with venous thrombosis. Alsaidan [ 30 ] also revealed that the risk of developing venous thrombosis was almost increased onefold for cases with a mild-to-moderate elevation of thyroid stimulating hormone and Free thyroxine 4(FT4). In contrast, it increased twofold for cases with a severe elevation of thyroid stimulating hormone and FT4. Raised thyroid hormones may increase the synthesis or secretion of coagulation factors or may decrease fibrinolysis, which may lead to the occurrence of coagulation abnormality.

Other thyroid diseases are also reported to be associated with DVT. In a large prospective cohort study [ 31 ], the incidence of venous thromboembolism was observed to increase in patients with thyroid cancer over the age of 60. However, other retrospective studies did not find any difference compared with the general population [ 32 ]. In the post-COVID-19 era, subacute thyroiditis has received considerable attention from researchers. New evidence suggests that COVID-19 may be associated with subacute thyroiditis [ 33 , 34 ]. Mondal et al. [ 35 ] found that out of 670 COVID-19 patients, 11 presented with post-COVID-19 subacute thyroiditis. Among them, painless subacute thyroiditis appeared earlier and exhibited symptoms of hyperthyroidism. Another case report also indicated the same result, that is, subacute thyroiditis occurred after COVID-19 infection, accompanied by thyroid function changes [ 36 ]. This led us to hypothesize that subacute thyroiditis may cause DVT through alterations in thyroid function.

This study confirmed a significant causal relationship between autoimmune hyperthyroidism and DVT ( p  = 0.02). The data were tested for heterogeneity and gene pleiotropy using MR-Egger, Cochran’s Q, and MR-PRESSO tests. There was no evidence that the results were influenced by pleiotropy or heterogeneity. In the leave-one-out analysis, four of the five selected SNPs showed significant effects of autoimmune hyperthyroidism on DVT, suggesting an impact of these SNPs on DVT outcome. Previous studies have focused on the relationship between hyperthyroidism and its secondary arrhythmias and arterial thromboembolism [ 37 , 38 ]. This study emphasized the risk of DVT in patients with hyperthyroidism, which has certain clinical implications. Prophylactic anticoagulant therapy was observed to help prevent DVT in patients with hyperthyroidism. Unfortunately, the results of this study did not reveal any evidence that suggests a relationship between other thyroid diseases and DVT occurrence. This may be due to the limited database, as this study only included the GWAS data from a subset of European populations. Large-scale multiracial studies are needed in the future.

There are some limitations to this study. First, it was limited to participants of European descent. Consequently, further investigation is required to confirm these findings in other ethnicities. Second, this study did not reveal the relationship between complications of hyperthyroidism and DVT. Additionally, this study selected IVs from the database using statistical methods rather than selecting them from the real population. This may result in weaker effects of the screened IVs and reduce the clinical significance of MR analysis. Moreover, the definitions of some diseases in this study were not clear in the original database, and some of the diseases were self-reported, which may reduce the accuracy of diagnosis. Further research is still needed to clarify the causal relationship between DVT and thyroid diseases based on prospective cohort and randomized controlled trials (RCTs).

This study analyzed large-scale genetic data and provided evidence of a causal relationship between autoimmune hyperthyroidism and the risk of DVT, Compared with the other thyroid diseases investigated. Prospective RCTs or MR studies with larger sample sizes are still needed to draw more precise conclusions.

Availability of data and materials

The IEU open gwas project, https://gwas.mrcieu.ac.uk/

Ortel TL, Neumann I, Ageno W, et al. American society of hematology 2020 guidelines for management of venous thromboembolism: treatment of deep vein thrombosis and pulmonary embolism. Blood Adv. 2020;4(19):4693–738.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Mehrabi F, Farshbafnadi M, Rezaei N. Post-discharge thromboembolic events in COVID-19 patients: a review on the necessity for prophylaxis. Clin Appl Thromb Hemost. 2023;29:10760296221148476.

Article   PubMed   PubMed Central   Google Scholar  

Loffredo L, Vidili G, Sciacqua A, et al. Asymptomatic and symptomatic deep venous thrombosis in hospitalized acutely ill medical patients: risk factors and therapeutic implications. Thromb J. 2022;20(1):72.

RIETE Registry. Death within 30 days. RIETE Registry. 2022[2023.8.23]. https://rieteregistry.com/graphics-interactives/dead-30-days/ .

Minges KE, Bikdeli B, Wang Y, Attaran RR, Krumholz HM. National and regional trends in deep vein thrombosis hospitalization rates, discharge disposition, and outcomes for medicare beneficiaries. Am J Med. 2018;131(10):1200–8.

Di Nisio M, van Es N, Büller HR. Deep vein thrombosis and pulmonary embolism. Lancet. 2016;388(10063):3060–73.

Article   PubMed   Google Scholar  

Aquila I, Boca S, Caputo F, et al. An unusual case of sudden death: is there a relationship between thyroid disorders and fatal pulmonary thromboembolism? A case report and review of literature. Am J Forensic Med Pathol. 2017;38(3):229–32.

Katić J, Katić A, Katić K, Duplančić D, Lozo M. Concurrent deep vein thrombosis and pulmonary embolism associated with hyperthyroidism: a case report. Acta Clin Croat. 2021;60(2):314–6.

PubMed   PubMed Central   Google Scholar  

Hieber M, von Kageneck C, Weiller C, Lambeck J. Thyroid diseases are an underestimated risk factor for cerebral venous sinus thrombosis. Front Neurol. 2020;11:561656.

Pohl KR, Hobohm L, Krieg VJ, et al. Impact of thyroid dysfunction on short-term outcomes and long-term mortality in patients with pulmonary embolism. Thromb Res. 2022;211:70–8.

Article   CAS   PubMed   Google Scholar  

Sirota DK. Axillary vein thrombosis as the initial symptom in metastatic papillary carcinoma of the thyroid. Mt Sinai J Med. 1989;56(2):111–3.

CAS   PubMed   Google Scholar  

Raveh E, Cohen M, Shpitzer T, Feinmesser R. Carcinoma of the thyroid: a cause of hypercoagulability? Ear Nose Throat J. 1995;74(2):110–2.

Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23(R1):R89–98.

Stuijver DJ, van Zaane B, Romualdi E, Brandjes DP, Gerdes VE, Squizzato A. The effect of hyperthyroidism on procoagulant, anticoagulant and fibrinolytic factors: a systematic review and meta-analysis. Thromb Haemost. 2012;108(6):1077–88.

PubMed   Google Scholar  

Son HM. Massive cerebral venous sinus thrombosis secondary to Graves’ disease. Yeungnam Univ J Med. 2019;36(3):273–80.

Elbers LP, Moran C, Gerdes VE, et al. The hypercoagulable state in hyperthyroidism is mediated via the thyroid hormone β receptor pathway. Eur J Endocrinol. 2016;174(6):755–62.

Davis PJ, Sudha T, Lin HY, et al. Thyroid hormone, hormone analogs, and angiogenesis. Compr Physiol. 2015;6(1):353–62.

Mousa SA, Lin HY, Tang HY, et al. Modulation of angiogenesis by thyroid hormone and hormone analogues: implications for cancer management. Angiogenesis. 2014;17(3):463–9.

Lou Z, Li X, Li C, et al. Microarray profile of circular RNAs identifies hsa_circ_000455 as a new circular RNA biomarker for deep vein thrombosis. Vascular. 2022;30(3):577–89.

Hemani G, Bowden J, Davey SG. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Mol Genet. 2018;27(R2):R195–208.

Zhang Z, Li L, Hu Z, et al. Causal effects between atrial fibrillation and heart failure: evidence from a bidirectional Mendelian randomization study. BMC Med Genomics. 2023;16(1):187.

Emdin CA, Khera AV, Kathiresan S. Mendelian randomization. JAMA. 2017;318(19):1925–6.

Skrivankova VW, Richmond RC, Woolf BAR, et al. Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: the STROBE-MR statement. JAMA. 2021;326(16):1614–21.

Hemani G, Zheng J, Elsworth B, et al. The MR-Base platform supports systematic causal inference across the human phenome. Elife. 2018;7: e34408.

Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50(5):693–8.

Franchini M. Hemostatic changes in thyroid diseases: haemostasis and thrombosis. Hematology. 2006;11(3):203–8.

Franchini M, Lippi G, Targher G. Hyperthyroidism and venous thrombosis: a casual or causal association? A systematic literature review. Clin Appl Thromb Hemost. 2011;17(4):387–92.

Fandler-Höfler S, Pilz S, Ertler M, et al. Thyroid dysfunction in cerebral venous thrombosis: a retrospective cohort study. J Neurol. 2022;269(4):2016–21.

Krieg VJ, Hobohm L, Liebetrau C, et al. Risk factors for chronic thromboembolic pulmonary hypertension—importance of thyroid disease and function. Thromb Res. 2020;185:20–6.

Alsaidan AA, Alruwiali F. Association between hyperthyroidism and thromboembolism: a retrospective observational study. Ann Afr Med. 2023;22(2):183–8.

Walker AJ, Card TR, West J, Crooks C, Grainge MJ. Incidence of venous thromboembolism in patients with cancer—a cohort study using linked United Kingdom databases. Eur J Cancer. 2013;49(6):1404–13.

Ordookhani A, Motazedi A, Burman KD. Thrombosis in thyroid cancer. Int J Endocrinol Metab. 2017;16(1): e57897.

Ziaka M, Exadaktylos A. Insights into SARS-CoV-2-associated subacute thyroiditis: from infection to vaccine. Virol J. 2023;20(1):132.

Henke K, Odermatt J, Ziaka M, Rudovich N. Subacute thyroiditis complicating COVID-19 infection. Clin Med Insights Case Rep. 2023;16:11795476231181560.

Mondal S, DasGupta R, Lodh M, Ganguly A. Subacute thyroiditis following recovery from COVID-19 infection: novel clinical findings from an Eastern Indian cohort. Postgrad Med J. 2023;99(1172):558–65.

Nham E, Song E, Hyun H, et al. Concurrent subacute thyroiditis and graves’ disease after COVID-19: a case report. J Korean Med Sci. 2023;38(18): e134.

Mouna E, Molka BB, Sawssan BT, et al. Cardiothyreosis: epidemiological, clinical and therapeutic approach. Clin Med Insights Cardiol. 2023;17:11795468231152042.

Maung AC, Cheong MA, Chua YY, Gardner DS. When a storm showers the blood clots: a case of thyroid storm with systemic thromboembolism. Endocrinol Diabetes Metab Case Rep. 2021;2021:20–0118.

Download references

Not applicable.

Author information

Lifeng Zhang and Kaibei Li have contributed equally to this work and share the first authorship.

Authors and Affiliations

Department of Vascular Surgery, Hospital of Chengdu University of Traditional Chinese Medicine, No. 39, Shierqiao Road, Jinniu District, Chengdu, 610072, Sichuan, People’s Republic of China

Lifeng Zhang, Qifan Yang, Yao Lin, Caijuan Geng, Wei Huang & Wei Zeng

Disinfection Supply Center, Hospital of Chengdu University of Traditional Chinese Medicine, No. 39, Shierqiao Road, Jin Niu District, Chengdu, 610072, Sichuan, People’s Republic of China

You can also search for this author in PubMed   Google Scholar

Contributions

Conception and design: LFZ and WZ. Analysis and interpretation: LFZ, KBL and WZ. Data collection: LFZ, QFY, YL, CJG and WH. Writing the article: LFZ, KBL. Critical revision of the article: LFZ, GFY and WZ. Final approval of the article: LFZ, KBL, YL, CJG, WH, QFY and WZ. Statistical analysis: YL, QFY.

Corresponding author

Correspondence to Wei Zeng .

Ethics declarations

Ethics approval and consent to participate.

Ethical approval was obtained in all original studies. This study complies with the terms of use of the database.

Competing interests

Additional information, publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1., rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Zhang, L., Li, K., Yang, Q. et al. Associations between deep venous thrombosis and thyroid diseases: a two-sample bidirectional Mendelian randomization study. Eur J Med Res 29 , 327 (2024). https://doi.org/10.1186/s40001-024-01933-1

Download citation

Received : 12 September 2023

Accepted : 09 June 2024

Published : 14 June 2024

DOI : https://doi.org/10.1186/s40001-024-01933-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Deep venous thrombosis
  • Thyroid diseases
  • Mendelian randomization analysis

European Journal of Medical Research

ISSN: 2047-783X

case control study epidemiology example

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Can J Hosp Pharm
  • v.67(5); Sep-Oct 2014

Logo of cjhp

An Introduction to the Fundamentals of Cohort and Case–Control Studies

Associated data, introduction.

As pharmacotherapy experts, pharmacists are continually updating their knowledge about drug effects. In addition to being knowledge users of research findings, pharmacists increasingly play a larger role in observational studies of drug effects. Observational studies are inherently nonexperimental and, unlike randomized clinical trials (RCTs), do not involve any manipulation (such as randomization) of the treatment and control groups by the investigator.

This article reviews for the practising pharmacist the fundamental design elements and foundational methodologic knowledge for conducting cohort and case–control studies, 2 common and robust observational study designs for elucidating drug–outcome associations. Readers interested in learning about other observational study designs, such as cross-sectional studies, ecological studies, case series, case reports, within-person studies, and quasi-experimental designs, or the critical appraisal of such designs, are referred elsewhere. 1 – 6

WHY WE NEED COHORT AND CASE–CONTROL STUDIES

We need well-designed and rigorous cohort and case– control studies because their findings provide knowledge complementary to that garnered from RCTs ( Table 1 ). The design properties of RCTs maximize their ability to estimate the potential causal effects of drugs under ideal circumstances and thereby to estimate the efficacy of those drugs. However, many RCTs involve a relatively limited number of highly selected patients and a limited duration. Indeed, RCTs typically follow patients for only a small fraction of the time that the drug would be used in clinical practice, especially when the medications are for chronic diseases. Moreover, RCTs typically exclude complex patients, they often use irrelevant comparators (e.g., placebo), and they frequently measure outcomes that are not patient-centred (i.e., surrogate end points). 7 Although many of these limitations may be overcome by designing more pragmatic RCTs that do indeed measure effectiveness, 8 cohort and case–control studies are 2 feasible study design alternatives that address the limitations of RCTs ( Table 1 ) without the considerable financial and human resource costs of pragmatic RCTs.

Limitations of Randomized Clinical Trials (RCTs) Potentially Addressed by Cohort and Case–Control Studies

Use of a strict study protocol that is often not representative of typical careUsually representative of settings of routine medical care
Exclusion of key patient populations, such as children, pregnant women, and elderly peopleMay focus on vulnerable and under-represented populations
Limited sample sizeMay include large number of patients, especially if secondary data sources are used, thereby allowing rare events to be detected
Short durationMay follow patients for long periods of time (e.g., years)
Evaluation of irrelevant treatment comparisonsMay compare several relevant therapies
Outcomes measured may not be important to the patient (e.g., surrogate end points)May include any outcome that is measurable within the data source
High costRelatively low cost

COHORT STUDIES

A cohort is a group of people who share a common experience or characteristic. The term “cohort” first appeared in the medical literature in the 1930s in an article by epidemiologist W H Frost. 9 Interestingly, the word “cohort” has military roots, originating from the Latin word “cohors”. 10 The term was first used in the Roman military, where a group of 300 to 600 soldiers constituted a cohort. 11

A cohort study compares the experience of 2 or more groups of patients who are followed concurrently forward in time ( Figure 1 ). This prospective tracking, from exposure to outcome, is in fact one of the defining features of a cohort study. 11 The temporal sequence involved in following a group of patients who are exposed to a certain factor (the treatment group) and a group of patients who are not exposed to that factor (the control group) is akin to that of a clinical trial, where instead of chance determining a patient’s exposure status (as occurs in an RCT), choice or happenstance determines exposure status.

An external file that holds a picture, illustration, etc.
Object name is cjhp-67-366f1.jpg

Schematic for the design of cohort and case–control studies.

Selecting the Study Cohort

For any cohort study, a source population must be defined, from which the eligible study cohort is derived through application of various inclusion and exclusion criteria. At a minimum, patients entering the study cohort must be free of the outcome of interest. For example, in a cohort study designed to measure the association between atypical antipsychotics and diabetes mellitus, patients with diabetes would have to be excluded from the study cohort because they are not at risk of the outcome. Often, other restrictions are put in place to minimize the risk of bias. For example, restriction to new users of a medication will ensure avoidance of multiple biases. 12 Inclusion of prevalent or current drug users can lead to significant bias because patients who experience early intolerance or adverse effects of a drug may discontinue the drug, and the remaining cohort will consist of a healthier and usually more adherent group. 13 Risk that varies over time, whereby new users have a higher risk of an adverse event, has been observed for numerous associations, including those between nonsteroidal anti-inflammatory drugs and upper gastrointestinal bleeding, 14 oral contraceptives and venous thromboembolism, 15 benzodiazepines and falls, 16 and angiotensin-converting enzyme inhibitors and angioedema. 17

Defining Drug Exposure Groups

Once the study cohort has been created, 2 or more exposure groups must be clearly defined, 1 of which must serve as the control or reference group. The reference group should be clinically relevant. For example, in a comparative safety or effectiveness study, patients taking a drug within the same therapeutic class or receiving usual care may serve as the reference group. If clinically and scientifically relevant, a group with no therapeutic exposure may be the reference group. Drug exposure may be measured in terms of persons or person-time (the time for which a person is exposed to a particular drug). Drug exposure is often categorized in a binary fashion (i.e., yes or no), based on either a minimum number of prescription records (e.g., at least 3 records) or a specified duration of exposure (e.g., at least 90 days’ exposure), or a combination of these 2 factors (i.e., cumulative exposure). Irrespective of how exposure is defined, it is essential that follow-up time be properly categorized following entry into the cohort to avoid time-related bias. 18 Furthermore, the definition of exposure should be coherent with the study hypothesis. For example, a certain amount of time or a certain dose of drug may be required to elicit an effect, or a drug may continue to have an effect once discontinued (e.g., bisphosphonates). Moreover, decisions about when to discontinue drug exposure must be made. There are 2 common approaches: “as treated”, whereby drug exposure is recorded as being stopped when a person no longer meets the definition of exposure; and “intention-to-treat”, whereby a person is considered exposed from the time of first meeting the study’s exposure definition until experiencing the outcome of interest or the end of the study, irrespective of changes in actual exposure status. There is no consensus on how to best define drug exposures, and hence the definitions of exposure often vary considerably among cohort studies assessing identical drug–outcome associations.

Measuring Occurrence of Outcomes

Complete and accurate measurement of the outcome of interest is essential to ensure the validity of study results. When subjective outcome data (e.g., diagnosis of pneumonia) are being collected during the study period, exposure status should be blinded for the outcome assessors and adjudicators, to prevent responder bias. When previously collected data (i.e., secondary data) are being used, investigators should ideally use outcome definitions that have been validated in previous studies. For example, Hux and others 19 validated definitions of diabetes by comparing International Classification of Diseases codes obtained from administrative health care databases in Ontario with diagnostic data from primary care charts.

Quantifying the Drug–Outcome Association

For cohort studies, the drug–outcome association is usually expressed as a relative risk, a relative rate, or a hazard ratio. Advanced statistical techniques are used to account for factors other than the drug exposure of interest that might distort the drug–outcome association. These factors or potential confounders are often handled simultaneously with multivariable regression models. Because these statistical models account for measured variables, it is crucial that the data source capture as many potential confounding variables (or proxies of confounders) as possible. Potential confounders should usually be measured before entry into the cohort, to avoid adjustment for factors in the causal pathway.

Strengths and Weaknesses

One of the major strengths of a cohort study is that the temporal sequence—drug exposure preceding outcome—is explicit in the study design. The incidence of a particular outcome among persons exposed to a certain drug can be directly calculated using a cohort design. Cohort studies are also relatively efficient for studying rare exposures, and multiple outcomes may be assessed for a single exposure. However, cohort studies with long observation periods may be more susceptible to losses to follow-up and to inaccurate measurement of exposures and outcomes. Large numbers of patients may be required to precisely estimate meaningful drug–outcome associations, especially for rare outcomes or outcomes that take a long time to occur.

CASE-CONTROL STUDIES

The first case–control study using the design with which we are familiar today was published in 1926. However, the concept of case–control studies has its origins in the investigation of disease etiologies through detailed histories and examination of patients. 20

In a case–control study, a number of cases and noncases (controls) are identified, and the occurrence of one or more prior exposures is compared between groups to evaluate drug–outcome associations ( Figure 1 ). A case–control study runs in reverse relative to a cohort study. 21 As such, study inception occurs when a patient experiences an outcome and is thus designated a “case”. A modern conceptual view holds that the case–control study can be thought of as an efficient cohort design. Essentially, patients who would have experienced the outcome of interest in a cohort study are the cases in a case–control study. Similarly, patients who were at risk but did not experience the outcome of interest in a cohort study are the controls in a case–control study. The potential data sources for a case–control study are identical with those for a cohort study, and the investigator may collect data after study inception or may use previously collected data. An extension of the case–control study is the nested case–control study, which is a case–control study conducted within a cohort. Details regarding this design are beyond the scope of this article and are reviewed elsewhere. 22 , 23

Selection of Cases

The first step in a case–control study is to identify the cases through application of explicitly defined inclusion and exclusion criteria. Ideally, cases should be directly sampled from the source population in a manner that is unrelated to the drug exposures of interest; however, the source population that gave rise to the cases is often unknown and difficult to identify (except in a nested-case control study, where the source population is known). The case-selection process and the data sources from which cases were selected should be described in detail, especially if cases are from a variety of sources, such as hospital and community-based sources. Selecting only hospital-based cases may lead to systematic error related to hospital admission practices, whereby exposed cases may be more likely to be admitted and therefore selected into the study (a phenomenon known as Berksonian bias). Furthermore, only new (incident) cases should be selected, as nonincident cases usually over-represent long-term survivors, and diagnostic practices may change over time, introducing potential bias. When cases are selected from a secondary data source, the case definitions should be supported by previous validation studies.

Selection of Controls

The selection of controls in a case–control study is fraught with difficulty and is often the source of significant bias. Essentially, the controls should be selected from the same source population as the cases. 24 In other words, controls should be at risk of becoming cases and should come from a population with the same exposure distribution as the cases. Multiple controls are usually selected for each case, to increase the statistical efficiency of the study; however, the gains are minimal beyond 3 or 4 controls per case. Nonetheless, modern case–control studies involving large databases often use much higher control–case ratios to maximize study precision. To control for potential confounding, cases and controls are often matched on one or more patient characteristics, such as age or sex (although it may not always be appropriate to match on these variables). The study investigator must be careful not to match on too many factors or on factors that are not confounders, as doing so might lead to overmatching and bias. Furthermore, matching should not involve variables that the investigator is interested in examining in association with an outcome. The selection of controls is one of the most difficult aspects of epidemiologic research, and readers are encouraged to consult additional resources. 24 – 28

Similar to the situation for a cohort study, the drug exposures of interest and their definitions should be clearly specified in the methods. Because exposure in a case–control study is determined after the cases have been identified, a period before occurrence of the case, called the “look-back period” or “look-back window”, must be defined. A comparable look-back period must be defined for the control group. Look-back periods should consider the study hypothesis and thus may vary considerably from one study to another. For example, Abdelmoneim and others 29 specified a 120-day look-back period before the date of their cases (patients with acute coronary syndrome) to assess recent exposure to glyburide and gliclazide. Azoulay and others 30 specified an exposure window of any time prior to a year before the date of cases in their study evaluating the association between pioglitazone and bladder cancer. If the investigators are collecting exposure data themselves, then outcome status should be blinded to study personnel.

In a case–control study, the odds ratio is the usual measure of association reported. This measure is the ratio of the odds of an exposure between cases and controls and in most cases approximates the relative risk. As in a cohort study, the analytic plan for a case–control study typically involves advanced statistical methods to adjust for multiple potential confounders.

The major strengths of the case–control design are statistical efficiency (i.e., uses fewer data to quantify a drug–outcome association than would be required in a cohort study), efficiency for studying rare outcomes, efficiency for studying conditions with long latency periods, efficiency for handling the time-varying nature of drug exposures, and relatively low cost. The weaknesses of case–control studies include inefficiency for studying rare exposures, difficulty of selecting unbiased controls, and inability to directly calculate incidence rates of outcomes.

LIMITATIONS OF COHORT AND CASE–CONTROL STUDIES

Bias and confounding.

Observational studies are methodologically difficult, susceptible to bias and confounding, and difficult to interpret, given the many types of bias potentially at play. For these reasons, observational studies are limited to studying drug–outcome associations and cannot be used to measure the causal effects of drugs. Recent methodologic advances in design and analytic techniques in pharmacoepidemiology have helped to combat the various types of selection bias, information bias, and confounding at play in cohort and case–control studies (see Appendix 1 , available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ). Many of these techniques can account for multiple potential confounders simultaneously. A comprehensive review of these techniques is beyond the scope of this article, but such reviews may be found elsewhere, 25 , 31 – 33 Bias and confounding result in spurious drug–outcome associations and are introduced at both the design and analysis stages. Appendix 2 (available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ) illustrates the origin of bias in relation to the cohort design, and Appendix 3 (available online at www.cjhp-online.ca/index.php/cjhp/issue/view/104/showToc ) lists common types of bias that occur in cohort and case–control studies of drug effects.

Study of Intended Drug Effects

Cohort and case–control studies are powerful approaches for estimating the association between drugs and unintended outcomes 34 ; however, their use for studying the intended effects of drugs has spurred debate in the past and remains controversial today. 35 – 37 This controversy has arisen because the propensity for bias and confounding is much higher when estimating the intended effects of drugs (i.e., benefits). 37 This higher propensity for bias is in turn due to the nonrandom nature of prescribing practices and is often referred to as “confounding by the reason for the prescription” or simply “confounding by indication”. Confounding by indication is expected with these types of studies, as it is good medical practice to prescribe intentionally and rationally, as opposed to prescribing according to a random process. 38 Some authors strongly recommend against using observational studies to study intended effects, suggesting instead that we consider restricting our research questions to those of unintended effects because confounding by indication introduces uncontrollable bias. 31 , 34 , 39 , 40 The literature contains numerous examples of confounding by indication. A most striking example is the distorted 27-fold increased risk of thrombotic events associated with use of warfarin, when in fact warfarin prevents thrombotic events. 39 Another example of confounding by indication is the observed relationship between short-acting ß-receptor agonists (e.g., salbutamol) and increased risk of death from asthma. 41 Of course confounding by indication is not verifiable, but it must be considered when studying the intended effects of drugs.

GENERAL CONSIDERATIONS IN CONDUCTING A COHORT OR CASE–CONTROL STUDY

Protocol and study team.

Cohort and case–control studies aim to quantitatively estimate the association between a drug exposure and outcome. Before embarking on a cohort or case–control study, the investigators must develop a well-articulated and focused research question. 42 Furthermore, the study protocol, including a detailed methodologic and analytic plan, should be consistent with international guidelines. 43 , 44 The study team should have appropriate clinical and methodologic expertise. Clinical expertise is essential for developing exposure and outcome definitions, as well as for understanding the overall clinical context of how the research question fits into the current body of knowledge. Methodologic expertise is critical for ensuring that robust methods are used, to minimize bias and confounding.

Data Sources

To estimate a drug–outcome association in a cohort or case– control study, accurate and comprehensive data must be collected on the drug exposures and outcomes of interest. Study investigators may collect data after study inception or may use previously collected data. The major advantage of prospectively collecting the data (primary data collection) is that the investigators have control over what information is collected; in contrast, when previously collected data are used (secondary data collection), the investigators are limited to the information already collected. Data may often be missing from or inaccurately recorded in secondary data sources, which creates challenges when the data are used for research purposes. Although previously collected data are considered retrospective to study inception, the data themselves are often collected prospectively; therefore, use of the terms “retrospective” and “prospective” may be misleading and usually does not provide any clarity in terms of important design characteristics. 25 There are 3 main sources of existing data: administrative data, medical records, and surveys. Special considerations and the advantages and disadvantages of these secondary data sources are discussed elsewhere. 45 , 46 For studying drug effects, secondary data sources are more commonly used than primary data collection, primarily because of gains in time, cost, and statistical efficiency. Furthermore, use of secondary data sources avoids the Hawthorne effect, whereby knowledge of participation in a study changes the behaviour of study participants and may lead to bias.

CONCLUSIONS

Pharmacists use knowledge from cohort and case–control studies to inform patients, clinicians, and the general public about drug effects. At a basic level, cohort and case–control studies quantitatively estimate the relation between exposures and outcomes. They represent rigorous study designs for answering drug safety and effectiveness questions, with case–control studies being more prone to bias. The methodologic rigour of cohort and case–control studies evaluating drug–outcome associations is advancing, and approaches are being developed and refined that limit the generation of misleading study results. Indeed, both RCTs and observational studies are necessary, and neither is sufficient to learn about the totality of drug effects in the population.

Acknowledgments

John-Michael Gamble is supported by a New Investigator Award in drug safety and effectiveness from the Canadian Institutes of Health Research and a Clinician Scientist Award from the Canadian Diabetes Association.

This article is the sixth in the CJHP Research Primer Series, an initiative of the CJHP Editorial Board and the CSHP Research Committee. The planned 2-year series is intended to appeal to relatively inexperienced researchers, with the goal of building research capacity among practising pharmacists. The articles, presenting simple but rigorous guidance to encourage and support novice researchers, are being solicited from authors with appropriate expertise.

Previous article in this series:

Bond CM. The research jigsaw: how to get started. Can J Hosp Pharm . 2014;67(1):28–30.

Tully MP. Research: articulating questions, generating hypotheses, and choosing study designs. Can J Hosp Pharm . 2014;67(1):31–4.

Loewen P. Ethical issues in pharmacy practice research: an introductory guide. Can J Hosp Pharm. 2014;67(2):133–7.

Tsuyuki RT. Designing pharmacy practice research trials. Can J Hosp Pharm . 2014;67(3):226–9.

Bresee LC. An introduction to developing surveys for pharmacy practice research. Can J Hosp Pharm . 2014;67(4):286–91.

Competing interests: None declared.

IMAGES

  1. PPT

    case control study epidemiology example

  2. case control study epidemiology example

    case control study epidemiology example

  3. case control study epidemiology example

    case control study epidemiology example

  4. Case Control

    case control study epidemiology example

  5. case control study epidemiology example

    case control study epidemiology example

  6. case control study epidemiology example

    case control study epidemiology example

VIDEO

  1. case control study , epidemiology part 5 community medicine

  2. case control study epidemiology by dr Ostovar

  3. case control study I community Medicine I Epidemiology

  4. Case control study

  5. EPIDEMIOLOGY lecture 11 CASE CONTROL STUDY detailed information with all questions

  6. Case control study, PSM, easy explanation in hindi

COMMENTS

  1. What Is a Case-Control Study?

    Examples of case-control studies. Case-control studies are common in fields like epidemiology, healthcare, and psychology. Example: Epidemiology case-control study You are examining the relationship between drinking water contamination and the incidence of gastrointestinal illnesses like gastroenteritis. Here, the case group would be individuals who have been diagnosed with a gastrointestinal ...

  2. Case Control Studies

    A case-control study is a type of observational study commonly used to look at factors associated with diseases or outcomes.[1] ... For example, if a disease developed in 1 in 1000 people per year (0.001/year) then in ten years one would expect about 10 cases of a disease to exist in a group of 1000 people. ... Epidemiology Of Study Design ...

  3. Epidemiology in Practice: Case-Control Studies

    Introduction. A case-control study is designed to help determine if an exposure is associated with an outcome (i.e., disease or condition of interest). In theory, the case-control study can be described simply. First, identify the cases (a group known to have the outcome) and the controls (a group known to be free of the outcome).

  4. Case Control Study: Definition, Benefits & Examples

    A case control study is a retrospective, observational study that compares two existing groups. Researchers form these groups based on the existence of a condition in the case group and the lack of that condition in the control group. They evaluate the differences in the histories between these two groups looking for factors that might cause a ...

  5. Methodology Series Module 2: Case-control Studies

    'Individual matching' is one common technique used in case-control study. For example, in the above mentioned metabolic syndrome and psoriasis, we can decide that for each case enrolled in the study, we will enroll a control that is matched for sex and age (+/- 2 years). ... In 'The Dictionary of Epidemiology' by Porta (2014), the ...

  6. A Practical Overview of Case-Control Studies in Clinical Practice

    General Overview of Case-Control Studies. In observational studies, also called epidemiologic studies, the primary objective is to discover and quantify an association between exposures and the outcome of interest, in hopes of drawing causal inference. Observational studies can have a retrospective study design, a prospective design, a cross ...

  7. A Practical Overview of Case-Control Studies in Clinical Practice

    Case-control studies are one of the major observational study designs for performing clinical research. The advantages of these study designs over other study designs are that they are relatively quick to perform, economical, and easy to design and implement. Case-control studies are particularly appropriate for studying disease outbreaks, rare diseases, or outcomes of interest. This article ...

  8. Case Control Studies

    A case-control study is a type of observational study commonly used to look at factors associated with diseases or outcomes. The case-control study starts with a group of cases, which are the individuals who have the outcome of interest. ... For example, if a disease developed in 1 in 1000 people per year (0.001/year) then in ten years one ...

  9. Case-control studies: basic concepts

    Introductory textbooks of epidemiology often fall back on methods of control sampling, ... Hospital-based case-control studies. In most examples presented earlier, the patients are assumed to be sampled from a defined geographical population (via disease registries or by having access to all hospitals of some region), and control subjects are ...

  10. Case-control study—design, measures, and classic examples

    Abstract. Case-control studies are a type of observational epidemiological study that involve comparing two groups of individuals; one group with a defined outcome and the other without (normal). By doing this, one can look back in time to analyze the possible factors that may have contributed to the development of that outcome.

  11. PDF Case-control studies: an efficient study design

    For example, a case-control study could be used to determine whether long-term use of ... Lewallen S, Courtright P. Epidemiology in practice: case-control studies. Community Eye Health. 1998;11:57 ...

  12. Example of a Case-Control Study

    Example of a Case-Control Study. The Salmonella outbreak above occurred in a small, well-defined cohort, and the overall attack rate was 58%. A cohort study design works well in these circumstances. However, in most outbreaks the population is not well defined, and cohort studies are not feasible. A good example of this is an actual outbreak of ...

  13. PDF Case Control Studies

    Karin B. Yeatts, PhD, MS. Case-Control StudiesCase-control studies are used to determine if there is an association between an exposure and a spe. ific health outcome. These studies proceed from effect (e.g. health outcome, condition, disease) to cause (exposure). Case-control studies assess whether exposure is disproportionately distributed ...

  14. Case Control

    Case control studies are observational because no intervention is attempted and no attempt is made to alter the course of the disease. The goal is to retrospectively determine the exposure to the risk factor of interest from each of the two groups of individuals: cases and controls. These studies are designed to estimate odds.

  15. Case Control Study: Definition & Examples

    Examples. A case-control study is an observational study where researchers analyzed two groups of people (cases and controls) to look at factors associated with particular diseases or outcomes. Below are some examples of case-control studies: Investigating the impact of exposure to daylight on the health of office workers (Boubekri et al., 2014).

  16. Case-control and Cohort studies: A brief overview

    Introduction. Case-control and cohort studies are observational studies that lie near the middle of the hierarchy of evidence. These types of studies, along with randomised controlled trials, constitute analytical studies, whereas case reports and case series define descriptive studies (1). Although these studies are not ranked as highly as ...

  17. Epidemiologic Case Study Resources

    The case studies include links to websites and videos, discussion and interactive questions, plus a full package of instructor resources including a helpful instructor's guide with sample answers to discussion questions, and a test bank. The 6 Interactive Case Studies include: 1. Clinical course of COVID-19 2. Epidemiology of COVID-19 3.

  18. Observational Studies: Cohort and Case-Control Studies

    Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In this review article, we describe these study designs, methodological issues, and provide examples from the plastic surgery literature. Keywords: observational studies, case-control study ...

  19. Case-control and case-cohort studies

    This chapter addresses Case-control and case-cohort studies. In a Case-control study, one samples persons based on their disease outcome, so the fraction of diseased persons in a Case-control study is usually known (at least approximately) before data collection. In a cohort (follow-up) study, the relationship between some exposure and disease ...

  20. Case-control study

    case-control study, in epidemiology, observational (nonexperimental) study design used to ascertain information on differences in suspected exposures and outcomes between individuals with a disease of interest (cases) and comparable individuals who do not have the disease (controls). Analysis yields an odds ratio (OR) that reflects the relative probabilities of exposure in the two populations.

  21. Case-control study

    A case-control study (also known as case-referent study) is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case-control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have the condition with patients who do not have ...

  22. The case for case-cohort: An applied epidemiologist's guide to re

    For example, if investigators are interested in studying the role of genetic factors as they relate to an exposure, an outcome, or their interaction, it is fairly straight-forward to genotype the same sample that was selected for a prior nested case-control or case-cohort study. 27,31,32 In doing this, investigators using case-cohort sampling ...

  23. A step-by-step guide to causal study design using real-world data

    A case study was selected to demonstrate an application of the guide. An eight-step guide to causal study design was created, integrating essential concepts from the literature, anchored into conceptual groupings according to natural steps in the study design process. ... For example, a study that uses propensity score-based methods to balance ...

  24. Associations between deep venous thrombosis and thyroid diseases: a two

    Some previous observational studies have linked deep venous thrombosis (DVT) to thyroid diseases; however, the findings were contradictory. This study aimed to investigate whether some common thyroid diseases can cause DVT using a two-sample Mendelian randomization (MR) approach. This two-sample MR study used single nucleotide polymorphisms (SNPs) identified by the FinnGen genome-wide ...

  25. An Introduction to the Fundamentals of Cohort and Case-Control Studies

    Design. In a case-control study, a number of cases and noncases (controls) are identified, and the occurrence of one or more prior exposures is compared between groups to evaluate drug-outcome associations ( Figure 1 ). A case-control study runs in reverse relative to a cohort study. 21 As such, study inception occurs when a patient ...